| % Generated by roxygen2: do not edit by hand |
| % Please edit documentation in R/dataset-scan.R |
| \name{map_batches} |
| \alias{map_batches} |
| \title{Apply a function to a stream of RecordBatches} |
| \usage{ |
| map_batches(X, FUN, ..., .schema = NULL, .lazy = TRUE, .data.frame = NULL) |
| } |
| \arguments{ |
| \item{X}{A \code{Dataset} or \code{arrow_dplyr_query} object, as returned by the |
| \code{dplyr} methods on \code{Dataset}.} |
| |
| \item{FUN}{A function or \code{purrr}-style lambda expression to apply to each |
| batch. It must return a RecordBatch or something coercible to one via |
| `as_record_batch()'.} |
| |
| \item{...}{Additional arguments passed to \code{FUN}} |
| |
| \item{.schema}{An optional \code{\link[=schema]{schema()}}. If NULL, the schema will be inferred |
| from the first batch.} |
| |
| \item{.lazy}{Use \code{TRUE} to evaluate \code{FUN} lazily as batches are read from |
| the result; use \code{FALSE} to evaluate \code{FUN} on all batches before returning |
| the reader.} |
| |
| \item{.data.frame}{Deprecated argument, ignored} |
| } |
| \value{ |
| An \code{arrow_dplyr_query}. |
| } |
| \description{ |
| As an alternative to calling \code{collect()} on a \code{Dataset} query, you can |
| use this function to access the stream of \code{RecordBatch}es in the \code{Dataset}. |
| This lets you do more complex operations in R that operate on chunks of data |
| without having to hold the entire Dataset in memory at once. You can include |
| \code{map_batches()} in a dplyr pipeline and do additional dplyr methods on the |
| stream of data in Arrow after it. |
| } |
| \details{ |
| This is experimental and not recommended for production use. It is also |
| single-threaded and runs in R not C++, so it won't be as fast as core |
| Arrow methods. |
| } |