| % Generated by roxygen2: do not edit by hand |
| % Please edit documentation in R/dataset.R |
| \name{map_batches} |
| \alias{map_batches} |
| \title{Apply a function to a stream of RecordBatches} |
| \usage{ |
| map_batches(X, FUN, ..., .data.frame = TRUE) |
| } |
| \arguments{ |
| \item{X}{A \code{Dataset} or \code{arrow_dplyr_query} object, as returned by the |
| \code{dplyr} methods on \code{Dataset}.} |
| |
| \item{FUN}{A function or \code{purrr}-style lambda expression to apply to each |
| batch} |
| |
| \item{...}{Additional arguments passed to \code{FUN}} |
| |
| \item{.data.frame}{logical: collect the resulting chunks into a single |
| \code{data.frame}? Default \code{TRUE}} |
| } |
| \description{ |
| As an alternative to calling \code{collect()} on a \code{Dataset} query, you can |
| use this function to access the stream of \code{RecordBatch}es in the \code{Dataset}. |
| This lets you aggregate on each chunk and pull the intermediate results into |
| a \code{data.frame} for further aggregation, even if you couldn't fit the whole |
| \code{Dataset} result in memory. |
| } |
| \details{ |
| This is experimental and not recommended for production use. |
| } |