| % Generated by roxygen2: do not edit by hand |
| % Please edit documentation in R/dplyr-funcs-doc.R |
| \name{acero} |
| \alias{acero} |
| \title{Functions available in Arrow dplyr queries} |
| \description{ |
| The \code{arrow} package contains methods for 37 \code{dplyr} table functions, many of |
| which are "verbs" that do transformations to one or more tables. |
| The package also has mappings of 211 R functions to the corresponding |
| functions in the Arrow compute library. These allow you to write code inside |
| of \code{dplyr} methods that call R functions, including many in packages like |
| \code{stringr} and \code{lubridate}, and they will get translated to Arrow and run |
| on the Arrow query engine (Acero). This document lists all of the mapped |
| functions. |
| } |
| \section{\code{dplyr} verbs}{ |
| Most verb functions return an \code{arrow_dplyr_query} object, similar in spirit |
| to a \code{dbplyr::tbl_lazy}. This means that the verbs do not eagerly evaluate |
| the query on the data. To run the query, call either \code{compute()}, |
| which returns an \code{arrow} \link{Table}, or \code{collect()}, which pulls the resulting |
| Table into an R \code{data.frame}. |
| \itemize{ |
| \item \code{\link[dplyr:filter-joins]{anti_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:arrange]{arrange()}} |
| \item \code{\link[dplyr:compute]{collapse()}} |
| \item \code{\link[dplyr:compute]{collect()}} |
| \item \code{\link[dplyr:compute]{compute()}} |
| \item \code{\link[dplyr:count]{count()}} |
| \item \code{\link[dplyr:distinct]{distinct()}}: \code{.keep_all = TRUE} not supported |
| \item \code{\link[dplyr:explain]{explain()}} |
| \item \code{\link[dplyr:filter]{filter()}} |
| \item \code{\link[dplyr:mutate-joins]{full_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:glimpse]{glimpse()}} |
| \item \code{\link[dplyr:group_by]{group_by()}} |
| \item \code{\link[dplyr:group_by_drop_default]{group_by_drop_default()}} |
| \item \code{\link[dplyr:group_data]{group_vars()}} |
| \item \code{\link[dplyr:group_data]{groups()}} |
| \item \code{\link[dplyr:mutate-joins]{inner_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:mutate-joins]{left_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:mutate]{mutate()}}: window functions (e.g. things that require aggregation within groups) not currently supported |
| \item \code{\link[dplyr:pull]{pull()}}: the \code{name} argument is not supported; returns an R vector by default but this behavior is deprecated and will return an Arrow \link{ChunkedArray} in a future release. Provide \code{as_vector = TRUE/FALSE} to control this behavior, or set \code{options(arrow.pull_as_vector)} globally. |
| \item \code{\link[dplyr:relocate]{relocate()}} |
| \item \code{\link[dplyr:rename]{rename()}} |
| \item \code{\link[dplyr:rename]{rename_with()}} |
| \item \code{\link[dplyr:mutate-joins]{right_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:select]{select()}} |
| \item \code{\link[dplyr:filter-joins]{semi_join()}}: the \code{copy} and \code{na_matches} arguments are ignored |
| \item \code{\link[dplyr:explain]{show_query()}} |
| \item \code{\link[dplyr:slice]{slice_head()}}: slicing within groups not supported; Arrow datasets do not have row order, so head is non-deterministic; \code{prop} only supported on queries where \code{nrow()} is knowable without evaluating |
| \item \code{\link[dplyr:slice]{slice_max()}}: slicing within groups not supported; \code{with_ties = TRUE} (dplyr default) is not supported; \code{prop} only supported on queries where \code{nrow()} is knowable without evaluating |
| \item \code{\link[dplyr:slice]{slice_min()}}: slicing within groups not supported; \code{with_ties = TRUE} (dplyr default) is not supported; \code{prop} only supported on queries where \code{nrow()} is knowable without evaluating |
| \item \code{\link[dplyr:slice]{slice_sample()}}: slicing within groups not supported; \code{replace = TRUE} and the \code{weight_by} argument not supported; \code{n} only supported on queries where \code{nrow()} is knowable without evaluating |
| \item \code{\link[dplyr:slice]{slice_tail()}}: slicing within groups not supported; Arrow datasets do not have row order, so tail is non-deterministic; \code{prop} only supported on queries where \code{nrow()} is knowable without evaluating |
| \item \code{\link[dplyr:summarise]{summarise()}}: window functions not currently supported; arguments \code{.drop = FALSE} and `.groups = "rowwise" not supported |
| \item \code{\link[dplyr:count]{tally()}} |
| \item \code{\link[dplyr:transmute]{transmute()}} |
| \item \code{\link[dplyr:group_by]{ungroup()}} |
| \item \code{\link[dplyr:setops]{union()}} |
| \item \code{\link[dplyr:setops]{union_all()}} |
| } |
| } |
| |
| \section{Function mappings}{ |
| In the list below, any differences in behavior or support between Acero and |
| the R function are listed. If no notes follow the function name, then you |
| can assume that the function works in Acero just as it does in R. |
| |
| Functions can be called either as \code{pkg::fun()} or just \code{fun()}, i.e. both |
| \code{str_sub()} and \code{stringr::str_sub()} work. |
| |
| In addition to these functions, you can call any of Arrow's 246 compute |
| functions directly. Arrow has many functions that don't map to an existing R |
| function. In other cases where there is an R function mapping, you can still |
| call the Arrow function directly if you don't want the adaptations that the R |
| mapping has that make Acero behave like R. These functions are listed in the |
| \href{https://arrow.apache.org/docs/cpp/compute.html}{C++ documentation}, and |
| in the function registry in R, they are named with an \code{arrow_} prefix, such |
| as \code{arrow_ascii_is_decimal}. |
| \subsection{arrow}{ |
| \itemize{ |
| \item \code{\link[=add_filename]{add_filename()}} |
| \item \code{\link[=cast]{cast()}} |
| } |
| } |
| |
| \subsection{base}{ |
| \itemize{ |
| \item \code{\link[=-]{-}} |
| \item \code{\link[=!]{!}} |
| \item \code{\link[=!=]{!=}} |
| \item \code{\link[=*]{*}} |
| \item \code{\link[=/]{/}} |
| \item \code{\link[=&]{&}} |
| \item \code{\link[=\%/\%]{\%/\%}} |
| \item \code{\link[=\%\%]{\%\%}} |
| \item \code{\link[=\%in\%]{\%in\%}} |
| \item \code{\link[=^]{^}} |
| \item \code{\link[=+]{+}} |
| \item \code{\link[=<]{<}} |
| \item \code{\link[=<=]{<=}} |
| \item \code{\link[===]{==}} |
| \item \code{\link[=>]{>}} |
| \item \code{\link[=>=]{>=}} |
| \item \code{\link[=|]{|}} |
| \item \code{\link[base:MathFun]{abs()}} |
| \item \code{\link[base:Trig]{acos()}} |
| \item \code{\link[base:all]{all()}} |
| \item \code{\link[base:any]{any()}} |
| \item \code{\link[base:character]{as.character()}} |
| \item \code{\link[base:as.Date]{as.Date()}}: Multiple \code{tryFormats} not supported in Arrow. |
| Consider using the lubridate specialised parsing functions \code{ymd()}, \code{ymd()}, etc. |
| \item \code{\link[base:difftime]{as.difftime()}}: only supports \code{units = "secs"} (the default) |
| \item \code{\link[base:double]{as.double()}} |
| \item \code{\link[base:integer]{as.integer()}} |
| \item \code{\link[base:logical]{as.logical()}} |
| \item \code{\link[base:numeric]{as.numeric()}} |
| \item \code{\link[base:Trig]{asin()}} |
| \item \code{\link[base:Round]{ceiling()}} |
| \item \code{\link[base:Trig]{cos()}} |
| \item \code{\link[base:data.frame]{data.frame()}}: \code{row.names} and \code{check.rows} arguments not supported; |
| \code{stringsAsFactors} must be \code{FALSE} |
| \item \code{\link[base:difftime]{difftime()}}: only supports \code{units = "secs"} (the default); |
| \code{tz} argument not supported |
| \item \code{\link[base:startsWith]{endsWith()}} |
| \item \code{\link[base:Log]{exp()}} |
| \item \code{\link[base:Round]{floor()}} |
| \item \code{\link[base:format]{format()}} |
| \item \code{\link[base:grep]{grepl()}} |
| \item \code{\link[base:grep]{gsub()}} |
| \item \code{\link[base:ifelse]{ifelse()}} |
| \item \code{\link[base:character]{is.character()}} |
| \item \code{\link[base:double]{is.double()}} |
| \item \code{\link[base:factor]{is.factor()}} |
| \item \code{\link[base:is.finite]{is.finite()}} |
| \item \code{\link[base:is.finite]{is.infinite()}} |
| \item \code{\link[base:integer]{is.integer()}} |
| \item \code{\link[base:list]{is.list()}} |
| \item \code{\link[base:logical]{is.logical()}} |
| \item \code{\link[base:NA]{is.na()}} |
| \item \code{\link[base:is.finite]{is.nan()}} |
| \item \code{\link[base:numeric]{is.numeric()}} |
| \item \code{\link[base:ISOdatetime]{ISOdate()}} |
| \item \code{\link[base:ISOdatetime]{ISOdatetime()}} |
| \item \code{\link[base:Log]{log()}} |
| \item \code{\link[base:Log]{log10()}} |
| \item \code{\link[base:Log]{log1p()}} |
| \item \code{\link[base:Log]{log2()}} |
| \item \code{\link[base:Log]{logb()}} |
| \item \code{\link[base:Extremes]{max()}} |
| \item \code{\link[base:mean]{mean()}} |
| \item \code{\link[base:Extremes]{min()}} |
| \item \code{\link[base:nchar]{nchar()}}: \code{allowNA = TRUE} and \code{keepNA = TRUE} not supported |
| \item \code{\link[base:paste]{paste()}}: the \code{collapse} argument is not yet supported |
| \item \code{\link[base:paste]{paste0()}}: the \code{collapse} argument is not yet supported |
| \item \code{\link[base:Extremes]{pmax()}} |
| \item \code{\link[base:Extremes]{pmin()}} |
| \item \code{\link[base:Round]{round()}} |
| \item \code{\link[base:sign]{sign()}} |
| \item \code{\link[base:Trig]{sin()}} |
| \item \code{\link[base:MathFun]{sqrt()}} |
| \item \code{\link[base:startsWith]{startsWith()}} |
| \item \code{\link[base:strptime]{strftime()}} |
| \item \code{\link[base:strptime]{strptime()}}: accepts a \code{unit} argument not present in the \code{base} function. |
| Valid values are "s", "ms" (default), "us", "ns". |
| \item \code{\link[base:strrep]{strrep()}} |
| \item \code{\link[base:strsplit]{strsplit()}} |
| \item \code{\link[base:grep]{sub()}} |
| \item \code{\link[base:substr]{substr()}}: \code{start} and \code{stop} must be length 1 |
| \item \code{\link[base:substr]{substring()}} |
| \item \code{\link[base:sum]{sum()}} |
| \item \code{\link[base:Trig]{tan()}} |
| \item \code{\link[base:chartr]{tolower()}} |
| \item \code{\link[base:chartr]{toupper()}} |
| \item \code{\link[base:Round]{trunc()}} |
| } |
| } |
| |
| \subsection{bit64}{ |
| \itemize{ |
| \item \code{\link[bit64:as.integer64.character]{as.integer64()}} |
| \item \code{\link[bit64:bit64-package]{is.integer64()}} |
| } |
| } |
| |
| \subsection{dplyr}{ |
| \itemize{ |
| \item \code{\link[dplyr:across]{across()}} |
| \item \code{\link[dplyr:between]{between()}} |
| \item \code{\link[dplyr:case_when]{case_when()}} |
| \item \code{\link[dplyr:coalesce]{coalesce()}} |
| \item \code{\link[dplyr:desc]{desc()}} |
| \item \code{\link[dplyr:across]{if_all()}} |
| \item \code{\link[dplyr:across]{if_any()}} |
| \item \code{\link[dplyr:if_else]{if_else()}} |
| \item \code{\link[dplyr:context]{n()}} |
| \item \code{\link[dplyr:n_distinct]{n_distinct()}} |
| } |
| } |
| |
| \subsection{lubridate}{ |
| \itemize{ |
| \item \code{\link[lubridate:am]{am()}} |
| \item \code{\link[lubridate:as_date]{as_date()}} |
| \item \code{\link[lubridate:as_date]{as_datetime()}} |
| \item \code{\link[lubridate:round_date]{ceiling_date()}} |
| \item \code{\link[lubridate:date]{date()}} |
| \item \code{\link[lubridate:date_decimal]{date_decimal()}} |
| \item \code{\link[lubridate:day]{day()}} |
| \item \code{\link[lubridate:duration]{ddays()}} |
| \item \code{\link[lubridate:decimal_date]{decimal_date()}} |
| \item \code{\link[lubridate:duration]{dhours()}} |
| \item \code{\link[lubridate:duration]{dmicroseconds()}} |
| \item \code{\link[lubridate:duration]{dmilliseconds()}} |
| \item \code{\link[lubridate:duration]{dminutes()}} |
| \item \code{\link[lubridate:duration]{dmonths()}} |
| \item \code{\link[lubridate:ymd]{dmy()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{dmy_h()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{dmy_hm()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{dmy_hms()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:duration]{dnanoseconds()}} |
| \item \code{\link[lubridate:duration]{dpicoseconds()}}: not supported |
| \item \code{\link[lubridate:duration]{dseconds()}} |
| \item \code{\link[lubridate:dst]{dst()}} |
| \item \code{\link[lubridate:duration]{dweeks()}} |
| \item \code{\link[lubridate:duration]{dyears()}} |
| \item \code{\link[lubridate:ymd]{dym()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:week]{epiweek()}} |
| \item \code{\link[lubridate:year]{epiyear()}} |
| \item \code{\link[lubridate:parse_date_time]{fast_strptime()}}: non-default values of \code{lt} and \code{cutoff_2000} not supported |
| \item \code{\link[lubridate:round_date]{floor_date()}} |
| \item \code{\link[lubridate:force_tz]{force_tz()}}: Timezone conversion from non-UTC timezone not supported; |
| \code{roll_dst} values of 'error' and 'boundary' are supported for nonexistent times, |
| \code{roll_dst} values of 'error', 'pre', and 'post' are supported for ambiguous times. |
| \item \code{\link[lubridate:format_ISO8601]{format_ISO8601()}} |
| \item \code{\link[lubridate:hour]{hour()}} |
| \item \code{\link[lubridate:date_utils]{is.Date()}} |
| \item \code{\link[lubridate:is.instant]{is.instant()}} |
| \item \code{\link[lubridate:posix_utils]{is.POSIXct()}} |
| \item \code{\link[lubridate:is.instant]{is.timepoint()}} |
| \item \code{\link[lubridate:week]{isoweek()}} |
| \item \code{\link[lubridate:year]{isoyear()}} |
| \item \code{\link[lubridate:leap_year]{leap_year()}} |
| \item \code{\link[lubridate:make_datetime]{make_date()}} |
| \item \code{\link[lubridate:make_datetime]{make_datetime()}}: only supports UTC (default) timezone |
| \item \code{\link[lubridate:make_difftime]{make_difftime()}}: only supports \code{units = "secs"} (the default); |
| providing both \code{num} and \code{...} is not supported |
| \item \code{\link[lubridate:day]{mday()}} |
| \item \code{\link[lubridate:ymd]{mdy()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{mdy_h()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{mdy_hm()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{mdy_hms()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:minute]{minute()}} |
| \item \code{\link[lubridate:month]{month()}} |
| \item \code{\link[lubridate:ymd]{my()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd]{myd()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:parse_date_time]{parse_date_time()}}: \code{quiet = FALSE} is not supported |
| Available formats are H, I, j, M, S, U, w, W, y, Y, R, T. |
| On Linux and OS X additionally a, A, b, B, Om, p, r are available. |
| \item \code{\link[lubridate:am]{pm()}} |
| \item \code{\link[lubridate:day]{qday()}} |
| \item \code{\link[lubridate:quarter]{quarter()}} |
| \item \code{\link[lubridate:round_date]{round_date()}} |
| \item \code{\link[lubridate:second]{second()}} |
| \item \code{\link[lubridate:quarter]{semester()}} |
| \item \code{\link[lubridate:tz]{tz()}} |
| \item \code{\link[lubridate:day]{wday()}} |
| \item \code{\link[lubridate:week]{week()}} |
| \item \code{\link[lubridate:with_tz]{with_tz()}} |
| \item \code{\link[lubridate:day]{yday()}} |
| \item \code{\link[lubridate:ymd]{ydm()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ydm_h()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ydm_hm()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ydm_hms()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:year]{year()}} |
| \item \code{\link[lubridate:ymd]{ym()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd]{ymd()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ymd_h()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ymd_hm()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd_hms]{ymd_hms()}}: \code{locale} argument not supported |
| \item \code{\link[lubridate:ymd]{yq()}}: \code{locale} argument not supported |
| } |
| } |
| |
| \subsection{methods}{ |
| \itemize{ |
| \item \code{\link[methods:is]{is()}} |
| } |
| } |
| |
| \subsection{rlang}{ |
| \itemize{ |
| \item \code{\link[rlang:type-predicates]{is_character()}} |
| \item \code{\link[rlang:type-predicates]{is_double()}} |
| \item \code{\link[rlang:type-predicates]{is_integer()}} |
| \item \code{\link[rlang:type-predicates]{is_list()}} |
| \item \code{\link[rlang:type-predicates]{is_logical()}} |
| } |
| } |
| |
| \subsection{stats}{ |
| \itemize{ |
| \item \code{\link[stats:median]{median()}}: approximate median (t-digest) is computed |
| \item \code{\link[stats:quantile]{quantile()}}: \code{probs} must be length 1; |
| approximate quantile (t-digest) is computed |
| \item \code{\link[stats:sd]{sd()}} |
| \item \code{\link[stats:cor]{var()}} |
| } |
| } |
| |
| \subsection{stringi}{ |
| \itemize{ |
| \item \code{\link[stringi:stri_reverse]{stri_reverse()}} |
| } |
| } |
| |
| \subsection{stringr}{ |
| |
| Pattern modifiers \code{coll()} and \code{boundary()} are not supported in any functions. |
| \itemize{ |
| \item \code{\link[stringr:str_c]{str_c()}}: the \code{collapse} argument is not yet supported |
| \item \code{\link[stringr:str_count]{str_count()}}: \code{pattern} must be a length 1 character vector |
| \item \code{\link[stringr:str_detect]{str_detect()}} |
| \item \code{\link[stringr:str_dup]{str_dup()}} |
| \item \code{\link[stringr:str_starts]{str_ends()}} |
| \item \code{\link[stringr:str_length]{str_length()}} |
| \item \code{\link[stringr:str_like]{str_like()}} |
| \item \code{\link[stringr:str_pad]{str_pad()}} |
| \item \code{\link[stringr:str_remove]{str_remove()}} |
| \item \code{\link[stringr:str_remove]{str_remove_all()}} |
| \item \code{\link[stringr:str_replace]{str_replace()}} |
| \item \code{\link[stringr:str_replace]{str_replace_all()}} |
| \item \code{\link[stringr:str_split]{str_split()}}: Case-insensitive string splitting and splitting into 0 parts not supported |
| \item \code{\link[stringr:str_starts]{str_starts()}} |
| \item \code{\link[stringr:str_sub]{str_sub()}}: \code{start} and \code{end} must be length 1 |
| \item \code{\link[stringr:case]{str_to_lower()}} |
| \item \code{\link[stringr:case]{str_to_title()}} |
| \item \code{\link[stringr:case]{str_to_upper()}} |
| \item \code{\link[stringr:str_trim]{str_trim()}} |
| } |
| } |
| |
| \subsection{tibble}{ |
| \itemize{ |
| \item \code{\link[tibble:tibble]{tibble()}} |
| } |
| } |
| |
| \subsection{tidyselect}{ |
| \itemize{ |
| \item \code{\link[tidyselect:all_of]{all_of()}} |
| \item \code{\link[tidyselect:starts_with]{contains()}} |
| \item \code{\link[tidyselect:starts_with]{ends_with()}} |
| \item \code{\link[tidyselect:everything]{everything()}} |
| \item \code{\link[tidyselect:everything]{last_col()}} |
| \item \code{\link[tidyselect:starts_with]{matches()}} |
| \item \code{\link[tidyselect:starts_with]{num_range()}} |
| \item \code{\link[tidyselect:one_of]{one_of()}} |
| \item \code{\link[tidyselect:starts_with]{starts_with()}} |
| } |
| } |
| } |
| |