| # Licensed to the Apache Software Foundation (ASF) under one |
| # or more contributor license agreements. See the NOTICE file |
| # distributed with this work for additional information |
| # regarding copyright ownership. The ASF licenses this file |
| # to you under the Apache License, Version 2.0 (the |
| # "License"); you may not use this file except in compliance |
| # with the License. You may obtain a copy of the License at |
| # |
| # http://www.apache.org/licenses/LICENSE-2.0 |
| # |
| # Unless required by applicable law or agreed to in writing, |
| # software distributed under the License is distributed on an |
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| # KIND, either express or implied. See the License for the |
| # specific language governing permissions and limitations |
| # under the License. |
| |
| # Generated by using data-raw/docgen.R -> do not edit by hand |
| |
| #' Functions available in Arrow dplyr queries |
| #' |
| #' The `arrow` package contains methods for 37 `dplyr` table functions, many of |
| #' which are "verbs" that do transformations to one or more tables. |
| #' The package also has mappings of 211 R functions to the corresponding |
| #' functions in the Arrow compute library. These allow you to write code inside |
| #' of `dplyr` methods that call R functions, including many in packages like |
| #' `stringr` and `lubridate`, and they will get translated to Arrow and run |
| #' on the Arrow query engine (Acero). This document lists all of the mapped |
| #' functions. |
| #' |
| #' # `dplyr` verbs |
| #' |
| #' Most verb functions return an `arrow_dplyr_query` object, similar in spirit |
| #' to a `dbplyr::tbl_lazy`. This means that the verbs do not eagerly evaluate |
| #' the query on the data. To run the query, call either `compute()`, |
| #' which returns an `arrow` [Table], or `collect()`, which pulls the resulting |
| #' Table into an R `tibble`. |
| #' |
| #' * [`anti_join()`][dplyr::anti_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`arrange()`][dplyr::arrange()] |
| #' * [`collapse()`][dplyr::collapse()] |
| #' * [`collect()`][dplyr::collect()] |
| #' * [`compute()`][dplyr::compute()] |
| #' * [`count()`][dplyr::count()] |
| #' * [`distinct()`][dplyr::distinct()]: `.keep_all = TRUE` not supported |
| #' * [`explain()`][dplyr::explain()] |
| #' * [`filter()`][dplyr::filter()] |
| #' * [`full_join()`][dplyr::full_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`glimpse()`][dplyr::glimpse()] |
| #' * [`group_by()`][dplyr::group_by()] |
| #' * [`group_by_drop_default()`][dplyr::group_by_drop_default()] |
| #' * [`group_vars()`][dplyr::group_vars()] |
| #' * [`groups()`][dplyr::groups()] |
| #' * [`inner_join()`][dplyr::inner_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`left_join()`][dplyr::left_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`mutate()`][dplyr::mutate()]: window functions (e.g. things that require aggregation within groups) not currently supported |
| #' * [`pull()`][dplyr::pull()]: the `name` argument is not supported; returns an R vector by default but this behavior is deprecated and will return an Arrow [ChunkedArray] in a future release. Provide `as_vector = TRUE/FALSE` to control this behavior, or set `options(arrow.pull_as_vector)` globally. |
| #' * [`relocate()`][dplyr::relocate()] |
| #' * [`rename()`][dplyr::rename()] |
| #' * [`rename_with()`][dplyr::rename_with()] |
| #' * [`right_join()`][dplyr::right_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`select()`][dplyr::select()] |
| #' * [`semi_join()`][dplyr::semi_join()]: the `copy` and `na_matches` arguments are ignored |
| #' * [`show_query()`][dplyr::show_query()] |
| #' * [`slice_head()`][dplyr::slice_head()]: slicing within groups not supported; Arrow datasets do not have row order, so head is non-deterministic; `prop` only supported on queries where `nrow()` is knowable without evaluating |
| #' * [`slice_max()`][dplyr::slice_max()]: slicing within groups not supported; `with_ties = TRUE` (dplyr default) is not supported; `prop` only supported on queries where `nrow()` is knowable without evaluating |
| #' * [`slice_min()`][dplyr::slice_min()]: slicing within groups not supported; `with_ties = TRUE` (dplyr default) is not supported; `prop` only supported on queries where `nrow()` is knowable without evaluating |
| #' * [`slice_sample()`][dplyr::slice_sample()]: slicing within groups not supported; `replace = TRUE` and the `weight_by` argument not supported; `n` only supported on queries where `nrow()` is knowable without evaluating |
| #' * [`slice_tail()`][dplyr::slice_tail()]: slicing within groups not supported; Arrow datasets do not have row order, so tail is non-deterministic; `prop` only supported on queries where `nrow()` is knowable without evaluating |
| #' * [`summarise()`][dplyr::summarise()]: window functions not currently supported; arguments `.drop = FALSE` and `.groups = "rowwise" not supported |
| #' * [`tally()`][dplyr::tally()] |
| #' * [`transmute()`][dplyr::transmute()] |
| #' * [`ungroup()`][dplyr::ungroup()] |
| #' * [`union()`][dplyr::union()] |
| #' * [`union_all()`][dplyr::union_all()] |
| #' |
| #' # Function mappings |
| #' |
| #' In the list below, any differences in behavior or support between Acero and |
| #' the R function are listed. If no notes follow the function name, then you |
| #' can assume that the function works in Acero just as it does in R. |
| #' |
| #' Functions can be called either as `pkg::fun()` or just `fun()`, i.e. both |
| #' `str_sub()` and `stringr::str_sub()` work. |
| #' |
| #' In addition to these functions, you can call any of Arrow's 254 compute |
| #' functions directly. Arrow has many functions that don't map to an existing R |
| #' function. In other cases where there is an R function mapping, you can still |
| #' call the Arrow function directly if you don't want the adaptations that the R |
| #' mapping has that make Acero behave like R. These functions are listed in the |
| #' [C++ documentation](https://arrow.apache.org/docs/cpp/compute.html), and |
| #' in the function registry in R, they are named with an `arrow_` prefix, such |
| #' as `arrow_ascii_is_decimal`. |
| #' |
| #' ## arrow |
| #' |
| #' * [`add_filename()`][arrow::add_filename()] |
| #' * [`cast()`][arrow::cast()] |
| #' |
| #' ## base |
| #' |
| #' * [`!`][!()] |
| #' * [`!=`][!=()] |
| #' * [`%%`][%%()] |
| #' * [`%/%`][%/%()] |
| #' * [`%in%`][%in%()] |
| #' * [`&`][&()] |
| #' * [`*`][*()] |
| #' * [`+`][+()] |
| #' * [`-`][-()] |
| #' * [`/`][/()] |
| #' * [`<`][<()] |
| #' * [`<=`][<=()] |
| #' * [`==`][==()] |
| #' * [`>`][>()] |
| #' * [`>=`][>=()] |
| #' * [`ISOdate()`][base::ISOdate()] |
| #' * [`ISOdatetime()`][base::ISOdatetime()] |
| #' * [`^`][^()] |
| #' * [`abs()`][base::abs()] |
| #' * [`acos()`][base::acos()] |
| #' * [`all()`][base::all()] |
| #' * [`any()`][base::any()] |
| #' * [`as.Date()`][base::as.Date()]: Multiple `tryFormats` not supported in Arrow. |
| #' Consider using the lubridate specialised parsing functions `ymd()`, `ymd()`, etc. |
| #' * [`as.character()`][base::as.character()] |
| #' * [`as.difftime()`][base::as.difftime()]: only supports `units = "secs"` (the default) |
| #' * [`as.double()`][base::as.double()] |
| #' * [`as.integer()`][base::as.integer()] |
| #' * [`as.logical()`][base::as.logical()] |
| #' * [`as.numeric()`][base::as.numeric()] |
| #' * [`asin()`][base::asin()] |
| #' * [`ceiling()`][base::ceiling()] |
| #' * [`cos()`][base::cos()] |
| #' * [`data.frame()`][base::data.frame()]: `row.names` and `check.rows` arguments not supported; |
| #' `stringsAsFactors` must be `FALSE` |
| #' * [`difftime()`][base::difftime()]: only supports `units = "secs"` (the default); |
| #' `tz` argument not supported |
| #' * [`endsWith()`][base::endsWith()] |
| #' * [`exp()`][base::exp()] |
| #' * [`floor()`][base::floor()] |
| #' * [`format()`][base::format()] |
| #' * [`grepl()`][base::grepl()] |
| #' * [`gsub()`][base::gsub()] |
| #' * [`ifelse()`][base::ifelse()] |
| #' * [`is.character()`][base::is.character()] |
| #' * [`is.double()`][base::is.double()] |
| #' * [`is.factor()`][base::is.factor()] |
| #' * [`is.finite()`][base::is.finite()] |
| #' * [`is.infinite()`][base::is.infinite()] |
| #' * [`is.integer()`][base::is.integer()] |
| #' * [`is.list()`][base::is.list()] |
| #' * [`is.logical()`][base::is.logical()] |
| #' * [`is.na()`][base::is.na()] |
| #' * [`is.nan()`][base::is.nan()] |
| #' * [`is.numeric()`][base::is.numeric()] |
| #' * [`log()`][base::log()] |
| #' * [`log10()`][base::log10()] |
| #' * [`log1p()`][base::log1p()] |
| #' * [`log2()`][base::log2()] |
| #' * [`logb()`][base::logb()] |
| #' * [`max()`][base::max()] |
| #' * [`mean()`][base::mean()] |
| #' * [`min()`][base::min()] |
| #' * [`nchar()`][base::nchar()]: `allowNA = TRUE` and `keepNA = TRUE` not supported |
| #' * [`paste()`][base::paste()]: the `collapse` argument is not yet supported |
| #' * [`paste0()`][base::paste0()]: the `collapse` argument is not yet supported |
| #' * [`pmax()`][base::pmax()] |
| #' * [`pmin()`][base::pmin()] |
| #' * [`round()`][base::round()] |
| #' * [`sign()`][base::sign()] |
| #' * [`sin()`][base::sin()] |
| #' * [`sqrt()`][base::sqrt()] |
| #' * [`startsWith()`][base::startsWith()] |
| #' * [`strftime()`][base::strftime()] |
| #' * [`strptime()`][base::strptime()]: accepts a `unit` argument not present in the `base` function. |
| #' Valid values are "s", "ms" (default), "us", "ns". |
| #' * [`strrep()`][base::strrep()] |
| #' * [`strsplit()`][base::strsplit()] |
| #' * [`sub()`][base::sub()] |
| #' * [`substr()`][base::substr()]: `start` and `stop` must be length 1 |
| #' * [`substring()`][base::substring()] |
| #' * [`sum()`][base::sum()] |
| #' * [`tan()`][base::tan()] |
| #' * [`tolower()`][base::tolower()] |
| #' * [`toupper()`][base::toupper()] |
| #' * [`trunc()`][base::trunc()] |
| #' * [`|`][|()] |
| #' |
| #' ## bit64 |
| #' |
| #' * [`as.integer64()`][bit64::as.integer64()] |
| #' * [`is.integer64()`][bit64::is.integer64()] |
| #' |
| #' ## dplyr |
| #' |
| #' * [`across()`][dplyr::across()] |
| #' * [`between()`][dplyr::between()] |
| #' * [`case_when()`][dplyr::case_when()]: `.ptype` and `.size` arguments not supported |
| #' * [`coalesce()`][dplyr::coalesce()] |
| #' * [`desc()`][dplyr::desc()] |
| #' * [`if_all()`][dplyr::if_all()] |
| #' * [`if_any()`][dplyr::if_any()] |
| #' * [`if_else()`][dplyr::if_else()] |
| #' * [`n()`][dplyr::n()] |
| #' * [`n_distinct()`][dplyr::n_distinct()] |
| #' |
| #' ## lubridate |
| #' |
| #' * [`am()`][lubridate::am()] |
| #' * [`as_date()`][lubridate::as_date()] |
| #' * [`as_datetime()`][lubridate::as_datetime()] |
| #' * [`ceiling_date()`][lubridate::ceiling_date()] |
| #' * [`date()`][lubridate::date()] |
| #' * [`date_decimal()`][lubridate::date_decimal()] |
| #' * [`day()`][lubridate::day()] |
| #' * [`ddays()`][lubridate::ddays()] |
| #' * [`decimal_date()`][lubridate::decimal_date()] |
| #' * [`dhours()`][lubridate::dhours()] |
| #' * [`dmicroseconds()`][lubridate::dmicroseconds()] |
| #' * [`dmilliseconds()`][lubridate::dmilliseconds()] |
| #' * [`dminutes()`][lubridate::dminutes()] |
| #' * [`dmonths()`][lubridate::dmonths()] |
| #' * [`dmy()`][lubridate::dmy()]: `locale` argument not supported |
| #' * [`dmy_h()`][lubridate::dmy_h()]: `locale` argument not supported |
| #' * [`dmy_hm()`][lubridate::dmy_hm()]: `locale` argument not supported |
| #' * [`dmy_hms()`][lubridate::dmy_hms()]: `locale` argument not supported |
| #' * [`dnanoseconds()`][lubridate::dnanoseconds()] |
| #' * [`dpicoseconds()`][lubridate::dpicoseconds()]: not supported |
| #' * [`dseconds()`][lubridate::dseconds()] |
| #' * [`dst()`][lubridate::dst()] |
| #' * [`dweeks()`][lubridate::dweeks()] |
| #' * [`dyears()`][lubridate::dyears()] |
| #' * [`dym()`][lubridate::dym()]: `locale` argument not supported |
| #' * [`epiweek()`][lubridate::epiweek()] |
| #' * [`epiyear()`][lubridate::epiyear()] |
| #' * [`fast_strptime()`][lubridate::fast_strptime()]: non-default values of `lt` and `cutoff_2000` not supported |
| #' * [`floor_date()`][lubridate::floor_date()] |
| #' * [`force_tz()`][lubridate::force_tz()]: Timezone conversion from non-UTC timezone not supported; |
| #' `roll_dst` values of 'error' and 'boundary' are supported for nonexistent times, |
| #' `roll_dst` values of 'error', 'pre', and 'post' are supported for ambiguous times. |
| #' * [`format_ISO8601()`][lubridate::format_ISO8601()] |
| #' * [`hour()`][lubridate::hour()] |
| #' * [`is.Date()`][lubridate::is.Date()] |
| #' * [`is.POSIXct()`][lubridate::is.POSIXct()] |
| #' * [`is.instant()`][lubridate::is.instant()] |
| #' * [`is.timepoint()`][lubridate::is.timepoint()] |
| #' * [`isoweek()`][lubridate::isoweek()] |
| #' * [`isoyear()`][lubridate::isoyear()] |
| #' * [`leap_year()`][lubridate::leap_year()] |
| #' * [`make_date()`][lubridate::make_date()] |
| #' * [`make_datetime()`][lubridate::make_datetime()]: only supports UTC (default) timezone |
| #' * [`make_difftime()`][lubridate::make_difftime()]: only supports `units = "secs"` (the default); |
| #' providing both `num` and `...` is not supported |
| #' * [`mday()`][lubridate::mday()] |
| #' * [`mdy()`][lubridate::mdy()]: `locale` argument not supported |
| #' * [`mdy_h()`][lubridate::mdy_h()]: `locale` argument not supported |
| #' * [`mdy_hm()`][lubridate::mdy_hm()]: `locale` argument not supported |
| #' * [`mdy_hms()`][lubridate::mdy_hms()]: `locale` argument not supported |
| #' * [`minute()`][lubridate::minute()] |
| #' * [`month()`][lubridate::month()] |
| #' * [`my()`][lubridate::my()]: `locale` argument not supported |
| #' * [`myd()`][lubridate::myd()]: `locale` argument not supported |
| #' * [`parse_date_time()`][lubridate::parse_date_time()]: `quiet = FALSE` is not supported |
| #' Available formats are H, I, j, M, S, U, w, W, y, Y, R, T. |
| #' On Linux and OS X additionally a, A, b, B, Om, p, r are available. |
| #' * [`pm()`][lubridate::pm()] |
| #' * [`qday()`][lubridate::qday()] |
| #' * [`quarter()`][lubridate::quarter()] |
| #' * [`round_date()`][lubridate::round_date()] |
| #' * [`second()`][lubridate::second()] |
| #' * [`semester()`][lubridate::semester()] |
| #' * [`tz()`][lubridate::tz()] |
| #' * [`wday()`][lubridate::wday()] |
| #' * [`week()`][lubridate::week()] |
| #' * [`with_tz()`][lubridate::with_tz()] |
| #' * [`yday()`][lubridate::yday()] |
| #' * [`ydm()`][lubridate::ydm()]: `locale` argument not supported |
| #' * [`ydm_h()`][lubridate::ydm_h()]: `locale` argument not supported |
| #' * [`ydm_hm()`][lubridate::ydm_hm()]: `locale` argument not supported |
| #' * [`ydm_hms()`][lubridate::ydm_hms()]: `locale` argument not supported |
| #' * [`year()`][lubridate::year()] |
| #' * [`ym()`][lubridate::ym()]: `locale` argument not supported |
| #' * [`ymd()`][lubridate::ymd()]: `locale` argument not supported |
| #' * [`ymd_h()`][lubridate::ymd_h()]: `locale` argument not supported |
| #' * [`ymd_hm()`][lubridate::ymd_hm()]: `locale` argument not supported |
| #' * [`ymd_hms()`][lubridate::ymd_hms()]: `locale` argument not supported |
| #' * [`yq()`][lubridate::yq()]: `locale` argument not supported |
| #' |
| #' ## methods |
| #' |
| #' * [`is()`][methods::is()] |
| #' |
| #' ## rlang |
| #' |
| #' * [`is_character()`][rlang::is_character()] |
| #' * [`is_double()`][rlang::is_double()] |
| #' * [`is_integer()`][rlang::is_integer()] |
| #' * [`is_list()`][rlang::is_list()] |
| #' * [`is_logical()`][rlang::is_logical()] |
| #' |
| #' ## stats |
| #' |
| #' * [`median()`][stats::median()]: approximate median (t-digest) is computed |
| #' * [`quantile()`][stats::quantile()]: `probs` must be length 1; |
| #' approximate quantile (t-digest) is computed |
| #' * [`sd()`][stats::sd()] |
| #' * [`var()`][stats::var()] |
| #' |
| #' ## stringi |
| #' |
| #' * [`stri_reverse()`][stringi::stri_reverse()] |
| #' |
| #' ## stringr |
| #' |
| #' Pattern modifiers `coll()` and `boundary()` are not supported in any functions. |
| #' |
| #' * [`str_c()`][stringr::str_c()]: the `collapse` argument is not yet supported |
| #' * [`str_count()`][stringr::str_count()]: `pattern` must be a length 1 character vector |
| #' * [`str_detect()`][stringr::str_detect()] |
| #' * [`str_dup()`][stringr::str_dup()] |
| #' * [`str_ends()`][stringr::str_ends()] |
| #' * [`str_length()`][stringr::str_length()] |
| #' * [`str_like()`][stringr::str_like()] |
| #' * [`str_pad()`][stringr::str_pad()] |
| #' * [`str_remove()`][stringr::str_remove()] |
| #' * [`str_remove_all()`][stringr::str_remove_all()] |
| #' * [`str_replace()`][stringr::str_replace()] |
| #' * [`str_replace_all()`][stringr::str_replace_all()] |
| #' * [`str_split()`][stringr::str_split()]: Case-insensitive string splitting and splitting into 0 parts not supported |
| #' * [`str_starts()`][stringr::str_starts()] |
| #' * [`str_sub()`][stringr::str_sub()]: `start` and `end` must be length 1 |
| #' * [`str_to_lower()`][stringr::str_to_lower()] |
| #' * [`str_to_title()`][stringr::str_to_title()] |
| #' * [`str_to_upper()`][stringr::str_to_upper()] |
| #' * [`str_trim()`][stringr::str_trim()] |
| #' |
| #' ## tibble |
| #' |
| #' * [`tibble()`][tibble::tibble()] |
| #' |
| #' ## tidyselect |
| #' |
| #' * [`all_of()`][tidyselect::all_of()] |
| #' * [`contains()`][tidyselect::contains()] |
| #' * [`ends_with()`][tidyselect::ends_with()] |
| #' * [`everything()`][tidyselect::everything()] |
| #' * [`last_col()`][tidyselect::last_col()] |
| #' * [`matches()`][tidyselect::matches()] |
| #' * [`num_range()`][tidyselect::num_range()] |
| #' * [`one_of()`][tidyselect::one_of()] |
| #' * [`starts_with()`][tidyselect::starts_with()] |
| #' |
| #' @name acero |
| #' |
| #' @aliases arrow-functions arrow-verbs arrow-dplyr |
| NULL |