| # arrow <img src="https://arrow.apache.org/img/arrow-logo_hex_black-txt_white-bg.png" align="right" alt="" width="120" /> |
| |
| <!-- badges: start --> |
| |
| [](https://cran.r-project.org/package=arrow) |
| [](https://github.com/apache/arrow/actions/workflows/r.yml?query=branch%3Amain+event%3Apush) |
| [](https://apache.r-universe.dev) |
| [](https://anaconda.org/conda-forge/r-arrow) |
| |
| <!-- badges: end --> |
| |
| ## Overview |
| |
| The R `{arrow}` package provides access to many of the features of the [Apache Arrow C++ library](https://arrow.apache.org/docs/cpp/index.html) for R users. The goal of arrow is to provide an Arrow C++ backend to `{dplyr}`, and access to the Arrow C++ library through familiar base R and tidyverse functions, or `{R6}` classes. The dedicated R package website is located [here](https://arrow.apache.org/docs/r/index.html). |
| |
| To learn more about the Apache Arrow project, see the documentation of the parent [Arrow Project](https://arrow.apache.org/). The Arrow project provides functionality for a wide range of data analysis tasks to store, process and move data fast. See the [read/write article](https://arrow.apache.org/docs/r/articles/read_write.html) to learn about reading and writing data files, [data wrangling](https://arrow.apache.org/docs/r/articles/data_wrangling.html) to learn how to use dplyr syntax with arrow objects, and the [function documentation](https://arrow.apache.org/docs/r/reference/acero.html) for a full list of supported functions within dplyr queries. |
| |
| ## Installation |
| |
| The latest release of arrow can be installed from CRAN. In most cases installing the latest release should work without requiring any additional system dependencies, especially if you are using |
| Windows or macOS. |
| |
| ```r |
| install.packages("arrow") |
| ``` |
| |
| If you are having trouble installing from CRAN, then we offer two alternative install options for grabbing the latest arrow release. First, [R-universe](https://r-universe.dev/) provides pre-compiled binaries for the most commonly used operating systems.[^1] |
| |
| [^1]: Linux users should consult the R-universe [documentation](https://docs.r-universe.dev/install/binaries.html) for guidance on the exact repo URL path and potential limitations. |
| |
| ```r |
| install.packages("arrow", repos = c("https://apache.r-universe.dev", "https://cloud.r-project.org")) |
| ``` |
| |
| Second, if you are using conda then you can install arrow from conda-forge. |
| |
| ```sh |
| conda install -c conda-forge --strict-channel-priority r-arrow |
| ``` |
| |
| There are some special cases to note: |
| |
| - On macOS, the R you use with Arrow should match the architecture of the machine you are using. If you're using an ARM (aka M1, M2, etc.) processor use R compiled for arm64. If you're using an Intel based mac, use R compiled for x86. Using R and Arrow compiled for Intel based macs on an ARM based mac will result in segfaults and crashes. |
| |
| - On Linux the installation process can sometimes be more involved because CRAN does not host binaries for Linux. For more information please see the [installation guide](https://arrow.apache.org/docs/r/articles/install.html). |
| |
| - If you are compiling arrow from source, please note that as of version 23.0.0, arrow requires C++20 to build. This has implications on Windows and CentOS 7. For Windows users it means you need to be running an R version of 4.3 or later (though R 4.2 has incomplete support and might work with special configuration). See the [installation details article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) for guidance. |
| |
| - Development versions of arrow are released nightly. For information on how to install nightly builds please see the [installing nightly builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) article. |
| |
| ## What can the arrow package do? |
| |
| The Arrow C++ library is comprised of different parts, each of which serves a specific purpose. The arrow package provides bindings to the C++ functionality for a wide range of data analysis |
| tasks. |
| |
| It allows users to read and write data in a variety of formats: |
| |
| - Read and write Parquet files, an efficient and widely used columnar format |
| - Read and write Arrow (formerly known as Feather) files, a format optimized for speed and |
| interoperability |
| - Read and write CSV files with excellent speed and efficiency |
| - Read and write multi-file and larger-than-memory datasets |
| - Read JSON files |
| |
| It provides access to remote filesystems and servers: |
| |
| - Read and write files in Amazon S3 and Google Cloud Storage buckets |
| - Connect to Arrow Flight servers to transport large datasets over networks |
| |
| Additional features include: |
| |
| - Manipulate and analyze Arrow data with dplyr verbs |
| - Zero-copy data sharing between R and Python |
| - Fine control over column types to work seamlessly with databases and data warehouses |
| - Toolkit for building connectors to other applications and services that use Arrow |
| |
| ## What is Apache Arrow? |
| |
| Apache Arrow is a cross-language development platform for in-memory and |
| larger-than-memory data. It specifies a standardized language-independent |
| columnar memory format for flat and hierarchical data, organized for efficient |
| analytic operations on modern hardware. It also provides computational libraries |
| and zero-copy streaming, messaging, and interprocess communication. |
| |
| This package exposes an interface to the Arrow C++ library, enabling access to |
| many of its features in R. It provides low-level access to the Arrow C++ library |
| API and higher-level access through a dplyr backend and familiar R functions. |
| |
| |
| ## Arrow resources |
| |
| There are a few additional resources that you may find useful for getting started with arrow: |
| |
| - The official [Arrow R package documentation](https://arrow.apache.org/docs/r/) |
| - [Scaling Up With R and Arrow](https://arrowrbook.com) |
| - [Arrow for R cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf) |
| - [Apache Arrow R Cookbook](https://arrow.apache.org/cookbook/r/index.html) |
| - R for Data Science [Chapter on Arrow](https://r4ds.hadley.nz/arrow) |
| - [Awesome Arrow R](https://github.com/thisisnic/awesome-arrow-r) |
| |
| ## Getting help |
| |
| We welcome questions, discussion, and contributions from users of the |
| arrow package. For information about mailing lists and other venues |
| for engaging with the Arrow developer and user communities, please see |
| the [Apache Arrow Community](https://arrow.apache.org/community/) page. |
| |
| If you encounter a bug, please file an issue with a minimal reproducible |
| example on [GitHub issues](https://github.com/apache/arrow/issues). |
| Log in to your GitHub account, click on **New issue** and select the type of |
| issue you want to create. Add a meaningful title prefixed with **`[R]`** |
| followed by a space, the issue summary and select component **R** from the |
| dropdown list. For more information, see the **Report bugs and propose |
| features** section of the [Contributing to Apache |
| Arrow](https://arrow.apache.org/docs/developers/#contributing) page |
| in the Arrow developer documentation. |
| |
| ## Code of Conduct |
| |
| Please note that all participation in the Apache Arrow project is |
| governed by the Apache Software Foundation's [code of |
| conduct](https://www.apache.org/foundation/policies/conduct.html). |