| <div id="main" class="col-md-9" role="main"> |
| |
| # arrow  |
| |
| <div class="section level1"> |
| |
| <div class="section level2"> |
| |
| ## Overview |
| |
| The R [arrow](https://github.com/apache/arrow/) package provides access |
| to many of the features of the [Apache Arrow C++ |
| library](https://arrow.apache.org/docs/cpp/index.html) for R users. The |
| goal of arrow is to provide an Arrow C++ backend to |
| [dplyr](https://dplyr.tidyverse.org), and access to the Arrow C++ |
| library through familiar base R and tidyverse functions, or |
| [R6](https://r6.r-lib.org) classes. The dedicated R package website is |
| located [here](https://arrow.apache.org/docs/r/index.html). |
| |
| To learn more about the Apache Arrow project, see the documentation of |
| the parent [Arrow Project](https://arrow.apache.org/). The Arrow project |
| provides functionality for a wide range of data analysis tasks to store, |
| process and move data fast. See the [read/write |
| article](https://arrow.apache.org/docs/r/articles/read_write.html) to |
| learn about reading and writing data files, [data |
| wrangling](https://arrow.apache.org/docs/r/articles/data_wrangling.html) |
| to learn how to use dplyr syntax with arrow objects, and the [function |
| documentation](https://arrow.apache.org/docs/r/reference/acero.html) for |
| a full list of supported functions within dplyr queries. |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## Installation |
| |
| The latest release of arrow can be installed from CRAN. In most cases |
| installing the latest release should work without requiring any |
| additional system dependencies, especially if you are using Windows or |
| macOS. |
| |
| <div id="cb1" class="sourceCode"> |
| |
| ``` r |
| install.packages("arrow") |
| ``` |
| |
| </div> |
| |
| If you are having trouble installing from CRAN, then we offer two |
| alternative install options for grabbing the latest arrow release. |
| First, [R-universe](https://r-universe.dev/) provides pre-compiled |
| binaries for the most commonly used operating systems.[¹](#fn1) |
| |
| <div id="cb2" class="sourceCode"> |
| |
| ``` r |
| install.packages("arrow", repos = c("https://apache.r-universe.dev", "https://cloud.r-project.org")) |
| ``` |
| |
| </div> |
| |
| Second, if you are using conda then you can install arrow from |
| conda-forge. |
| |
| <div id="cb3" class="sourceCode"> |
| |
| ``` sh |
| conda install -c conda-forge --strict-channel-priority r-arrow |
| ``` |
| |
| </div> |
| |
| There are some special cases to note: |
| |
| - On macOS, the R you use with Arrow should match the architecture of |
| the machine you are using. If you’re using an ARM (aka M1, M2, etc.) |
| processor use R compiled for arm64. If you’re using an Intel based |
| mac, use R compiled for x86. Using R and Arrow compiled for Intel |
| based macs on an ARM based mac will result in segfaults and crashes. |
| |
| - On Linux the installation process can sometimes be more involved |
| because CRAN does not host binaries for Linux. For more information |
| please see the [installation |
| guide](https://arrow.apache.org/docs/r/articles/install.html). |
| |
| - If you are compiling arrow from source, please note that as of |
| version 10.0.0, arrow requires C++17 to build. This has implications |
| on Windows and CentOS 7. For Windows users it means you need to be |
| running an R version of 4.0 or later. On CentOS 7, it means you need |
| to install a newer compiler than the default system compiler gcc. |
| See the [installation details |
| article](https://arrow.apache.org/docs/r/articles/developers/install_details.html) |
| for guidance. |
| |
| - Development versions of arrow are released nightly. For information |
| on how to install nightly builds please see the [installing nightly |
| builds](https://arrow.apache.org/docs/r/articles/install_nightly.html) |
| article. |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## What can the arrow package do? |
| |
| The Arrow C++ library is comprised of different parts, each of which |
| serves a specific purpose. The arrow package provides binding to the C++ |
| functionality for a wide range of data analysis tasks. |
| |
| It allows users to read and write data in a variety formats: |
| |
| - Read and write Parquet files, an efficient and widely used columnar |
| format |
| - Read and write Arrow (formerly known as Feather) files, a format |
| optimized for speed and interoperability |
| - Read and write CSV files with excellent speed and efficiency |
| - Read and write multi-file and larger-than-memory datasets |
| - Read JSON files |
| |
| It provides access to remote filesystems and servers: |
| |
| - Read and write files in Amazon S3 and Google Cloud Storage buckets |
| - Connect to Arrow Flight servers to transport large datasets over |
| networks |
| |
| Additional features include: |
| |
| - Manipulate and analyze Arrow data with dplyr verbs |
| - Zero-copy data sharing between R and Python |
| - Fine control over column types to work seamlessly with databases and |
| data warehouses |
| - Toolkit for building connectors to other applications and services |
| that use Arrow |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## What is Apache Arrow? |
| |
| Apache Arrow is a cross-language development platform for in-memory and |
| larger-than-memory data. It specifies a standardized |
| language-independent columnar memory format for flat and hierarchical |
| data, organized for efficient analytic operations on modern hardware. It |
| also provides computational libraries and zero-copy streaming, |
| messaging, and interprocess communication. |
| |
| This package exposes an interface to the Arrow C++ library, enabling |
| access to many of its features in R. It provides low-level access to the |
| Arrow C++ library API and higher-level access through a dplyr backend |
| and familiar R functions. |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## Arrow resources |
| |
| There are a few additional resources that you may find useful for |
| getting started with arrow: |
| |
| - The official [Arrow R package |
| documentation](https://arrow.apache.org/docs/r/) |
| - [Scaling Up With R and Arrow](https://arrowrbook.com) |
| - [Arrow for R |
| cheatsheet](https://github.com/apache/arrow/blob/-/r/cheatsheet/arrow-cheatsheet.pdf) |
| - [Apache Arrow R |
| Cookbook](https://arrow.apache.org/cookbook/r/index.html) |
| - R for Data Science [Chapter on Arrow](https://r4ds.hadley.nz/arrow) |
| - [Awesome Arrow R](https://github.com/thisisnic/awesome-arrow-r) |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## Getting help |
| |
| We welcome questions, discussion, and contributions from users of the |
| arrow package. For information about mailing lists and other venues for |
| engaging with the Arrow developer and user communities, please see the |
| [Apache Arrow Community](https://arrow.apache.org/community/) page. |
| |
| If you encounter a bug, please file an issue with a minimal reproducible |
| example on [GitHub issues](https://github.com/apache/arrow/issues). Log |
| in to your GitHub account, click on **New issue** and select the type of |
| issue you want to create. Add a meaningful title prefixed with **`[R]`** |
| followed by a space, the issue summary and select component **R** from |
| the dropdown list. For more information, see the **Report bugs and |
| propose features** section of the [Contributing to Apache |
| Arrow](https://arrow.apache.org/docs/developers/#contributing) page in |
| the Arrow developer documentation. |
| |
| </div> |
| |
| <div class="section level2"> |
| |
| ## Code of Conduct |
| |
| Please note that all participation in the Apache Arrow project is |
| governed by the Apache Software Foundation’s [code of |
| conduct](https://www.apache.org/foundation/policies/conduct.html). |
| |
| </div> |
| |
| </div> |
| |
| <div id="footnotes" class="section footnotes footnotes-end-of-document" |
| role="doc-endnotes"> |
| |
| ------------------------------------------------------------------------ |
| |
| 1. <div id="fn1"> |
| |
| Linux users should consult the R-universe |
| [documentation](https://docs.r-universe.dev/install/binaries.html) |
| for guidance on the exact repo URL path and potential limitations. |
| |
| </div> |
| |
| </div> |
| |
| </div> |