blob: 915ec0859ca0085aea8cfd3fdbb3d88c44a5298c [file] [log] [blame]
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Installing on Linux • Arrow R Package</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="96x96" href="../favicon-96x96.png">
<link rel="icon" type="”image/svg+xml”" href="../favicon.svg">
<link rel="apple-touch-icon" sizes="180x180" href="../apple-touch-icon.png">
<link rel="icon" sizes="any" href="../favicon.ico">
<link rel="manifest" href="../site.webmanifest">
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
<meta property="og:title" content="Installing on Linux">
<meta name="description" content="Installing arrow on linux usually just works, but occasionally poses problems. Learn how to handle installation problems if and when they arise
">
<meta property="og:description" content="Installing arrow on linux usually just works, but occasionally poses problems. Learn how to handle installation problems if and when they arise
">
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png">
<meta property="og:image:alt" content="Apache Arrow logo, displaying the triple chevron image adjacent to the text">
<!-- Matomo --><script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script><!-- End Matomo Code --><!-- Kapa AI --><script async src="https://widget.kapa.ai/kapa-widget.bundle.js" data-website-id="9db461d5-ac77-4b3f-a5c5-75efa78339d2" data-project-name="Apache Arrow" data-project-color="#000000" data-project-logo="https://arrow.apache.org/img/arrow-logo_chevrons_white-txt_black-bg.png" data-modal-disclaimer="This is a custom LLM with access to all of [Arrow documentation](https://arrow.apache.org/docs/). If you want an R-specific answer, please mention this in your question." data-consent-required="true" data-user-analytics-cookie-enabled="false" data-consent-screen-disclaimer="By clicking &quot;I agree, let's chat&quot;, you consent to the use of the AI assistant in accordance with kapa.ai's [Privacy Policy](https://www.kapa.ai/content/privacy-policy). This service uses reCAPTCHA, which requires your consent to Google's [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms). By proceeding, you explicitly agree to both kapa.ai's and Google's privacy policies."></script><!-- End Kapa AI -->
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar fixed-top navbar-dark navbar-expand-lg bg-black"><div class="container">
<a class="navbar-brand me-2" href="../index.html">Arrow R Package</a>
<span class="version">
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">21.0.0.9000</small>
</span>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="nav-item"><a class="nav-link" href="../articles/arrow.html">Get started</a></li>
<li class="nav-item"><a class="nav-link" href="../reference/index.html">Reference</a></li>
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-articles" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true">Articles</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-articles">
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Using the package</h6></li>
<li><a class="dropdown-item" href="../articles/read_write.html">Reading and writing data files</a></li>
<li><a class="dropdown-item" href="../articles/data_wrangling.html">Data analysis with dplyr syntax</a></li>
<li><a class="dropdown-item" href="../articles/dataset.html">Working with multi-file data sets</a></li>
<li><a class="dropdown-item" href="../articles/python.html">Integrating Arrow, Python, and R</a></li>
<li><a class="dropdown-item" href="../articles/fs.html">Using cloud storage (S3, GCS)</a></li>
<li><a class="dropdown-item" href="../articles/flight.html">Connecting to a Flight server</a></li>
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Arrow concepts</h6></li>
<li><a class="dropdown-item" href="../articles/data_objects.html">Data objects</a></li>
<li><a class="dropdown-item" href="../articles/data_types.html">Data types</a></li>
<li><a class="dropdown-item" href="../articles/metadata.html">Metadata</a></li>
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Installation</h6></li>
<li><a class="dropdown-item" href="../articles/install.html">Installing on Linux</a></li>
<li><a class="dropdown-item" href="../articles/install_nightly.html">Installing development versions</a></li>
<li><hr class="dropdown-divider"></li>
<li><a class="dropdown-item" href="../articles/index.html">More articles...</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../news/index.html">Changelog</a></li>
</ul>
<form class="form-inline my-2 my-lg-0" role="search">
<input type="search" class="form-control me-sm-2" aria-label="Toggle navigation" name="search-input" data-search-index="../search.json" id="search-input" placeholder="" autocomplete="off">
</form>
<ul class="navbar-nav">
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/apache/arrow/" aria-label="GitHub"><span class="fa fab fa-github fa-lg"></span></a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<h1>Installing on Linux</h1>
<small class="dont-index">Source: <a href="https://github.com/apache/arrow/blob/main/r/vignettes/install.Rmd" class="external-link"><code>vignettes/install.Rmd</code></a></small>
<div class="d-none name"><code>install.Rmd</code></div>
</div>
<p>In most cases, <code>install.packages("arrow")</code> should just
work. There are things you can do to make the installation faster,
documented in this article. If for some reason installation does not
work, set the environment variable <code>ARROW_R_DEV=true</code>, retry,
and share the logs with us.</p>
<div class="section level2">
<h2 id="background">Background<a class="anchor" aria-label="anchor" href="#background"></a>
</h2>
<p>The Apache Arrow project is implemented in multiple languages, and
the R package depends on the Arrow C++ library (referred to from here on
as libarrow). This means that when you install arrow, you need both the
R and C++ versions. If you install arrow from CRAN on a machine running
Windows or macOS, when you call <code>install.packages("arrow")</code>,
a precompiled binary containing both the R package and libarrow will be
downloaded. However, CRAN does not host R package binaries for Linux,
and so you must choose from one of the alternative approaches.</p>
<p>This article outlines the recommend approaches to installing arrow on
Linux, starting from the simplest and least customizable to the most
complex but with more flexibility to customize your installation.</p>
<p>The primary audience for this document is arrow R package
<em>users</em> on Linux, and not Arrow <em>developers</em>. Additional
resources for developers are listed at the end of this article.</p>
</div>
<div class="section level2">
<h2 id="system-dependencies">System dependencies<a class="anchor" aria-label="anchor" href="#system-dependencies"></a>
</h2>
<p>The arrow package is designed to work with very minimal system
requirements, but there are a few things to note.</p>
<div class="section level3">
<h3 id="compilers">Compilers<a class="anchor" aria-label="anchor" href="#compilers"></a>
</h3>
<p>As of version 10.0.0, arrow requires a C++17 compiler to build. For
<code>gcc</code>, this generally means version 7 or newer. Most
contemporary Linux distributions have a new enough compiler; however,
CentOS 7 is a notable exception, as it ships with gcc 4.8.</p>
</div>
<div class="section level3">
<h3 id="libraries">Libraries<a class="anchor" aria-label="anchor" href="#libraries"></a>
</h3>
<p>Optional support for reading from cloud storage–AWS S3 and Google
Cloud Storage (GCS)–requires additional system dependencies:</p>
<ul>
<li>CURL: install <code>libcurl-devel</code> (rpm) or
<code>libcurl4-openssl-dev</code> (deb)</li>
<li>OpenSSL &gt;= 1.0.2: install <code>openssl-devel</code> (rpm) or
<code>libssl-dev</code> (deb)</li>
</ul>
<p>The prebuilt binaries come with S3 and GCS support enabled, so you
will need to meet these system requirements in order to use them. If
you’re building everything from source, the install script will check
for the presence of these dependencies and turn off S3 and GCS support
in the build if the prerequisites are not met–installation will succeed
but without S3 or GCS functionality. If afterwards you install the
missing system requirements, you’ll need to reinstall the package in
order to enable S3 and GCS support.</p>
</div>
</div>
<div class="section level2">
<h2 id="install-release-version-easy-way">Install release version (easy way)<a class="anchor" aria-label="anchor" href="#install-release-version-easy-way"></a>
</h2>
<p>On macOS and Windows, when you run
<code>install.packages("arrow")</code> and install arrow from CRAN, you
get an R binary package that contains a precompiled version of libarrow.
Installing binaries is much easier that installing from source, but CRAN
does not host binaries for Linux. This means that the default behaviour
when you run <code><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages()</a></code> on Linux is to retrieve the
source version of the R package and compile both the R package
<em>and</em> libarrow from source. We’ll talk about this scenario in the
next section (the “less easy” way), but first we’ll suggest two faster
alternatives that are usually much easier.</p>
<div class="section level3">
<h3 id="binary-r-package-with-libarrow-binary-via-rspmconda">Binary R package with libarrow binary via RSPM/conda<a class="anchor" aria-label="anchor" href="#binary-r-package-with-libarrow-binary-via-rspmconda"></a>
</h3>
<p><img src="r_binary_libarrow_binary.png" alt="Graphic showing R and C++ logo inside the package icon" width="30%"></p>
<p>If you want a quicker installation process, and by default a more
fully-featured build, you could install arrow from <a href="https://packagemanager.rstudio.com/client/#/" class="external-link">RStudio’s public
package manager</a>, which hosts binaries for both Windows and
Linux.</p>
<p>For example, if you are using Ubuntu 20.04 (Focal):</p>
<div class="sourceCode" id="cb1"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/options.html" class="external-link">options</a></span><span class="op">(</span></span>
<span> HTTPUserAgent <span class="op">=</span></span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/sprintf.html" class="external-link">sprintf</a></span><span class="op">(</span></span>
<span> <span class="st">"R/%s R (%s)"</span>,</span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/numeric_version.html" class="external-link">getRversion</a></span><span class="op">(</span><span class="op">)</span>,</span>
<span> <span class="fu"><a href="https://rdrr.io/r/base/paste.html" class="external-link">paste</a></span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/numeric_version.html" class="external-link">getRversion</a></span><span class="op">(</span><span class="op">)</span>, <span class="va">R.version</span><span class="op">[</span><span class="st">"platform"</span><span class="op">]</span>, <span class="va">R.version</span><span class="op">[</span><span class="st">"arch"</span><span class="op">]</span>, <span class="va">R.version</span><span class="op">[</span><span class="st">"os"</span><span class="op">]</span><span class="op">)</span></span>
<span> <span class="op">)</span></span>
<span><span class="op">)</span></span>
<span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span><span class="st">"arrow"</span>, repos <span class="op">=</span> <span class="st">"https://packagemanager.rstudio.com/all/__linux__/focal/latest"</span><span class="op">)</span></span></code></pre></div>
<p>Note that the User Agent header must be specified as in the example
above. Please check <a href="https://docs.posit.co/rspm/admin/serving-binaries/#using-linux-binary-packages" class="external-link">the
RStudio Package Manager: Admin Guide</a> for more details.</p>
<p>For other Linux distributions, to get the relevant URL, you can visit
<a href="https://packagemanager.rstudio.com/client/#/repos/1/overview" class="external-link">the
RSPM site</a>, click on ‘binary’, and select your preferred
distribution.</p>
<p>Similarly, if you use <code>conda</code> to manage your R
environment, you can get the latest official release of the R package
including libarrow via:</p>
<pre class="shell"><code># Using the --strict-channel-priority flag on `conda install` causes very long
# solve times, so we add it directly to the config
conda config --set channel_priority strict
conda install -c conda-forge r-arrow</code></pre>
</div>
<div class="section level3">
<h3 id="r-source-package-with-libarrow-binary">R source package with libarrow binary<a class="anchor" aria-label="anchor" href="#r-source-package-with-libarrow-binary"></a>
</h3>
<p><img src="r_source_libarrow_binary.png" alt="Graphic showing R logo in folder icon, then a plus sign, then C++ logo inside the package icon" width="50%"></p>
<p>Another way of achieving faster installation with all key features
enabled is to use static libarrow binaries we host. These are used
automatically on many Linux distributions (x86_64 architecture only),
according to the <a href="https://github.com/apache/arrow/blob/main/r/tools/nixlibs-allowlist.txt" class="external-link">allowlist</a>.
If your distribution isn’t in the list, you can opt-in by setting the
<code>NOT_CRAN</code> environment variable before you call
<code><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages()</a></code>:</p>
<div class="sourceCode" id="cb3"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"NOT_CRAN"</span> <span class="op">=</span> <span class="st">"true"</span><span class="op">)</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span><span class="st">"arrow"</span><span class="op">)</span></span></code></pre></div>
<p>This installs the source version of the R package, but during the
installation process will check for compatible libarrow binaries that we
host and use those if available. If no binary is available or can’t be
found, then this option falls back onto method 2 below (full source
build), but setting the environment variable results in a more
fully-featured build than default.</p>
<p>The libarrow binaries include support for AWS S3 and GCS, so they
require the libcurl and openssl libraries installed separately, as noted
above. If you don’t have these installed, the libarrow binary won’t be
used, and you will fall back to the full source build (with S3 and GCS
support disabled).</p>
<p>If the internet access of your computer doesn’t allow downloading the
libarrow binaries (e.g. if access is limited to CRAN), you can first
identify the right source and version by trying to install on the
offline computer:</p>
<div class="sourceCode" id="cb4"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"NOT_CRAN"</span> <span class="op">=</span> <span class="st">"true"</span>, <span class="st">"LIBARROW_BUILD"</span> <span class="op">=</span> <span class="cn">FALSE</span>, <span class="st">"ARROW_R_DEV"</span> <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span><span class="st">"arrow"</span><span class="op">)</span></span>
<span><span class="co"># This will fail if no internet access, but will print the binaries URL</span></span></code></pre></div>
<p>Then you can obtain the libarrow binaries (using a computer with
internet access) and transfer the zip file to the target computer. Now
you just have to tell the installer to use that pre-downloaded file:</p>
<div class="sourceCode" id="cb5"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="co"># Watchout: release numbers of the pre-downloaded libarrow must match CRAN!</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"ARROW_DOWNLOADED_BINARIES"</span> <span class="op">=</span> <span class="st">"/path/to/downloaded/libarrow.zip"</span><span class="op">)</span></span>
<span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span><span class="st">"arrow"</span><span class="op">)</span></span></code></pre></div>
</div>
</div>
<div class="section level2">
<h2 id="install-release-version-less-easy">Install release version (less easy)<a class="anchor" aria-label="anchor" href="#install-release-version-less-easy"></a>
</h2>
<p><img src="r_source_libarrow_source.png" alt="Graphic showing R inside a folder icon, then a plus sign, then C++ logo inside a folder icon" width="50%"></p>
<p>The “less easy” way to install arrow is to install both the R package
and the underlying Arrow C++ library (libarrow) from source. This method
is somewhat more difficult because compiling and installing R packages
with C++ dependencies generally requires installing system packages,
which you may not have privileges to do, and/or building the C++
dependencies separately, which introduces all sorts of additional ways
for things to go wrong.</p>
<p>Installing from the full source build of arrow, compiling both C++
and R bindings, will handle most of the dependency management for you,
but it is much slower than using binaries. However, if using binaries
isn’t an option for you,or you wish to customize your Linux
installation, the instructions in this section explain how to do
that.</p>
<div class="section level3">
<h3 id="basic-configuration">Basic configuration<a class="anchor" aria-label="anchor" href="#basic-configuration"></a>
</h3>
<p>If you wish to install libarrow from source instead of looking for
pre-compiled binaries, you can set the <code>LIBARROW_BINARY</code>
variable.</p>
<div class="sourceCode" id="cb6"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"LIBARROW_BINARY"</span> <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></span></code></pre></div>
<p>By default, this is set to <code>TRUE</code>, and so libarrow will
only be built from source if this environment variable is set to
<code>FALSE</code> or no compatible binary for your OS can be found.</p>
<p>When compiling libarrow from source, you have the power to really
fine-tune which features to install. You can set the environment
variable <code>LIBARROW_MINIMAL</code> to <code>FALSE</code> to enable a
more full-featured build including S3 support and alternative memory
allocators.</p>
<div class="sourceCode" id="cb7"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"LIBARROW_MINIMAL"</span> <span class="op">=</span> <span class="cn">FALSE</span><span class="op">)</span></span></code></pre></div>
<p>By default this variable is unset, which builds many commonly used
features such as Parquet support but disables some features that are
more costly to build, like S3 and GCS support. If set to
<code>TRUE</code>, a trimmed-down version of arrow is installed with all
optional features disabled.</p>
<p>Note that in this guide, you will have seen us mention the
environment variable <code>NOT_CRAN</code> - this is a convenience
variable, which when set to <code>TRUE</code>, automatically sets
<code>LIBARROW_MINIMAL</code> to <code>FALSE</code> and
<code>LIBARROW_BINARY</code> to <code>TRUE</code>.</p>
<p>Building libarrow from source requires more time and resources than
installing a binary. We recommend that you set the environment variable
<code>ARROW_R_DEV</code> to <code>TRUE</code> for more verbose output
during the installation process if anything goes wrong.</p>
<div class="sourceCode" id="cb8"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/base/Sys.setenv.html" class="external-link">Sys.setenv</a></span><span class="op">(</span><span class="st">"ARROW_R_DEV"</span> <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></span></code></pre></div>
<p>Once you have set these variables, call
<code><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages()</a></code> to install arrow using this
configuration.</p>
<div class="sourceCode" id="cb9"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span><span class="st">"arrow"</span><span class="op">)</span></span></code></pre></div>
<p>The section below discusses environment variables you can set before
calling <code>install.packages("arrow")</code> to build from source and
customise your configuration.</p>
<div class="section level4">
<h4 id="handling-libarrow-dependencies">Handling libarrow dependencies<a class="anchor" aria-label="anchor" href="#handling-libarrow-dependencies"></a>
</h4>
<p>When you build libarrow from source, its dependencies will be
automatically downloaded. The environment variable
<code>ARROW_DEPENDENCY_SOURCE</code> controls whether the libarrow
installation also downloads or installs all dependencies (when set to
<code>BUNDLED</code>), uses only system-installed dependencies (when set
to <code>SYSTEM</code>) or checks system-installed dependencies first
and only installs dependencies which aren’t already present (when set to
<code>AUTO</code>, the default).</p>
<p>These dependencies vary by platform; however, if you wish to install
these yourself prior to libarrow installation, we recommend that you
take a look at the <a href="https://github.com/apache/arrow/tree/main/ci/docker" class="external-link">docker file
for whichever of our CI builds</a> (the ones ending in “cpp” are for
building Arrow’s C++ libraries, aka libarrow) corresponds most closely
to your setup. This will contain the most up-to-date information about
dependencies and minimum versions.</p>
<p>If downloading dependencies at build time is not an option, as when
building on a system that is disconnected or behind a firewall, there
are a few options. See “Offline builds” below.</p>
</div>
<div class="section level4">
<h4 id="dependencies-for-s3-and-gcs-support">Dependencies for S3 and GCS support<a class="anchor" aria-label="anchor" href="#dependencies-for-s3-and-gcs-support"></a>
</h4>
<p>Support for working with data in S3 and GCS is not enabled in the
default source build, and it has additional system requirements as
described above. To enable it, set the environment variable
<code>LIBARROW_MINIMAL=false</code> or <code>NOT_CRAN=true</code> to
choose the full-featured build, or more selectively set
<code>ARROW_S3=ON</code> and/or <code>ARROW_GCS=ON</code>.</p>
<p>When either feature is enabled, the install script will check for the
presence of the required dependencies, and if the prerequisites are met,
it will turn off S3 and GCS support–installation will succeed but
without S3 or GCS functionality. If afterwards you install the missing
system requirements, you’ll need to reinstall the package in order to
enable S3 and GCS support.</p>
</div>
</div>
<div class="section level3">
<h3 id="advanced-configuration">Advanced configuration<a class="anchor" aria-label="anchor" href="#advanced-configuration"></a>
</h3>
<p>In this section, we describe how to fine-tune your installation at a
more granular level.</p>
<div class="section level4">
<h4 id="libarrow-configuration">libarrow configuration<a class="anchor" aria-label="anchor" href="#libarrow-configuration"></a>
</h4>
<p>Some features are optional when you build Arrow from source - you can
configure whether these components are built via the use of environment
variables. The names of the environment variables which control these
features and their default values are shown below.</p>
<table class="table">
<colgroup>
<col width="33%">
<col width="33%">
<col width="33%">
</colgroup>
<thead><tr class="header">
<th>Name</th>
<th>Description</th>
<th align="center">Default Value</th>
</tr></thead>
<tbody>
<tr class="odd">
<td><code>ARROW_S3</code></td>
<td>S3 support (if dependencies are met)*</td>
<td align="center"><code>OFF</code></td>
</tr>
<tr class="even">
<td><code>ARROW_GCS</code></td>
<td>GCS support (if dependencies are met)*</td>
<td align="center"><code>OFF</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_JEMALLOC</code></td>
<td>The <code>jemalloc</code> memory allocator</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_MIMALLOC</code></td>
<td>The <code>mimalloc</code> memory allocator</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_PARQUET</code></td>
<td></td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_DATASET</code></td>
<td></td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_JSON</code></td>
<td>The JSON parsing library</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_WITH_RE2</code></td>
<td>The RE2 regular expression library, used in some string compute
functions</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_WITH_UTF8PROC</code></td>
<td>The UTF8Proc string library, used in many other string compute
functions</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_WITH_BROTLI</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_WITH_BZ2</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_WITH_LZ4</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_WITH_SNAPPY</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="even">
<td><code>ARROW_WITH_ZLIB</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_WITH_ZSTD</code></td>
<td>Compression algorithm</td>
<td align="center"><code>ON</code></td>
</tr>
</tbody>
</table>
</div>
<div class="section level4">
<h4 id="r-package-configuration">R package configuration<a class="anchor" aria-label="anchor" href="#r-package-configuration"></a>
</h4>
<p>There are a number of other variables that affect the
<code>configure</code> script and the bundled build script. All boolean
variables are case-insensitive.</p>
<table class="table">
<colgroup>
<col width="33%">
<col width="33%">
<col width="33%">
</colgroup>
<thead><tr class="header">
<th>Name</th>
<th>Description</th>
<th align="center">Default</th>
</tr></thead>
<tbody>
<tr class="odd">
<td><code>LIBARROW_BUILD</code></td>
<td>Allow building from source</td>
<td align="center"><code>true</code></td>
</tr>
<tr class="even">
<td><code>LIBARROW_BINARY</code></td>
<td>Try to install <code>libarrow</code> binary instead of building from
source</td>
<td align="center">(unset)</td>
</tr>
<tr class="odd">
<td><code>LIBARROW_DOWNLOAD</code></td>
<td>Set to <code>false</code> to explicitly forbid fetching a
<code>libarrow</code> binary</td>
<td align="center">(unset)</td>
</tr>
<tr class="even">
<td><code>LIBARROW_MINIMAL</code></td>
<td>Build with minimal features enabled</td>
<td align="center">(unset)</td>
</tr>
<tr class="odd">
<td><code>NOT_CRAN</code></td>
<td>Set <code>LIBARROW_BINARY=true</code> and
<code>LIBARROW_MINIMAL=false</code>
</td>
<td align="center"><code>false</code></td>
</tr>
<tr class="even">
<td><code>ARROW_R_DEV</code></td>
<td>More verbose messaging and regenerates some code</td>
<td align="center"><code>false</code></td>
</tr>
<tr class="odd">
<td><code>ARROW_USE_PKG_CONFIG</code></td>
<td>Use <code>pkg-config</code> to search for <code>libarrow</code>
install</td>
<td align="center"><code>true</code></td>
</tr>
<tr class="even">
<td><code>LIBARROW_DEBUG_DIR</code></td>
<td>Directory to save source build logs</td>
<td align="center">(unset)</td>
</tr>
<tr class="odd">
<td><code>CMAKE</code></td>
<td>Alternative CMake path</td>
<td align="center">(unset)</td>
</tr>
</tbody>
</table>
<p>See below for more in-depth explanations of these environment
variables.</p>
<ul>
<li>
<code>LIBARROW_BINARY</code> : By default on many distributions, or
if explicitly set to <code>true</code>, the script will determine
whether there is a prebuilt libarrow that will work with your system.
You can set it to <code>false</code> to skip this option altogether, or
you can specify a string “distro-version” that corresponds to a binary
that is available, to override what this function may discover by
default. Possible values are: “linux-openssl-1.0”, “linux-openssl-1.1”,
“linux-openssl-3.0”.</li>
<li>
<code>LIBARROW_BUILD</code> : If set to <code>false</code>, the
build script will not attempt to build the C++ from source. This means
you will only get a working arrow R package if a prebuilt binary is
found. Use this if you want to avoid compiling the C++ library, which
may be slow and resource-intensive, and ensure that you only use a
prebuilt binary.</li>
<li>
<code>LIBARROW_MINIMAL</code> : If set to <code>false</code>, the
build script will enable some optional features, including S3 support
and additional alternative memory allocators. This will increase the
source build time but results in a more fully functional library. If set
to <code>true</code> turns off Parquet, Datasets, compression libraries,
and other optional features. This is not commonly used but may be
helpful if needing to compile on a platform that does not support these
features, e.g. Solaris.</li>
<li>
<code>NOT_CRAN</code> : If this variable is set to
<code>true</code>, as the <code>devtools</code> package does, the build
script will set <code>LIBARROW_BINARY=true</code> and
<code>LIBARROW_MINIMAL=false</code> unless those environment variables
are already set. This provides for a more complete and fast installation
experience for users who already have <code>NOT_CRAN=true</code> as part
of their workflow, without requiring additional environment variables to
be set.</li>
<li>
<code>ARROW_R_DEV</code> : If set to <code>true</code>, more verbose
messaging will be printed in the build script.
<code>arrow::install_arrow(verbose = TRUE)</code> sets this. This
variable also is needed if you’re modifying C++ code in the package: see
the developer guide article.</li>
<li>
<code>ARROW_USE_PKG_CONFIG</code>: If set to <code>false</code>, the
configure script won’t look for Arrow libraries on your system and
instead will look to download/build them. Use this if you have a version
mismatch between installed system libraries and the version of the R
package you’re installing.</li>
<li>
<code>LIBARROW_DEBUG_DIR</code> : If the C++ library building from
source fails (<code>cmake</code>), there may be messages telling you to
check some log file in the build directory. However, when the library is
built during R package installation, that location is in a temp
directory that is already deleted. To capture those logs, set this
variable to an absolute (not relative) path and the log files will be
copied there. The directory will be created if it does not exist.</li>
<li>
<code>CMAKE</code> : When building the C++ library from source, you
can specify a <code>/path/to/cmake</code> to use a different version
than whatever is found on the <code>$PATH</code>.</li>
</ul>
</div>
</div>
</div>
<div class="section level2">
<h2 id="using-install_arrow">Using install_arrow()<a class="anchor" aria-label="anchor" href="#using-install_arrow"></a>
</h2>
<p>The previous instructions are useful for a fresh arrow installation,
but arrow provides the function <code><a href="../reference/install_arrow.html">install_arrow()</a></code>. There are
three common use cases for this function:</p>
<ul>
<li>You have arrow installed and want to upgrade to a different
version</li>
<li>You want to try to reinstall and fix issues with Linux C++
binaries</li>
<li>You want to install a development build</li>
</ul>
<p>Examples of using <code><a href="../reference/install_arrow.html">install_arrow()</a></code> are shown below:</p>
<div class="sourceCode" id="cb10"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="../reference/install_arrow.html">install_arrow</a></span><span class="op">(</span><span class="op">)</span> <span class="co"># latest release</span></span>
<span><span class="fu"><a href="../reference/install_arrow.html">install_arrow</a></span><span class="op">(</span>nightly <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span> <span class="co"># install development version</span></span>
<span><span class="fu"><a href="../reference/install_arrow.html">install_arrow</a></span><span class="op">(</span>verbose <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span> <span class="co"># verbose output to debug install errors</span></span></code></pre></div>
<p>Although this function is part of the arrow package, it is also
available as a standalone script, so you can access it without first
installing the package:</p>
<div class="sourceCode" id="cb11"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/source.html" class="external-link">source</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R"</span><span class="op">)</span></span></code></pre></div>
<p>Notes:</p>
<ul>
<li>
<code><a href="../reference/install_arrow.html">install_arrow()</a></code> does not require environment variables
to be set in order to satisfy C++ dependencies.</li>
<li>unlike packages like <code>tensorflow</code>, <code>blogdown</code>,
and others that require external dependencies, you do not need to run
<code><a href="../reference/install_arrow.html">install_arrow()</a></code> after a successful arrow installation.</li>
</ul>
</div>
<div class="section level2">
<h2 id="offline-installation">Offline installation<a class="anchor" aria-label="anchor" href="#offline-installation"></a>
</h2>
<p>The <code>install-arrow.R</code> file mentioned in the previous
section includes a function called
<code><a href="../reference/create_package_with_all_dependencies.html">create_package_with_all_dependencies()</a></code>. Normally, when
installing on a computer with internet access, the build process will
download third-party dependencies as needed. This function provides a
way to download them in advance, which can be useful when installing
Arrow on a computer without internet access. The process is as
follows:</p>
<p><strong>Step 1.</strong> Using a computer with internet access,
download dependencies:</p>
<ul>
<li>
<p>Install the arrow package <strong>or</strong> source the script
directly using the following command:</p>
<div class="sourceCode" id="cb12"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="kw"><a href="https://rdrr.io/r/base/source.html" class="external-link">source</a></span><span class="op">(</span><span class="st">"https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R"</span><span class="op">)</span></span></code></pre></div>
</li>
<li>
<p>Use the <code><a href="../reference/create_package_with_all_dependencies.html">create_package_with_all_dependencies()</a></code>
function to create the installation bundle:</p>
<div class="sourceCode" id="cb13"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="../reference/create_package_with_all_dependencies.html">create_package_with_all_dependencies</a></span><span class="op">(</span><span class="st">"my_arrow_pkg.tar.gz"</span><span class="op">)</span></span></code></pre></div>
</li>
<li><p>Copy the newly created <code>my_arrow_pkg.tar.gz</code> file to
the computer without internet access</p></li>
</ul>
<p><strong>Step 2.</strong> On the computer without internet access,
install the prepared package:</p>
<ul>
<li>
<p>Install the arrow package from the copied file:</p>
<div class="sourceCode" id="cb14"><pre class="downlit sourceCode r">
<code class="sourceCode R"><span><span class="fu"><a href="https://rdrr.io/r/utils/install.packages.html" class="external-link">install.packages</a></span><span class="op">(</span></span>
<span> <span class="st">"my_arrow_pkg.tar.gz"</span>,</span>
<span> dependencies <span class="op">=</span> <span class="fu"><a href="https://rdrr.io/r/base/c.html" class="external-link">c</a></span><span class="op">(</span><span class="st">"Depends"</span>, <span class="st">"Imports"</span>, <span class="st">"LinkingTo"</span><span class="op">)</span></span>
<span> <span class="op">)</span></span></code></pre></div>
<p>This installation will build from source, so <code>cmake</code> must
be available</p>
</li>
<li><p>Run <code><a href="../reference/arrow_info.html">arrow_info()</a></code> to check installed
capabilities</p></li>
</ul>
<p>Notes:</p>
<ul>
<li><p>arrow <em>can</em> be installed on a computer without internet
access without using this function, but many useful features will be
disabled, as they depend on third-party components. More precisely,
<code>arrow::arrow_info()$capabilities()</code> will be
<code>FALSE</code> for every capability.</p></li>
<li><p>If you are using binary packages you shouldn’t need to this
function. You can download the appropriate binary from your package
repository, transfer that to the offline computer, and install
that.</p></li>
<li><p>If you’re using RStudio Package Manager on Linux (RSPM), and you
want to make a source bundle with this function, make sure to set the
first repository in <code>options("repos")</code> to be a mirror that
contains source packages. That is, the repository needs to be something
other than the RSPM binary mirror URLs.</p></li>
</ul>
</div>
<div class="section level2">
<h2 id="offline-installation-alternative">Offline installation (alternative)<a class="anchor" aria-label="anchor" href="#offline-installation-alternative"></a>
</h2>
<p>A second method for offline installation is a little more hands-on.
Follow these steps if you wish to try it:</p>
<ul>
<li>Download the dependency files
(<code>cpp/thirdparty/download_dependencies.sh</code> may be
helpful)</li>
<li>Copy the directory of dependencies to the offline computer</li>
<li>Create the environment variable
<code>ARROW_THIRDPARTY_DEPENDENCY_DIR</code> on the offline computer,
pointing to the copied directory.</li>
<li>Install the arrow package as usual.</li>
</ul>
<p>For offline installation using libarrow binaries, see Method 1b
above.</p>
</div>
<div class="section level2">
<h2 id="troubleshooting">Troubleshooting<a class="anchor" aria-label="anchor" href="#troubleshooting"></a>
</h2>
<p>The intent is that <code>install.packages("arrow")</code> will just
work and handle all C++ dependencies, but depending on your system, you
may have better results if you tune one of several parameters. Here are
some known complications and ways to address them.</p>
<div class="section level3">
<h3 id="package-failed-to-build-c-dependencies">Package failed to build C++ dependencies<a class="anchor" aria-label="anchor" href="#package-failed-to-build-c-dependencies"></a>
</h3>
<p>If you see a message like</p>
<pre><code>------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------</code></pre>
<p>in the output when the package fails to install, that means that
installation failed to retrieve or build the libarrow version compatible
with the current version of the R package.</p>
<p>Please check the “Known installation issues” below to see if any
apply, and if none apply, set the environment variable
<code>ARROW_R_DEV=TRUE</code> for more verbose output and try installing
again. Then, please <a href="https://github.com/apache/arrow/issues/new/choose" class="external-link">report an
issue</a> and include the full installation output.</p>
</div>
<div class="section level3">
<h3 id="using-system-libraries">Using system libraries<a class="anchor" aria-label="anchor" href="#using-system-libraries"></a>
</h3>
<p>If a system library or other installed Arrow is found but it doesn’t
match the R package version (for example, you have libarrow 1.0.0 on
your system and are installing R package 2.0.0), it is likely that the R
bindings will fail to compile. Because the Apache Arrow project is under
active development, it is essential that versions of libarrow and the R
package matches. When <code>install.packages("arrow")</code> has to
download libarrow, the install script ensures that you fetch the
libarrow version that corresponds to your R package version. However, if
you are using a version of libarrow already on your system, version
match isn’t guaranteed.</p>
<p>To fix version mismatch, you can either update your libarrow system
packages to match the R package version, or set the environment variable
<code>ARROW_USE_PKG_CONFIG=FALSE</code> to tell the configure script not
to look for system version of libarrow. (The latter is the default of
<code><a href="../reference/install_arrow.html">install_arrow()</a></code>.) System libarrow versions are available
corresponding to all CRAN releases but not for nightly or dev versions,
so depending on the R package version you’re installing, system libarrow
version may not be an option.</p>
<p>Note also that once you have a working R package installation based
on system (shared) libraries, if you update your system libarrow
installation, you’ll need to reinstall the R package to match its
version. Similarly, if you’re using libarrow system libraries, running
<code><a href="https://rdrr.io/r/utils/update.packages.html" class="external-link">update.packages()</a></code> after a new release of the arrow package
will likely fail unless you first update the libarrow system
packages.</p>
</div>
<div class="section level3">
<h3 id="using-prebuilt-binaries">Using prebuilt binaries<a class="anchor" aria-label="anchor" href="#using-prebuilt-binaries"></a>
</h3>
<p>If the R package finds and downloads a prebuilt binary of libarrow,
but then the arrow package can’t be loaded, perhaps with “undefined
symbols” errors, please <a href="https://github.com/apache/arrow/issues/new/choose" class="external-link">report an
issue</a>. This is likely a compiler mismatch and may be resolvable by
setting some environment variables to instruct R to compile the packages
to match libarrow.</p>
<p>A workaround would be to set the environment variable
<code>LIBARROW_BINARY=FALSE</code> and retry installation: this value
instructs the package to build libarrow from source instead of
downloading the prebuilt binary. That should guarantee that the compiler
settings match.</p>
<p>If a prebuilt libarrow binary wasn’t found for your operating system
but you think it should have been, please <a href="https://github.com/apache/arrow/issues/new/choose" class="external-link">report an
issue</a> and share the console output. You may also set the environment
variable <code>ARROW_R_DEV=TRUE</code> for additional debug
messages.</p>
</div>
<div class="section level3">
<h3 id="building-libarrow-from-source">Building libarrow from source<a class="anchor" aria-label="anchor" href="#building-libarrow-from-source"></a>
</h3>
<p>If building libarrow from source fails, check the error message. (If
you don’t see an error message, only the <code>----- NOTE -----</code>,
set the environment variable <code>ARROW_R_DEV=TRUE</code> to increase
verbosity and retry installation.) The install script should work
everywhere, so if libarrow fails to compile, please <a href="https://github.com/apache/arrow/issues/new/choose" class="external-link">report an
issue</a> so that we can improve the script.</p>
</div>
</div>
<div class="section level2">
<h2 id="contributing">Contributing<a class="anchor" aria-label="anchor" href="#contributing"></a>
</h2>
<p>We are constantly working to make the installation process as
painless as possible. If you find ways to improve the process, please <a href="https://github.com/apache/arrow/issues" class="external-link">report an issue</a> so
that we can document it. Similarly, if you find that your Linux
distribution or version is not supported, we would welcome the
contribution of Docker images (hosted on Docker Hub) that we can use in
our continuous integration and hopefully improve our coverage. If you do
contribute a Docker image, it should be as minimal as possible,
containing only R and the dependencies it requires. For reference, see
the images that <a href="https://github.com/r-hub/rhub-linux-builders" class="external-link">R-hub</a> uses.</p>
<p>You can test the arrow R package installation using the
<code>docker compose</code> setup included in the
<code>apache/arrow</code> git repository. For example,</p>
<pre><code>R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker compose build r
R_ORG=rhub R_IMAGE=ubuntu-release R_TAG=latest docker compose run r</code></pre>
<p>installs the arrow R package, including libarrow, on the <a href="https://hub.docker.com/r/rhub/ubuntu-release" class="external-link">rhub/ubuntu-release</a>
image.</p>
</div>
<div class="section level2">
<h2 id="further-reading">Further reading<a class="anchor" aria-label="anchor" href="#further-reading"></a>
</h2>
<ul>
<li>To learn about installing development versions, see the article on
<a href="./install_nightly.html">installing nightly builds</a>.</li>
<li>If you’re contributing to the Arrow project, see the <a href="./developing.html">Arrow R developers guide</a> for resources to
help you on set up your development environment.</li>
<li>Arrow developers may also wish to read a more detailed discussion of
the code run during the installation process, described in the <a href="./developers/install_details.html">install details
article</a>.</li>
</ul>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p><a href="https://arrow.apache.org/docs/r/versions.html">Older versions of these docs</a></p>
</div>
<div class="pkgdown-footer-right">
<p>Site built with <a href="https://pkgdown.r-lib.org/" class="external-link">pkgdown</a> 2.1.3.</p>
</div>
</footer>
</div>
</body>
</html>