| <!DOCTYPE html> |
| <!-- Generated by pkgdown: do not edit by hand --><html lang="en"> |
| <head> |
| <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>Apache Arrow in Python and R with reticulate • Arrow R Package</title> |
| <!-- jquery --><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.4.1/jquery.min.js" integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script><!-- Bootstrap --><link href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/3.4.0/cosmo/bootstrap.min.css" rel="stylesheet" crossorigin="anonymous"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.4.1/js/bootstrap.min.js" integrity="sha256-nuL8/2cJ5NDSSwnKD8VqreErSWHtnEP9E7AySL+1ev4=" crossorigin="anonymous"></script><!-- bootstrap-toc --><link rel="stylesheet" href="../bootstrap-toc.css"> |
| <script src="../bootstrap-toc.js"></script><!-- Font Awesome icons --><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/all.min.css" integrity="sha256-mmgLkCYLUQbXn0B1SRqzHar6dCnv9oZFPEC1g1cwlkk=" crossorigin="anonymous"> |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.12.1/css/v4-shims.min.css" integrity="sha256-wZjR52fzng1pJHwx4aV2AO3yyTOXrcDW7jBpJtTwVxw=" crossorigin="anonymous"> |
| <!-- clipboard.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/2.0.6/clipboard.min.js" integrity="sha256-inc5kl9MA1hkeYUt+EC3BhlIgyp/2jDIyBLS6k3UxPI=" crossorigin="anonymous"></script><!-- headroom.js --><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/headroom.min.js" integrity="sha256-AsUX4SJE1+yuDu5+mAVzJbuYNPHj/WroHuZ8Ir/CkE0=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/headroom/0.11.0/jQuery.headroom.min.js" integrity="sha256-ZX/yNShbjqsohH1k95liqY9Gd8uOiE1S4vZc+9KQ1K4=" crossorigin="anonymous"></script><!-- pkgdown --><link href="../pkgdown.css" rel="stylesheet"> |
| <script src="../pkgdown.js"></script><script src="../extra.js"></script><meta property="og:title" content="Apache Arrow in Python and R with reticulate"> |
| <meta property="og:description" content="arrow"> |
| <!-- mathjax --><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js" integrity="sha256-nvJJv9wWKEm88qvoQl9ekL2J+k/RWIsaSScxxlsrv8k=" crossorigin="anonymous"></script><script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/config/TeX-AMS-MML_HTMLorMML.js" integrity="sha256-84DKXVJXs0/F8OTMzX4UR909+jtl4G7SPypPavF+GfA=" crossorigin="anonymous"></script><!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]-->
|
|
|
| <!-- Matomo -->
|
| <script>
|
| var _paq = window._paq = window._paq || [];
|
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
|
| _paq.push(["setDoNotTrack", true]);
|
| _paq.push(["disableCookies"]);
|
| _paq.push(['trackPageView']);
|
| _paq.push(['enableLinkTracking']);
|
| (function() {
|
| var u="https://analytics.apache.org/";
|
| _paq.push(['setTrackerUrl', u+'matomo.php']);
|
| _paq.push(['setSiteId', '20']);
|
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
|
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
|
| })();
|
| </script>
|
| <!-- End Matomo Code -->
|
| |
| </head> |
| <body data-spy="scroll" data-target="#toc"> |
| <div class="container template-article"> |
| <header><div class="navbar navbar-default navbar-fixed-top" role="navigation"> |
| <div class="container"> |
| <div class="navbar-header"> |
| <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false"> |
| <span class="sr-only">Toggle navigation</span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| <span class="navbar-brand"> |
| <a class="navbar-link" href="../index.html">Arrow R Package</a> |
| <span class="version label label-default" data-toggle="tooltip" data-placement="bottom" title="Released version">5.0.0</span> |
| </span> |
| </div> |
| |
| <div id="navbar" class="navbar-collapse collapse"> |
| <ul class="nav navbar-nav"> |
| <li> |
| <a href="https://arrow.apache.org/">❯❯❯</a> |
| </li> |
| <li> |
| <a href="../articles/arrow.html">Get started</a> |
| </li> |
| <li> |
| <a href="../reference/index.html">Reference</a> |
| </li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false"> |
| Articles |
| |
| <span class="caret"></span> |
| </a> |
| <ul class="dropdown-menu" role="menu"> |
| <li> |
| <a href="../articles/install.html">Installing the Arrow Package on Linux</a> |
| </li> |
| <li> |
| <a href="../articles/dataset.html">Working with Arrow Datasets and dplyr</a> |
| </li> |
| <li> |
| <a href="../articles/fs.html">Working with Cloud Storage (S3)</a> |
| </li> |
| <li> |
| <a href="../articles/python.html">Apache Arrow in Python and R with reticulate</a> |
| </li> |
| <li> |
| <a href="../articles/flight.html">Connecting to Flight RPC Servers</a> |
| </li> |
| <li> |
| <a href="../articles/developing.html">Arrow R Developer Guide</a> |
| </li> |
| </ul> |
| </li> |
| <li> |
| <a href="../news/index.html">Changelog</a> |
| </li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false"> |
| Project docs |
| |
| <span class="caret"></span> |
| </a> |
| <ul class="dropdown-menu" role="menu"> |
| <li> |
| <a href="https://arrow.apache.org/docs/format/README.html">Specification</a> |
| </li> |
| <li> |
| <a href="https://arrow.apache.org/docs/c_glib">C GLib</a> |
| </li> |
| <li> |
| <a href="https://arrow.apache.org/docs/cpp">C++</a> |
| </li> |
| <li> |
| <a href="https://arrow.apache.org/docs/java">Java</a> |
| </li> |
| <li> |
| <a href="https://arrow.apache.org/docs/js">JavaScript</a> |
| </li> |
| <li> |
| <a href="https://arrow.apache.org/docs/python">Python</a> |
| </li> |
| <li> |
| <a href="../index.html">R</a> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| <ul class="nav navbar-nav navbar-right"></ul> |
| </div> |
| <!--/.nav-collapse --> |
| </div> |
| <!--/.container --> |
| </div> |
| <!--/.navbar --> |
| |
| |
| |
| </header><div class="row"> |
| <div class="col-md-9 contents"> |
| <div class="page-header toc-ignore"> |
| <h1 data-toc-skip>Apache Arrow in Python and R with reticulate</h1> |
| |
| |
| <small class="dont-index">Source: <a href="https://github.com/apache/arrow/blob/master/r/vignettes/python.Rmd"><code>vignettes/python.Rmd</code></a></small> |
| <div class="hidden name"><code>python.Rmd</code></div> |
| |
| </div> |
| |
| |
| |
| <p>The <code>arrow</code> package provides <code>reticulate</code> methods for passing data between R and Python in the same process. This document provides a brief overview.</p> |
| <div id="installing" class="section level2"> |
| <h2 class="hasAnchor"> |
| <a href="#installing" class="anchor"></a>Installing</h2> |
| <p>To use <code>arrow</code> in Python, at a minimum you’ll need the <code>pyarrow</code> library. To install it in a virtualenv,</p> |
| <div class="sourceCode" id="cb1"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://github.com/rstudio/reticulate">reticulate</a></span><span class="op">)</span> |
| <span class="fu"><a href="https://rdrr.io/pkg/reticulate/man/virtualenv-tools.html">virtualenv_create</a></span><span class="op">(</span><span class="st">"arrow-env"</span><span class="op">)</span> |
| <span class="fu"><a href="../reference/install_pyarrow.html">install_pyarrow</a></span><span class="op">(</span><span class="st">"arrow-env"</span><span class="op">)</span></code></pre></div> |
| <p>If you want to install a development version of <code>pyarrow</code>, add <code>nightly = TRUE</code>:</p> |
| <div class="sourceCode" id="cb2"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="fu"><a href="../reference/install_pyarrow.html">install_pyarrow</a></span><span class="op">(</span><span class="st">"arrow-env"</span>, nightly <span class="op">=</span> <span class="cn">TRUE</span><span class="op">)</span></code></pre></div> |
| <p><code><a href="../reference/install_pyarrow.html">install_pyarrow()</a></code> also works with <code>conda</code> environments (<code><a href="https://rdrr.io/pkg/reticulate/man/conda-tools.html">conda_create()</a></code> instead of <code><a href="https://rdrr.io/pkg/reticulate/man/virtualenv-tools.html">virtualenv_create()</a></code>).</p> |
| <p>For more on installing and configuring Python, see the <a href="https://rstudio.github.io/reticulate/articles/python_packages.html">reticulate docs</a>.</p> |
| </div> |
| <div id="using" class="section level2"> |
| <h2 class="hasAnchor"> |
| <a href="#using" class="anchor"></a>Using</h2> |
| <p>To start, load <code>arrow</code> and <code>reticulate</code>, and then import <code>pyarrow</code>.</p> |
| <div class="sourceCode" id="cb3"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://github.com/apache/arrow/">arrow</a></span><span class="op">)</span> |
| <span class="kw"><a href="https://rdrr.io/r/base/library.html">library</a></span><span class="op">(</span><span class="va"><a href="https://github.com/rstudio/reticulate">reticulate</a></span><span class="op">)</span> |
| <span class="fu"><a href="https://rdrr.io/pkg/reticulate/man/use_python.html">use_virtualenv</a></span><span class="op">(</span><span class="st">"arrow-env"</span><span class="op">)</span> |
| <span class="va">pa</span> <span class="op"><-</span> <span class="fu"><a href="https://rdrr.io/pkg/reticulate/man/import.html">import</a></span><span class="op">(</span><span class="st">"pyarrow"</span><span class="op">)</span></code></pre></div> |
| <p>The package includes support for sharing Arrow <code>Array</code> and <code>RecordBatch</code> objects in-process between R and Python. For example, let’s create an <code>Array</code> in <code>pyarrow</code>.</p> |
| <div class="sourceCode" id="cb4"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="va">a</span> <span class="op"><-</span> <span class="va">pa</span><span class="op">$</span><span class="fu">array</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">1</span>, <span class="fl">2</span>, <span class="fl">3</span><span class="op">)</span><span class="op">)</span> |
| <span class="va">a</span> |
| |
| <span class="co">## Array</span> |
| <span class="co">## <double></span> |
| <span class="co">## [</span> |
| <span class="co">## 1,</span> |
| <span class="co">## 2,</span> |
| <span class="co">## 3</span> |
| <span class="co">## ]</span></code></pre></div> |
| <p><code>a</code> is now an <code>Array</code> object in our R session, even though we created it in Python. We can apply R methods on it:</p> |
| <div class="sourceCode" id="cb5"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="va">a</span><span class="op">[</span><span class="va">a</span> <span class="op">></span> <span class="fl">1</span><span class="op">]</span> |
| |
| <span class="co">## Array</span> |
| <span class="co">## <double></span> |
| <span class="co">## [</span> |
| <span class="co">## 2,</span> |
| <span class="co">## 3</span> |
| <span class="co">## ]</span></code></pre></div> |
| <p>We can send data both ways. One reason we might want to use <code>pyarrow</code> in R is to take advantage of functionality that is better supported in Python than in R. For example, <code>pyarrow</code> has a <code>concat_arrays</code> function, but as of 0.17, this function is not implemented in the <code>arrow</code> R package. We can use <code>reticulate</code> to use it efficiently.</p> |
| <div class="sourceCode" id="cb6"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="va">b</span> <span class="op"><-</span> <span class="va">Array</span><span class="op">$</span><span class="fu">create</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/c.html">c</a></span><span class="op">(</span><span class="fl">5</span>, <span class="fl">6</span>, <span class="fl">7</span>, <span class="fl">8</span>, <span class="fl">9</span><span class="op">)</span><span class="op">)</span> |
| <span class="va">a_and_b</span> <span class="op"><-</span> <span class="va">pa</span><span class="op">$</span><span class="fu">concat_arrays</span><span class="op">(</span><span class="fu"><a href="https://rdrr.io/r/base/list.html">list</a></span><span class="op">(</span><span class="va">a</span>, <span class="va">b</span><span class="op">)</span><span class="op">)</span> |
| <span class="va">a_and_b</span> |
| |
| <span class="co">## Array</span> |
| <span class="co">## <double></span> |
| <span class="co">## [</span> |
| <span class="co">## 1,</span> |
| <span class="co">## 2,</span> |
| <span class="co">## 3,</span> |
| <span class="co">## 5,</span> |
| <span class="co">## 6,</span> |
| <span class="co">## 7,</span> |
| <span class="co">## 8,</span> |
| <span class="co">## 9</span> |
| <span class="co">## ]</span></code></pre></div> |
| <p>Now we have a single <code>Array</code> in R.</p> |
| <p>“Send”, however, isn’t the correct word. Internally, we’re passing pointers to the data between the R and Python interpreters running together in the same process, without copying anything. Nothing is being sent: we’re sharing and accessing the same internal Arrow memory buffers.</p> |
| </div> |
| <div id="troubleshooting" class="section level2"> |
| <h2 class="hasAnchor"> |
| <a href="#troubleshooting" class="anchor"></a>Troubleshooting</h2> |
| <p>If you get an error like</p> |
| <pre><code>Error in py_get_attr_impl(x, name, silent) : |
| AttributeError: 'pyarrow.lib.DoubleArray' object has no attribute '_export_to_c'</code></pre> |
| <p>it means that the version of <code>pyarrow</code> you’re using is too old. Support for passing data to and from R is included in versions 0.17 and greater. Check your pyarrow version like this:</p> |
| <div class="sourceCode" id="cb8"><pre class="downlit sourceCode r"> |
| <code class="sourceCode R"><span class="va">pa</span><span class="op">$</span><span class="va">`__version__`</span> |
| |
| <span class="co">## [1] "0.16.0"</span></code></pre></div> |
| <p>Note that your <code>pyarrow</code> and <code>arrow</code> versions don’t need themselves to match: they just need to be 0.17 or greater.</p> |
| </div> |
| </div> |
| |
| <div class="col-md-3 hidden-xs hidden-sm" id="pkgdown-sidebar"> |
| |
| <nav id="toc" data-toggle="toc"><h2 data-toc-skip>Contents</h2> |
| </nav> |
| </div> |
| |
| </div> |
| |
| |
| |
| <footer><div class="copyright"> |
| <p>Developed by Neal Richardson, Ian Cook, Nic Crane, Jonathan Keane, Romain François, Jeroen Ooms, Apache Arrow.</p> |
| </div> |
| |
| <div class="pkgdown"> |
| <p>Site built with <a href="https://pkgdown.r-lib.org/">pkgdown</a> 1.6.1.</p> |
| </div> |
| |
| </footer> |
| </div> |
| |
| |
| |
| |
| <script type="text/javascript" src="/docs/_static/versionwarning.js"></script> </body> |
| </html> |