blob: b726dc4867ebeb36499d321b6f84394839a2ca72 [file] [log] [blame]
<!DOCTYPE html>
<!-- Generated by pkgdown: do not edit by hand --><html lang="en-US">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<title>Introduction for developers • Arrow R Package</title>
<!-- favicons --><link rel="icon" type="image/png" sizes="96x96" href="../favicon-96x96.png">
<link rel="icon" type="”image/svg+xml”" href="../favicon.svg">
<link rel="apple-touch-icon" sizes="180x180" href="../apple-touch-icon.png">
<link rel="icon" sizes="any" href="../favicon.ico">
<link rel="manifest" href="../site.webmanifest">
<script src="../deps/jquery-3.6.0/jquery-3.6.0.min.js"></script><meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link href="../deps/bootstrap-5.3.1/bootstrap.min.css" rel="stylesheet">
<script src="../deps/bootstrap-5.3.1/bootstrap.bundle.min.js"></script><link href="../deps/font-awesome-6.5.2/css/all.min.css" rel="stylesheet">
<link href="../deps/font-awesome-6.5.2/css/v4-shims.min.css" rel="stylesheet">
<script src="../deps/headroom-0.11.0/headroom.min.js"></script><script src="../deps/headroom-0.11.0/jQuery.headroom.min.js"></script><script src="../deps/bootstrap-toc-1.0.1/bootstrap-toc.min.js"></script><script src="../deps/clipboard.js-2.0.11/clipboard.min.js"></script><script src="../deps/search-1.0.0/autocomplete.jquery.min.js"></script><script src="../deps/search-1.0.0/fuse.min.js"></script><script src="../deps/search-1.0.0/mark.min.js"></script><!-- pkgdown --><script src="../pkgdown.js"></script><link href="../extra.css" rel="stylesheet">
<meta property="og:title" content="Introduction for developers">
<meta name="description" content="Learn how to contribute to the arrow package
">
<meta property="og:description" content="Learn how to contribute to the arrow package
">
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png">
<meta property="og:image:alt" content="Apache Arrow logo, displaying the triple chevron image adjacent to the text">
<!-- Matomo --><script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script><!-- End Matomo Code --><!-- Kapa AI --><script async src="https://widget.kapa.ai/kapa-widget.bundle.js" data-website-id="9db461d5-ac77-4b3f-a5c5-75efa78339d2" data-project-name="Apache Arrow" data-project-color="#000000" data-project-logo="https://arrow.apache.org/img/arrow-logo_chevrons_white-txt_black-bg.png" data-modal-disclaimer="This is a custom LLM with access to all of [Arrow documentation](https://arrow.apache.org/docs/). If you want an R-specific answer, please mention this in your question." data-consent-required="true" data-user-analytics-cookie-enabled="false" data-consent-screen-disclaimer="By clicking &quot;I agree, let's chat&quot;, you consent to the use of the AI assistant in accordance with kapa.ai's [Privacy Policy](https://www.kapa.ai/content/privacy-policy). This service uses reCAPTCHA, which requires your consent to Google's [Privacy Policy](https://policies.google.com/privacy) and [Terms of Service](https://policies.google.com/terms). By proceeding, you explicitly agree to both kapa.ai's and Google's privacy policies."></script><!-- End Kapa AI -->
</head>
<body>
<a href="#main" class="visually-hidden-focusable">Skip to contents</a>
<nav class="navbar fixed-top navbar-dark navbar-expand-lg bg-black"><div class="container">
<a class="navbar-brand me-2" href="../index.html">Arrow R Package</a>
<span class="version">
<small class="nav-text text-muted me-auto" data-bs-toggle="tooltip" data-bs-placement="bottom" title="">21.0.0.9000</small>
</span>
<button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbar" aria-controls="navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div id="navbar" class="collapse navbar-collapse ms-3">
<ul class="navbar-nav me-auto">
<li class="nav-item"><a class="nav-link" href="../articles/arrow.html">Get started</a></li>
<li class="nav-item"><a class="nav-link" href="../reference/index.html">Reference</a></li>
<li class="active nav-item dropdown">
<button class="nav-link dropdown-toggle" type="button" id="dropdown-articles" data-bs-toggle="dropdown" aria-expanded="false" aria-haspopup="true">Articles</button>
<ul class="dropdown-menu" aria-labelledby="dropdown-articles">
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Using the package</h6></li>
<li><a class="dropdown-item" href="../articles/read_write.html">Reading and writing data files</a></li>
<li><a class="dropdown-item" href="../articles/data_wrangling.html">Data analysis with dplyr syntax</a></li>
<li><a class="dropdown-item" href="../articles/dataset.html">Working with multi-file data sets</a></li>
<li><a class="dropdown-item" href="../articles/python.html">Integrating Arrow, Python, and R</a></li>
<li><a class="dropdown-item" href="../articles/fs.html">Using cloud storage (S3, GCS)</a></li>
<li><a class="dropdown-item" href="../articles/flight.html">Connecting to a Flight server</a></li>
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Arrow concepts</h6></li>
<li><a class="dropdown-item" href="../articles/data_objects.html">Data objects</a></li>
<li><a class="dropdown-item" href="../articles/data_types.html">Data types</a></li>
<li><a class="dropdown-item" href="../articles/metadata.html">Metadata</a></li>
<li><hr class="dropdown-divider"></li>
<li><h6 class="dropdown-header" data-toc-skip>Installation</h6></li>
<li><a class="dropdown-item" href="../articles/install.html">Installing on Linux</a></li>
<li><a class="dropdown-item" href="../articles/install_nightly.html">Installing development versions</a></li>
<li><hr class="dropdown-divider"></li>
<li><a class="dropdown-item" href="../articles/index.html">More articles...</a></li>
</ul>
</li>
<li class="nav-item"><a class="nav-link" href="../news/index.html">Changelog</a></li>
</ul>
<form class="form-inline my-2 my-lg-0" role="search">
<input type="search" class="form-control me-sm-2" aria-label="Toggle navigation" name="search-input" data-search-index="../search.json" id="search-input" placeholder="" autocomplete="off">
</form>
<ul class="navbar-nav">
<li class="nav-item"><a class="external-link nav-link" href="https://github.com/apache/arrow/" aria-label="GitHub"><span class="fa fab fa-github fa-lg"></span></a></li>
</ul>
</div>
</div>
</nav><div class="container template-article">
<div class="row">
<main id="main" class="col-md-9"><div class="page-header">
<h1>Introduction for developers</h1>
<small class="dont-index">Source: <a href="https://github.com/apache/arrow/blob/main/r/vignettes/developing.Rmd" class="external-link"><code>vignettes/developing.Rmd</code></a></small>
<div class="d-none name"><code>developing.Rmd</code></div>
</div>
<p>If you’re interested in contributing to arrow, this article explains
our approach at a high-level. At the end of the article there we have
included links to articles that expand on this in various ways.</p>
<div class="section level2">
<h2 id="package-structure-and-conventions">Package structure and conventions<a class="anchor" aria-label="anchor" href="#package-structure-and-conventions"></a>
</h2>
<p>It helps to first outline the structure of the package.</p>
<p>C++ is an object-oriented language, so the core logic of the Arrow
C++ library is encapsulated in classes and methods. In the arrow R
package, these classes are implemented as <a href="https://r6.r-lib.org" class="external-link">R6</a> classes, most of which are exported
from the namespace.</p>
<p>In order to match the C++ naming conventions, the R6 classes are
named in “TitleCase”, e.g. <code>RecordBatch</code>. This makes it easy
to look up the relevant C++ implementations in the <a href="https://github.com/apache/arrow/tree/main/cpp" class="external-link">code</a> or <a href="https://arrow.apache.org/docs/cpp/" class="external-link">documentation</a>. To simplify
things in R, the C++ library namespaces are generally dropped or
flattened; that is, where the C++ library has
<code>arrow::io::FileOutputStream</code>, it is just
<code>FileOutputStream</code> in the R package. One exception is for the
file readers, where the namespace is necessary to disambiguate. So
<code>arrow::csv::TableReader</code> becomes
<code>CsvTableReader</code>, and <code>arrow::json::TableReader</code>
becomes <code>JsonTableReader</code>.</p>
<p>Some of these classes are not meant to be instantiated directly; they
may be base classes or other kinds of helpers. For those that you should
be able to create, use the <code>$create()</code> method to instantiate
an object. For example,
<code>rb &lt;- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))</code>
will create a <code>RecordBatch</code>. Many of these factory methods
that an R user might most often encounter also have a “snake_case”
alias, in order to be more familiar for contemporary R users. So
<code>record_batch(int = 1:10, dbl = as.numeric(1:10))</code> would do
the same as <code>RecordBatch$create()</code> above.</p>
<p>The typical user of the arrow R package may never deal directly with
the R6 objects. We provide more R-friendly wrapper functions as a
higher-level interface to the C++ library. An R user can call
<code><a href="../reference/read_parquet.html">read_parquet()</a></code> without knowing or caring that they’re
instantiating a <code>ParquetFileReader</code> object and calling the
<code>$ReadFile()</code> method on it. The classes are there and
available to the advanced programmer who wants fine-grained control over
how the C++ library is used.</p>
<!--
[Temporarily hiding this in a comment until I have a plan]
It is also worth mentioning that the arrow package also defines classes that do not exist in the C++ library including:
* `ArrowDatum`: inherited by `Scalar`, `Array`, and `ChunkedArray`
* `ArrowTabular`: inherited by `RecordBatch` and `Table`
* `ArrowObject`: inherited by all Arrow objects
-->
</div>
<div class="section level2">
<h2 id="approach-to-implementing-functionality">Approach to implementing functionality<a class="anchor" aria-label="anchor" href="#approach-to-implementing-functionality"></a>
</h2>
<p>Our general philosophy when implementing functionality is to match to
existing R function signatures which may be familiar to users, whilst
exposing any additional functionality available via Arrow. The intention
is to allow users to be able to use their existing code with minimal
changes, or new code or approaches to learn.</p>
<p>There are a number of ways in which we do this:</p>
<ul>
<li><p>When implementing a function with an R equivalent, support the
arguments available in R version as much as possible - use the original
parameter names and translate to the arrow parameter name inside the
function</p></li>
<li><p>If there are arrow parameters which do not exist in the R
function, allow the user to pass in those options through too</p></li>
<li><p>Where necessary add extra arguments to the function signature for
a feature that doesn’t exist in R but does in Arrow (e.g., passing in a
schema when reading a CSV dataset)</p></li>
</ul>
</div>
<div class="section level2">
<h2 id="further-reading">Further Reading<a class="anchor" aria-label="anchor" href="#further-reading"></a>
</h2>
<ul>
<li><a href="https://arrow.apache.org/docs/developers/guide/index.html" class="external-link">In-depth
guide to contributing to Arrow, including step-by-step examples</a></li>
<li><a href="https://arrow.apache.org/docs/developers/guide/architectural_overview.html#r-package-architectural-overview" class="external-link">R
package architectural overview</a></li>
<li><a href="https://arrow.apache.org/docs/r/articles/developers/setup.html">Setting
up a development environment, and building the R package and
components</a></li>
<li><a href="https://arrow.apache.org/docs/r/articles/developers/workflow.html">Common
Arrow developer workflow tasks</a></li>
<li><a href="https://arrow.apache.org/docs/r/articles/developers/debugging.html">Running
R with the C++ debugger attached</a></li>
<li><a href="https://arrow.apache.org/docs/r/articles/developers/install_details.html">In-depth
guide to how the package installation works</a></li>
<li><a href="https://arrow.apache.org/docs/r/articles/developers/docker.html">Using
Docker to diagnose a bug or test a feature on a specific OS</a></li>
</ul>
</div>
</main><aside class="col-md-3"><nav id="toc" aria-label="Table of contents"><h2>On this page</h2>
</nav></aside>
</div>
<footer><div class="pkgdown-footer-left">
<p><a href="https://arrow.apache.org/docs/r/versions.html">Older versions of these docs</a></p>
</div>
<div class="pkgdown-footer-right">
<p>Site built with <a href="https://pkgdown.r-lib.org/" class="external-link">pkgdown</a> 2.1.3.</p>
</div>
</footer>
</div>
</body>
</html>