blob: 4d5995ac54a39817e41aef5dc977cab5fd12e345 [file] [log] [blame]
---
layout: default
---
<div class="jumbotron">
<h1>Apache Arrow</h1>
<p class="lead">A cross-language development platform for in-memory data</p>
<p>
<a class="btn btn-lg btn-success" style="white-space: normal;" href="mailto:dev-subscribe@arrow.apache.org" role="button">Join Mailing List</a>
<a class="btn btn-lg btn-primary" style="white-space: normal;" href="{{ site.baseurl }}/install/" role="button">Install ({{site.data.versions['current'].number}} Release - {{site.data.versions['current'].date}})</a>
</p>
</div>
<h5>
Interested in contributing?
<small class="text-muted">Join the <a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/"><strong>mailing list</strong></a> or check out the <a href="https://cwiki.apache.org/confluence/display/ARROW"><strong>developer wiki</strong></a>.</small>
</h5>
<h5>
<a href="{{ site.baseurl }}/blog/"><strong>See Latest News</strong></a>
</h5>
<p>
Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.
</p>
<hr />
<div class="row">
<div class="col-lg-4">
<h2 class="mt-3">Fast</h2>
<p>Apache Arrow&#8482; enables execution engines to take advantage of the latest SIMD (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing. Columnar layout is optimized for data locality for better performance on modern hardware like CPUs and GPUs.</p>
<p>The Arrow memory format supports <strong>zero-copy reads</strong> for lightning-fast data access without serialization overhead.</p>
</div>
<div class="col-lg-4">
<h2 class="mt-3">Flexible</h2>
<p>Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust implementations are in progress and more languages are welcome.
</p>
</div>
<div class="col-lg-4">
<h2 class="mt-3">Standard</h2>
<p>Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
<p>Learn more about projects that are <a href="{{ site.baseurl }}/powered_by/">Powered By Apache Arrow</a></p>
</div>
</div>
<hr />
<div class="row">
<div class="col-md-6">
<h2>Performance Advantage of Columnar In-Memory</h2>
<p class="lead">
Columnar memory layout allows applications to avoid unnecessary IO and
accelerate analytical processing performance on modern CPUs and GPUs.
</p>
</div>
<div class="col-md-6">
<img src="{{ site.baseurl }}/img/simd.png" alt="SIMD" class="img-fluid mx-auto" />
</div>
</div>
<hr />
<h2>Advantages of a Common Data Layer</h2>
<div class="row">
<div class="col-md-6">
<img src="img/copy.png" alt="common data layer" class="img-fluid mx-auto" />
<ul>
<li>Each system has its own internal memory format</li>
<li>70-80% computation wasted on serialization and deserialization</li>
<li>Similar functionality implemented in multiple projects</li>
</ul>
</div>
<div class="col-md-6">
<img src="img/shared.png" alt="common data layer" class="img-fluid mx-auto" />
<ul>
<li>All systems utilize the same memory format</li>
<li>No overhead for cross-system communication</li>
<li>Projects can share functionality (eg, Parquet-to-Arrow reader)</li>
</ul>
</div>
</div>