Powering In-Memory Analytics
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
What's in the Arrow libraries?
The reference Arrow libraries contain a number of distinct software components:
- Columnar vector and table-like containers (similar to data frames) supporting flat or nested types
- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers library)
- Reference-counted off-heap buffer memory management, for zero-copy memory sharing and handling memory-mapped files
- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
- Self-describing binary wire formats (streaming and batch/file-like) for remote procedure calls (RPC) and interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:
How to Contribute
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/arrow repository.
If you are looking for some ideas on what to contribute, check out the JIRA issues for the Apache Arrow project. Comment on the issue and/or contact email@example.com with your questions and ideas.
If you’d like to report a bug but don’t have time to fix it, you can still post it on JIRA, or email the mailing list firstname.lastname@example.org
To contribute a patch:
- Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
- Create a JIRA for your patch on the Arrow Project JIRA.
- Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/arrow/pull/240).
- Make sure that your code passes the unit tests. You can find instructions how to run the unit tests for each Arrow component in its respective README file.
- Add new unit tests for your code.
Thank you in advance for your contributions!