The Apache Arrow team is pleased to announce the 0.4.0 release of the project. While only 17 days since the release, it includes 77 resolved JIRAs with some important new features and bug fixes.
See the Install Page to learn how to get the libraries for your platform.
The TypeScript Arrow implementation has undergone some work since 0.3.0 and can now read a substantial portion of the Arrow streaming binary format. As this implementation develops, we will eventually want to include JS in the integration test suite along with Java and C++ to ensure wire cross-compatibility.
With the 1.1.0 C++ release of Apache Parquet, we have enabled the pyarrow.parquet
extension on Windows for Python 3.5 and 3.6. This should appear in conda-forge packages and PyPI in the near future. Developers can follow the source build instructions.
In the 0.2.0 release, we defined the first version of the Arrow streaming binary format for low-cost messaging with columnar data. These streams presume that the message components are written as a continuous byte stream over a socket or file.
We would like to be able to support other other transport protocols, like gRPC, for the message components of Arrow streams. To that end, in C++ we defined an abstract stream reader interface, for which the current contiguous streaming format is one implementation:
{% highlight cpp %} class RecordBatchReader { public: virtual std::shared_ptr schema() const = 0; virtual Status GetNextRecordBatch(std::shared_ptr* batch) = 0; }; {% endhighlight %}
It would also be good to define abstract stream reader and writer interfaces in the Java implementation.
In an upcoming blog post, we will explain in more depth how Arrow streams work, but you can learn more about them by reading the IPC specification.
As other Python libraries with C or C++ extensions use Apache Arrow, they will need to be able to return Python objects wrapping the underlying C++ objects. In this release, we have implemented a prototype C++ API which enables Python wrapper objects to be constructed from C++ extension code:
{% highlight cpp %} #include “arrow/python/pyarrow.h”
if (!arrow::py::import_pyarrow()) { // Error }
std::shared_ptrarrow::RecordBatch cpp_batch = GetData(...); PyObject* py_batch = arrow::py::wrap_batch(cpp_batch); {% endhighlight %}
This API is intended to be usable from Cython code as well:
{% highlight cython %} cimport pyarrow pyarrow.import_pyarrow() {% endhighlight %}
With this release, pip install pyarrow
works on macOS (OS X) as well as Linux. We are working on providing binary wheel installers for Windows as well.