blob: 6d1e1b4c404d7f9b3347750c6502b131c87629dd [file] [log] [blame]
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>Apache Arrow 0.16.0 Release | Apache Arrow</title>
<!-- Begin Jekyll SEO tag v2.8.0 -->
<meta name="generator" content="Jekyll v4.4.1" />
<meta property="og:title" content="Apache Arrow 0.16.0 Release" />
<meta name="author" content="pmc" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="The Apache Arrow team is pleased to announce the 0.16.0 release. This covers about 4 months of development work and includes 735 resolved issues from 99 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. New committers Since the 0.15.0 release, we&#39;ve added two new committers: Eric Erhardt Joris Van den Bossche Thank you for all your contributions! Columnar Format Notes We still have work to do to complete comprehensive columnar format integration testing between the Java and C++ libraries. Once this work is completed, we intend to make a 1.0.0 release with forward and backward compatibility guarantees. We clarified some ambiguity on dictionary encoding in the specification. Work is on going to implement the features in Arrow libraries. Arrow Flight RPC notes Flight development work has recently focused on robustness and stability. If you are not yet familiar with Flight, read the introductory blog post from October. We are also discussing adding a &quot;bidirectional RPC&quot; which enables request-response workflows requiring both client and server to send data streams to be performed a single RPC request. C++ notes Some work has been done to make the default build configuration of Arrow C++ as lean as possible. The Arrow C++ core can now be built without any external dependencies other than a new enough C++ compiler (gcc 4.9 or higher). Notably, Boost is no longer required. We invested effort to vendor some small essential dependencies: Flatbuffers, double-conversion, and uriparser. Many optional features requiring external libraries, like compression and GLog integration, are now disabled by default. Several subcomponents of the C++ project like the filesystem API, CSV, compute, dataset and JSON layers, as well as command-line utilities, are now disabled by default. The only toolchain dependency enabled by default is jemalloc, the recommended memory allocator, but this can also be disabled if desired. For illustration, see the example minimal build script and Dockerfile. When enabled, the default jemalloc configuration has been tweaked to return memory more aggressively to the OS (ARROW-6910, ARROW-6994). We welcome feedback from users about our memory allocation configuration and performance in applications. The array validation facilities have been vastly expanded and now exist in two flavors: the Validate method does a light-weight validation that&#39;s O(1) in array size, while the potentially O(N) method ValidateFull does thorough data validation (ARROW-6157). The IO APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7235). C++: CSV An option is added to attempt automatic dictionary encoding of string columns during reading a CSV file, until a cardinality limit is reached. When successful, it can make reading faster and the resulting Arrow data is much more memory-efficient (ARROW-3408). The CSV APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7236). C++: Datasets The 0.16 release introduces the Datasets API to the C++ library, along with bindings in Python and R. This API allows you to treat multiple files as a single logical dataset entity and make efficient selection queries against it. This release includes support for Parquet and Arrow IPC file formats. Factory objects allow you to discover files in a directory recursively, inspect the schemas in the files, and performs some basic schema unification. You may specify how file path segments map to partition, and there is support for auto-detecting some partition information, including Hive-style partitioning. The Datasets API includes a filter expression syntax as well as column selection. These are evaluated with predicate pushdown, and for Parquet, evaluation is pushed down to row groups. C++: Filesystem layer An HDFS implementation of the FileSystem class is available (ARROW-6720). We plan to deprecate the prior bespoke C++ HDFS class in favor of the standardized filesystem API. The filesystem APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7161). C++: IPC The Arrow IPC reader is being fuzzed continuously by the OSS-Fuzz infrastructure, to detect undesirable behavior on invalid or malicious input. Several issues have already been found and fixed. C++: Parquet Modular encryption is now supported (PARQUET-1300). A performance regression when reading a file with a large number of columns has been fixed (ARROW-6876, ARROW-7059), as well as several bugs (PARQUET-1766, ARROW-6895). C++: Tensors CSC sparse matrices are supported (ARROW-4225). The Tensor APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7420). C# Notes There were a number of C# bug fixes this release. Note that the C# library is not yet being tested in CI against the other native Arrow implementations (integration tests). We are looking for more contributors for the C# project to help with this and other new feature development. Java notes Added prose documentation describing how to work with the Java libraries. Some additional algorithms have been added to the &quot;contrib&quot; algorithms package: multithreaded searching of ValueVectors, The memory modules have been refactored so non-netty allocators can now be used. A new utility for populating ValueVectors more concisely for testing was introduced The &quot;contrib&quot; Avro adapter now supports Avro Logical type conversion to corresponding Arrow type. Various bug fixes across all packages Python notes pyarrow 0.16 will be the last release to support Python 2.7. Python now has bindings for the datasets API (ARROW-6341) as well as the S3 (ARROW-6655) and HDFS (ARROW-7310) filesystem implementations. The Duration (ARROW-5855) and Fixed Size List (ARROW-7261) types are exposed in Python. Sparse tensors can be converted to dense tensors (ARROW-6624). They are also interoperable with the pydata/sparse and scipy.sparse libraries (ARROW-4223, ARROW-4224). Pandas extension arrays now are able to roundtrip through Arrow conversion (ARROW-2428). A memory leak when converting Arrow data to Pandas &quot;object&quot; data has been fixed (ARROW-6874). Arrow is now tested against Python 3.8, and we now build manylinux2014 wheels for Python 3 (ARROW-7344). R notes This release includes a dplyr interface to Arrow Datasets, which let you work efficiently with large, multi-file datasets as a single entity. See vignette(&quot;dataset&quot;, package = &quot;arrow&quot;) for more. Another major area of work in this release was to improve the installation experience on Linux. A source package installation (as from CRAN) will now handle its C++ dependencies automatically, with no system dependencies beyond what R requires. For common Linux distributions and versions, installation will retrieve a prebuilt static C++ library for inclusion in the package; where this binary is not available, the package executes a bundled script that should build the Arrow C++ library. See vignette(&quot;install&quot;, package = &quot;arrow&quot;) for details. For more on what&#39;s in the 0.16 R package, see the R changelog. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby Ruby includes the following improvements. Improve CSV save performance (ARROW-7474). Add support for saving/loading TSV (ARROW-7454). Add Arrow::Schema#build_expression to improve building Gandiva::Expression (ARROW-6619). C GLib C GLib includes the following changes. Add support for LargeList, LargeBinary, and LargeString (ARROW-6285, ARROW-6286). Add filter and take API for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch (ARROW-7110, ARROW-7111). Add garrow_table_combine_chunks() API (ARROW-7369). Rust notes Support for Arrow data types has been improved, with the following array types now supported (ARROW-3690): Fixed Size List and Fixed Size Binary Adding a String Array for utf-8 strings, and keeping the Binary Array for general binary data Duration and interval arrays. Initial work on Arrow IPC support has been completed, with readers and writers for streams and files implemented (ARROW-5180). Rust: DataFusion Query execution has been reimplemented with an extensible physical query plan. This allows other projects to add other plans, such as for distributed computing or for specific database servers (ARROW-5227). Added support for writing query results to CSV (ARROW-6274). The new Parquet -&gt; Arrow reader is now used to read Parquet files (ARROW-6700). Various other query improvements have been implemented, especially on grouping and aggregate queries (ARROW-6689). Rust: Parquet The Arrow reader integration has been completed, allowing Parquet files to be read into Arrow memory (ARROW-4059). Development notes Arrow has moved away from Travis-CI and is now using Github Actions for PR-based continuous integration. This new CI configuration relies heavily on docker-compose, making it easier for developers to reproduce builds locally, thanks to tremendous work by Krisztián Szűcs (ARROW-7101). Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there. Mandatory fields in IPC format: to ease input validation, it is being proposed to mark some fields in our Flatbuffers schema &quot;required&quot;. Those fields are already semantically required, but are not considered so by the generated Flatbuffers verifier. Before accepting this proposal, we need to ensure that it does not break binary compatibility with existing valid data. The C Data Interface has not yet been formally adopted, though the community has reached consensus to move forward after addressing various design questions and concerns. Guidelines for the use of &quot;unsafe&quot; in the Rust implementation are being discussed." />
<meta property="og:description" content="The Apache Arrow team is pleased to announce the 0.16.0 release. This covers about 4 months of development work and includes 735 resolved issues from 99 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. New committers Since the 0.15.0 release, we&#39;ve added two new committers: Eric Erhardt Joris Van den Bossche Thank you for all your contributions! Columnar Format Notes We still have work to do to complete comprehensive columnar format integration testing between the Java and C++ libraries. Once this work is completed, we intend to make a 1.0.0 release with forward and backward compatibility guarantees. We clarified some ambiguity on dictionary encoding in the specification. Work is on going to implement the features in Arrow libraries. Arrow Flight RPC notes Flight development work has recently focused on robustness and stability. If you are not yet familiar with Flight, read the introductory blog post from October. We are also discussing adding a &quot;bidirectional RPC&quot; which enables request-response workflows requiring both client and server to send data streams to be performed a single RPC request. C++ notes Some work has been done to make the default build configuration of Arrow C++ as lean as possible. The Arrow C++ core can now be built without any external dependencies other than a new enough C++ compiler (gcc 4.9 or higher). Notably, Boost is no longer required. We invested effort to vendor some small essential dependencies: Flatbuffers, double-conversion, and uriparser. Many optional features requiring external libraries, like compression and GLog integration, are now disabled by default. Several subcomponents of the C++ project like the filesystem API, CSV, compute, dataset and JSON layers, as well as command-line utilities, are now disabled by default. The only toolchain dependency enabled by default is jemalloc, the recommended memory allocator, but this can also be disabled if desired. For illustration, see the example minimal build script and Dockerfile. When enabled, the default jemalloc configuration has been tweaked to return memory more aggressively to the OS (ARROW-6910, ARROW-6994). We welcome feedback from users about our memory allocation configuration and performance in applications. The array validation facilities have been vastly expanded and now exist in two flavors: the Validate method does a light-weight validation that&#39;s O(1) in array size, while the potentially O(N) method ValidateFull does thorough data validation (ARROW-6157). The IO APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7235). C++: CSV An option is added to attempt automatic dictionary encoding of string columns during reading a CSV file, until a cardinality limit is reached. When successful, it can make reading faster and the resulting Arrow data is much more memory-efficient (ARROW-3408). The CSV APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7236). C++: Datasets The 0.16 release introduces the Datasets API to the C++ library, along with bindings in Python and R. This API allows you to treat multiple files as a single logical dataset entity and make efficient selection queries against it. This release includes support for Parquet and Arrow IPC file formats. Factory objects allow you to discover files in a directory recursively, inspect the schemas in the files, and performs some basic schema unification. You may specify how file path segments map to partition, and there is support for auto-detecting some partition information, including Hive-style partitioning. The Datasets API includes a filter expression syntax as well as column selection. These are evaluated with predicate pushdown, and for Parquet, evaluation is pushed down to row groups. C++: Filesystem layer An HDFS implementation of the FileSystem class is available (ARROW-6720). We plan to deprecate the prior bespoke C++ HDFS class in favor of the standardized filesystem API. The filesystem APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7161). C++: IPC The Arrow IPC reader is being fuzzed continuously by the OSS-Fuzz infrastructure, to detect undesirable behavior on invalid or malicious input. Several issues have already been found and fixed. C++: Parquet Modular encryption is now supported (PARQUET-1300). A performance regression when reading a file with a large number of columns has been fixed (ARROW-6876, ARROW-7059), as well as several bugs (PARQUET-1766, ARROW-6895). C++: Tensors CSC sparse matrices are supported (ARROW-4225). The Tensor APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7420). C# Notes There were a number of C# bug fixes this release. Note that the C# library is not yet being tested in CI against the other native Arrow implementations (integration tests). We are looking for more contributors for the C# project to help with this and other new feature development. Java notes Added prose documentation describing how to work with the Java libraries. Some additional algorithms have been added to the &quot;contrib&quot; algorithms package: multithreaded searching of ValueVectors, The memory modules have been refactored so non-netty allocators can now be used. A new utility for populating ValueVectors more concisely for testing was introduced The &quot;contrib&quot; Avro adapter now supports Avro Logical type conversion to corresponding Arrow type. Various bug fixes across all packages Python notes pyarrow 0.16 will be the last release to support Python 2.7. Python now has bindings for the datasets API (ARROW-6341) as well as the S3 (ARROW-6655) and HDFS (ARROW-7310) filesystem implementations. The Duration (ARROW-5855) and Fixed Size List (ARROW-7261) types are exposed in Python. Sparse tensors can be converted to dense tensors (ARROW-6624). They are also interoperable with the pydata/sparse and scipy.sparse libraries (ARROW-4223, ARROW-4224). Pandas extension arrays now are able to roundtrip through Arrow conversion (ARROW-2428). A memory leak when converting Arrow data to Pandas &quot;object&quot; data has been fixed (ARROW-6874). Arrow is now tested against Python 3.8, and we now build manylinux2014 wheels for Python 3 (ARROW-7344). R notes This release includes a dplyr interface to Arrow Datasets, which let you work efficiently with large, multi-file datasets as a single entity. See vignette(&quot;dataset&quot;, package = &quot;arrow&quot;) for more. Another major area of work in this release was to improve the installation experience on Linux. A source package installation (as from CRAN) will now handle its C++ dependencies automatically, with no system dependencies beyond what R requires. For common Linux distributions and versions, installation will retrieve a prebuilt static C++ library for inclusion in the package; where this binary is not available, the package executes a bundled script that should build the Arrow C++ library. See vignette(&quot;install&quot;, package = &quot;arrow&quot;) for details. For more on what&#39;s in the 0.16 R package, see the R changelog. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby Ruby includes the following improvements. Improve CSV save performance (ARROW-7474). Add support for saving/loading TSV (ARROW-7454). Add Arrow::Schema#build_expression to improve building Gandiva::Expression (ARROW-6619). C GLib C GLib includes the following changes. Add support for LargeList, LargeBinary, and LargeString (ARROW-6285, ARROW-6286). Add filter and take API for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch (ARROW-7110, ARROW-7111). Add garrow_table_combine_chunks() API (ARROW-7369). Rust notes Support for Arrow data types has been improved, with the following array types now supported (ARROW-3690): Fixed Size List and Fixed Size Binary Adding a String Array for utf-8 strings, and keeping the Binary Array for general binary data Duration and interval arrays. Initial work on Arrow IPC support has been completed, with readers and writers for streams and files implemented (ARROW-5180). Rust: DataFusion Query execution has been reimplemented with an extensible physical query plan. This allows other projects to add other plans, such as for distributed computing or for specific database servers (ARROW-5227). Added support for writing query results to CSV (ARROW-6274). The new Parquet -&gt; Arrow reader is now used to read Parquet files (ARROW-6700). Various other query improvements have been implemented, especially on grouping and aggregate queries (ARROW-6689). Rust: Parquet The Arrow reader integration has been completed, allowing Parquet files to be read into Arrow memory (ARROW-4059). Development notes Arrow has moved away from Travis-CI and is now using Github Actions for PR-based continuous integration. This new CI configuration relies heavily on docker-compose, making it easier for developers to reproduce builds locally, thanks to tremendous work by Krisztián Szűcs (ARROW-7101). Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there. Mandatory fields in IPC format: to ease input validation, it is being proposed to mark some fields in our Flatbuffers schema &quot;required&quot;. Those fields are already semantically required, but are not considered so by the generated Flatbuffers verifier. Before accepting this proposal, we need to ensure that it does not break binary compatibility with existing valid data. The C Data Interface has not yet been formally adopted, though the community has reached consensus to move forward after addressing various design questions and concerns. Guidelines for the use of &quot;unsafe&quot; in the Rust implementation are being discussed." />
<link rel="canonical" href="https://arrow.apache.org/blog/2020/02/12/0.16.0-release/" />
<meta property="og:url" content="https://arrow.apache.org/blog/2020/02/12/0.16.0-release/" />
<meta property="og:site_name" content="Apache Arrow" />
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2020-02-12T01:00:00-05:00" />
<meta name="twitter:card" content="summary_large_image" />
<meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="twitter:title" content="Apache Arrow 0.16.0 Release" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"pmc"},"dateModified":"2020-02-12T01:00:00-05:00","datePublished":"2020-02-12T01:00:00-05:00","description":"The Apache Arrow team is pleased to announce the 0.16.0 release. This covers about 4 months of development work and includes 735 resolved issues from 99 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. New committers Since the 0.15.0 release, we&#39;ve added two new committers: Eric Erhardt Joris Van den Bossche Thank you for all your contributions! Columnar Format Notes We still have work to do to complete comprehensive columnar format integration testing between the Java and C++ libraries. Once this work is completed, we intend to make a 1.0.0 release with forward and backward compatibility guarantees. We clarified some ambiguity on dictionary encoding in the specification. Work is on going to implement the features in Arrow libraries. Arrow Flight RPC notes Flight development work has recently focused on robustness and stability. If you are not yet familiar with Flight, read the introductory blog post from October. We are also discussing adding a &quot;bidirectional RPC&quot; which enables request-response workflows requiring both client and server to send data streams to be performed a single RPC request. C++ notes Some work has been done to make the default build configuration of Arrow C++ as lean as possible. The Arrow C++ core can now be built without any external dependencies other than a new enough C++ compiler (gcc 4.9 or higher). Notably, Boost is no longer required. We invested effort to vendor some small essential dependencies: Flatbuffers, double-conversion, and uriparser. Many optional features requiring external libraries, like compression and GLog integration, are now disabled by default. Several subcomponents of the C++ project like the filesystem API, CSV, compute, dataset and JSON layers, as well as command-line utilities, are now disabled by default. The only toolchain dependency enabled by default is jemalloc, the recommended memory allocator, but this can also be disabled if desired. For illustration, see the example minimal build script and Dockerfile. When enabled, the default jemalloc configuration has been tweaked to return memory more aggressively to the OS (ARROW-6910, ARROW-6994). We welcome feedback from users about our memory allocation configuration and performance in applications. The array validation facilities have been vastly expanded and now exist in two flavors: the Validate method does a light-weight validation that&#39;s O(1) in array size, while the potentially O(N) method ValidateFull does thorough data validation (ARROW-6157). The IO APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7235). C++: CSV An option is added to attempt automatic dictionary encoding of string columns during reading a CSV file, until a cardinality limit is reached. When successful, it can make reading faster and the resulting Arrow data is much more memory-efficient (ARROW-3408). The CSV APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7236). C++: Datasets The 0.16 release introduces the Datasets API to the C++ library, along with bindings in Python and R. This API allows you to treat multiple files as a single logical dataset entity and make efficient selection queries against it. This release includes support for Parquet and Arrow IPC file formats. Factory objects allow you to discover files in a directory recursively, inspect the schemas in the files, and performs some basic schema unification. You may specify how file path segments map to partition, and there is support for auto-detecting some partition information, including Hive-style partitioning. The Datasets API includes a filter expression syntax as well as column selection. These are evaluated with predicate pushdown, and for Parquet, evaluation is pushed down to row groups. C++: Filesystem layer An HDFS implementation of the FileSystem class is available (ARROW-6720). We plan to deprecate the prior bespoke C++ HDFS class in favor of the standardized filesystem API. The filesystem APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7161). C++: IPC The Arrow IPC reader is being fuzzed continuously by the OSS-Fuzz infrastructure, to detect undesirable behavior on invalid or malicious input. Several issues have already been found and fixed. C++: Parquet Modular encryption is now supported (PARQUET-1300). A performance regression when reading a file with a large number of columns has been fixed (ARROW-6876, ARROW-7059), as well as several bugs (PARQUET-1766, ARROW-6895). C++: Tensors CSC sparse matrices are supported (ARROW-4225). The Tensor APIs now use Result&lt;T&gt; when returning both a Status and result value, rather than taking a pointer-out function parameter (ARROW-7420). C# Notes There were a number of C# bug fixes this release. Note that the C# library is not yet being tested in CI against the other native Arrow implementations (integration tests). We are looking for more contributors for the C# project to help with this and other new feature development. Java notes Added prose documentation describing how to work with the Java libraries. Some additional algorithms have been added to the &quot;contrib&quot; algorithms package: multithreaded searching of ValueVectors, The memory modules have been refactored so non-netty allocators can now be used. A new utility for populating ValueVectors more concisely for testing was introduced The &quot;contrib&quot; Avro adapter now supports Avro Logical type conversion to corresponding Arrow type. Various bug fixes across all packages Python notes pyarrow 0.16 will be the last release to support Python 2.7. Python now has bindings for the datasets API (ARROW-6341) as well as the S3 (ARROW-6655) and HDFS (ARROW-7310) filesystem implementations. The Duration (ARROW-5855) and Fixed Size List (ARROW-7261) types are exposed in Python. Sparse tensors can be converted to dense tensors (ARROW-6624). They are also interoperable with the pydata/sparse and scipy.sparse libraries (ARROW-4223, ARROW-4224). Pandas extension arrays now are able to roundtrip through Arrow conversion (ARROW-2428). A memory leak when converting Arrow data to Pandas &quot;object&quot; data has been fixed (ARROW-6874). Arrow is now tested against Python 3.8, and we now build manylinux2014 wheels for Python 3 (ARROW-7344). R notes This release includes a dplyr interface to Arrow Datasets, which let you work efficiently with large, multi-file datasets as a single entity. See vignette(&quot;dataset&quot;, package = &quot;arrow&quot;) for more. Another major area of work in this release was to improve the installation experience on Linux. A source package installation (as from CRAN) will now handle its C++ dependencies automatically, with no system dependencies beyond what R requires. For common Linux distributions and versions, installation will retrieve a prebuilt static C++ library for inclusion in the package; where this binary is not available, the package executes a bundled script that should build the Arrow C++ library. See vignette(&quot;install&quot;, package = &quot;arrow&quot;) for details. For more on what&#39;s in the 0.16 R package, see the R changelog. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby Ruby includes the following improvements. Improve CSV save performance (ARROW-7474). Add support for saving/loading TSV (ARROW-7454). Add Arrow::Schema#build_expression to improve building Gandiva::Expression (ARROW-6619). C GLib C GLib includes the following changes. Add support for LargeList, LargeBinary, and LargeString (ARROW-6285, ARROW-6286). Add filter and take API for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch (ARROW-7110, ARROW-7111). Add garrow_table_combine_chunks() API (ARROW-7369). Rust notes Support for Arrow data types has been improved, with the following array types now supported (ARROW-3690): Fixed Size List and Fixed Size Binary Adding a String Array for utf-8 strings, and keeping the Binary Array for general binary data Duration and interval arrays. Initial work on Arrow IPC support has been completed, with readers and writers for streams and files implemented (ARROW-5180). Rust: DataFusion Query execution has been reimplemented with an extensible physical query plan. This allows other projects to add other plans, such as for distributed computing or for specific database servers (ARROW-5227). Added support for writing query results to CSV (ARROW-6274). The new Parquet -&gt; Arrow reader is now used to read Parquet files (ARROW-6700). Various other query improvements have been implemented, especially on grouping and aggregate queries (ARROW-6689). Rust: Parquet The Arrow reader integration has been completed, allowing Parquet files to be read into Arrow memory (ARROW-4059). Development notes Arrow has moved away from Travis-CI and is now using Github Actions for PR-based continuous integration. This new CI configuration relies heavily on docker-compose, making it easier for developers to reproduce builds locally, thanks to tremendous work by Krisztián Szűcs (ARROW-7101). Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there. Mandatory fields in IPC format: to ease input validation, it is being proposed to mark some fields in our Flatbuffers schema &quot;required&quot;. Those fields are already semantically required, but are not considered so by the generated Flatbuffers verifier. Before accepting this proposal, we need to ensure that it does not break binary compatibility with existing valid data. The C Data Interface has not yet been formally adopted, though the community has reached consensus to move forward after addressing various design questions and concerns. Guidelines for the use of &quot;unsafe&quot; in the Rust implementation are being discussed.","headline":"Apache Arrow 0.16.0 Release","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2020/02/12/0.16.0-release/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"url":"https://arrow.apache.org/blog/2020/02/12/0.16.0-release/"}</script>
<!-- End Jekyll SEO tag -->
<!-- favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6">
<!-- dark mode favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
<script>
// Switch to the dark-mode favicons if prefers-color-scheme: dark
function onUpdate() {
light1 = document.querySelector('link#light1');
light2 = document.querySelector('link#light2');
light3 = document.querySelector('link#light3');
light4 = document.querySelector('link#light4');
light5 = document.querySelector('link#light5');
light6 = document.querySelector('link#light6');
dark1 = document.querySelector('link#dark1');
dark2 = document.querySelector('link#dark2');
dark3 = document.querySelector('link#dark3');
dark4 = document.querySelector('link#dark4');
dark5 = document.querySelector('link#dark5');
dark6 = document.querySelector('link#dark6');
if (matcher.matches) {
light1.remove();
light2.remove();
light3.remove();
light4.remove();
light5.remove();
light6.remove();
document.head.append(dark1);
document.head.append(dark2);
document.head.append(dark3);
document.head.append(dark4);
document.head.append(dark5);
document.head.append(dark6);
} else {
dark1.remove();
dark2.remove();
dark3.remove();
dark4.remove();
dark5.remove();
dark6.remove();
document.head.append(light1);
document.head.append(light2);
document.head.append(light3);
document.head.append(light4);
document.head.append(light5);
document.head.append(light6);
}
}
matcher = window.matchMedia('(prefers-color-scheme: dark)');
matcher.addListener(onUpdate);
onUpdate();
</script>
<link href="/css/main.css" rel="stylesheet">
<link href="/css/syntax.css" rel="stylesheet">
<script src="/javascript/main.js"></script>
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
<link type="application/atom+xml" rel="alternate" href="https://arrow.apache.org/feed.xml" title="Apache Arrow" />
</head>
<body class="wrap">
<header>
<nav class="navbar navbar-expand-md navbar-dark bg-dark">
<a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"></a>
<button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse justify-content-end" id="arrow-navbar">
<ul class="nav navbar-nav">
<li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
<li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li>
<li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownGetArrow" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Get Arrow
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
<a class="dropdown-item" href="/install/">Install</a>
<a class="dropdown-item" href="/release/">Releases</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownDocumentation" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Docs
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
<a class="dropdown-item" href="/docs">Project Docs</a>
<a class="dropdown-item" href="/docs/format/Columnar.html">Format</a>
<hr>
<a class="dropdown-item" href="/docs/c_glib">C GLib</a>
<a class="dropdown-item" href="/docs/cpp">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="/docs/java">Java</a>
<a class="dropdown-item" href="/docs/js">JavaScript</a>
<a class="dropdown-item" href="/julia/">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="/docs/python">Python</a>
<a class="dropdown-item" href="/docs/r">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://docs.rs/arrow/latest" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="/swift">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSource" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Source
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSource">
<a class="dropdown-item" href="https://github.com/apache/arrow" target="_blank" rel="noopener">Main Repo</a>
<hr>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/c_glib" target="_blank" rel="noopener">C GLib</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/cpp" target="_blank" rel="noopener">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/csharp" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-go" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-java" target="_blank" rel="noopener">Java</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-js" target="_blank" rel="noopener">JavaScript</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-julia" target="_blank" rel="noopener">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/matlab" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/python" target="_blank" rel="noopener">Python</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/r" target="_blank" rel="noopener">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/ruby" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-rs" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-swift" target="_blank" rel="noopener">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSubprojects" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Subprojects
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects">
<a class="dropdown-item" href="/adbc">ADBC</a>
<a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a>
<a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a>
<a class="dropdown-item" href="https://datafusion.apache.org" target="_blank" rel="noopener">DataFusion</a>
<a class="dropdown-item" href="/nanoarrow">nanoarrow</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownCommunity" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Community
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
<a class="dropdown-item" href="/community/">Communication</a>
<a class="dropdown-item" href="/docs/developers/index.html">Contributing</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/issues" target="_blank" rel="noopener">Issue Tracker</a>
<a class="dropdown-item" href="/committers/">Governance</a>
<a class="dropdown-item" href="/use_cases/">Use Cases</a>
<a class="dropdown-item" href="/powered_by/">Powered By</a>
<a class="dropdown-item" href="/visual_identity/">Visual Identity</a>
<a class="dropdown-item" href="/security/">Security</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener">Code of Conduct</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownASF" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
ASF Links
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF">
<a class="dropdown-item" href="https://www.apache.org/" target="_blank" rel="noopener">ASF Website</a>
<a class="dropdown-item" href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">Donate</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">Thanks</a>
<a class="dropdown-item" href="https://www.apache.org/security/" target="_blank" rel="noopener">Security</a>
</div>
</li>
</ul>
</div>
<!-- /.navbar-collapse -->
</nav>
</header>
<div class="container p-4 pt-5">
<div class="col-md-8 mx-auto">
<main role="main" class="pb-5">
<h1>
Apache Arrow 0.16.0 Release
</h1>
<hr class="mt-4 mb-3">
<p class="mb-4 pb-1">
<span class="badge badge-secondary">Published</span>
<span class="published mr-3">
12 Feb 2020
</span>
<br>
<span class="badge badge-secondary">By</span>
<a class="mr-3" href="https://arrow.apache.org">The Apache Arrow PMC (pmc) </a>
</p>
<!--
-->
<p>The Apache Arrow team is pleased to announce the 0.16.0 release. This covers
about 4 months of development work and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.16.0" target="_blank" rel="noopener"><strong>735 resolved issues</strong></a>
from <a href="https://arrow.apache.org/release/0.16.0.html#contributors"><strong>99 distinct contributors</strong></a>. See the Install Page to learn how to
get the libraries for your platform.</p>
<p>The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the <a href="https://arrow.apache.org/release/0.16.0.html">complete changelog</a>.</p>
<h2>New committers</h2>
<p>Since the 0.15.0 release, we've added two new committers:</p>
<ul>
<li><a href="https://github.com/eerhardt" target="_blank" rel="noopener">Eric Erhardt</a></li>
<li><a href="https://github.com/jorisvandenbossche" target="_blank" rel="noopener">Joris Van den Bossche</a></li>
</ul>
<p>Thank you for all your contributions!</p>
<h2>Columnar Format Notes</h2>
<p>We still have work to do to complete comprehensive columnar format integration
testing between the Java and C++ libraries. Once this work is completed, we
intend to make a 1.0.0 release with <a href="https://arrow.apache.org/docs/format/Versioning.html">forward and backward compatibility
guarantees</a>.</p>
<p>We <a href="https://github.com/apache/arrow/commit/0ddc1f4737c35008cd06be1ee28472ebd7da68e2" target="_blank" rel="noopener">clarified some ambiguity</a> on dictionary encoding in the specification.
Work is on going to implement the features in Arrow libraries.</p>
<h2>Arrow Flight RPC notes</h2>
<p>Flight development work has recently focused on robustness and stability. If
you are not yet familiar with Flight, read the <a href="https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/">introductory blog post from
October</a>.</p>
<p>We are also discussing adding a "bidirectional RPC" which enables
request-response workflows requiring both client and server to send data
streams to be performed a single RPC request.</p>
<h2>C++ notes</h2>
<p>Some work has been done to make the default build configuration of Arrow C++ as
lean as possible. The Arrow C++ core can now be built without any external
dependencies other than a new enough C++ compiler (gcc 4.9 or higher). Notably,
Boost is no longer required. We invested effort to vendor some small essential
dependencies: Flatbuffers, double-conversion, and uriparser. Many optional
features requiring external libraries, like compression and GLog integration,
are now disabled by default. Several subcomponents of the C++ project like the
filesystem API, CSV, compute, dataset and JSON layers, as well as command-line
utilities, are now disabled by default. The only toolchain dependency enabled
by default is jemalloc, the recommended memory allocator, but this can also be
disabled if desired. For illustration, see the <a href="https://github.com/apache/arrow/blob/master/cpp/examples/minimal_build/build.sh" target="_blank" rel="noopener">example minimal build script and
Dockerfile</a>.</p>
<p>When enabled, the default jemalloc configuration has been tweaked to return
memory more aggressively to the OS (ARROW-6910, ARROW-6994). We welcome
feedback from users about our memory allocation configuration and performance
in applications.</p>
<p>The array validation facilities have been vastly expanded and now exist in
two flavors: the <code>Validate</code> method does a light-weight validation that's
O(1) in array size, while the potentially O(N) method <code>ValidateFull</code> does
thorough data validation (ARROW-6157).</p>
<p>The IO APIs now use <code>Result&lt;T&gt;</code> when returning both a Status
and result value, rather than taking a pointer-out function parameter
(ARROW-7235).</p>
<h3>C++: CSV</h3>
<p>An option is added to attempt automatic dictionary encoding of string columns
during reading a CSV file, until a cardinality limit is reached. When
successful, it can make reading faster and the resulting Arrow data is
much more memory-efficient (ARROW-3408).</p>
<p>The CSV APIs now use <code>Result&lt;T&gt;</code> when returning both a Status
and result value, rather than taking a pointer-out function parameter
(ARROW-7236).</p>
<h3>C++: Datasets</h3>
<p>The 0.16 release introduces the Datasets API to the C++ library, along with
bindings in Python and R. This API allows you to treat multiple files as a
single logical dataset entity and make efficient selection queries against it.
This release includes support for Parquet and Arrow IPC file formats. Factory
objects allow you to discover files in a directory recursively, inspect the
schemas in the files, and performs some basic schema unification. You may
specify how file path segments map to partition, and there is support for
auto-detecting some partition information, including Hive-style partitioning.
The Datasets API includes a filter expression syntax as well as column
selection. These are evaluated with predicate pushdown, and for Parquet,
evaluation is pushed down to row groups.</p>
<h3>C++: Filesystem layer</h3>
<p>An HDFS implementation of the FileSystem class is available (ARROW-6720). We
plan to deprecate the prior bespoke C++ HDFS class in favor of the standardized
filesystem API.</p>
<p>The filesystem APIs now use <code>Result&lt;T&gt;</code> when returning both a Status
and result value, rather than taking a pointer-out function parameter
(ARROW-7161).</p>
<h3>C++: IPC</h3>
<p>The Arrow IPC reader is being fuzzed continuously by the <a href="https://google.github.io/oss-fuzz/" target="_blank" rel="noopener">OSS-Fuzz</a>
infrastructure, to detect undesirable behavior on invalid or malicious input.
Several issues have already been found and fixed.</p>
<h3>C++: Parquet</h3>
<p><a href="https://github.com/apache/parquet-format/blob/master/Encryption.md" target="_blank" rel="noopener">Modular encryption</a> is now supported (PARQUET-1300).</p>
<p>A performance regression when reading a file with a large number of columns has
been fixed (ARROW-6876, ARROW-7059), as well as several bugs (PARQUET-1766,
ARROW-6895).</p>
<h3>C++: Tensors</h3>
<p>CSC sparse matrices are supported (ARROW-4225).</p>
<p>The Tensor APIs now use <code>Result&lt;T&gt;</code> when returning both a Status
and result value, rather than taking a pointer-out function parameter
(ARROW-7420).</p>
<h2>C# Notes</h2>
<p>There were a number of C# bug fixes this release. Note that the C# library is
not yet being tested in CI against the other native Arrow implementations
(integration tests). We are looking for more contributors for the C# project to
help with this and other new feature development.</p>
<h2>Java notes</h2>
<ul>
<li>Added prose documentation describing how to work with the Java libraries.</li>
<li>Some additional algorithms have been added to the "contrib" algorithms package: multithreaded searching of ValueVectors,</li>
<li>The memory modules have been refactored so non-netty allocators can now be used.</li>
<li>A new utility for populating ValueVectors more concisely for testing was introduced</li>
<li>The "contrib" Avro adapter now supports Avro Logical type conversion to corresponding Arrow type.</li>
<li>Various bug fixes across all packages</li>
</ul>
<h2>Python notes</h2>
<p>pyarrow 0.16 will be the last release to support Python 2.7.</p>
<p>Python now has bindings for the datasets API (ARROW-6341) as well as the S3
(ARROW-6655) and HDFS (ARROW-7310) filesystem implementations.</p>
<p>The Duration (ARROW-5855) and Fixed Size List (ARROW-7261) types are exposed
in Python.</p>
<p>Sparse tensors can be converted to dense tensors (ARROW-6624). They are
also interoperable with the <code>pydata/sparse</code> and <code>scipy.sparse</code> libraries
(ARROW-4223, ARROW-4224).</p>
<p>Pandas extension arrays now are able to roundtrip through Arrow conversion
(ARROW-2428).</p>
<p>A memory leak when converting Arrow data to Pandas "object" data has been
fixed (ARROW-6874).</p>
<p>Arrow is now tested against Python 3.8, and we now build manylinux2014 wheels
for Python 3 (ARROW-7344).</p>
<h2>R notes</h2>
<p>This release includes a <code>dplyr</code> interface to Arrow Datasets,
which let you work efficiently with large, multi-file datasets as a single entity.
See <a href="https://arrow.apache.org/docs/r/articles/dataset.html"><code>vignette("dataset", package = "arrow")</code></a> for more.</p>
<p>Another major area of work in this release was to improve the installation
experience on Linux. A source package installation (as from CRAN) will now
handle its C++ dependencies automatically, with no system dependencies beyond what R requires.
For common Linux distributions and versions, installation will retrieve a prebuilt static
C++ library for inclusion in the package; where this binary is not available,
the package executes a bundled script that should build the Arrow C++ library.
See <a href="https://arrow.apache.org/docs/r/articles/install.html"><code>vignette("install", package = "arrow")</code></a> for details.</p>
<p>For more on what's in the 0.16 R package, see the R <a href="https://arrow.apache.org/docs/r/news/">changelog</a>.</p>
<h2>Ruby and C GLib notes</h2>
<p>Ruby and C GLib continues to follow the features in the C++ project.</p>
<h3>Ruby</h3>
<p>Ruby includes the following improvements.</p>
<ul>
<li>Improve CSV save performance (ARROW-7474).</li>
<li>Add support for saving/loading TSV (ARROW-7454).</li>
<li>Add <code>Arrow::Schema#build_expression</code> to improve building <code>Gandiva::Expression</code> (ARROW-6619).</li>
</ul>
<h3>C GLib</h3>
<p>C GLib includes the following changes.</p>
<ul>
<li>Add support for LargeList, LargeBinary, and LargeString (ARROW-6285, ARROW-6286).</li>
<li>Add filter and take API for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch (ARROW-7110, ARROW-7111).</li>
<li>Add <code>garrow_table_combine_chunks()</code> API (ARROW-7369).</li>
</ul>
<h2>Rust notes</h2>
<p>Support for Arrow data types has been improved, with the following array types now supported (ARROW-3690):</p>
<ul>
<li>Fixed Size List and Fixed Size Binary</li>
<li>Adding a String Array for utf-8 strings, and keeping the Binary Array for general binary data</li>
<li>Duration and interval arrays.</li>
</ul>
<p>Initial work on Arrow IPC support has been completed, with readers and writers for streams and files implemented (ARROW-5180).</p>
<h3>Rust: DataFusion</h3>
<p>Query execution has been reimplemented with an extensible physical query plan. This allows other projects to add other plans, such as for distributed computing or for specific database servers (ARROW-5227).</p>
<p>Added support for writing query results to CSV (ARROW-6274).</p>
<p>The new Parquet -&gt; Arrow reader is now used to read Parquet files (ARROW-6700).</p>
<p>Various other query improvements have been implemented, especially on grouping and aggregate queries (ARROW-6689).</p>
<h3>Rust: Parquet</h3>
<p>The Arrow reader integration has been completed, allowing Parquet files to be
read into Arrow memory (ARROW-4059).</p>
<h2>Development notes</h2>
<p>Arrow has moved away from Travis-CI and is now using Github Actions for
PR-based continuous integration. This new CI configuration relies heavily
on <code>docker-compose</code>, making it easier for developers to reproduce builds
locally, thanks to tremendous work by Krisztián Szűcs (ARROW-7101).</p>
<h2>Community Discussions Ongoing</h2>
<p>There are a number of active discussions ongoing on the developer
<a href="mailto:dev@arrow.apache.org">dev@arrow.apache.org</a> mailing list. We look forward to hearing from the
community there.</p>
<ul>
<li>Mandatory fields in IPC format: to ease input validation, it is being
<a href="https://mail-archives.apache.org/mod_mbox/arrow-dev/202001.mbox/%3C0dd13489-9221-459a-3560-1426738d3bb4%40python.org%3E" target="_blank" rel="noopener">proposed</a> to mark some fields in our Flatbuffers schema "required".
Those fields are already semantically required, but are not considered so by
the generated Flatbuffers verifier. Before accepting this proposal, we need
to ensure that it does not break binary compatibility with existing valid
data.</li>
<li>The C Data Interface has not yet been formally adopted, though the community
has reached consensus to move forward after addressing various design
questions and concerns.</li>
<li>Guidelines for the use of "unsafe" in the Rust implementation are being
<a href="https://mail-archives.apache.org/mod_mbox/arrow-dev/202001.mbox/%3cMN2PR19MB347050FB22046485C0B2525EDD380@MN2PR19MB3470.namprd19.prod.outlook.com%3e" target="_blank" rel="noopener">discussed</a>.</li>
</ul>
</main>
</div>
<hr>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p>Apache Arrow, Arrow, Apache, the Apache logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
<p>© 2016-2025 The Apache Software Foundation</p>
</div>
<div class="col-md-3">
<a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener">
<img src="https://www.apache.org/events/current-event-234x60.png">
</a>
</div>
</div>
</footer>
</div>
</body>
</html>