blob: 5475e136ebc17437070a18621b490cbc2ca53dd4 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>Apache Arrow 3.0.0 Release | Apache Arrow</title>
<!-- Begin Jekyll SEO tag v2.8.0 -->
<meta name="generator" content="Jekyll v4.4.1" />
<meta property="og:title" content="Apache Arrow 3.0.0 Release" />
<meta name="author" content="pmc" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="The Apache Arrow team is pleased to announce the 3.0.0 release. This covers over 3 months of development work and includes 666 resolved issues from 106 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Columnar Format Notes The Decimal256 data type, which was already supported by the Arrow columnar format specification, is now implemented in C++ and Java (ARROW-9747). Arrow Flight RPC notes Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers. Support for cookies has also been added. The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations. A basic Flight implementation for C#/.NET has been added. See the implementation status matrix for details. C++ notes The default memory pool can now be changed at runtime using the environment variable ARROW_DEFAULT_MEMORY_POOL (ARROW-11009). The environment variable is inspected at process startup. This is useful when trying to diagnose memory consumption issues with Arrow. STL-like iterators are now provided over concrete arrays. Those are useful for non-performance critical tasks, for example testing (ARROW-10776). It is now possible to concatenate dictionary arrays with unequal dictionaries. The dictionaries are unified when concatenating, for supported data types (ARROW-5336). Threads in a thread pool are now spawned lazily as needed for enqueued tasks, up to the configured capacity. They used to be spawned upfront on creation of the thread pool (ARROW-10038). Compute layer Comprehensive documentation for compute functions is now available: https://arrow.apache.org/docs/cpp/compute.html Compute functions for string processing have been added for: splitting on whitespace (ASCII and Unicode flavors) and splitting on a pattern (ARROW-9991); trimming characters (ARROW-9128). Behavior of the index_in and is_in compute functions with nulls has been changed for consistency (ARROW-10663). Multiple-column sort kernels are now available for tables and record batches (ARROW-8199, ARROW-10796, ARROW-10790). Performance of table filtering has been vastly improved (ARROW-10569). Scalar arguments are now accepted for more compute functions. Compute functions quantile (ARROW-10831) and is_nan (ARROW-11043) have been added for numeric data. Aggregation functions any (ARROW-1846) and all (ARROW-10301) have been added for boolean data. Dataset The Expression hierarchy has simplified to a wrapper around literals, field references, or calls to named functions. This enables usage of any compute function while filtering with no boilerplate. Parquet statistics are lazily parsed in ParquetDatasetFactory and ParquetFileFragment for shorter construction time. CSV Conversion of string columns is now faster thanks to faster UTF-8 validation of small strings (ARROW-10313). Conversion of floating-point columns is now faster thanks to optimized string-to-double conversion routines (ARROW-10328). Parsing of ISO8601 timestamps is now more liberal: trailing zeros can be omitted in the fractional part (ARROW-10337). Fixed a bug where null detection could give the wrong results on some platforms (ARROW-11067). Added type inference for Date32 columns for values in the form YYYY-MM-DD (ARROW-11247). Feather Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163). Filesystem layer S3 recursive tree walks now benefit from a parallel implementation, where reads of multiple child directories are now issued concurrently (ARROW-10788). Improved empty directory detection to be mindful of differences between Amazon and Minio S3 implementations (ARROW-10942). Flight RPC IPv6 host addresses are now supported (ARROW-10475). IPC It is now possible to emit dictionary deltas where possible using the IPC stream writer. This is governed by a new variable in the IpcWriteOptions class (ARROW-6883). It is now possible to read wider tables, which used to fail due to reaching a limit during Flatbuffers verification (ARROW-10056). Parquet Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet implementation (ARROW-11301). Fixed a bug where writing multiple batches of nullable nested strings to Parquet would not write any data in batches after the first one (ARROW-10493) The Decimal256 data type can be read from and written to Parquet (ARROW-10607). LargeString and LargeBinary data can now be written to Parquet (ARROW-10426). C# notes The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages Apache.Arrow.Flight (client) and Apache.Arrow.Flight.AspNetCore (server). Also fixed an issue where ArrowStreamWriter wasn&#39;t writing schema metadata to Arrow streams. Julia notes This is the first release to officially include an implementation for the Julia language. The pure Julia implementation includes support for wide coverage of the format specification. Additional details can be found in the julialang.org blog post. Python notes Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5 was removed (ARROW-5679). Support for building manylinux1 packages has been removed (ARROW-11212). PyArrow continues to be available as manylinux2010 and manylinux2014 wheels. The minimal required version for NumPy is now 1.16.6. Note that when upgrading NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility, as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833). Compute functions are now automatically exported from C++ to the pyarrow.compute module, and they have docstrings matching their C++ definition. An iter_batches() method is now available for reading a Parquet file iteratively (ARROW-7800). Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory pool) are now available from Python (ARROW-11049). Fixed a potential deadlock when importing pandas from several threads (ARROW-10519). See the C++ notes above for additional details. R notes This release contains new features for the Flight RPC wrapper, better support for saving R metadata (including sf spatial data) to Feather and Parquet files, several significant improvements to speed and memory management, and many other enhancements. For more on what’s in the 3.0.0 R package, see the R changelog. Ruby and C GLib notes Ruby In Ruby binding, 256-bit decimal support and Arrow::FixedBinaryArrayBuilder are added likewise C GLib below. C GLib In the version 3.0.0 of C GLib consists of many new features. A chunked array, a record batch, and a table support sort_indices function as well as an array. These functions including array&#39;s support to specify sorting option. garrow_array_sort_to_indices has been renamed to garrow_array_sort_indices and the previous name has been deprecated. GArrowField supports functions to handle metadata. GArrowSchema supports garrow_schema_has_metadata() function. GArrowArrayBuilder supports to add single null, multiple nulls, single empty value, and multiple empty values. GArrowFixedSizedBinaryArrayBuilder is newly supported. 256-bit decimal and extension types are newly supported. Filesystem module supports Mock, HDFS, S3 file systems. Dataset module supports CSV, IPC, and Parquet file formats. Rust notes Core Arrow Crate The development of the arrow crate was focused on four main aspects: Make the crate usable in stable Rust Bug fixing and removal of unsafe code Extend functionality to keep up with the specification Increase performance of existing kernels Stable Rust Possibly the biggest news for this release is that all project crates, including arrow, parquet, and datafusion now build with stable Rust by default. Nightly / unstable Rust is still required when enabling the SIMD feature. Parquet Arrow writer The Parquet Writer for Arrow arrays is now available, allowing the Rust programs to easily read and write Parquet files and making it easier to integrate with the overall Arrow ecosystem. The reader and writer include both basic and nested type support (List, Dictionary, Struct) First Class Arrow Flight IPC Support This release the Arrow Flight IPC implementation in Rust became fully-featured enough to participate in the regular cross-language integration tests, thus ensuring Rust applications written using Arrow can interoperate with the rest of the ecosystem Performance There have been numerous performance improvements in this release across the board. This includes both kernel operations, such as take, filter, and cast, as well as more fundamental parts such as bitwise comparison and reading and writing to CSV. Increased Data Type Support New DataTypes: Decimal data type for fixed-width decimal values Improved operation support for nested structures Dictionary, and Lists (filter, take, etc) Other improvements: Added support for Date and time on FFI Added support for Binary type on FFI Added support for i64 sized arrays to “take” kernel Support for the i128 Decimal Type Added support to cast string to date Added API to create arrays out of existing arrays (e.g. for join, merge-sort, concatenate) The simd feature is now also available on aarch64 API Changes BooleanArray is no longer a PrimitiveArray ArrowNativeType no longer includes bool since arrows boolean type is represented using bitpacking Several Buffer methods are now infallible instead of returning a Result DataType::List now contains a Field to track metadata about the contained elements PrimitiveArray::raw_values, values_slice and values methods got replaced by a values method returning a slice Buffer::data and raw_data were renamed to as_slice and as_ptr MutableBuffer::data_mut and freeze were renamed to as_slice_mut and into to be more consistent with the stdlib naming conventions The generic type parameter for BufferBuilder was changed from ArrowPrimitiveType to ArrowNativeType DataFusion SQL In this release, we clarified that DataFusion will standardize on the PostgreSQL SQL dialect. New SQL support: JOIN, LEFT JOIN, RIGHT JOIN COUNT DISTINCT CASE WHEN USING BETWEEN IS IN Nested SELECT statements Nested expressions in aggregations LOWER(), UPPER(), TRIM() NULLIF() SHA224(), SHA256(), SHA384(), SHA512() DATE_TRUNC() Performance There have been numerous performance improvements in this release: Optimizations for JOINs such as using vectorized hashing. We started with adding statistics and cost-based optimizations. We choose the smaller side of a join as the build side if possible. Improved parallelism when reading partitioned Parquet data sources Concurrent writes of CSV and Parquet partitions to file Parquet Crate The Parquet has the following improvements: Nested reading Support to write booleans Add support to write temporal types Roadmap for 4.0.0 We have also started building up a shared community roadmap for 4.0: Apache Arrow: Crowd Sourced Rust Roadmap for Arrow 4.0, January 2021." />
<meta property="og:description" content="The Apache Arrow team is pleased to announce the 3.0.0 release. This covers over 3 months of development work and includes 666 resolved issues from 106 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Columnar Format Notes The Decimal256 data type, which was already supported by the Arrow columnar format specification, is now implemented in C++ and Java (ARROW-9747). Arrow Flight RPC notes Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers. Support for cookies has also been added. The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations. A basic Flight implementation for C#/.NET has been added. See the implementation status matrix for details. C++ notes The default memory pool can now be changed at runtime using the environment variable ARROW_DEFAULT_MEMORY_POOL (ARROW-11009). The environment variable is inspected at process startup. This is useful when trying to diagnose memory consumption issues with Arrow. STL-like iterators are now provided over concrete arrays. Those are useful for non-performance critical tasks, for example testing (ARROW-10776). It is now possible to concatenate dictionary arrays with unequal dictionaries. The dictionaries are unified when concatenating, for supported data types (ARROW-5336). Threads in a thread pool are now spawned lazily as needed for enqueued tasks, up to the configured capacity. They used to be spawned upfront on creation of the thread pool (ARROW-10038). Compute layer Comprehensive documentation for compute functions is now available: https://arrow.apache.org/docs/cpp/compute.html Compute functions for string processing have been added for: splitting on whitespace (ASCII and Unicode flavors) and splitting on a pattern (ARROW-9991); trimming characters (ARROW-9128). Behavior of the index_in and is_in compute functions with nulls has been changed for consistency (ARROW-10663). Multiple-column sort kernels are now available for tables and record batches (ARROW-8199, ARROW-10796, ARROW-10790). Performance of table filtering has been vastly improved (ARROW-10569). Scalar arguments are now accepted for more compute functions. Compute functions quantile (ARROW-10831) and is_nan (ARROW-11043) have been added for numeric data. Aggregation functions any (ARROW-1846) and all (ARROW-10301) have been added for boolean data. Dataset The Expression hierarchy has simplified to a wrapper around literals, field references, or calls to named functions. This enables usage of any compute function while filtering with no boilerplate. Parquet statistics are lazily parsed in ParquetDatasetFactory and ParquetFileFragment for shorter construction time. CSV Conversion of string columns is now faster thanks to faster UTF-8 validation of small strings (ARROW-10313). Conversion of floating-point columns is now faster thanks to optimized string-to-double conversion routines (ARROW-10328). Parsing of ISO8601 timestamps is now more liberal: trailing zeros can be omitted in the fractional part (ARROW-10337). Fixed a bug where null detection could give the wrong results on some platforms (ARROW-11067). Added type inference for Date32 columns for values in the form YYYY-MM-DD (ARROW-11247). Feather Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163). Filesystem layer S3 recursive tree walks now benefit from a parallel implementation, where reads of multiple child directories are now issued concurrently (ARROW-10788). Improved empty directory detection to be mindful of differences between Amazon and Minio S3 implementations (ARROW-10942). Flight RPC IPv6 host addresses are now supported (ARROW-10475). IPC It is now possible to emit dictionary deltas where possible using the IPC stream writer. This is governed by a new variable in the IpcWriteOptions class (ARROW-6883). It is now possible to read wider tables, which used to fail due to reaching a limit during Flatbuffers verification (ARROW-10056). Parquet Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet implementation (ARROW-11301). Fixed a bug where writing multiple batches of nullable nested strings to Parquet would not write any data in batches after the first one (ARROW-10493) The Decimal256 data type can be read from and written to Parquet (ARROW-10607). LargeString and LargeBinary data can now be written to Parquet (ARROW-10426). C# notes The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages Apache.Arrow.Flight (client) and Apache.Arrow.Flight.AspNetCore (server). Also fixed an issue where ArrowStreamWriter wasn&#39;t writing schema metadata to Arrow streams. Julia notes This is the first release to officially include an implementation for the Julia language. The pure Julia implementation includes support for wide coverage of the format specification. Additional details can be found in the julialang.org blog post. Python notes Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5 was removed (ARROW-5679). Support for building manylinux1 packages has been removed (ARROW-11212). PyArrow continues to be available as manylinux2010 and manylinux2014 wheels. The minimal required version for NumPy is now 1.16.6. Note that when upgrading NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility, as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833). Compute functions are now automatically exported from C++ to the pyarrow.compute module, and they have docstrings matching their C++ definition. An iter_batches() method is now available for reading a Parquet file iteratively (ARROW-7800). Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory pool) are now available from Python (ARROW-11049). Fixed a potential deadlock when importing pandas from several threads (ARROW-10519). See the C++ notes above for additional details. R notes This release contains new features for the Flight RPC wrapper, better support for saving R metadata (including sf spatial data) to Feather and Parquet files, several significant improvements to speed and memory management, and many other enhancements. For more on what’s in the 3.0.0 R package, see the R changelog. Ruby and C GLib notes Ruby In Ruby binding, 256-bit decimal support and Arrow::FixedBinaryArrayBuilder are added likewise C GLib below. C GLib In the version 3.0.0 of C GLib consists of many new features. A chunked array, a record batch, and a table support sort_indices function as well as an array. These functions including array&#39;s support to specify sorting option. garrow_array_sort_to_indices has been renamed to garrow_array_sort_indices and the previous name has been deprecated. GArrowField supports functions to handle metadata. GArrowSchema supports garrow_schema_has_metadata() function. GArrowArrayBuilder supports to add single null, multiple nulls, single empty value, and multiple empty values. GArrowFixedSizedBinaryArrayBuilder is newly supported. 256-bit decimal and extension types are newly supported. Filesystem module supports Mock, HDFS, S3 file systems. Dataset module supports CSV, IPC, and Parquet file formats. Rust notes Core Arrow Crate The development of the arrow crate was focused on four main aspects: Make the crate usable in stable Rust Bug fixing and removal of unsafe code Extend functionality to keep up with the specification Increase performance of existing kernels Stable Rust Possibly the biggest news for this release is that all project crates, including arrow, parquet, and datafusion now build with stable Rust by default. Nightly / unstable Rust is still required when enabling the SIMD feature. Parquet Arrow writer The Parquet Writer for Arrow arrays is now available, allowing the Rust programs to easily read and write Parquet files and making it easier to integrate with the overall Arrow ecosystem. The reader and writer include both basic and nested type support (List, Dictionary, Struct) First Class Arrow Flight IPC Support This release the Arrow Flight IPC implementation in Rust became fully-featured enough to participate in the regular cross-language integration tests, thus ensuring Rust applications written using Arrow can interoperate with the rest of the ecosystem Performance There have been numerous performance improvements in this release across the board. This includes both kernel operations, such as take, filter, and cast, as well as more fundamental parts such as bitwise comparison and reading and writing to CSV. Increased Data Type Support New DataTypes: Decimal data type for fixed-width decimal values Improved operation support for nested structures Dictionary, and Lists (filter, take, etc) Other improvements: Added support for Date and time on FFI Added support for Binary type on FFI Added support for i64 sized arrays to “take” kernel Support for the i128 Decimal Type Added support to cast string to date Added API to create arrays out of existing arrays (e.g. for join, merge-sort, concatenate) The simd feature is now also available on aarch64 API Changes BooleanArray is no longer a PrimitiveArray ArrowNativeType no longer includes bool since arrows boolean type is represented using bitpacking Several Buffer methods are now infallible instead of returning a Result DataType::List now contains a Field to track metadata about the contained elements PrimitiveArray::raw_values, values_slice and values methods got replaced by a values method returning a slice Buffer::data and raw_data were renamed to as_slice and as_ptr MutableBuffer::data_mut and freeze were renamed to as_slice_mut and into to be more consistent with the stdlib naming conventions The generic type parameter for BufferBuilder was changed from ArrowPrimitiveType to ArrowNativeType DataFusion SQL In this release, we clarified that DataFusion will standardize on the PostgreSQL SQL dialect. New SQL support: JOIN, LEFT JOIN, RIGHT JOIN COUNT DISTINCT CASE WHEN USING BETWEEN IS IN Nested SELECT statements Nested expressions in aggregations LOWER(), UPPER(), TRIM() NULLIF() SHA224(), SHA256(), SHA384(), SHA512() DATE_TRUNC() Performance There have been numerous performance improvements in this release: Optimizations for JOINs such as using vectorized hashing. We started with adding statistics and cost-based optimizations. We choose the smaller side of a join as the build side if possible. Improved parallelism when reading partitioned Parquet data sources Concurrent writes of CSV and Parquet partitions to file Parquet Crate The Parquet has the following improvements: Nested reading Support to write booleans Add support to write temporal types Roadmap for 4.0.0 We have also started building up a shared community roadmap for 4.0: Apache Arrow: Crowd Sourced Rust Roadmap for Arrow 4.0, January 2021." />
<link rel="canonical" href="https://arrow.apache.org/blog/2021/01/25/3.0.0-release/" />
<meta property="og:url" content="https://arrow.apache.org/blog/2021/01/25/3.0.0-release/" />
<meta property="og:site_name" content="Apache Arrow" />
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2021-01-25T01:00:00-05:00" />
<meta name="twitter:card" content="summary_large_image" />
<meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="twitter:title" content="Apache Arrow 3.0.0 Release" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"pmc"},"dateModified":"2021-01-25T01:00:00-05:00","datePublished":"2021-01-25T01:00:00-05:00","description":"The Apache Arrow team is pleased to announce the 3.0.0 release. This covers over 3 months of development work and includes 666 resolved issues from 106 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Columnar Format Notes The Decimal256 data type, which was already supported by the Arrow columnar format specification, is now implemented in C++ and Java (ARROW-9747). Arrow Flight RPC notes Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers. Support for cookies has also been added. The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations. A basic Flight implementation for C#/.NET has been added. See the implementation status matrix for details. C++ notes The default memory pool can now be changed at runtime using the environment variable ARROW_DEFAULT_MEMORY_POOL (ARROW-11009). The environment variable is inspected at process startup. This is useful when trying to diagnose memory consumption issues with Arrow. STL-like iterators are now provided over concrete arrays. Those are useful for non-performance critical tasks, for example testing (ARROW-10776). It is now possible to concatenate dictionary arrays with unequal dictionaries. The dictionaries are unified when concatenating, for supported data types (ARROW-5336). Threads in a thread pool are now spawned lazily as needed for enqueued tasks, up to the configured capacity. They used to be spawned upfront on creation of the thread pool (ARROW-10038). Compute layer Comprehensive documentation for compute functions is now available: https://arrow.apache.org/docs/cpp/compute.html Compute functions for string processing have been added for: splitting on whitespace (ASCII and Unicode flavors) and splitting on a pattern (ARROW-9991); trimming characters (ARROW-9128). Behavior of the index_in and is_in compute functions with nulls has been changed for consistency (ARROW-10663). Multiple-column sort kernels are now available for tables and record batches (ARROW-8199, ARROW-10796, ARROW-10790). Performance of table filtering has been vastly improved (ARROW-10569). Scalar arguments are now accepted for more compute functions. Compute functions quantile (ARROW-10831) and is_nan (ARROW-11043) have been added for numeric data. Aggregation functions any (ARROW-1846) and all (ARROW-10301) have been added for boolean data. Dataset The Expression hierarchy has simplified to a wrapper around literals, field references, or calls to named functions. This enables usage of any compute function while filtering with no boilerplate. Parquet statistics are lazily parsed in ParquetDatasetFactory and ParquetFileFragment for shorter construction time. CSV Conversion of string columns is now faster thanks to faster UTF-8 validation of small strings (ARROW-10313). Conversion of floating-point columns is now faster thanks to optimized string-to-double conversion routines (ARROW-10328). Parsing of ISO8601 timestamps is now more liberal: trailing zeros can be omitted in the fractional part (ARROW-10337). Fixed a bug where null detection could give the wrong results on some platforms (ARROW-11067). Added type inference for Date32 columns for values in the form YYYY-MM-DD (ARROW-11247). Feather Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163). Filesystem layer S3 recursive tree walks now benefit from a parallel implementation, where reads of multiple child directories are now issued concurrently (ARROW-10788). Improved empty directory detection to be mindful of differences between Amazon and Minio S3 implementations (ARROW-10942). Flight RPC IPv6 host addresses are now supported (ARROW-10475). IPC It is now possible to emit dictionary deltas where possible using the IPC stream writer. This is governed by a new variable in the IpcWriteOptions class (ARROW-6883). It is now possible to read wider tables, which used to fail due to reaching a limit during Flatbuffers verification (ARROW-10056). Parquet Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet implementation (ARROW-11301). Fixed a bug where writing multiple batches of nullable nested strings to Parquet would not write any data in batches after the first one (ARROW-10493) The Decimal256 data type can be read from and written to Parquet (ARROW-10607). LargeString and LargeBinary data can now be written to Parquet (ARROW-10426). C# notes The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages Apache.Arrow.Flight (client) and Apache.Arrow.Flight.AspNetCore (server). Also fixed an issue where ArrowStreamWriter wasn&#39;t writing schema metadata to Arrow streams. Julia notes This is the first release to officially include an implementation for the Julia language. The pure Julia implementation includes support for wide coverage of the format specification. Additional details can be found in the julialang.org blog post. Python notes Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5 was removed (ARROW-5679). Support for building manylinux1 packages has been removed (ARROW-11212). PyArrow continues to be available as manylinux2010 and manylinux2014 wheels. The minimal required version for NumPy is now 1.16.6. Note that when upgrading NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility, as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833). Compute functions are now automatically exported from C++ to the pyarrow.compute module, and they have docstrings matching their C++ definition. An iter_batches() method is now available for reading a Parquet file iteratively (ARROW-7800). Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory pool) are now available from Python (ARROW-11049). Fixed a potential deadlock when importing pandas from several threads (ARROW-10519). See the C++ notes above for additional details. R notes This release contains new features for the Flight RPC wrapper, better support for saving R metadata (including sf spatial data) to Feather and Parquet files, several significant improvements to speed and memory management, and many other enhancements. For more on what’s in the 3.0.0 R package, see the R changelog. Ruby and C GLib notes Ruby In Ruby binding, 256-bit decimal support and Arrow::FixedBinaryArrayBuilder are added likewise C GLib below. C GLib In the version 3.0.0 of C GLib consists of many new features. A chunked array, a record batch, and a table support sort_indices function as well as an array. These functions including array&#39;s support to specify sorting option. garrow_array_sort_to_indices has been renamed to garrow_array_sort_indices and the previous name has been deprecated. GArrowField supports functions to handle metadata. GArrowSchema supports garrow_schema_has_metadata() function. GArrowArrayBuilder supports to add single null, multiple nulls, single empty value, and multiple empty values. GArrowFixedSizedBinaryArrayBuilder is newly supported. 256-bit decimal and extension types are newly supported. Filesystem module supports Mock, HDFS, S3 file systems. Dataset module supports CSV, IPC, and Parquet file formats. Rust notes Core Arrow Crate The development of the arrow crate was focused on four main aspects: Make the crate usable in stable Rust Bug fixing and removal of unsafe code Extend functionality to keep up with the specification Increase performance of existing kernels Stable Rust Possibly the biggest news for this release is that all project crates, including arrow, parquet, and datafusion now build with stable Rust by default. Nightly / unstable Rust is still required when enabling the SIMD feature. Parquet Arrow writer The Parquet Writer for Arrow arrays is now available, allowing the Rust programs to easily read and write Parquet files and making it easier to integrate with the overall Arrow ecosystem. The reader and writer include both basic and nested type support (List, Dictionary, Struct) First Class Arrow Flight IPC Support This release the Arrow Flight IPC implementation in Rust became fully-featured enough to participate in the regular cross-language integration tests, thus ensuring Rust applications written using Arrow can interoperate with the rest of the ecosystem Performance There have been numerous performance improvements in this release across the board. This includes both kernel operations, such as take, filter, and cast, as well as more fundamental parts such as bitwise comparison and reading and writing to CSV. Increased Data Type Support New DataTypes: Decimal data type for fixed-width decimal values Improved operation support for nested structures Dictionary, and Lists (filter, take, etc) Other improvements: Added support for Date and time on FFI Added support for Binary type on FFI Added support for i64 sized arrays to “take” kernel Support for the i128 Decimal Type Added support to cast string to date Added API to create arrays out of existing arrays (e.g. for join, merge-sort, concatenate) The simd feature is now also available on aarch64 API Changes BooleanArray is no longer a PrimitiveArray ArrowNativeType no longer includes bool since arrows boolean type is represented using bitpacking Several Buffer methods are now infallible instead of returning a Result DataType::List now contains a Field to track metadata about the contained elements PrimitiveArray::raw_values, values_slice and values methods got replaced by a values method returning a slice Buffer::data and raw_data were renamed to as_slice and as_ptr MutableBuffer::data_mut and freeze were renamed to as_slice_mut and into to be more consistent with the stdlib naming conventions The generic type parameter for BufferBuilder was changed from ArrowPrimitiveType to ArrowNativeType DataFusion SQL In this release, we clarified that DataFusion will standardize on the PostgreSQL SQL dialect. New SQL support: JOIN, LEFT JOIN, RIGHT JOIN COUNT DISTINCT CASE WHEN USING BETWEEN IS IN Nested SELECT statements Nested expressions in aggregations LOWER(), UPPER(), TRIM() NULLIF() SHA224(), SHA256(), SHA384(), SHA512() DATE_TRUNC() Performance There have been numerous performance improvements in this release: Optimizations for JOINs such as using vectorized hashing. We started with adding statistics and cost-based optimizations. We choose the smaller side of a join as the build side if possible. Improved parallelism when reading partitioned Parquet data sources Concurrent writes of CSV and Parquet partitions to file Parquet Crate The Parquet has the following improvements: Nested reading Support to write booleans Add support to write temporal types Roadmap for 4.0.0 We have also started building up a shared community roadmap for 4.0: Apache Arrow: Crowd Sourced Rust Roadmap for Arrow 4.0, January 2021.","headline":"Apache Arrow 3.0.0 Release","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2021/01/25/3.0.0-release/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"url":"https://arrow.apache.org/blog/2021/01/25/3.0.0-release/"}</script>
<!-- End Jekyll SEO tag -->
<!-- favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6">
<!-- dark mode favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
<script>
// Switch to the dark-mode favicons if prefers-color-scheme: dark
function onUpdate() {
light1 = document.querySelector('link#light1');
light2 = document.querySelector('link#light2');
light3 = document.querySelector('link#light3');
light4 = document.querySelector('link#light4');
light5 = document.querySelector('link#light5');
light6 = document.querySelector('link#light6');
dark1 = document.querySelector('link#dark1');
dark2 = document.querySelector('link#dark2');
dark3 = document.querySelector('link#dark3');
dark4 = document.querySelector('link#dark4');
dark5 = document.querySelector('link#dark5');
dark6 = document.querySelector('link#dark6');
if (matcher.matches) {
light1.remove();
light2.remove();
light3.remove();
light4.remove();
light5.remove();
light6.remove();
document.head.append(dark1);
document.head.append(dark2);
document.head.append(dark3);
document.head.append(dark4);
document.head.append(dark5);
document.head.append(dark6);
} else {
dark1.remove();
dark2.remove();
dark3.remove();
dark4.remove();
dark5.remove();
dark6.remove();
document.head.append(light1);
document.head.append(light2);
document.head.append(light3);
document.head.append(light4);
document.head.append(light5);
document.head.append(light6);
}
}
matcher = window.matchMedia('(prefers-color-scheme: dark)');
matcher.addListener(onUpdate);
onUpdate();
</script>
<link href="/css/main.css" rel="stylesheet">
<link href="/css/syntax.css" rel="stylesheet">
<script src="/javascript/main.js"></script>
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
<link type="application/atom+xml" rel="alternate" href="https://arrow.apache.org/feed.xml" title="Apache Arrow" />
</head>
<body class="wrap">
<header>
<nav class="navbar navbar-expand-md navbar-dark bg-dark">
<a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"></a>
<button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse justify-content-end" id="arrow-navbar">
<ul class="nav navbar-nav">
<li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
<li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li>
<li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownGetArrow" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Get Arrow
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
<a class="dropdown-item" href="/install/">Install</a>
<a class="dropdown-item" href="/release/">Releases</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownDocumentation" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Docs
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
<a class="dropdown-item" href="/docs">Project Docs</a>
<a class="dropdown-item" href="/docs/format/Columnar.html">Format</a>
<hr>
<a class="dropdown-item" href="/docs/c_glib">C GLib</a>
<a class="dropdown-item" href="/docs/cpp">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="/docs/java">Java</a>
<a class="dropdown-item" href="/docs/js">JavaScript</a>
<a class="dropdown-item" href="/julia/">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="/docs/python">Python</a>
<a class="dropdown-item" href="/docs/r">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://docs.rs/arrow/latest" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="/swift">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSource" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Source
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSource">
<a class="dropdown-item" href="https://github.com/apache/arrow" target="_blank" rel="noopener">Main Repo</a>
<hr>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/c_glib" target="_blank" rel="noopener">C GLib</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/cpp" target="_blank" rel="noopener">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/csharp" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-go" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-java" target="_blank" rel="noopener">Java</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-js" target="_blank" rel="noopener">JavaScript</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-julia" target="_blank" rel="noopener">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/matlab" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/python" target="_blank" rel="noopener">Python</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/r" target="_blank" rel="noopener">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/ruby" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-rs" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-swift" target="_blank" rel="noopener">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSubprojects" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Subprojects
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects">
<a class="dropdown-item" href="/adbc">ADBC</a>
<a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a>
<a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a>
<a class="dropdown-item" href="https://datafusion.apache.org" target="_blank" rel="noopener">DataFusion</a>
<a class="dropdown-item" href="/nanoarrow">nanoarrow</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownCommunity" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Community
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
<a class="dropdown-item" href="/community/">Communication</a>
<a class="dropdown-item" href="/docs/developers/index.html">Contributing</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/issues" target="_blank" rel="noopener">Issue Tracker</a>
<a class="dropdown-item" href="/committers/">Governance</a>
<a class="dropdown-item" href="/use_cases/">Use Cases</a>
<a class="dropdown-item" href="/powered_by/">Powered By</a>
<a class="dropdown-item" href="/visual_identity/">Visual Identity</a>
<a class="dropdown-item" href="/security/">Security</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener">Code of Conduct</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownASF" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
ASF Links
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF">
<a class="dropdown-item" href="https://www.apache.org/" target="_blank" rel="noopener">ASF Website</a>
<a class="dropdown-item" href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">Donate</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">Thanks</a>
<a class="dropdown-item" href="https://www.apache.org/security/" target="_blank" rel="noopener">Security</a>
</div>
</li>
</ul>
</div>
<!-- /.navbar-collapse -->
</nav>
</header>
<div class="container p-4 pt-5">
<div class="col-md-8 mx-auto">
<main role="main" class="pb-5">
<h1>
Apache Arrow 3.0.0 Release
</h1>
<hr class="mt-4 mb-3">
<p class="mb-4 pb-1">
<span class="badge badge-secondary">Published</span>
<span class="published mr-3">
25 Jan 2021
</span>
<br>
<span class="badge badge-secondary">By</span>
<a class="mr-3" href="https://arrow.apache.org">The Apache Arrow PMC (pmc) </a>
</p>
<!--
-->
<p>The Apache Arrow team is pleased to announce the 3.0.0 release. This covers
over 3 months of development work and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%203.0.0" target="_blank" rel="noopener"><strong>666 resolved issues</strong></a>
from <a href="/release/3.0.0.html#contributors"><strong>106 distinct contributors</strong></a>. See the Install Page to learn how to
get the libraries for your platform.</p>
<p>The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the <a href="/release/3.0.0.html">complete changelog</a>.</p>
<h2>Columnar Format Notes</h2>
<p>The Decimal256 data type, which was already supported by the Arrow columnar
format specification, is now implemented in C++ and Java (ARROW-9747).</p>
<h2>Arrow Flight RPC notes</h2>
<p>Authentication in C++/Java/Python has been overhauled, allowing more flexible authentication methods and use of standard headers.
Support for cookies has also been added.
The C++/Java implementations are now more permissive when parsing messages in order to interoperate better with other Flight implementations.</p>
<p>A basic Flight implementation for C#/.NET has been added.
See the <a href="https://arrow.apache.org/docs/status.html#flight-rpc">implementation status matrix</a> for details.</p>
<h2>C++ notes</h2>
<p>The default memory pool can now be changed at runtime using the environment
variable <code>ARROW_DEFAULT_MEMORY_POOL</code> (ARROW-11009). The environment variable
is inspected at process startup. This is useful when trying to diagnose memory
consumption issues with Arrow.</p>
<p>STL-like iterators are now provided over concrete arrays. Those are useful for
non-performance critical tasks, for example testing (ARROW-10776).</p>
<p>It is now possible to concatenate dictionary arrays with unequal dictionaries.
The dictionaries are unified when concatenating, for supported data types
(ARROW-5336).</p>
<p>Threads in a thread pool are now spawned lazily as needed for enqueued
tasks, up to the configured capacity. They used to be spawned upfront on
creation of the thread pool (ARROW-10038).</p>
<h3>Compute layer</h3>
<p>Comprehensive documentation for compute functions is now available:
<a href="https://arrow.apache.org/docs/cpp/compute.html">https://arrow.apache.org/docs/cpp/compute.html</a></p>
<p>Compute functions for string processing have been added for:</p>
<ul>
<li>splitting on whitespace (ASCII and Unicode flavors) and splitting on a
pattern (ARROW-9991);</li>
<li>trimming characters (ARROW-9128).</li>
</ul>
<p>Behavior of the <code>index_in</code> and <code>is_in</code> compute functions with nulls has been
changed for consistency (ARROW-10663).</p>
<p>Multiple-column sort kernels are now available for tables and record batches
(ARROW-8199, ARROW-10796, ARROW-10790).</p>
<p>Performance of table filtering has been vastly improved (ARROW-10569).</p>
<p>Scalar arguments are now accepted for more compute functions.</p>
<p>Compute functions <code>quantile</code> (ARROW-10831) and <code>is_nan</code> (ARROW-11043) have been
added for numeric data.</p>
<p>Aggregation functions <code>any</code> (ARROW-1846) and <code>all</code> (ARROW-10301) have been
added for boolean data.</p>
<h3>Dataset</h3>
<p>The <code>Expression</code> hierarchy has simplified to a wrapper around literals, field references,
or calls to named functions. This enables usage of any compute function while filtering
with no boilerplate.</p>
<p>Parquet statistics are lazily parsed in <code>ParquetDatasetFactory</code> and
<code>ParquetFileFragment</code> for shorter construction time.</p>
<h3>CSV</h3>
<p>Conversion of string columns is now faster thanks to faster UTF-8 validation
of small strings (ARROW-10313).</p>
<p>Conversion of floating-point columns is now faster thanks to optimized
string-to-double conversion routines (ARROW-10328).</p>
<p>Parsing of ISO8601 timestamps is now more liberal: trailing zeros can
be omitted in the fractional part (ARROW-10337).</p>
<p>Fixed a bug where null detection could give the wrong results on some platforms
(ARROW-11067).</p>
<p>Added type inference for Date32 columns for values in the form <code>YYYY-MM-DD</code>
(ARROW-11247).</p>
<h3>Feather</h3>
<p>Fixed reading of compressed Feather files written with Arrow 0.17 (ARROW-11163).</p>
<h3>Filesystem layer</h3>
<p>S3 recursive tree walks now benefit from a parallel implementation, where reads
of multiple child directories are now issued concurrently (ARROW-10788).</p>
<p>Improved empty directory detection to be mindful of differences between Amazon
and Minio S3 implementations (ARROW-10942).</p>
<h3>Flight RPC</h3>
<p>IPv6 host addresses are now supported (ARROW-10475).</p>
<h3>IPC</h3>
<p>It is now possible to emit dictionary deltas where possible using the IPC
stream writer. This is governed by a new variable in the <code>IpcWriteOptions</code> class
(ARROW-6883).</p>
<p>It is now possible to read wider tables, which used to fail due to reaching a
limit during Flatbuffers verification (ARROW-10056).</p>
<h3>Parquet</h3>
<p>Fixed reading of LZ4-compressed Parquet columns emitted by the Java Parquet
implementation (ARROW-11301).</p>
<p>Fixed a bug where writing multiple batches of nullable nested strings to Parquet
would not write any data in batches after the first one (ARROW-10493)</p>
<p>The Decimal256 data type can be read from and written to Parquet (ARROW-10607).</p>
<p>LargeString and LargeBinary data can now be written to Parquet (ARROW-10426).</p>
<h2>C# notes</h2>
<p>The .NET package added initial support for Arrow Flight clients and servers. Support is enabled through two new NuGet packages <a href="https://www.nuget.org/packages/Apache.Arrow.Flight/" target="_blank" rel="noopener">Apache.Arrow.Flight</a> (client) and <a href="https://www.nuget.org/packages/Apache.Arrow.Flight.AspNetCore/" target="_blank" rel="noopener">Apache.Arrow.Flight.AspNetCore</a> (server).</p>
<p>Also fixed an issue where ArrowStreamWriter wasn't writing schema metadata to Arrow streams.</p>
<h2>Julia notes</h2>
<p>This is the first release to officially include
<a href="https://github.com/apache/arrow/tree/master/julia/Arrow" target="_blank" rel="noopener">an implementation</a>
for the Julia language. The pure Julia implementation includes support
for <a href="https://arrow.apache.org/docs/status.html">wide coverage of the format specification</a>.
Additional details can be found in the
<a href="https://julialang.org/blog/2021/01/arrow/" target="_blank" rel="noopener">julialang.org blog post</a>.</p>
<h2>Python notes</h2>
<p>Support for Python 3.9 was added (ARROW-10224), and support for Python 3.5
was removed (ARROW-5679).</p>
<p>Support for building manylinux1 packages has been removed (ARROW-11212).
PyArrow continues to be available as manylinux2010 and manylinux2014 wheels.</p>
<p>The minimal required version for NumPy is now 1.16.6. Note that when upgrading
NumPy to 1.20, you also need to upgrade pyarrow to 3.0.0 to ensure compatibility,
as this pyarrow release fixed a compatibility issue with NumPy 1.20 (ARROW-10833).</p>
<p>Compute functions are now automatically exported from C++ to the <code>pyarrow.compute</code>
module, and they have docstrings matching their C++ definition.</p>
<p>An <code>iter_batches()</code> method is now available for reading a Parquet file iteratively
(ARROW-7800).</p>
<p>Alternate memory pools (such as mimalloc, jemalloc or the C malloc-based memory
pool) are now available from Python (ARROW-11049).</p>
<p>Fixed a potential deadlock when importing pandas from several threads (ARROW-10519).</p>
<p>See the C++ notes above for additional details.</p>
<h2>R notes</h2>
<p>This release contains new features for the Flight RPC wrapper, better
support for saving R metadata (including <code>sf</code> spatial data) to Feather and
Parquet files, several significant improvements to speed and memory management,
and many other enhancements.</p>
<p>For more on what’s in the 3.0.0 R package, see the <a href="/docs/r/news/">R changelog</a>.</p>
<h2>Ruby and C GLib notes</h2>
<h3>Ruby</h3>
<p>In Ruby binding, 256-bit decimal support and <code>Arrow::FixedBinaryArrayBuilder</code> are added likewise C GLib below.</p>
<h3>C GLib</h3>
<p>In the version 3.0.0 of C GLib consists of many new features.</p>
<p>A chunked array, a record batch, and a table support <code>sort_indices</code> function as well as an array.
These functions including array's support to specify sorting option.
<code>garrow_array_sort_to_indices</code> has been renamed to <code>garrow_array_sort_indices</code> and the previous name has been deprecated.</p>
<p><code>GArrowField</code> supports functions to handle metadata.
<code>GArrowSchema</code> supports <code>garrow_schema_has_metadata()</code> function.</p>
<p><code>GArrowArrayBuilder</code> supports to add single null, multiple nulls, single empty value, and multiple empty values.
<code>GArrowFixedSizedBinaryArrayBuilder</code> is newly supported.</p>
<p>256-bit decimal and extension types are newly supported.
Filesystem module supports Mock, HDFS, S3 file systems.
Dataset module supports CSV, IPC, and Parquet file formats.</p>
<h2>Rust notes</h2>
<h3>Core Arrow Crate</h3>
<p>The development of the arrow crate was focused on four main aspects:</p>
<ul>
<li>Make the crate usable in stable Rust</li>
<li>Bug fixing and removal of <code>unsafe</code> code</li>
<li>Extend functionality to keep up with the specification</li>
<li>Increase performance of existing kernels</li>
</ul>
<h4>Stable Rust</h4>
<p>Possibly the biggest news for this release is that all project crates, including arrow, parquet, and datafusion now
build with stable Rust by default. Nightly / unstable Rust is still required when enabling the SIMD feature.</p>
<h4>Parquet Arrow writer</h4>
<p>The Parquet Writer for Arrow arrays is now available, allowing the Rust programs to easily read and write Parquet
files and making it easier to integrate with the overall Arrow ecosystem. The reader and writer include both basic
and nested type support (List, Dictionary, Struct)</p>
<h4>First Class Arrow Flight IPC Support</h4>
<p>This release the Arrow Flight IPC implementation in Rust became fully-featured enough to participate in the regular
cross-language integration tests, thus ensuring Rust applications written using Arrow can interoperate with the rest
of the ecosystem</p>
<h4>Performance</h4>
<p>There have been numerous performance improvements in this release across the board. This includes both kernel
operations, such as <code>take</code>, <code>filter</code>, and <code>cast</code>, as well as more fundamental parts such as bitwise comparison
and reading and writing to CSV.</p>
<h4>Increased Data Type Support</h4>
<p>New DataTypes:</p>
<ul>
<li>Decimal data type for fixed-width decimal values</li>
</ul>
<p>Improved operation support for nested structures Dictionary, and Lists (filter, take, etc)</p>
<h4>Other improvements:</h4>
<ul>
<li>Added support for Date and time on FFI</li>
<li>Added support for Binary type on FFI</li>
<li>Added support for i64 sized arrays to “take” kernel</li>
<li>Support for the i128 Decimal Type</li>
<li>Added support to cast string to date</li>
<li>Added API to create arrays out of existing arrays (e.g. for join, merge-sort, concatenate)</li>
<li>The simd feature is now also available on aarch64</li>
</ul>
<h4>API Changes</h4>
<ul>
<li>BooleanArray is no longer a PrimitiveArray</li>
<li>ArrowNativeType no longer includes bool since arrows boolean type is represented using bitpacking</li>
<li>Several Buffer methods are now infallible instead of returning a Result</li>
<li>DataType::List now contains a Field to track metadata about the contained elements</li>
<li>PrimitiveArray::raw_values, values_slice and values methods got replaced by a values method returning a slice</li>
<li>Buffer::data and raw_data were renamed to as_slice and as_ptr</li>
<li>MutableBuffer::data_mut and freeze were renamed to as_slice_mut and into to be more consistent with the stdlib naming conventions</li>
<li>The generic type parameter for BufferBuilder was changed from ArrowPrimitiveType to ArrowNativeType</li>
</ul>
<h3>DataFusion</h3>
<h4>SQL</h4>
<p>In this release, we clarified that DataFusion will standardize on the PostgreSQL SQL dialect.</p>
<p>New SQL support:</p>
<ul>
<li>JOIN, LEFT JOIN, RIGHT JOIN</li>
<li>COUNT DISTINCT</li>
<li>CASE WHEN</li>
<li>USING</li>
<li>BETWEEN</li>
<li>IS IN</li>
<li>Nested SELECT statements</li>
<li>Nested expressions in aggregations</li>
<li>LOWER(), UPPER(), TRIM()</li>
<li>NULLIF()</li>
<li>SHA224(), SHA256(), SHA384(), SHA512()</li>
<li>DATE_TRUNC()</li>
</ul>
<h4>Performance</h4>
<p>There have been numerous performance improvements in this release:</p>
<ul>
<li>Optimizations for JOINs such as using vectorized hashing.</li>
<li>We started with adding statistics and cost-based optimizations. We choose the smaller side of a join as the build
side if possible.</li>
<li>Improved parallelism when reading partitioned Parquet data sources</li>
<li>Concurrent writes of CSV and Parquet partitions to file</li>
</ul>
<h3>Parquet Crate</h3>
<p>The Parquet has the following improvements:</p>
<ul>
<li>Nested reading</li>
<li>Support to write booleans</li>
<li>Add support to write temporal types</li>
</ul>
<h3>Roadmap for 4.0.0</h3>
<p>We have also started building up a shared community roadmap for 4.0: <a href="https://docs.google.com/document/d/1qspsOM_dknOxJKdGvKbC1aoVoO0M3i6x1CIo58mmN2Y/edit#heading=h.kstb571j5g5j" target="_blank" rel="noopener">Apache Arrow: Crowd Sourced Rust Roadmap for
Arrow 4.0, January 2021</a>.</p>
</main>
</div>
<hr>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p>Apache Arrow, Arrow, Apache, the Apache logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
<p>© 2016-2025 The Apache Software Foundation</p>
</div>
<div class="col-md-3">
<a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener">
<img src="https://www.apache.org/events/current-event-234x60.png">
</a>
</div>
</div>
</footer>
</div>
</body>
</html>