The Apache Arrow team is pleased to announce the 15.0.0 release. This covers over 3 months of development work and includes 344 resolved issues on [536 distinct commits][2] from [101 distinct contributors][2]. See the Install Page to learn how to get the libraries for your platform.
The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the [complete changelog][3].
Since the 14.0.0 release, Curt Hagenlocher, Xuwei Fu, James Duong and Felipe Oliveira Carvalho have been invited to be committers. Jonathan Keane and Raúl Cumplido have joined the Project Management Committee (PMC).
As per our tradition of rotating the PMC chair once a year Andy Grove was elected as the new PMC chair and VP.
Thanks for your contributions and participation in the project!
New format strings have been added for ListView, LargeListView, BinaryView and StringView array types.
Flight SQL is now considered stable (GH-39037). The Flight SQL specification was clarified regarding how the result set schema of a prepared statement is affected by bound parameters (GH-37061).
The JDBC Arrow Flight SQL driver now supports mTLS authentication (GH-38460) and bind parameters (GH-33475), follows the Flight RPC spec when fetching data (GH-34532), and can reuse credentials across metadata and data connections (GH-38576). On macOS it will also use the system keychain to be consistent with other platforms (GH-39014). Applications can also retrieve the underlying Flight RPC metadata from the JDBC driver (GH-38024, GH-38022).
For C++ notes refer to the full changelog.
Removal of build targets:
New features:
Fixes and improved compatibility:
ValueLen
to Binary and String array interface (GH-38458)GetToTimeFunc
for fixed timestamp data types (GH-38795)We expect a breaking change in the next release, Arrow 16.0.0. Support for Java 9 modules is coming, but that will require changing the JVM flags used to launch your application (GH-38998). Arrow 15.0.0 is not affected.
A bill-of-materials (BOM) package was added to make it easier to depend on multiple Arrow libraries (GH-38264).
The JDBC adapter (separate from the JDBC driver) now supports 256-bit decimals (GH-39484) and throws more informative exceptions (GH-39355).
Various improvements were made to utilities for working with vectors (GH-38662, GH-38614, GH-38511, GH-38254, GH-38246).
This release comes with new features and APIs. We also removed getByteLength
to reduce bundle sizes.
New Features with API changes
Package changes
Compatibility notes:
ParquetDataset
custom implementation has been removed and only the new dataset API is now in use GH-31303.New features:
RecordBatchReader
constructor from stream object implementing the PyCapsule Protocol has been added [GH-39217](https://github.com/apache/arrow/issues/39217) together with some additional documentation [GH-39196](https://github.com/apache/arrow/issues/39196).__dlpack__
and __dlpack_device__
dunder methods GH-33984.CacheOptions
are now configurable from Python as part of the pyarrow.dataset.ParquetFragmentScanOptions
GH-36441.RowGroupMetaData
GH-35331.Other improvements:
FileOutputStream
is exposed for the OSFile
class GH-38857.make_fragment
in the pyarrow datasets (pyarrow.dataset.FileFormat
and pyarrow.dataset.ParquetFileFormat
) GH-37857.FixedSizeListArray.from_arrays
GH-34316to/from_struct_array
are added to the pyarrow.Table
class GH-33500..nbytes
which is improving performance when calculating the data size GH-39096.DatetimeTZBlock
has been removed GH-38341.DataType
instance can be passed to MapType.from_arrays
constructor GH-39515.Relevant bug fixes:
S3FileSystem
equals None
segfault has been fixed GH-38535.dictionary_encode(dictionary)
GH-34890.base::prod
have been added so you can now use it in your dplyr pipelines (i.e., tbl |> summarize(prod(col))
) without having to pull the data into R GH-38601.dimnames
or colnames
on Dataset
objects now returns a useful result rather than just NULL
GH-38377.code()
method on Schema objects now takes an optional namespace
argument which, when TRUE
, prefixes names with arrow::
which makes the output more portable GH-38144.s3_bucket
, S3FileSystem
), the debug log level for S3 can be set with the AWS_S3_LOG_LEVEL
environment variable. See ?S3FileSystem
for more information. GH-38267sub
, gsub
, stringr::str_replace
, stringr::str_replace_all
are passed a length > 1 vector of values in pattern
GH-39219.?open_dataset
documenting how to use the ND-JSON support added in arrow 13.0.0 GH-38258.to_duckdb()
) no longer results in warnings when quitting your R session. GH-38495For more on what’s in the 15.0.0 R package, see the [R changelog][4].
The Rust projects have moved to separate repositories outside the main Arrow monorepo. For notes on the latest release of the Rust implementation, see the latest [Arrow Rust changelog][5].
[2]: {{ site.baseurl }}/release/15.0.0.html#contributors [3]: {{ site.baseurl }}/release/15.0.0.html#changelog [4]: {{ site.baseurl }}/docs/r/news/ [5]: https://github.com/apache/arrow-rs/tags