| <!DOCTYPE html> |
| <html lang="en-US"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags --> |
| |
| <title>Apache Arrow 0.15.0 Release | Apache Arrow</title> |
| |
| |
| <!-- Begin Jekyll SEO tag v2.8.0 --> |
| <meta name="generator" content="Jekyll v4.4.1" /> |
| <meta property="og:title" content="Apache Arrow 0.15.0 Release" /> |
| <meta name="author" content="pmc" /> |
| <meta property="og:locale" content="en_US" /> |
| <meta name="description" content="The Apache Arrow team is pleased to announce the 0.15.0 release. This covers about 3 months of development work and includes 687 resolved issues from 80 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available. About a third of issues closed (240) were classified as bug fixes, so this release brings many stability, memory use, and performance improvements over 0.14.x. We will discuss some of the language-specific improvements and new features below. New committers Since the 0.14.0 release, we've added four new committers: Ben Kietzman David Li Kenta Murata Neal Richardson In addition, Sebastien Binet and Micah Kornfield have joined the PMC. Thank you for all your contributions! Columnar Format Notes The format gets new datatypes : LargeList(ARROW-4810), LargeBinary and LargeString (ARROW-750). LargeList is similar to List but with 64-bit offsets instead of 32-bit. The same relationship holds for LargeBinary and LargeString with respect to Binary and String. Since the last major release, we have also made a significant overhaul of the columnar format documentation to be clearer and easier to follow for implementation creators. Upcoming Columnar Format Stability and Library / Format Version Split The Arrow community has decided to make a 1.0.0 release of the project marking formal stability of the columnar format and binary protocol, including explicit forward and backward compatibility guarantees. You can read about these guarantees in the new documentation page about versioning. Starting with 1.0.0, we will give the columnar format and libraries separate version numbers. This will allow the library versions to evolve without creating confusion or uncertainty about whether the Arrow columnar format remains stable or not. Columnar "Streaming Protocol" Change since 0.14.0 Since 0.14.0 we have modified the IPC "encapsulated message" format to insert 4 bytes of additional data in the message preamble to ensure that the Flatbuffers metadata starts on an aligned offset. By default, IPC streams generated by 0.15.0 and later will not be readable by library versions 0.14.1 and prior. Implementations have offered options to write messages using the now "legacy" message format. For users who cannot upgrade to version 0.15.0 in all parts of their system, such as Apache Spark users, we recommend one of the two routes: If using pyarrow, set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1 when using 0.15.0 and sending data to an old library Wait to upgrade all components simultaneously We do not anticipate making this kind of change again in the near future and would not have made such a non-forward-compatible change unless we deemed it very important. Arrow Flight notes A GetFlightSchema method is added to the Flight RPC protocol (ARROW-6094). As the name suggests, it returns the schema for a given Flight descriptor on the server. This is useful for cases where the Flight locations are not immediately available, depending on the server implementation. Flight implementations for C++ and Java now implement half-closed semantics for DoPut (ARROW-6063). The client can close the writing end of the stream to signal that it has finished sending the Flight data, but still receive the batch-specific response and its associated metadata. C++ notes C++ now supports the LargeList, LargeBinary and LargeString datatypes. The Status class gains the ability to carry additional subsystem-specific data with it, under the form of an opaque StatusDetail interface (ARROW-4036). This allows, for example, to store not only an exception message coming from Python but the actual Python exception object, such as to raise it again if the Status is propagated back to Python. It can also enable the consumer of Status to inspect the subsystem-specific error, such as a finer-grained Flight error code. DataType and Schema equality are significantly faster (ARROW-6038). The Column class is completely removed, as it did not have a strong enough motivation for existing between ChunkedArray and RecordBatch / Table (ARROW-5893). C++: Parquet The 0.15 release includes many improvements to the Apache Parquet C++ internals, resulting in greatly improved read and write performance. We described the work and published some benchmarks in a recent blog post. C++: CSV reader The CSV reader is now more flexible in terms of how column names are chosen (ARROW-6231) and column selection (ARROW-5977). C++: Memory Allocation Layer Arrow now has the option to allocate memory using the mimalloc memory allocator. jemalloc is still preferred for best performance, but mimalloc is a reasonable alternative to the system allocator on Windows where jemalloc is not currently supported. Also, we now expose explicit global functions to get a MemoryPool for each of the jemalloc allocator, mimalloc allocator and system allocator (ARROW-6292). The vendored jemalloc version is bumped from 4.5.x to 5.2.x (ARROW-6549). Performance characteristics may differ on memory allocation-heavy workloads, though we did not notice any significant regression on our suite of micro-benchmarks (and a multi-threaded benchmark of reading a CSV file showed a 25% speedup). C++: Filesystem layer A FileSystem implementation to access Amazon S3-compatible filesystems is now available. It depends on the AWS SDK for C++. C++: I/O layer Significant improvements were made to the Arrow I/O stack. ARROW-6180: Add RandomAccessFile::GetStream that returns an InputStream over a fixed subset of the file. ARROW-6381: Improve performance of small writes with BufferOutputStream ARROW-2490: Streamline concurrency semantics of InputStream implementations, and add debug checks for race conditions between non-thread-safe InputStream operations. ARROW-6527: Add an OutputStream::Write overload that takes an owned Buffer rather than a raw memory area. This allows OutputStream implementations to safely implement delayed writing without having to copy the data. C++: Tensors There are three improvements of Tensor and SparseTensor in this release. Add Tensor::Value template function for element access Add EqualOptions support in Tensor::Equals function, that allows us to control the way to compare two float tensors Add smaller bit-width index supports in SparseTensor C# Notes We have fixed some bugs causing incompatibilities between C# and other Arrow implementations. Java notes Added an initial version of an Avro adapter To improve the JDBC adapter performance, refactored consume data logic and implemented an iterator API to prevent loading all data into one vector Implemented subField encoding for complex type, now List and Struct vectors subField encoding is available Implemented visitor API for vector/range/type/approx equals compare Performed a lot of optimization and refactoring for DictionaryEncoder, supporting all data types and avoiding memory copy via hash table and visitor API Introduced Error Prone into code base to catch more potential errors earlier Fixed the bug where dictionary entries were required in IPC streams even when empty; readers can now also read interleaved messages Python notes The FileSystem API, implemented in C++, is now available in Python (ARROW-5494). The API for extension types has been straightened and definition of custom extension types in Python is now more powerful (ARROW-5610). Sparse tensors are now available in Python (ARROW-4453). A potential crash when handling Python dates and datetimes was fixed (ARROW-6597). Based on a mailing list discussion, we are looking for help with maintaining our Python wheels. Community members have found that the wheels take up a great deal of maintenance time, so if you or your organization depend on pip install pyarrow working, we would appreciate your assistance. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby includes the following backward incompatible changes. Remove Arrow::Struct and use Hash instead. Add Arrow::Time for Arrow::Time{32,64}DataType value. Arrow::Decimal128Array#get_value returns BigDecimal. Ruby improves the performance of Arrow#values. Rust notes A number of core Arrow improvements were made to the Rust library. Add explicit SIMD vectorization for the divide kernel Add a feature to disable SIMD Use "if cfg!" pattern Optimizations to BooleanBufferBuilder::append_slice Implemented Debug trait for List/Struct/BinaryArray Improvements related to Rust Parquet and DataFusion are detailed next. Rust Parquet Implement Arrow record reader Add converter that is used to convert record reader's content to arrow primitive array. Rust DataFusion Preview of new query execution engine using an extensible trait-based physical execution plan that supports parallel execution using threads ExecutionContext now has a register_parquet convenience method for registering Parquet data sources Fixed bug in type coercion optimizer rule TableProvider.scan() now returns a thread-safe BatchIterator Remove use of bare trait objects (switched to using dyn syntax) Adds casting from unsigned to signed integer data types R notes A major development since the 0.14 release was the arrival of the arrow R package on CRAN. We wrote about this in August on the Arrow blog. In addition to the package availability on CRAN, we also published package documentation on the Arrow website. The 0.15 R package includes many of the enhancements in the C++ library release, such as the Parquet performance improvements and the FileSystem API. In addition, there are a number of upgrades that make it easier to read and write data, specify types and schema, and interact with Arrow tables and record batches in R. For more on what's in the 0.15 R package, see the changelog. Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there." /> |
| <meta property="og:description" content="The Apache Arrow team is pleased to announce the 0.15.0 release. This covers about 3 months of development work and includes 687 resolved issues from 80 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available. About a third of issues closed (240) were classified as bug fixes, so this release brings many stability, memory use, and performance improvements over 0.14.x. We will discuss some of the language-specific improvements and new features below. New committers Since the 0.14.0 release, we've added four new committers: Ben Kietzman David Li Kenta Murata Neal Richardson In addition, Sebastien Binet and Micah Kornfield have joined the PMC. Thank you for all your contributions! Columnar Format Notes The format gets new datatypes : LargeList(ARROW-4810), LargeBinary and LargeString (ARROW-750). LargeList is similar to List but with 64-bit offsets instead of 32-bit. The same relationship holds for LargeBinary and LargeString with respect to Binary and String. Since the last major release, we have also made a significant overhaul of the columnar format documentation to be clearer and easier to follow for implementation creators. Upcoming Columnar Format Stability and Library / Format Version Split The Arrow community has decided to make a 1.0.0 release of the project marking formal stability of the columnar format and binary protocol, including explicit forward and backward compatibility guarantees. You can read about these guarantees in the new documentation page about versioning. Starting with 1.0.0, we will give the columnar format and libraries separate version numbers. This will allow the library versions to evolve without creating confusion or uncertainty about whether the Arrow columnar format remains stable or not. Columnar "Streaming Protocol" Change since 0.14.0 Since 0.14.0 we have modified the IPC "encapsulated message" format to insert 4 bytes of additional data in the message preamble to ensure that the Flatbuffers metadata starts on an aligned offset. By default, IPC streams generated by 0.15.0 and later will not be readable by library versions 0.14.1 and prior. Implementations have offered options to write messages using the now "legacy" message format. For users who cannot upgrade to version 0.15.0 in all parts of their system, such as Apache Spark users, we recommend one of the two routes: If using pyarrow, set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1 when using 0.15.0 and sending data to an old library Wait to upgrade all components simultaneously We do not anticipate making this kind of change again in the near future and would not have made such a non-forward-compatible change unless we deemed it very important. Arrow Flight notes A GetFlightSchema method is added to the Flight RPC protocol (ARROW-6094). As the name suggests, it returns the schema for a given Flight descriptor on the server. This is useful for cases where the Flight locations are not immediately available, depending on the server implementation. Flight implementations for C++ and Java now implement half-closed semantics for DoPut (ARROW-6063). The client can close the writing end of the stream to signal that it has finished sending the Flight data, but still receive the batch-specific response and its associated metadata. C++ notes C++ now supports the LargeList, LargeBinary and LargeString datatypes. The Status class gains the ability to carry additional subsystem-specific data with it, under the form of an opaque StatusDetail interface (ARROW-4036). This allows, for example, to store not only an exception message coming from Python but the actual Python exception object, such as to raise it again if the Status is propagated back to Python. It can also enable the consumer of Status to inspect the subsystem-specific error, such as a finer-grained Flight error code. DataType and Schema equality are significantly faster (ARROW-6038). The Column class is completely removed, as it did not have a strong enough motivation for existing between ChunkedArray and RecordBatch / Table (ARROW-5893). C++: Parquet The 0.15 release includes many improvements to the Apache Parquet C++ internals, resulting in greatly improved read and write performance. We described the work and published some benchmarks in a recent blog post. C++: CSV reader The CSV reader is now more flexible in terms of how column names are chosen (ARROW-6231) and column selection (ARROW-5977). C++: Memory Allocation Layer Arrow now has the option to allocate memory using the mimalloc memory allocator. jemalloc is still preferred for best performance, but mimalloc is a reasonable alternative to the system allocator on Windows where jemalloc is not currently supported. Also, we now expose explicit global functions to get a MemoryPool for each of the jemalloc allocator, mimalloc allocator and system allocator (ARROW-6292). The vendored jemalloc version is bumped from 4.5.x to 5.2.x (ARROW-6549). Performance characteristics may differ on memory allocation-heavy workloads, though we did not notice any significant regression on our suite of micro-benchmarks (and a multi-threaded benchmark of reading a CSV file showed a 25% speedup). C++: Filesystem layer A FileSystem implementation to access Amazon S3-compatible filesystems is now available. It depends on the AWS SDK for C++. C++: I/O layer Significant improvements were made to the Arrow I/O stack. ARROW-6180: Add RandomAccessFile::GetStream that returns an InputStream over a fixed subset of the file. ARROW-6381: Improve performance of small writes with BufferOutputStream ARROW-2490: Streamline concurrency semantics of InputStream implementations, and add debug checks for race conditions between non-thread-safe InputStream operations. ARROW-6527: Add an OutputStream::Write overload that takes an owned Buffer rather than a raw memory area. This allows OutputStream implementations to safely implement delayed writing without having to copy the data. C++: Tensors There are three improvements of Tensor and SparseTensor in this release. Add Tensor::Value template function for element access Add EqualOptions support in Tensor::Equals function, that allows us to control the way to compare two float tensors Add smaller bit-width index supports in SparseTensor C# Notes We have fixed some bugs causing incompatibilities between C# and other Arrow implementations. Java notes Added an initial version of an Avro adapter To improve the JDBC adapter performance, refactored consume data logic and implemented an iterator API to prevent loading all data into one vector Implemented subField encoding for complex type, now List and Struct vectors subField encoding is available Implemented visitor API for vector/range/type/approx equals compare Performed a lot of optimization and refactoring for DictionaryEncoder, supporting all data types and avoiding memory copy via hash table and visitor API Introduced Error Prone into code base to catch more potential errors earlier Fixed the bug where dictionary entries were required in IPC streams even when empty; readers can now also read interleaved messages Python notes The FileSystem API, implemented in C++, is now available in Python (ARROW-5494). The API for extension types has been straightened and definition of custom extension types in Python is now more powerful (ARROW-5610). Sparse tensors are now available in Python (ARROW-4453). A potential crash when handling Python dates and datetimes was fixed (ARROW-6597). Based on a mailing list discussion, we are looking for help with maintaining our Python wheels. Community members have found that the wheels take up a great deal of maintenance time, so if you or your organization depend on pip install pyarrow working, we would appreciate your assistance. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby includes the following backward incompatible changes. Remove Arrow::Struct and use Hash instead. Add Arrow::Time for Arrow::Time{32,64}DataType value. Arrow::Decimal128Array#get_value returns BigDecimal. Ruby improves the performance of Arrow#values. Rust notes A number of core Arrow improvements were made to the Rust library. Add explicit SIMD vectorization for the divide kernel Add a feature to disable SIMD Use "if cfg!" pattern Optimizations to BooleanBufferBuilder::append_slice Implemented Debug trait for List/Struct/BinaryArray Improvements related to Rust Parquet and DataFusion are detailed next. Rust Parquet Implement Arrow record reader Add converter that is used to convert record reader's content to arrow primitive array. Rust DataFusion Preview of new query execution engine using an extensible trait-based physical execution plan that supports parallel execution using threads ExecutionContext now has a register_parquet convenience method for registering Parquet data sources Fixed bug in type coercion optimizer rule TableProvider.scan() now returns a thread-safe BatchIterator Remove use of bare trait objects (switched to using dyn syntax) Adds casting from unsigned to signed integer data types R notes A major development since the 0.14 release was the arrival of the arrow R package on CRAN. We wrote about this in August on the Arrow blog. In addition to the package availability on CRAN, we also published package documentation on the Arrow website. The 0.15 R package includes many of the enhancements in the C++ library release, such as the Parquet performance improvements and the FileSystem API. In addition, there are a number of upgrades that make it easier to read and write data, specify types and schema, and interact with Arrow tables and record batches in R. For more on what's in the 0.15 R package, see the changelog. Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there." /> |
| <link rel="canonical" href="https://arrow.apache.org/blog/2019/10/06/0.15.0-release/" /> |
| <meta property="og:url" content="https://arrow.apache.org/blog/2019/10/06/0.15.0-release/" /> |
| <meta property="og:site_name" content="Apache Arrow" /> |
| <meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" /> |
| <meta property="og:type" content="article" /> |
| <meta property="article:published_time" content="2019-10-06T02:00:00-04:00" /> |
| <meta name="twitter:card" content="summary_large_image" /> |
| <meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" /> |
| <meta property="twitter:title" content="Apache Arrow 0.15.0 Release" /> |
| <script type="application/ld+json"> |
| {"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"pmc"},"dateModified":"2019-10-06T02:00:00-04:00","datePublished":"2019-10-06T02:00:00-04:00","description":"The Apache Arrow team is pleased to announce the 0.15.0 release. This covers about 3 months of development work and includes 687 resolved issues from 80 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The complete changelog is also available. About a third of issues closed (240) were classified as bug fixes, so this release brings many stability, memory use, and performance improvements over 0.14.x. We will discuss some of the language-specific improvements and new features below. New committers Since the 0.14.0 release, we've added four new committers: Ben Kietzman David Li Kenta Murata Neal Richardson In addition, Sebastien Binet and Micah Kornfield have joined the PMC. Thank you for all your contributions! Columnar Format Notes The format gets new datatypes : LargeList(ARROW-4810), LargeBinary and LargeString (ARROW-750). LargeList is similar to List but with 64-bit offsets instead of 32-bit. The same relationship holds for LargeBinary and LargeString with respect to Binary and String. Since the last major release, we have also made a significant overhaul of the columnar format documentation to be clearer and easier to follow for implementation creators. Upcoming Columnar Format Stability and Library / Format Version Split The Arrow community has decided to make a 1.0.0 release of the project marking formal stability of the columnar format and binary protocol, including explicit forward and backward compatibility guarantees. You can read about these guarantees in the new documentation page about versioning. Starting with 1.0.0, we will give the columnar format and libraries separate version numbers. This will allow the library versions to evolve without creating confusion or uncertainty about whether the Arrow columnar format remains stable or not. Columnar "Streaming Protocol" Change since 0.14.0 Since 0.14.0 we have modified the IPC "encapsulated message" format to insert 4 bytes of additional data in the message preamble to ensure that the Flatbuffers metadata starts on an aligned offset. By default, IPC streams generated by 0.15.0 and later will not be readable by library versions 0.14.1 and prior. Implementations have offered options to write messages using the now "legacy" message format. For users who cannot upgrade to version 0.15.0 in all parts of their system, such as Apache Spark users, we recommend one of the two routes: If using pyarrow, set the environment variable ARROW_PRE_0_15_IPC_FORMAT=1 when using 0.15.0 and sending data to an old library Wait to upgrade all components simultaneously We do not anticipate making this kind of change again in the near future and would not have made such a non-forward-compatible change unless we deemed it very important. Arrow Flight notes A GetFlightSchema method is added to the Flight RPC protocol (ARROW-6094). As the name suggests, it returns the schema for a given Flight descriptor on the server. This is useful for cases where the Flight locations are not immediately available, depending on the server implementation. Flight implementations for C++ and Java now implement half-closed semantics for DoPut (ARROW-6063). The client can close the writing end of the stream to signal that it has finished sending the Flight data, but still receive the batch-specific response and its associated metadata. C++ notes C++ now supports the LargeList, LargeBinary and LargeString datatypes. The Status class gains the ability to carry additional subsystem-specific data with it, under the form of an opaque StatusDetail interface (ARROW-4036). This allows, for example, to store not only an exception message coming from Python but the actual Python exception object, such as to raise it again if the Status is propagated back to Python. It can also enable the consumer of Status to inspect the subsystem-specific error, such as a finer-grained Flight error code. DataType and Schema equality are significantly faster (ARROW-6038). The Column class is completely removed, as it did not have a strong enough motivation for existing between ChunkedArray and RecordBatch / Table (ARROW-5893). C++: Parquet The 0.15 release includes many improvements to the Apache Parquet C++ internals, resulting in greatly improved read and write performance. We described the work and published some benchmarks in a recent blog post. C++: CSV reader The CSV reader is now more flexible in terms of how column names are chosen (ARROW-6231) and column selection (ARROW-5977). C++: Memory Allocation Layer Arrow now has the option to allocate memory using the mimalloc memory allocator. jemalloc is still preferred for best performance, but mimalloc is a reasonable alternative to the system allocator on Windows where jemalloc is not currently supported. Also, we now expose explicit global functions to get a MemoryPool for each of the jemalloc allocator, mimalloc allocator and system allocator (ARROW-6292). The vendored jemalloc version is bumped from 4.5.x to 5.2.x (ARROW-6549). Performance characteristics may differ on memory allocation-heavy workloads, though we did not notice any significant regression on our suite of micro-benchmarks (and a multi-threaded benchmark of reading a CSV file showed a 25% speedup). C++: Filesystem layer A FileSystem implementation to access Amazon S3-compatible filesystems is now available. It depends on the AWS SDK for C++. C++: I/O layer Significant improvements were made to the Arrow I/O stack. ARROW-6180: Add RandomAccessFile::GetStream that returns an InputStream over a fixed subset of the file. ARROW-6381: Improve performance of small writes with BufferOutputStream ARROW-2490: Streamline concurrency semantics of InputStream implementations, and add debug checks for race conditions between non-thread-safe InputStream operations. ARROW-6527: Add an OutputStream::Write overload that takes an owned Buffer rather than a raw memory area. This allows OutputStream implementations to safely implement delayed writing without having to copy the data. C++: Tensors There are three improvements of Tensor and SparseTensor in this release. Add Tensor::Value template function for element access Add EqualOptions support in Tensor::Equals function, that allows us to control the way to compare two float tensors Add smaller bit-width index supports in SparseTensor C# Notes We have fixed some bugs causing incompatibilities between C# and other Arrow implementations. Java notes Added an initial version of an Avro adapter To improve the JDBC adapter performance, refactored consume data logic and implemented an iterator API to prevent loading all data into one vector Implemented subField encoding for complex type, now List and Struct vectors subField encoding is available Implemented visitor API for vector/range/type/approx equals compare Performed a lot of optimization and refactoring for DictionaryEncoder, supporting all data types and avoiding memory copy via hash table and visitor API Introduced Error Prone into code base to catch more potential errors earlier Fixed the bug where dictionary entries were required in IPC streams even when empty; readers can now also read interleaved messages Python notes The FileSystem API, implemented in C++, is now available in Python (ARROW-5494). The API for extension types has been straightened and definition of custom extension types in Python is now more powerful (ARROW-5610). Sparse tensors are now available in Python (ARROW-4453). A potential crash when handling Python dates and datetimes was fixed (ARROW-6597). Based on a mailing list discussion, we are looking for help with maintaining our Python wheels. Community members have found that the wheels take up a great deal of maintenance time, so if you or your organization depend on pip install pyarrow working, we would appreciate your assistance. Ruby and C GLib notes Ruby and C GLib continues to follow the features in the C++ project. Ruby includes the following backward incompatible changes. Remove Arrow::Struct and use Hash instead. Add Arrow::Time for Arrow::Time{32,64}DataType value. Arrow::Decimal128Array#get_value returns BigDecimal. Ruby improves the performance of Arrow#values. Rust notes A number of core Arrow improvements were made to the Rust library. Add explicit SIMD vectorization for the divide kernel Add a feature to disable SIMD Use "if cfg!" pattern Optimizations to BooleanBufferBuilder::append_slice Implemented Debug trait for List/Struct/BinaryArray Improvements related to Rust Parquet and DataFusion are detailed next. Rust Parquet Implement Arrow record reader Add converter that is used to convert record reader's content to arrow primitive array. Rust DataFusion Preview of new query execution engine using an extensible trait-based physical execution plan that supports parallel execution using threads ExecutionContext now has a register_parquet convenience method for registering Parquet data sources Fixed bug in type coercion optimizer rule TableProvider.scan() now returns a thread-safe BatchIterator Remove use of bare trait objects (switched to using dyn syntax) Adds casting from unsigned to signed integer data types R notes A major development since the 0.14 release was the arrival of the arrow R package on CRAN. We wrote about this in August on the Arrow blog. In addition to the package availability on CRAN, we also published package documentation on the Arrow website. The 0.15 R package includes many of the enhancements in the C++ library release, such as the Parquet performance improvements and the FileSystem API. In addition, there are a number of upgrades that make it easier to read and write data, specify types and schema, and interact with Arrow tables and record batches in R. For more on what's in the 0.15 R package, see the changelog. Community Discussions Ongoing There are a number of active discussions ongoing on the developer dev@arrow.apache.org mailing list. We look forward to hearing from the community there.","headline":"Apache Arrow 0.15.0 Release","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2019/10/06/0.15.0-release/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"url":"https://arrow.apache.org/blog/2019/10/06/0.15.0-release/"}</script> |
| <!-- End Jekyll SEO tag --> |
| |
| |
| <!-- favicons --> |
| <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1"> |
| <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2"> |
| <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3"> |
| <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4"> |
| <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5"> |
| <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6"> |
| <!-- dark mode favicons --> |
| <link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1"> |
| <link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2"> |
| <link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3"> |
| <link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4"> |
| <link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5"> |
| <link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6"> |
| |
| <script> |
| // Switch to the dark-mode favicons if prefers-color-scheme: dark |
| function onUpdate() { |
| light1 = document.querySelector('link#light1'); |
| light2 = document.querySelector('link#light2'); |
| light3 = document.querySelector('link#light3'); |
| light4 = document.querySelector('link#light4'); |
| light5 = document.querySelector('link#light5'); |
| light6 = document.querySelector('link#light6'); |
| |
| dark1 = document.querySelector('link#dark1'); |
| dark2 = document.querySelector('link#dark2'); |
| dark3 = document.querySelector('link#dark3'); |
| dark4 = document.querySelector('link#dark4'); |
| dark5 = document.querySelector('link#dark5'); |
| dark6 = document.querySelector('link#dark6'); |
| |
| if (matcher.matches) { |
| light1.remove(); |
| light2.remove(); |
| light3.remove(); |
| light4.remove(); |
| light5.remove(); |
| light6.remove(); |
| document.head.append(dark1); |
| document.head.append(dark2); |
| document.head.append(dark3); |
| document.head.append(dark4); |
| document.head.append(dark5); |
| document.head.append(dark6); |
| } else { |
| dark1.remove(); |
| dark2.remove(); |
| dark3.remove(); |
| dark4.remove(); |
| dark5.remove(); |
| dark6.remove(); |
| document.head.append(light1); |
| document.head.append(light2); |
| document.head.append(light3); |
| document.head.append(light4); |
| document.head.append(light5); |
| document.head.append(light6); |
| } |
| } |
| matcher = window.matchMedia('(prefers-color-scheme: dark)'); |
| matcher.addListener(onUpdate); |
| onUpdate(); |
| </script> |
| |
| <link href="/css/main.css" rel="stylesheet"> |
| <link href="/css/syntax.css" rel="stylesheet"> |
| <script src="/javascript/main.js"></script> |
| |
| <!-- Matomo --> |
| <script> |
| var _paq = window._paq = window._paq || []; |
| /* tracker methods like "setCustomDimension" should be called before "trackPageView" */ |
| /* We explicitly disable cookie tracking to avoid privacy issues */ |
| _paq.push(['disableCookies']); |
| _paq.push(['trackPageView']); |
| _paq.push(['enableLinkTracking']); |
| (function() { |
| var u="https://analytics.apache.org/"; |
| _paq.push(['setTrackerUrl', u+'matomo.php']); |
| _paq.push(['setSiteId', '20']); |
| var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0]; |
| g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s); |
| })(); |
| </script> |
| <!-- End Matomo Code --> |
| |
| |
| <link type="application/atom+xml" rel="alternate" href="https://arrow.apache.org/feed.xml" title="Apache Arrow" /> |
| </head> |
| |
| |
| <body class="wrap"> |
| <header> |
| <nav class="navbar navbar-expand-md navbar-dark bg-dark"> |
| |
| <a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"></a> |
| |
| <button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation"> |
| <span class="navbar-toggler-icon"></span> |
| </button> |
| |
| <!-- Collect the nav links, forms, and other content for toggling --> |
| <div class="collapse navbar-collapse justify-content-end" id="arrow-navbar"> |
| <ul class="nav navbar-nav"> |
| <li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li> |
| <li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li> |
| <li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownGetArrow" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Get Arrow |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow"> |
| <a class="dropdown-item" href="/install/">Install</a> |
| <a class="dropdown-item" href="/release/">Releases</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownDocumentation" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Docs |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation"> |
| <a class="dropdown-item" href="/docs">Project Docs</a> |
| <a class="dropdown-item" href="/docs/format/Columnar.html">Format</a> |
| <hr> |
| <a class="dropdown-item" href="/docs/c_glib">C GLib</a> |
| <a class="dropdown-item" href="/docs/cpp">C++</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md" target="_blank" rel="noopener">C#</a> |
| <a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow" target="_blank" rel="noopener">Go</a> |
| <a class="dropdown-item" href="/docs/java">Java</a> |
| <a class="dropdown-item" href="/docs/js">JavaScript</a> |
| <a class="dropdown-item" href="/julia/">Julia</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md" target="_blank" rel="noopener">MATLAB</a> |
| <a class="dropdown-item" href="/docs/python">Python</a> |
| <a class="dropdown-item" href="/docs/r">R</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md" target="_blank" rel="noopener">Ruby</a> |
| <a class="dropdown-item" href="https://docs.rs/arrow/latest" target="_blank" rel="noopener">Rust</a> |
| <a class="dropdown-item" href="/swift">Swift</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSource" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Source |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownSource"> |
| <a class="dropdown-item" href="https://github.com/apache/arrow" target="_blank" rel="noopener">Main Repo</a> |
| <hr> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/c_glib" target="_blank" rel="noopener">C GLib</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/cpp" target="_blank" rel="noopener">C++</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/csharp" target="_blank" rel="noopener">C#</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-go" target="_blank" rel="noopener">Go</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-java" target="_blank" rel="noopener">Java</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-js" target="_blank" rel="noopener">JavaScript</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-julia" target="_blank" rel="noopener">Julia</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/matlab" target="_blank" rel="noopener">MATLAB</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/python" target="_blank" rel="noopener">Python</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/r" target="_blank" rel="noopener">R</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/ruby" target="_blank" rel="noopener">Ruby</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-rs" target="_blank" rel="noopener">Rust</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow-swift" target="_blank" rel="noopener">Swift</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSubprojects" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Subprojects |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects"> |
| <a class="dropdown-item" href="/adbc">ADBC</a> |
| <a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a> |
| <a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a> |
| <a class="dropdown-item" href="https://datafusion.apache.org" target="_blank" rel="noopener">DataFusion</a> |
| <a class="dropdown-item" href="/nanoarrow">nanoarrow</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownCommunity" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| Community |
| </a> |
| <div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity"> |
| <a class="dropdown-item" href="/community/">Communication</a> |
| <a class="dropdown-item" href="/docs/developers/index.html">Contributing</a> |
| <a class="dropdown-item" href="https://github.com/apache/arrow/issues" target="_blank" rel="noopener">Issue Tracker</a> |
| <a class="dropdown-item" href="/committers/">Governance</a> |
| <a class="dropdown-item" href="/use_cases/">Use Cases</a> |
| <a class="dropdown-item" href="/powered_by/">Powered By</a> |
| <a class="dropdown-item" href="/visual_identity/">Visual Identity</a> |
| <a class="dropdown-item" href="/security/">Security</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener">Code of Conduct</a> |
| </div> |
| </li> |
| <li class="nav-item dropdown"> |
| <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownASF" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> |
| ASF Links |
| </a> |
| <div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF"> |
| <a class="dropdown-item" href="https://www.apache.org/" target="_blank" rel="noopener">ASF Website</a> |
| <a class="dropdown-item" href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">Donate</a> |
| <a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">Thanks</a> |
| <a class="dropdown-item" href="https://www.apache.org/security/" target="_blank" rel="noopener">Security</a> |
| </div> |
| </li> |
| </ul> |
| </div> |
| <!-- /.navbar-collapse --> |
| </nav> |
| |
| </header> |
| |
| <div class="container p-4 pt-5"> |
| <div class="col-md-8 mx-auto"> |
| <main role="main" class="pb-5"> |
| |
| <h1> |
| Apache Arrow 0.15.0 Release |
| </h1> |
| <hr class="mt-4 mb-3"> |
| |
| |
| |
| <p class="mb-4 pb-1"> |
| <span class="badge badge-secondary">Published</span> |
| <span class="published mr-3"> |
| 06 Oct 2019 |
| </span> |
| <br> |
| <span class="badge badge-secondary">By</span> |
| |
| <a class="mr-3" href="https://arrow.apache.org">The Apache Arrow PMC (pmc) </a> |
| |
| |
| |
| </p> |
| |
| |
| <!-- |
| |
| --> |
| <p>The Apache Arrow team is pleased to announce the 0.15.0 release. This covers |
| about 3 months of development work and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%200.15.0" target="_blank" rel="noopener"><strong>687 resolved issues</strong></a> |
| from <a href="https://arrow.apache.org/release/0.15.0.html#contributors"><strong>80 distinct contributors</strong></a>. See the Install Page to learn how to |
| get the libraries for your platform. The <a href="https://arrow.apache.org/release/0.15.0.html">complete changelog</a> is also |
| available.</p> |
| <p>About a third of issues closed (240) were classified as bug fixes, so this |
| release brings many stability, memory use, and performance improvements over |
| 0.14.x. We will discuss some of the language-specific improvements and new |
| features below.</p> |
| <h2>New committers</h2> |
| <p>Since the 0.14.0 release, we've added four new committers:</p> |
| <ul> |
| <li><a href="https://github.com/bkietz" target="_blank" rel="noopener">Ben Kietzman</a></li> |
| <li><a href="https://github.com/lidavidm" target="_blank" rel="noopener">David Li</a></li> |
| <li><a href="https://github.com/mrkn" target="_blank" rel="noopener">Kenta Murata</a></li> |
| <li><a href="https://github.com/nealrichardson" target="_blank" rel="noopener">Neal Richardson</a></li> |
| </ul> |
| <p>In addition, <a href="https://github.com/sbinet" target="_blank" rel="noopener">Sebastien Binet</a> and <a href="https://github.com/emkornfield" target="_blank" rel="noopener">Micah Kornfield</a> have joined the PMC.</p> |
| <p>Thank you for all your contributions!</p> |
| <h2>Columnar Format Notes</h2> |
| <p>The format gets new datatypes : LargeList(ARROW-4810), LargeBinary and |
| LargeString (ARROW-750). LargeList is similar to List but with 64-bit |
| offsets instead of 32-bit. The same relationship holds for LargeBinary |
| and LargeString with respect to Binary and String.</p> |
| <p>Since the last major release, we have also made a significant overhaul of the |
| <a href="https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst" target="_blank" rel="noopener">columnar format documentation</a> to be clearer and easier to follow for |
| implementation creators.</p> |
| <h2>Upcoming Columnar Format Stability and Library / Format Version Split</h2> |
| <p>The Arrow community has decided to make a 1.0.0 release of the project marking |
| formal stability of the columnar format and binary protocol, including explicit |
| forward and backward compatibility guarantees. You can read about these |
| guarantees in the <a href="https://github.com/apache/arrow/blob/master/docs/source/format/Versioning.rst" target="_blank" rel="noopener">new documentation page</a> about versioning.</p> |
| <p>Starting with 1.0.0, we will give the columnar format and libraries separate |
| version numbers. This will allow the library versions to evolve without |
| creating confusion or uncertainty about whether the Arrow columnar format |
| remains stable or not.</p> |
| <h2>Columnar "Streaming Protocol" Change since 0.14.0</h2> |
| <p>Since 0.14.0 we have modified the IPC "encapsulated message" format to insert 4 |
| bytes of additional data in the message preamble to ensure that the Flatbuffers |
| metadata starts on an aligned offset. By default, IPC streams generated by |
| 0.15.0 and later will not be readable by library versions 0.14.1 and |
| prior. Implementations have offered options to write messages using the now |
| "legacy" message format.</p> |
| <p>For users who cannot upgrade to version 0.15.0 in all parts of their system, |
| such as Apache Spark users, we recommend one of the two routes:</p> |
| <ul> |
| <li>If using pyarrow, set the environment variable <code>ARROW_PRE_0_15_IPC_FORMAT=1</code> |
| when using 0.15.0 and sending data to an old library</li> |
| <li>Wait to upgrade all components simultaneously</li> |
| </ul> |
| <p>We do not anticipate making this kind of change again in the near future and |
| would not have made such a non-forward-compatible change unless we deemed it |
| very important.</p> |
| <h2>Arrow Flight notes</h2> |
| <p>A GetFlightSchema method is added to the Flight RPC protocol (ARROW-6094). |
| As the name suggests, it returns the schema for a given Flight descriptor |
| on the server. This is useful for cases where the Flight locations are |
| not immediately available, depending on the server implementation.</p> |
| <p>Flight implementations for C++ and Java now implement half-closed |
| semantics for DoPut (ARROW-6063). The client can close the writing |
| end of the stream to signal that it has finished sending the Flight |
| data, but still receive the batch-specific response and its associated |
| metadata.</p> |
| <h2>C++ notes</h2> |
| <p>C++ now supports the LargeList, LargeBinary and LargeString datatypes.</p> |
| <p>The Status class gains the ability to carry additional subsystem-specific |
| data with it, under the form of an opaque StatusDetail interface (ARROW-4036). |
| This allows, for example, to store not only an exception message coming from |
| Python but the actual Python exception object, such as to raise it again if |
| the Status is propagated back to Python. It can also enable the consumer |
| of Status to inspect the subsystem-specific error, such as a finer-grained |
| Flight error code.</p> |
| <p>DataType and Schema equality are significantly faster (ARROW-6038).</p> |
| <p>The Column class is completely removed, as it did not have a strong enough |
| motivation for existing between ChunkedArray and RecordBatch / Table |
| (ARROW-5893).</p> |
| <h3>C++: Parquet</h3> |
| <p>The 0.15 release includes many improvements to the Apache Parquet C++ internals, |
| resulting in greatly improved read and write performance. We described the work |
| and published some benchmarks in a <a href="https://arrow.apache.org/blog/2019/09/05/faster-strings-cpp-parquet/">recent blog post</a>.</p> |
| <h3>C++: CSV reader</h3> |
| <p>The CSV reader is now more flexible in terms of how column names are chosen |
| (ARROW-6231) and column selection (ARROW-5977).</p> |
| <h3>C++: Memory Allocation Layer</h3> |
| <p>Arrow now has the option to allocate memory using the mimalloc memory |
| allocator. jemalloc is still preferred for best performance, but mimalloc |
| is a reasonable alternative to the system allocator on Windows where jemalloc |
| is not currently supported.</p> |
| <p>Also, we now expose explicit global functions to get a MemoryPool for each |
| of the jemalloc allocator, mimalloc allocator and system allocator (ARROW-6292).</p> |
| <p>The vendored jemalloc version is bumped from 4.5.x to 5.2.x (ARROW-6549). |
| Performance characteristics may differ on memory allocation-heavy workloads, |
| though we did not notice any significant regression on our suite of |
| micro-benchmarks (and a multi-threaded benchmark of reading a CSV file |
| showed a 25% speedup).</p> |
| <h3>C++: Filesystem layer</h3> |
| <p>A FileSystem implementation to access Amazon S3-compatible filesystems is now |
| available. It depends on the AWS SDK for C++.</p> |
| <h3>C++: I/O layer</h3> |
| <p>Significant improvements were made to the Arrow I/O stack.</p> |
| <ul> |
| <li>ARROW-6180: Add RandomAccessFile::GetStream that returns an InputStream over |
| a fixed subset of the file.</li> |
| <li>ARROW-6381: Improve performance of small writes with BufferOutputStream</li> |
| <li>ARROW-2490: Streamline concurrency semantics of InputStream implementations, |
| and add debug checks for race conditions between non-thread-safe InputStream |
| operations.</li> |
| <li>ARROW-6527: Add an OutputStream::Write overload that takes an owned Buffer |
| rather than a raw memory area. This allows OutputStream implementations |
| to safely implement delayed writing without having to copy the data.</li> |
| </ul> |
| <h3>C++: Tensors</h3> |
| <p>There are three improvements of Tensor and SparseTensor in this release.</p> |
| <ul> |
| <li>Add Tensor::Value template function for element access</li> |
| <li>Add EqualOptions support in Tensor::Equals function, that allows us to control the way to compare two float tensors</li> |
| <li>Add smaller bit-width index supports in SparseTensor</li> |
| </ul> |
| <h2>C# Notes</h2> |
| <p>We have fixed some bugs causing incompatibilities between C# and other Arrow |
| implementations.</p> |
| <h2>Java notes</h2> |
| <ul> |
| <li>Added an initial version of an Avro adapter</li> |
| <li>To improve the JDBC adapter performance, refactored consume data logic and |
| implemented an iterator API to prevent loading all data into one vector</li> |
| <li>Implemented subField encoding for complex type, now List and Struct vectors |
| subField encoding is available</li> |
| <li>Implemented visitor API for vector/range/type/approx equals compare</li> |
| <li>Performed a lot of optimization and refactoring for DictionaryEncoder, |
| supporting all data types and avoiding memory copy via hash table and visitor |
| API</li> |
| <li>Introduced <a href="https://github.com/google/error-prone" target="_blank" rel="noopener">Error Prone</a> into code base to catch more potential errors |
| earlier</li> |
| <li>Fixed the bug where dictionary entries were required in IPC streams even when |
| empty; readers can now also read interleaved messages</li> |
| </ul> |
| <h2>Python notes</h2> |
| <p>The FileSystem API, implemented in C++, is now available in Python (ARROW-5494).</p> |
| <p>The API for extension types has been straightened and definition of |
| custom extension types in Python is now more powerful (ARROW-5610).</p> |
| <p>Sparse tensors are now available in Python (ARROW-4453).</p> |
| <p>A potential crash when handling Python dates and datetimes was fixed |
| (ARROW-6597).</p> |
| <p>Based on a mailing list discussion, we are looking for help with maintaining |
| our Python wheels. Community members have found that the wheels take up a great |
| deal of maintenance time, so if you or your organization depend on <code>pip install pyarrow</code> working, we would appreciate your assistance.</p> |
| <h2>Ruby and C GLib notes</h2> |
| <p>Ruby and C GLib continues to follow the features in the C++ project. |
| Ruby includes the following backward incompatible changes.</p> |
| <ul> |
| <li>Remove Arrow::Struct and use Hash instead.</li> |
| <li>Add Arrow::Time for Arrow::Time{32,64}DataType value.</li> |
| <li>Arrow::Decimal128Array#get_value returns BigDecimal.</li> |
| </ul> |
| <p>Ruby improves the performance of Arrow#values.</p> |
| <h2>Rust notes</h2> |
| <p>A number of core Arrow improvements were made to the Rust library.</p> |
| <ul> |
| <li>Add explicit SIMD vectorization for the divide kernel</li> |
| <li>Add a feature to disable SIMD</li> |
| <li>Use "if cfg!" pattern</li> |
| <li>Optimizations to BooleanBufferBuilder::append_slice</li> |
| <li>Implemented Debug trait for List/Struct/BinaryArray</li> |
| </ul> |
| <p>Improvements related to Rust Parquet and DataFusion are detailed next.</p> |
| <h3>Rust Parquet</h3> |
| <ul> |
| <li>Implement Arrow record reader</li> |
| <li>Add converter that is used to convert record reader's content to arrow primitive array.</li> |
| </ul> |
| <h3>Rust DataFusion</h3> |
| <ul> |
| <li>Preview of new query execution engine using an extensible trait-based |
| physical execution plan that supports parallel execution using threads</li> |
| <li>ExecutionContext now has a register_parquet convenience method for |
| registering Parquet data sources</li> |
| <li>Fixed bug in type coercion optimizer rule</li> |
| <li>TableProvider.scan() now returns a thread-safe BatchIterator</li> |
| <li>Remove use of bare trait objects (switched to using dyn syntax)</li> |
| <li>Adds casting from unsigned to signed integer data types</li> |
| </ul> |
| <h2>R notes</h2> |
| <p>A major development since the 0.14 release was the arrival of the <code>arrow</code> R |
| package on <a href="https://cran.r-project.org/package=arrow" target="_blank" rel="noopener">CRAN</a>. We wrote about this in August on the <a href="https://arrow.apache.org/blog/2019/08/08/r-package-on-cran/">Arrow blog</a>. |
| In addition to the package availability on CRAN, we also published |
| <a href="https://arrow.apache.org/docs/r">package documentation</a> on the Arrow website.</p> |
| <p>The 0.15 R package includes many of the enhancements in the C++ library |
| release, such as the Parquet performance improvements and the FileSystem API. |
| In addition, there are a number of upgrades that make it easier to read and |
| write data, specify types and schema, and interact with Arrow tables and record |
| batches in R.</p> |
| <p>For more on what's in the 0.15 R package, see the <a href="http://arrow.apache.org/docs/r/news/">changelog</a>.</p> |
| <h2>Community Discussions Ongoing</h2> |
| <p>There are a number of active discussions ongoing on the developer |
| <a href="mailto:dev@arrow.apache.org">dev@arrow.apache.org</a> mailing list. We look forward to hearing from the |
| community there.</p> |
| |
| </main> |
| </div> |
| |
| <hr> |
| <footer class="footer"> |
| <div class="row"> |
| <div class="col-md-9"> |
| <p>Apache Arrow, Arrow, Apache, the Apache logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p> |
| <p>© 2016-2025 The Apache Software Foundation</p> |
| </div> |
| <div class="col-md-3"> |
| <a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener"> |
| <img src="https://www.apache.org/events/current-event-234x60.png"> |
| </a> |
| </div> |
| </div> |
| </footer> |
| |
| </div> |
| </body> |
| </html> |