blob: 9567ca6f54b3d1c359cefe42b2e1f38a71271a55 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>Apache Arrow 7.0.0 Release | Apache Arrow</title>
<!-- Begin Jekyll SEO tag v2.8.0 -->
<meta name="generator" content="Jekyll v4.4.1" />
<meta property="og:title" content="Apache Arrow 7.0.0 Release" />
<meta name="author" content="pmc" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="The Apache Arrow team is pleased to announce the 7.0.0 release. This covers over 3 months of development work and includes 617 resolved issues from 105 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Community Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited to be committers. Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC). Thanks for your contributions and participation in the project! Arrow Flight RPC notes The Flight specification has been clarified to note that schemas are expected to be IPC-encapsulated on the wire. Documentation has been generally improved; see the Arrow Cookbook for recipes on how to use Flight in Python and R, and a new example on how to use Flight and gRPC services on the same port. This release includes Arrow Flight SQL, a protocol for using Arrow Flight to execute queries against and fetch metadata from SQL databases. Support is included for C++ and Java (but not languages that bind to C++, like Python or R). A more detailed blog post is forthcoming (EDIT 2022/02/16: see the Flight SQL announcement). Note that development is ongoing and the specification is currently experimental. C++ notes A set of CMake presets has been added to ease building Arrow in a number of cases (ARROW-14678, ARROW-14714). The arrow::BitUtil namespace has been renamed to arrow::bit_util (ARROW-13494). Concatenation of union arrays is now supported (ARROW-4975). StructType gained three convenience methods to add, change and remove a given field (ARROW-11424). The Datum kind COLLECTION has been removed as it was entirely unused in the codebase (ARROW-13598). Compute Layer A number of compute functions have been added: functions operating on strings: &quot;binary_reverse&quot; (ARROW-14306), &quot;string_repeat&quot; (ARROW-12712), &quot;utf8_normalize&quot; (ARROW-14205); &quot;fill_null_forward&quot;, &quot;fill_null_backward&quot; (ARROW-1699); &quot;ceil_temporal&quot;, &quot;floor_temporal&quot;, &quot;round_temporal&quot; to adjust temporal input to an integral multiple of a given unit (ARROW-14822); &quot;year_month_day&quot; to extract the calendar components of the input (ARROW-15032); &quot;random&quot; to general random floating-point values between 0 and 1 (ARROW-12404); &quot;indices_nonzero&quot; to return the indices in the input where there are non-zero, non-null values (ARROW-13035). Decimal data is now supported as input of the arithmetic kernels (ARROW-13130). Dictionary data is now supported as input of the hash join execution node (ARROW-14181). Residual predicates have been implemented in the hash join node (ARROW-13643). The &quot;list_parent_indices&quot; function now always returns int64 data regardless of the input type (ARROW-14592). Month-day-nano interval data is now supported as input of the same functions as other interval types (ARROW-13989). CSV The CSV writer got additional configuration options: the string representation of null values (ARROW-14905); the quoting strategy: always / never / as needed (ARROW-14905); the end of line character(s) (ARROW-14907) Dataset Layer Skyhook, a dataset addition that offloads fragment scan operations to a Ceph distributed storage cluster, was contributed (ARROW-13607). The dataset writer now exposes options min_rows_per_group and max_rows_per_group to control the size of row groups created (ARROW-14426). IO and Filesystem Layer A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart uploads has been circumvented (ARROW-14523). The Google Cloud Storage filesystem is now featureful enough to pass all generic filesystem tests (ARROW-14924). The OpenAppendStream method of filesystems has been un-deprecated; however, it still cannot be implemented for all filesystem backends (ARROW-14969). A new function arrow::fs::ResolveS3BucketRegion allows resolving the region where a particular S3 bucket resides (ARROW-15165). The S3 filesystem now sets the Content-Type of output files to &quot;application/octet-stream&quot; (instead of &quot;application/xml&quot; previously) if not explicitly specified by the caller (ARROW-15306). IPC Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683) and asynchronous (ARROW-14577) IPC reader. It is now possible to set the compression level when using LZ4 compression (ARROW-9648). ORC The ORC adapters have been significantly improved. A lot more properties of the ORC reader as well as ORC writer options are now available. Moreover API docs for both the ORC reader and the ORC writer have been generated. (ARROW-11297) Parquet DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to) bytearray columns in Parquet files (PARQUET-492). Go notes Arrow Bug Fixes License lifted up a level so that it is properly detected for the github.com/apache/arrow/go/v7 module for pkg.go.dev ARROW-14728. Documentation on pkg.go.dev will look correct with complete major version handling as of the v7.0.0 release. Errors from MessageReader.Message get properly surfaced by Reader.Read ARROW-14769 ipc.Reader properly uses the allocator it is initialized with instead of making native byte slices ARROW-14717 Fixed a CI issue where the CGO tests were crashing on windows ARROW-14589 Various fixes for internal usages of Release and Retain to maintain proper management of reference counting. Enhancements Continuous Integration for Go library now uses Go1.16 as the version being tested ARROW-14985 ValueOffsets function added to array.String to return the entire slice of offsets ARROW-14645 array.Interface has been lifted to arrow.Array, array.{Record,Column,Chunked,Table} have been lifted to arrow.{Record,Column,Chunked,Table}. Interface arrow.ArrayData has been created to be used instead of array.Data. Aliases have been provided for the array package so existing code that doesn&#39;t directly use array.Data shouldn&#39;t be affected. The aliases will be removed in v8. ARROW-5599. The Chunked.NewSlice method has been removed and is replaced by the array.NewChunkedSlice function. Arrays and Records now support marshalling to JSON via the json.Marshaller interface. Builders support adding values to them by unmarshalling from JSON via the json.Unmarshaller interface. array.FromJSON function added to create Arrays from JSON directly. ARROW-9630 Basic handling of field referencing and expression building similar to the C++ Compute APIs added through the new compute package in preparation for adding compute interfaces. Does not yet allow executing expressions. ARROW-14430 Parquet Enhancements Updated dependency versions ARROW-14462 file module added, Go Parquet library now supports full file reading and writing. ARROW-13984 ARROW-13986. Does not yet provide direct Parquet &lt;--&gt; Arrow conversions. Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, gaining a 4x - 6x performance improvement. ARROW-15536 Java notes Flight SQL support is now available in the Java library, with integration tests to verify it against the C++ reference implementation. GeneralOutOfPlaceVectorSorter is now available for sorting any kind of vector. In general if dedicated sorters can be used (like FixedWidthInPlaceVectorSorter) they should preferred as they will generally perform better. log4j2 dependency was removed as it was unused and a possible vector for attacks VectorSchemaRootAppender now works with BitVector JavaScript notes Major simplifications to the API. There is only a single Vector class now. See the (also much improved) docs for details. Dictionary vectors created with vectorFromArray are automatically cached for better performance. Better tree shaking support. Some bundles can now be only a few kb. Python notes Official support for Python 3.6 has been dropped. random and indices_nonzero compute functions are now supported in Python pyarrow.orc.read_table is now provided to easily read the content of ORC files to a Table. pyarrow.orc.ORCFile now has a lot more properties exposed. pyarrow.orc.ORCWriter and pyarrow.orc.write_table now have the writer options available. pyarrow.orc now has much better API documentation. Support for compute functions arguments and options has been improved in general, arguments are not position only, while options can be provided as keyword args or not, and error reporting for wrong arguments has been improved. Table now has a group_by method that allows to perform aggregations on table data. The compute functions documentation has also been improved to better distinguish between standard compute functions and HASH_AGGREGATE compute functions that can only be using for aggregations. Python documentation now provides interlinking for references to parameter types and return values, thus making far easier to navigate the documentation. R notes This release adds additional improvements to the dplyr interface, to CSV support, and to the C-Data interface to exchange data with other languages. For more details, see the complete R changelog. Ruby and C GLib notes Ruby There are two new contributors @okadakk and @simpl1g . The updates of Red Arrow consists of the following improvements: Arrow::Function#execute now accepts an instance of an Arrow::Column as its argument (ARROW-14551) Arrow::Table.load now supports .arrows files to load (ARROW-15356) Add support loading Arrow::Table by a URI in Arrow::Table.load (ARROW-14562) Arrow::Table now supports to join two tables (ARROW-14531) Arrow::Function#execute gets more easier to use than before (ARROW-15274) Arrow::SortKey#name has been renamed to Arrow::SortKey#target (ARROW-14784) Add Cookbook section to documentation (ARROW-14636) Support the explicit initialization of S3 API by the Arrow.s3_initialize method (ARROW-14637) On macOS, stop specifying the version of openssl package explicitly when building the extension library (ARROW-14619) C GLib The updates of Arrow GLib etc. consists of the following improvements: Add garrow_execute_plan_build_hash_join_node function, GArrowHashJoinNodeOption, and GArrowJoinType (ARROW-15288) Add garrow_function_get_options_type function (ARROW-15273) Add garrow_function_get_default_options function (ARROW-15267) Add GArrowRoundToMultipleOptions to customize the round_to_multiple function (ARROW-15216) Add garrow_function_all function to list up all the functions (ARROW-15205) In addition, add garrow_function_get_name, garrow_function_equal, and garrow_function_to_string functions for convenience Add GArrowRoundOptions (ARROW-15204) Add garrow_struct_scalar_get_value function for converting a C++ scalar value to a GLib value (ARROW-15203) Add the following three interval data types (ARROW-15134) GArrowMonthIntervalDataType for the interval with the month component GArrowDayTimeIntervalDataType for the interval with the days and the milliseconds components GArrowMonthDayNanoIntervalDataType for the interval with the months, the days, and the nanoseconds components Rename GArrowSortKey::name to ::target (ARROW-14784) Support the explicit initialization of S3 API by the garrow_s3_initialize function (ARROW-14637) garrow_decimal128_new_string and garrow_decimal256_new_string now returns errors when they gets a invalid decimal string (ARROW-14530) garrow_decimal128_data_type_new and garrow_decimal256_data_type_new functions now validates the given precision (ARROW-14529) Rust notes Rust releases minor versions every 2 weeks in addition to a major version with the rest of the Arrow language implementations. Thus most enhancements have been incrementally released over the last 3 months as part of the 6.x. Going forward, the Rust implementation version will start deviating from the rest of the Arrow implementations, incrementing a major version if the changes to the crate require it. We still plan a release every other week. Please see issue #1120 for more details Major changes in the 7.0.0 release include: Additional support for Decimal More ergonomic compute kernels that take dyn Array Union type now follows the latest Arrow standard Support for custom datetime format for inference and parsing CSV files Another highlight is that the community continues to improve the safety of the arrow crate. The 6.4.0 release included complete data validation and has resolved all outstanding RUSTSEC issues against the crate. For additional details on the 7.0.0 Rust implementation, please see the Arrow Rust CHANGELOG" />
<meta property="og:description" content="The Apache Arrow team is pleased to announce the 7.0.0 release. This covers over 3 months of development work and includes 617 resolved issues from 105 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Community Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited to be committers. Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC). Thanks for your contributions and participation in the project! Arrow Flight RPC notes The Flight specification has been clarified to note that schemas are expected to be IPC-encapsulated on the wire. Documentation has been generally improved; see the Arrow Cookbook for recipes on how to use Flight in Python and R, and a new example on how to use Flight and gRPC services on the same port. This release includes Arrow Flight SQL, a protocol for using Arrow Flight to execute queries against and fetch metadata from SQL databases. Support is included for C++ and Java (but not languages that bind to C++, like Python or R). A more detailed blog post is forthcoming (EDIT 2022/02/16: see the Flight SQL announcement). Note that development is ongoing and the specification is currently experimental. C++ notes A set of CMake presets has been added to ease building Arrow in a number of cases (ARROW-14678, ARROW-14714). The arrow::BitUtil namespace has been renamed to arrow::bit_util (ARROW-13494). Concatenation of union arrays is now supported (ARROW-4975). StructType gained three convenience methods to add, change and remove a given field (ARROW-11424). The Datum kind COLLECTION has been removed as it was entirely unused in the codebase (ARROW-13598). Compute Layer A number of compute functions have been added: functions operating on strings: &quot;binary_reverse&quot; (ARROW-14306), &quot;string_repeat&quot; (ARROW-12712), &quot;utf8_normalize&quot; (ARROW-14205); &quot;fill_null_forward&quot;, &quot;fill_null_backward&quot; (ARROW-1699); &quot;ceil_temporal&quot;, &quot;floor_temporal&quot;, &quot;round_temporal&quot; to adjust temporal input to an integral multiple of a given unit (ARROW-14822); &quot;year_month_day&quot; to extract the calendar components of the input (ARROW-15032); &quot;random&quot; to general random floating-point values between 0 and 1 (ARROW-12404); &quot;indices_nonzero&quot; to return the indices in the input where there are non-zero, non-null values (ARROW-13035). Decimal data is now supported as input of the arithmetic kernels (ARROW-13130). Dictionary data is now supported as input of the hash join execution node (ARROW-14181). Residual predicates have been implemented in the hash join node (ARROW-13643). The &quot;list_parent_indices&quot; function now always returns int64 data regardless of the input type (ARROW-14592). Month-day-nano interval data is now supported as input of the same functions as other interval types (ARROW-13989). CSV The CSV writer got additional configuration options: the string representation of null values (ARROW-14905); the quoting strategy: always / never / as needed (ARROW-14905); the end of line character(s) (ARROW-14907) Dataset Layer Skyhook, a dataset addition that offloads fragment scan operations to a Ceph distributed storage cluster, was contributed (ARROW-13607). The dataset writer now exposes options min_rows_per_group and max_rows_per_group to control the size of row groups created (ARROW-14426). IO and Filesystem Layer A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart uploads has been circumvented (ARROW-14523). The Google Cloud Storage filesystem is now featureful enough to pass all generic filesystem tests (ARROW-14924). The OpenAppendStream method of filesystems has been un-deprecated; however, it still cannot be implemented for all filesystem backends (ARROW-14969). A new function arrow::fs::ResolveS3BucketRegion allows resolving the region where a particular S3 bucket resides (ARROW-15165). The S3 filesystem now sets the Content-Type of output files to &quot;application/octet-stream&quot; (instead of &quot;application/xml&quot; previously) if not explicitly specified by the caller (ARROW-15306). IPC Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683) and asynchronous (ARROW-14577) IPC reader. It is now possible to set the compression level when using LZ4 compression (ARROW-9648). ORC The ORC adapters have been significantly improved. A lot more properties of the ORC reader as well as ORC writer options are now available. Moreover API docs for both the ORC reader and the ORC writer have been generated. (ARROW-11297) Parquet DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to) bytearray columns in Parquet files (PARQUET-492). Go notes Arrow Bug Fixes License lifted up a level so that it is properly detected for the github.com/apache/arrow/go/v7 module for pkg.go.dev ARROW-14728. Documentation on pkg.go.dev will look correct with complete major version handling as of the v7.0.0 release. Errors from MessageReader.Message get properly surfaced by Reader.Read ARROW-14769 ipc.Reader properly uses the allocator it is initialized with instead of making native byte slices ARROW-14717 Fixed a CI issue where the CGO tests were crashing on windows ARROW-14589 Various fixes for internal usages of Release and Retain to maintain proper management of reference counting. Enhancements Continuous Integration for Go library now uses Go1.16 as the version being tested ARROW-14985 ValueOffsets function added to array.String to return the entire slice of offsets ARROW-14645 array.Interface has been lifted to arrow.Array, array.{Record,Column,Chunked,Table} have been lifted to arrow.{Record,Column,Chunked,Table}. Interface arrow.ArrayData has been created to be used instead of array.Data. Aliases have been provided for the array package so existing code that doesn&#39;t directly use array.Data shouldn&#39;t be affected. The aliases will be removed in v8. ARROW-5599. The Chunked.NewSlice method has been removed and is replaced by the array.NewChunkedSlice function. Arrays and Records now support marshalling to JSON via the json.Marshaller interface. Builders support adding values to them by unmarshalling from JSON via the json.Unmarshaller interface. array.FromJSON function added to create Arrays from JSON directly. ARROW-9630 Basic handling of field referencing and expression building similar to the C++ Compute APIs added through the new compute package in preparation for adding compute interfaces. Does not yet allow executing expressions. ARROW-14430 Parquet Enhancements Updated dependency versions ARROW-14462 file module added, Go Parquet library now supports full file reading and writing. ARROW-13984 ARROW-13986. Does not yet provide direct Parquet &lt;--&gt; Arrow conversions. Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, gaining a 4x - 6x performance improvement. ARROW-15536 Java notes Flight SQL support is now available in the Java library, with integration tests to verify it against the C++ reference implementation. GeneralOutOfPlaceVectorSorter is now available for sorting any kind of vector. In general if dedicated sorters can be used (like FixedWidthInPlaceVectorSorter) they should preferred as they will generally perform better. log4j2 dependency was removed as it was unused and a possible vector for attacks VectorSchemaRootAppender now works with BitVector JavaScript notes Major simplifications to the API. There is only a single Vector class now. See the (also much improved) docs for details. Dictionary vectors created with vectorFromArray are automatically cached for better performance. Better tree shaking support. Some bundles can now be only a few kb. Python notes Official support for Python 3.6 has been dropped. random and indices_nonzero compute functions are now supported in Python pyarrow.orc.read_table is now provided to easily read the content of ORC files to a Table. pyarrow.orc.ORCFile now has a lot more properties exposed. pyarrow.orc.ORCWriter and pyarrow.orc.write_table now have the writer options available. pyarrow.orc now has much better API documentation. Support for compute functions arguments and options has been improved in general, arguments are not position only, while options can be provided as keyword args or not, and error reporting for wrong arguments has been improved. Table now has a group_by method that allows to perform aggregations on table data. The compute functions documentation has also been improved to better distinguish between standard compute functions and HASH_AGGREGATE compute functions that can only be using for aggregations. Python documentation now provides interlinking for references to parameter types and return values, thus making far easier to navigate the documentation. R notes This release adds additional improvements to the dplyr interface, to CSV support, and to the C-Data interface to exchange data with other languages. For more details, see the complete R changelog. Ruby and C GLib notes Ruby There are two new contributors @okadakk and @simpl1g . The updates of Red Arrow consists of the following improvements: Arrow::Function#execute now accepts an instance of an Arrow::Column as its argument (ARROW-14551) Arrow::Table.load now supports .arrows files to load (ARROW-15356) Add support loading Arrow::Table by a URI in Arrow::Table.load (ARROW-14562) Arrow::Table now supports to join two tables (ARROW-14531) Arrow::Function#execute gets more easier to use than before (ARROW-15274) Arrow::SortKey#name has been renamed to Arrow::SortKey#target (ARROW-14784) Add Cookbook section to documentation (ARROW-14636) Support the explicit initialization of S3 API by the Arrow.s3_initialize method (ARROW-14637) On macOS, stop specifying the version of openssl package explicitly when building the extension library (ARROW-14619) C GLib The updates of Arrow GLib etc. consists of the following improvements: Add garrow_execute_plan_build_hash_join_node function, GArrowHashJoinNodeOption, and GArrowJoinType (ARROW-15288) Add garrow_function_get_options_type function (ARROW-15273) Add garrow_function_get_default_options function (ARROW-15267) Add GArrowRoundToMultipleOptions to customize the round_to_multiple function (ARROW-15216) Add garrow_function_all function to list up all the functions (ARROW-15205) In addition, add garrow_function_get_name, garrow_function_equal, and garrow_function_to_string functions for convenience Add GArrowRoundOptions (ARROW-15204) Add garrow_struct_scalar_get_value function for converting a C++ scalar value to a GLib value (ARROW-15203) Add the following three interval data types (ARROW-15134) GArrowMonthIntervalDataType for the interval with the month component GArrowDayTimeIntervalDataType for the interval with the days and the milliseconds components GArrowMonthDayNanoIntervalDataType for the interval with the months, the days, and the nanoseconds components Rename GArrowSortKey::name to ::target (ARROW-14784) Support the explicit initialization of S3 API by the garrow_s3_initialize function (ARROW-14637) garrow_decimal128_new_string and garrow_decimal256_new_string now returns errors when they gets a invalid decimal string (ARROW-14530) garrow_decimal128_data_type_new and garrow_decimal256_data_type_new functions now validates the given precision (ARROW-14529) Rust notes Rust releases minor versions every 2 weeks in addition to a major version with the rest of the Arrow language implementations. Thus most enhancements have been incrementally released over the last 3 months as part of the 6.x. Going forward, the Rust implementation version will start deviating from the rest of the Arrow implementations, incrementing a major version if the changes to the crate require it. We still plan a release every other week. Please see issue #1120 for more details Major changes in the 7.0.0 release include: Additional support for Decimal More ergonomic compute kernels that take dyn Array Union type now follows the latest Arrow standard Support for custom datetime format for inference and parsing CSV files Another highlight is that the community continues to improve the safety of the arrow crate. The 6.4.0 release included complete data validation and has resolved all outstanding RUSTSEC issues against the crate. For additional details on the 7.0.0 Rust implementation, please see the Arrow Rust CHANGELOG" />
<link rel="canonical" href="https://arrow.apache.org/blog/2022/02/08/7.0.0-release/" />
<meta property="og:url" content="https://arrow.apache.org/blog/2022/02/08/7.0.0-release/" />
<meta property="og:site_name" content="Apache Arrow" />
<meta property="og:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2022-02-08T01:00:00-05:00" />
<meta name="twitter:card" content="summary_large_image" />
<meta property="twitter:image" content="https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png" />
<meta property="twitter:title" content="Apache Arrow 7.0.0 Release" />
<script type="application/ld+json">
{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"pmc"},"dateModified":"2022-02-08T01:00:00-05:00","datePublished":"2022-02-08T01:00:00-05:00","description":"The Apache Arrow team is pleased to announce the 7.0.0 release. This covers over 3 months of development work and includes 617 resolved issues from 105 distinct contributors. See the Install Page to learn how to get the libraries for your platform. The release notes below are not exhaustive and only expose selected highlights of the release. Many other bugfixes and improvements have been made: we refer you to the complete changelog. Community Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited to be committers. Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC). Thanks for your contributions and participation in the project! Arrow Flight RPC notes The Flight specification has been clarified to note that schemas are expected to be IPC-encapsulated on the wire. Documentation has been generally improved; see the Arrow Cookbook for recipes on how to use Flight in Python and R, and a new example on how to use Flight and gRPC services on the same port. This release includes Arrow Flight SQL, a protocol for using Arrow Flight to execute queries against and fetch metadata from SQL databases. Support is included for C++ and Java (but not languages that bind to C++, like Python or R). A more detailed blog post is forthcoming (EDIT 2022/02/16: see the Flight SQL announcement). Note that development is ongoing and the specification is currently experimental. C++ notes A set of CMake presets has been added to ease building Arrow in a number of cases (ARROW-14678, ARROW-14714). The arrow::BitUtil namespace has been renamed to arrow::bit_util (ARROW-13494). Concatenation of union arrays is now supported (ARROW-4975). StructType gained three convenience methods to add, change and remove a given field (ARROW-11424). The Datum kind COLLECTION has been removed as it was entirely unused in the codebase (ARROW-13598). Compute Layer A number of compute functions have been added: functions operating on strings: &quot;binary_reverse&quot; (ARROW-14306), &quot;string_repeat&quot; (ARROW-12712), &quot;utf8_normalize&quot; (ARROW-14205); &quot;fill_null_forward&quot;, &quot;fill_null_backward&quot; (ARROW-1699); &quot;ceil_temporal&quot;, &quot;floor_temporal&quot;, &quot;round_temporal&quot; to adjust temporal input to an integral multiple of a given unit (ARROW-14822); &quot;year_month_day&quot; to extract the calendar components of the input (ARROW-15032); &quot;random&quot; to general random floating-point values between 0 and 1 (ARROW-12404); &quot;indices_nonzero&quot; to return the indices in the input where there are non-zero, non-null values (ARROW-13035). Decimal data is now supported as input of the arithmetic kernels (ARROW-13130). Dictionary data is now supported as input of the hash join execution node (ARROW-14181). Residual predicates have been implemented in the hash join node (ARROW-13643). The &quot;list_parent_indices&quot; function now always returns int64 data regardless of the input type (ARROW-14592). Month-day-nano interval data is now supported as input of the same functions as other interval types (ARROW-13989). CSV The CSV writer got additional configuration options: the string representation of null values (ARROW-14905); the quoting strategy: always / never / as needed (ARROW-14905); the end of line character(s) (ARROW-14907) Dataset Layer Skyhook, a dataset addition that offloads fragment scan operations to a Ceph distributed storage cluster, was contributed (ARROW-13607). The dataset writer now exposes options min_rows_per_group and max_rows_per_group to control the size of row groups created (ARROW-14426). IO and Filesystem Layer A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart uploads has been circumvented (ARROW-14523). The Google Cloud Storage filesystem is now featureful enough to pass all generic filesystem tests (ARROW-14924). The OpenAppendStream method of filesystems has been un-deprecated; however, it still cannot be implemented for all filesystem backends (ARROW-14969). A new function arrow::fs::ResolveS3BucketRegion allows resolving the region where a particular S3 bucket resides (ARROW-15165). The S3 filesystem now sets the Content-Type of output files to &quot;application/octet-stream&quot; (instead of &quot;application/xml&quot; previously) if not explicitly specified by the caller (ARROW-15306). IPC Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683) and asynchronous (ARROW-14577) IPC reader. It is now possible to set the compression level when using LZ4 compression (ARROW-9648). ORC The ORC adapters have been significantly improved. A lot more properties of the ORC reader as well as ORC writer options are now available. Moreover API docs for both the ORC reader and the ORC writer have been generated. (ARROW-11297) Parquet DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to) bytearray columns in Parquet files (PARQUET-492). Go notes Arrow Bug Fixes License lifted up a level so that it is properly detected for the github.com/apache/arrow/go/v7 module for pkg.go.dev ARROW-14728. Documentation on pkg.go.dev will look correct with complete major version handling as of the v7.0.0 release. Errors from MessageReader.Message get properly surfaced by Reader.Read ARROW-14769 ipc.Reader properly uses the allocator it is initialized with instead of making native byte slices ARROW-14717 Fixed a CI issue where the CGO tests were crashing on windows ARROW-14589 Various fixes for internal usages of Release and Retain to maintain proper management of reference counting. Enhancements Continuous Integration for Go library now uses Go1.16 as the version being tested ARROW-14985 ValueOffsets function added to array.String to return the entire slice of offsets ARROW-14645 array.Interface has been lifted to arrow.Array, array.{Record,Column,Chunked,Table} have been lifted to arrow.{Record,Column,Chunked,Table}. Interface arrow.ArrayData has been created to be used instead of array.Data. Aliases have been provided for the array package so existing code that doesn&#39;t directly use array.Data shouldn&#39;t be affected. The aliases will be removed in v8. ARROW-5599. The Chunked.NewSlice method has been removed and is replaced by the array.NewChunkedSlice function. Arrays and Records now support marshalling to JSON via the json.Marshaller interface. Builders support adding values to them by unmarshalling from JSON via the json.Unmarshaller interface. array.FromJSON function added to create Arrays from JSON directly. ARROW-9630 Basic handling of field referencing and expression building similar to the C++ Compute APIs added through the new compute package in preparation for adding compute interfaces. Does not yet allow executing expressions. ARROW-14430 Parquet Enhancements Updated dependency versions ARROW-14462 file module added, Go Parquet library now supports full file reading and writing. ARROW-13984 ARROW-13986. Does not yet provide direct Parquet &lt;--&gt; Arrow conversions. Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, gaining a 4x - 6x performance improvement. ARROW-15536 Java notes Flight SQL support is now available in the Java library, with integration tests to verify it against the C++ reference implementation. GeneralOutOfPlaceVectorSorter is now available for sorting any kind of vector. In general if dedicated sorters can be used (like FixedWidthInPlaceVectorSorter) they should preferred as they will generally perform better. log4j2 dependency was removed as it was unused and a possible vector for attacks VectorSchemaRootAppender now works with BitVector JavaScript notes Major simplifications to the API. There is only a single Vector class now. See the (also much improved) docs for details. Dictionary vectors created with vectorFromArray are automatically cached for better performance. Better tree shaking support. Some bundles can now be only a few kb. Python notes Official support for Python 3.6 has been dropped. random and indices_nonzero compute functions are now supported in Python pyarrow.orc.read_table is now provided to easily read the content of ORC files to a Table. pyarrow.orc.ORCFile now has a lot more properties exposed. pyarrow.orc.ORCWriter and pyarrow.orc.write_table now have the writer options available. pyarrow.orc now has much better API documentation. Support for compute functions arguments and options has been improved in general, arguments are not position only, while options can be provided as keyword args or not, and error reporting for wrong arguments has been improved. Table now has a group_by method that allows to perform aggregations on table data. The compute functions documentation has also been improved to better distinguish between standard compute functions and HASH_AGGREGATE compute functions that can only be using for aggregations. Python documentation now provides interlinking for references to parameter types and return values, thus making far easier to navigate the documentation. R notes This release adds additional improvements to the dplyr interface, to CSV support, and to the C-Data interface to exchange data with other languages. For more details, see the complete R changelog. Ruby and C GLib notes Ruby There are two new contributors @okadakk and @simpl1g . The updates of Red Arrow consists of the following improvements: Arrow::Function#execute now accepts an instance of an Arrow::Column as its argument (ARROW-14551) Arrow::Table.load now supports .arrows files to load (ARROW-15356) Add support loading Arrow::Table by a URI in Arrow::Table.load (ARROW-14562) Arrow::Table now supports to join two tables (ARROW-14531) Arrow::Function#execute gets more easier to use than before (ARROW-15274) Arrow::SortKey#name has been renamed to Arrow::SortKey#target (ARROW-14784) Add Cookbook section to documentation (ARROW-14636) Support the explicit initialization of S3 API by the Arrow.s3_initialize method (ARROW-14637) On macOS, stop specifying the version of openssl package explicitly when building the extension library (ARROW-14619) C GLib The updates of Arrow GLib etc. consists of the following improvements: Add garrow_execute_plan_build_hash_join_node function, GArrowHashJoinNodeOption, and GArrowJoinType (ARROW-15288) Add garrow_function_get_options_type function (ARROW-15273) Add garrow_function_get_default_options function (ARROW-15267) Add GArrowRoundToMultipleOptions to customize the round_to_multiple function (ARROW-15216) Add garrow_function_all function to list up all the functions (ARROW-15205) In addition, add garrow_function_get_name, garrow_function_equal, and garrow_function_to_string functions for convenience Add GArrowRoundOptions (ARROW-15204) Add garrow_struct_scalar_get_value function for converting a C++ scalar value to a GLib value (ARROW-15203) Add the following three interval data types (ARROW-15134) GArrowMonthIntervalDataType for the interval with the month component GArrowDayTimeIntervalDataType for the interval with the days and the milliseconds components GArrowMonthDayNanoIntervalDataType for the interval with the months, the days, and the nanoseconds components Rename GArrowSortKey::name to ::target (ARROW-14784) Support the explicit initialization of S3 API by the garrow_s3_initialize function (ARROW-14637) garrow_decimal128_new_string and garrow_decimal256_new_string now returns errors when they gets a invalid decimal string (ARROW-14530) garrow_decimal128_data_type_new and garrow_decimal256_data_type_new functions now validates the given precision (ARROW-14529) Rust notes Rust releases minor versions every 2 weeks in addition to a major version with the rest of the Arrow language implementations. Thus most enhancements have been incrementally released over the last 3 months as part of the 6.x. Going forward, the Rust implementation version will start deviating from the rest of the Arrow implementations, incrementing a major version if the changes to the crate require it. We still plan a release every other week. Please see issue #1120 for more details Major changes in the 7.0.0 release include: Additional support for Decimal More ergonomic compute kernels that take dyn Array Union type now follows the latest Arrow standard Support for custom datetime format for inference and parsing CSV files Another highlight is that the community continues to improve the safety of the arrow crate. The 6.4.0 release included complete data validation and has resolved all outstanding RUSTSEC issues against the crate. For additional details on the 7.0.0 Rust implementation, please see the Arrow Rust CHANGELOG","headline":"Apache Arrow 7.0.0 Release","image":"https://arrow.apache.org/img/arrow-logo_horizontal_black-txt_white-bg.png","mainEntityOfPage":{"@type":"WebPage","@id":"https://arrow.apache.org/blog/2022/02/08/7.0.0-release/"},"publisher":{"@type":"Organization","logo":{"@type":"ImageObject","url":"https://arrow.apache.org/img/logo.png"},"name":"pmc"},"url":"https://arrow.apache.org/blog/2022/02/08/7.0.0-release/"}</script>
<!-- End Jekyll SEO tag -->
<!-- favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16.png" id="light1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32.png" id="light2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon.png" id="light3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120.png" id="light4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76.png" id="light5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60.png" id="light6">
<!-- dark mode favicons -->
<link rel="icon" type="image/png" sizes="16x16" href="/img/favicon-16x16-dark.png" id="dark1">
<link rel="icon" type="image/png" sizes="32x32" href="/img/favicon-32x32-dark.png" id="dark2">
<link rel="apple-touch-icon" type="image/png" sizes="180x180" href="/img/apple-touch-icon-dark.png" id="dark3">
<link rel="apple-touch-icon" type="image/png" sizes="120x120" href="/img/apple-touch-icon-120x120-dark.png" id="dark4">
<link rel="apple-touch-icon" type="image/png" sizes="76x76" href="/img/apple-touch-icon-76x76-dark.png" id="dark5">
<link rel="apple-touch-icon" type="image/png" sizes="60x60" href="/img/apple-touch-icon-60x60-dark.png" id="dark6">
<script>
// Switch to the dark-mode favicons if prefers-color-scheme: dark
function onUpdate() {
light1 = document.querySelector('link#light1');
light2 = document.querySelector('link#light2');
light3 = document.querySelector('link#light3');
light4 = document.querySelector('link#light4');
light5 = document.querySelector('link#light5');
light6 = document.querySelector('link#light6');
dark1 = document.querySelector('link#dark1');
dark2 = document.querySelector('link#dark2');
dark3 = document.querySelector('link#dark3');
dark4 = document.querySelector('link#dark4');
dark5 = document.querySelector('link#dark5');
dark6 = document.querySelector('link#dark6');
if (matcher.matches) {
light1.remove();
light2.remove();
light3.remove();
light4.remove();
light5.remove();
light6.remove();
document.head.append(dark1);
document.head.append(dark2);
document.head.append(dark3);
document.head.append(dark4);
document.head.append(dark5);
document.head.append(dark6);
} else {
dark1.remove();
dark2.remove();
dark3.remove();
dark4.remove();
dark5.remove();
dark6.remove();
document.head.append(light1);
document.head.append(light2);
document.head.append(light3);
document.head.append(light4);
document.head.append(light5);
document.head.append(light6);
}
}
matcher = window.matchMedia('(prefers-color-scheme: dark)');
matcher.addListener(onUpdate);
onUpdate();
</script>
<link href="/css/main.css" rel="stylesheet">
<link href="/css/syntax.css" rel="stylesheet">
<script src="/javascript/main.js"></script>
<!-- Matomo -->
<script>
var _paq = window._paq = window._paq || [];
/* tracker methods like "setCustomDimension" should be called before "trackPageView" */
/* We explicitly disable cookie tracking to avoid privacy issues */
_paq.push(['disableCookies']);
_paq.push(['trackPageView']);
_paq.push(['enableLinkTracking']);
(function() {
var u="https://analytics.apache.org/";
_paq.push(['setTrackerUrl', u+'matomo.php']);
_paq.push(['setSiteId', '20']);
var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
g.async=true; g.src=u+'matomo.js'; s.parentNode.insertBefore(g,s);
})();
</script>
<!-- End Matomo Code -->
<link type="application/atom+xml" rel="alternate" href="https://arrow.apache.org/feed.xml" title="Apache Arrow" />
</head>
<body class="wrap">
<header>
<nav class="navbar navbar-expand-md navbar-dark bg-dark">
<a class="navbar-brand no-padding" href="/"><img src="/img/arrow-inverse-300px.png" height="40px"></a>
<button class="navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#arrow-navbar" aria-controls="arrow-navbar" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse justify-content-end" id="arrow-navbar">
<ul class="nav navbar-nav">
<li class="nav-item"><a class="nav-link" href="/overview/" role="button" aria-haspopup="true" aria-expanded="false">Overview</a></li>
<li class="nav-item"><a class="nav-link" href="/faq/" role="button" aria-haspopup="true" aria-expanded="false">FAQ</a></li>
<li class="nav-item"><a class="nav-link" href="/blog" role="button" aria-haspopup="true" aria-expanded="false">Blog</a></li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownGetArrow" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Get Arrow
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownGetArrow">
<a class="dropdown-item" href="/install/">Install</a>
<a class="dropdown-item" href="/release/">Releases</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownDocumentation" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Docs
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownDocumentation">
<a class="dropdown-item" href="/docs">Project Docs</a>
<a class="dropdown-item" href="/docs/format/Columnar.html">Format</a>
<hr>
<a class="dropdown-item" href="/docs/c_glib">C GLib</a>
<a class="dropdown-item" href="/docs/cpp">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/csharp/README.md" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://godoc.org/github.com/apache/arrow/go/arrow" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="/docs/java">Java</a>
<a class="dropdown-item" href="/docs/js">JavaScript</a>
<a class="dropdown-item" href="/julia/">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/matlab/README.md" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="/docs/python">Python</a>
<a class="dropdown-item" href="/docs/r">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/blob/main/ruby/README.md" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://docs.rs/arrow/latest" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="/swift">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSource" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Source
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSource">
<a class="dropdown-item" href="https://github.com/apache/arrow" target="_blank" rel="noopener">Main Repo</a>
<hr>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/c_glib" target="_blank" rel="noopener">C GLib</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/cpp" target="_blank" rel="noopener">C++</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/csharp" target="_blank" rel="noopener">C#</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-go" target="_blank" rel="noopener">Go</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-java" target="_blank" rel="noopener">Java</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-js" target="_blank" rel="noopener">JavaScript</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-julia" target="_blank" rel="noopener">Julia</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/matlab" target="_blank" rel="noopener">MATLAB</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/python" target="_blank" rel="noopener">Python</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/r" target="_blank" rel="noopener">R</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/tree/main/ruby" target="_blank" rel="noopener">Ruby</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-rs" target="_blank" rel="noopener">Rust</a>
<a class="dropdown-item" href="https://github.com/apache/arrow-swift" target="_blank" rel="noopener">Swift</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownSubprojects" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Subprojects
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownSubprojects">
<a class="dropdown-item" href="/adbc">ADBC</a>
<a class="dropdown-item" href="/docs/format/Flight.html">Arrow Flight</a>
<a class="dropdown-item" href="/docs/format/FlightSql.html">Arrow Flight SQL</a>
<a class="dropdown-item" href="https://datafusion.apache.org" target="_blank" rel="noopener">DataFusion</a>
<a class="dropdown-item" href="/nanoarrow">nanoarrow</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownCommunity" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Community
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdownCommunity">
<a class="dropdown-item" href="/community/">Communication</a>
<a class="dropdown-item" href="/docs/developers/index.html">Contributing</a>
<a class="dropdown-item" href="https://github.com/apache/arrow/issues" target="_blank" rel="noopener">Issue Tracker</a>
<a class="dropdown-item" href="/committers/">Governance</a>
<a class="dropdown-item" href="/use_cases/">Use Cases</a>
<a class="dropdown-item" href="/powered_by/">Powered By</a>
<a class="dropdown-item" href="/visual_identity/">Visual Identity</a>
<a class="dropdown-item" href="/security/">Security</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/policies/conduct.html" target="_blank" rel="noopener">Code of Conduct</a>
</div>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdownASF" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
ASF Links
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdownASF">
<a class="dropdown-item" href="https://www.apache.org/" target="_blank" rel="noopener">ASF Website</a>
<a class="dropdown-item" href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">Donate</a>
<a class="dropdown-item" href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">Thanks</a>
<a class="dropdown-item" href="https://www.apache.org/security/" target="_blank" rel="noopener">Security</a>
</div>
</li>
</ul>
</div>
<!-- /.navbar-collapse -->
</nav>
</header>
<div class="container p-4 pt-5">
<div class="col-md-8 mx-auto">
<main role="main" class="pb-5">
<h1>
Apache Arrow 7.0.0 Release
</h1>
<hr class="mt-4 mb-3">
<p class="mb-4 pb-1">
<span class="badge badge-secondary">Published</span>
<span class="published mr-3">
08 Feb 2022
</span>
<br>
<span class="badge badge-secondary">By</span>
<a class="mr-3" href="https://arrow.apache.org">The Apache Arrow PMC (pmc) </a>
</p>
<!--
-->
<p>The Apache Arrow team is pleased to announce the 7.0.0 release. This covers
over 3 months of development work and includes <a href="https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20%3D%20Resolved%20AND%20fixVersion%20%3D%207.0.0" target="_blank" rel="noopener"><strong>617 resolved issues</strong></a>
from <a href="/release/7.0.0.html#contributors"><strong>105 distinct contributors</strong></a>. See the Install Page to learn how to
get the libraries for your platform.</p>
<p>The release notes below are not exhaustive and only expose selected highlights
of the release. Many other bugfixes and improvements have been made: we refer
you to the <a href="/release/7.0.0.html#changelog">complete changelog</a>.</p>
<h2>Community</h2>
<p>Since the 6.0.1 release, Rémi Dattai and Alessandro Molina have been invited to be committers.
Daniël Heres and Yibo Cai have joined the Project Management Committee (PMC).
Thanks for your contributions and participation in the project!</p>
<h2>Arrow Flight RPC notes</h2>
<p>The Flight specification has been clarified to note that schemas are expected to be IPC-encapsulated on the wire.</p>
<p>Documentation has been generally improved; see the <a href="https://arrow.apache.org/cookbook/">Arrow Cookbook</a> for recipes on how to use Flight in Python and R, and a new <a href="https://github.com/apache/arrow/blob/master/cpp/examples/arrow/flight_grpc_example.cc" target="_blank" rel="noopener">example</a> on how to use Flight and gRPC services on the same port.</p>
<p>This release includes Arrow Flight SQL, a protocol for using Arrow Flight to execute queries against and fetch metadata from SQL databases. Support is included for C++ and Java (but <em>not</em> languages that bind to C++, like Python or R). A more detailed blog post is forthcoming (<em>EDIT</em> 2022/02/16: see the <a href="/blog/2022/02/16/introducing-arrow-flight-sql/">Flight SQL announcement</a>). Note that development is ongoing and the specification is currently experimental.</p>
<h2>C++ notes</h2>
<p>A set of CMake presets has been added to ease building Arrow in a number
of cases (ARROW-14678, ARROW-14714).</p>
<p>The <code>arrow::BitUtil</code> namespace has been renamed to <code>arrow::bit_util</code>
(ARROW-13494).</p>
<p>Concatenation of union arrays is now supported (ARROW-4975).</p>
<p><code>StructType</code> gained three convenience methods to add, change and remove
a given field (ARROW-11424).</p>
<p>The <code>Datum</code> kind <code>COLLECTION</code> has been removed as it was entirely unused
in the codebase (ARROW-13598).</p>
<h3>Compute Layer</h3>
<p>A number of compute functions have been added:</p>
<ul>
<li>functions operating on strings: "binary_reverse" (ARROW-14306),
"string_repeat" (ARROW-12712), "utf8_normalize" (ARROW-14205);</li>
<li>"fill_null_forward", "fill_null_backward" (ARROW-1699);</li>
<li>"ceil_temporal", "floor_temporal", "round_temporal" to adjust temporal input
to an integral multiple of a given unit (ARROW-14822);</li>
<li>"year_month_day" to extract the calendar components of the input (ARROW-15032);</li>
<li>"random" to general random floating-point values between 0 and 1 (ARROW-12404);</li>
<li>"indices_nonzero" to return the indices in the input where there are
non-zero, non-null values (ARROW-13035).</li>
</ul>
<p>Decimal data is now supported as input of the arithmetic kernels
(ARROW-13130).</p>
<p>Dictionary data is now supported as input of the hash join execution node
(ARROW-14181).</p>
<p>Residual predicates have been implemented in the hash join node
(ARROW-13643).</p>
<p>The "list_parent_indices" function now always returns int64 data
regardless of the input type (ARROW-14592).</p>
<p>Month-day-nano interval data is now supported as input of the same functions
as other interval types (ARROW-13989).</p>
<h3>CSV</h3>
<p>The CSV writer got additional configuration options:</p>
<ul>
<li>the string representation of null values (ARROW-14905);</li>
<li>the quoting strategy: always / never / as needed (ARROW-14905);</li>
<li>the end of line character(s) (ARROW-14907)</li>
</ul>
<h3>Dataset Layer</h3>
<p><a href="/blog/2022/01/31/skyhook-bringing-computation-to-storage-with-apache-arrow/">Skyhook</a>,
a dataset addition that offloads fragment scan operations to a
Ceph distributed storage cluster, was contributed (ARROW-13607).</p>
<p>The dataset writer now exposes options <code>min_rows_per_group</code> and
<code>max_rows_per_group</code> to control the size of row groups created (ARROW-14426).</p>
<h3>IO and Filesystem Layer</h3>
<p>A critical bug in the AWS SDK for C++ that risks losing data in S3 multipart
uploads has been circumvented (ARROW-14523).</p>
<p>The Google Cloud Storage filesystem is now featureful enough to pass all
generic filesystem tests (ARROW-14924).</p>
<p>The OpenAppendStream method of filesystems has been un-deprecated; however,
it still cannot be implemented for all filesystem backends (ARROW-14969).</p>
<p>A new function <code>arrow::fs::ResolveS3BucketRegion</code> allows resolving the
region where a particular S3 bucket resides (ARROW-15165).</p>
<p>The S3 filesystem now sets the Content-Type of output files to
"application/octet-stream" (instead of "application/xml" previously)
if not explicitly specified by the caller (ARROW-15306).</p>
<h3>IPC</h3>
<p>Fine-grained I/O (coalescing) is now enabled in the synchronous (ARROW-12683)
and asynchronous (ARROW-14577) IPC reader.</p>
<p>It is now possible to set the compression level when using LZ4 compression
(ARROW-9648).</p>
<h3>ORC</h3>
<p>The ORC adapters have been significantly improved. A lot more properties of the ORC reader as well as ORC writer options are now available. Moreover API docs for both the ORC reader and the ORC writer have been generated. (ARROW-11297)</p>
<h3>Parquet</h3>
<p>DELTA_BYTE_ARRAY-encoded data can now be read from (but not written to)
bytearray columns in Parquet files (PARQUET-492).</p>
<h2>Go notes</h2>
<h3>Arrow</h3>
<h4>Bug Fixes</h4>
<ul>
<li>License lifted up a level so that it is properly detected for the github.com/apache/arrow/go/v7 module for pkg.go.dev <a href="https://github.com/apache/arrow/pull/11715" target="_blank" rel="noopener">ARROW-14728</a>. Documentation on pkg.go.dev will look correct with complete major version handling as of the v7.0.0 release.</li>
<li>Errors from <code>MessageReader.Message</code> get properly surfaced by <code>Reader.Read</code> <a href="https://github.com/apache/arrow/pull/11739" target="_blank" rel="noopener">ARROW-14769</a>
</li>
<li>
<code>ipc.Reader</code> properly uses the allocator it is initialized with instead of making native byte slices <a href="https://github.com/apache/arrow/pull/11712" target="_blank" rel="noopener">ARROW-14717</a>
</li>
<li>Fixed a CI issue where the CGO tests were crashing on windows <a href="https://github.com/apache/arrow/pull/11611" target="_blank" rel="noopener">ARROW-14589</a>
</li>
<li>Various fixes for internal usages of <code>Release</code> and <code>Retain</code> to maintain proper management of reference counting.</li>
</ul>
<h4>Enhancements</h4>
<ul>
<li>Continuous Integration for Go library now uses Go1.16 as the version being tested <a href="https://github.com/apache/arrow/pull/11860" target="_blank" rel="noopener">ARROW-14985</a>
</li>
<li>
<code>ValueOffsets</code> function added to <code>array.String</code> to return the entire slice of offsets <a href="https://github.com/apache/arrow/pull/11653" target="_blank" rel="noopener">ARROW-14645</a>
</li>
<li>
<code>array.Interface</code> has been lifted to <code>arrow.Array</code>, <code>array.{Record,Column,Chunked,Table}</code> have been lifted to <code>arrow.{Record,Column,Chunked,Table}</code>. Interface <code>arrow.ArrayData</code> has been created to be used instead of <code>array.Data</code>. Aliases have been provided for the <code>array</code> package so existing code that doesn't directly use <code>array.Data</code> shouldn't be affected. The aliases will be removed in v8. <a href="https://github.com/apache/arrow/pull/11832" target="_blank" rel="noopener">ARROW-5599</a>. The <code>Chunked.NewSlice</code> method has been removed and is replaced by the <code>array.NewChunkedSlice</code> function.</li>
<li>Arrays and Records now support marshalling to JSON via the <code>json.Marshaller</code> interface. Builders support adding values to them by unmarshalling from JSON via the <code>json.Unmarshaller</code> interface. <code>array.FromJSON</code> function added to create Arrays from JSON directly. <a href="https://github.com/apache/arrow/pull/11359" target="_blank" rel="noopener">ARROW-9630</a>
</li>
<li>Basic handling of field referencing and expression building similar to the C++ Compute APIs added through the new <code>compute</code> package in preparation for adding compute interfaces. Does not yet allow <em>executing</em> expressions. <a href="https://github.com/apache/arrow/pull/11514" target="_blank" rel="noopener">ARROW-14430</a>
</li>
</ul>
<h3>Parquet</h3>
<h4>Enhancements</h4>
<ul>
<li>Updated dependency versions <a href="https://github.com/apache/arrow/pull/11537" target="_blank" rel="noopener">ARROW-14462</a>
</li>
<li>
<code>file</code> module added, Go Parquet library now supports full file reading and writing. <a href="https://github.com/apache/arrow/pull/11146" target="_blank" rel="noopener">ARROW-13984</a> <a href="https://github.com/apache/arrow/pull/11538" target="_blank" rel="noopener">ARROW-13986</a>. Does not yet provide direct Parquet &lt;--&gt; Arrow conversions.</li>
<li>Internal min_max utility functions given Arm64 NEON SIMD optimized assembly, gaining a 4x - 6x performance improvement. <a href="https://github.com/apache/arrow/pull/12163" target="_blank" rel="noopener">ARROW-15536</a>
</li>
</ul>
<h2>Java notes</h2>
<ul>
<li>Flight SQL support is now available in the Java library, with integration tests to verify it against the C++ reference implementation.</li>
<li>
<code>GeneralOutOfPlaceVectorSorter</code> is now available for sorting any kind of vector. In general if dedicated sorters can be used (like <code>FixedWidthInPlaceVectorSorter</code>) they should preferred as they will generally perform better.</li>
<li>
<code>log4j2</code> dependency was removed as it was unused and a possible vector for attacks</li>
<li>
<code>VectorSchemaRootAppender</code> now works with <code>BitVector</code>
</li>
</ul>
<h2>JavaScript notes</h2>
<ul>
<li>Major simplifications to the API. There is only a single Vector class now. See the (also much improved) docs for details.</li>
<li>Dictionary vectors created with <code>vectorFromArray</code> are automatically cached for better performance.</li>
<li>Better tree shaking support. Some bundles can now be only a few kb.</li>
</ul>
<h2>Python notes</h2>
<ul>
<li>Official support for Python 3.6 has been dropped.</li>
<li>
<code>random</code> and <code>indices_nonzero</code> compute functions are now supported in Python</li>
<li>
<code>pyarrow.orc.read_table</code> is now provided to easily read the content of ORC files to a Table.</li>
<li>
<code>pyarrow.orc.ORCFile</code> now has a lot more properties exposed.</li>
<li>
<code>pyarrow.orc.ORCWriter</code> and <code>pyarrow.orc.write_table</code> now have the writer options available.</li>
<li>
<code>pyarrow.orc</code> now has much better API documentation.</li>
<li>Support for compute functions arguments and options has been improved in general, arguments are not position only, while options can be provided as keyword args or not, and error reporting for wrong arguments has been improved.</li>
<li>
<code>Table</code> now has a <code>group_by</code> method that allows to perform aggregations on table data. The compute functions documentation has also been improved to better distinguish between standard compute functions and <code>HASH_AGGREGATE</code> compute functions that can only be using for aggregations.</li>
<li>Python documentation now provides interlinking for references to parameter types and return values, thus making far easier to navigate the documentation.</li>
</ul>
<h2>R notes</h2>
<p>This release adds additional improvements to the <code>dplyr</code> interface, to CSV support, and to the C-Data interface to exchange data with other languages. For more details, see the <a href="/docs/r/news/">complete R changelog</a>.</p>
<h2>Ruby and C GLib notes</h2>
<h3>Ruby</h3>
<p>There are two new contributors @okadakk and @simpl1g .</p>
<p>The updates of Red Arrow consists of the following improvements:</p>
<ul>
<li>
<code>Arrow::Function#execute</code> now accepts an instance of an <code>Arrow::Column</code> as its argument <a href="https://issues.apache.org/jira/browse/ARROW-14551" target="_blank" rel="noopener">(ARROW-14551)</a>
</li>
<li>
<code>Arrow::Table.load</code> now supports <code>.arrows</code> files to load <a href="https://issues.apache.org/jira/browse/ARROW-15356" target="_blank" rel="noopener">(ARROW-15356)</a>
</li>
<li>Add support loading <code>Arrow::Table</code> by a <code>URI</code> in <code>Arrow::Table.load</code> <a href="https://issues.apache.org/jira/browse/ARROW-14562" target="_blank" rel="noopener">(ARROW-14562)</a>
</li>
<li>
<code>Arrow::Table</code> now supports to join two tables <a href="https://issues.apache.org/jira/browse/ARROW-14531" target="_blank" rel="noopener">(ARROW-14531)</a>
</li>
<li>
<code>Arrow::Function#execute</code> gets more easier to use than before <a href="https://issues.apache.org/jira/browse/ARROW-15274" target="_blank" rel="noopener">(ARROW-15274)</a>
</li>
<li>
<code>Arrow::SortKey#name</code> has been renamed to <code>Arrow::SortKey#target</code> <a href="https://issues.apache.org/jira/browse/ARROW-14784" target="_blank" rel="noopener">(ARROW-14784)</a>
</li>
<li>Add Cookbook section to documentation <a href="https://issues.apache.org/jira/browse/ARROW-14636" target="_blank" rel="noopener">(ARROW-14636)</a>
</li>
<li>Support the explicit initialization of S3 API by the <code>Arrow.s3_initialize</code> method <a href="https://issues.apache.org/jira/browse/ARROW-14637" target="_blank" rel="noopener">(ARROW-14637)</a>
</li>
<li>On macOS, stop specifying the version of openssl package explicitly when building the extension library <a href="https://issues.apache.org/jira/browse/ARROW-14619" target="_blank" rel="noopener">(ARROW-14619)</a>
</li>
</ul>
<h3>C GLib</h3>
<p>The updates of Arrow GLib etc. consists of the following improvements:</p>
<ul>
<li>Add <code>garrow_execute_plan_build_hash_join_node</code> function, <code>GArrowHashJoinNodeOption</code>, and <code>GArrowJoinType</code> <a href="https://issues.apache.org/jira/browse/ARROW-15288" target="_blank" rel="noopener">(ARROW-15288)</a>
</li>
<li>Add <code>garrow_function_get_options_type</code> function <a href="https://issues.apache.org/jira/browse/ARROW-15273" target="_blank" rel="noopener">(ARROW-15273)</a>
</li>
<li>Add <code>garrow_function_get_default_options</code> function <a href="https://issues.apache.org/jira/browse/ARROW-15267" target="_blank" rel="noopener">(ARROW-15267)</a>
</li>
<li>Add <code>GArrowRoundToMultipleOptions</code> to customize the <code>round_to_multiple</code> function <a href="https://issues.apache.org/jira/browse/ARROW-15216" target="_blank" rel="noopener">(ARROW-15216)</a>
</li>
<li>Add <code>garrow_function_all</code> function to list up all the functions <a href="https://issues.apache.org/jira/browse/ARROW-15205" target="_blank" rel="noopener">(ARROW-15205)</a>
<ul>
<li>In addition, add <code>garrow_function_get_name</code>, <code>garrow_function_equal</code>, and <code>garrow_function_to_string</code> functions for convenience</li>
</ul>
</li>
<li>Add <code>GArrowRoundOptions</code> <a href="https://issues.apache.org/jira/browse/ARROW-15204" target="_blank" rel="noopener">(ARROW-15204)</a>
</li>
<li>Add <code>garrow_struct_scalar_get_value</code> function for converting a C++ scalar value to a GLib value <a href="https://issues.apache.org/jira/browse/ARROW-15203" target="_blank" rel="noopener">(ARROW-15203)</a>
</li>
<li>Add the following three interval data types <a href="https://issues.apache.org/jira/browse/ARROW-15134" target="_blank" rel="noopener">(ARROW-15134)</a>
<ul>
<li>
<code>GArrowMonthIntervalDataType</code> for the interval with the month component</li>
<li>
<code>GArrowDayTimeIntervalDataType</code> for the interval with the days and the milliseconds components</li>
<li>
<code>GArrowMonthDayNanoIntervalDataType</code> for the interval with the months, the days, and the nanoseconds components</li>
</ul>
</li>
<li>Rename <code>GArrowSortKey::name</code> to <code>::target</code> <a href="https://issues.apache.org/jira/browse/ARROW-14784" target="_blank" rel="noopener">(ARROW-14784)</a>
</li>
<li>Support the explicit initialization of S3 API by the <code>garrow_s3_initialize</code> function <a href="https://issues.apache.org/jira/browse/ARROW-14637" target="_blank" rel="noopener">(ARROW-14637)</a>
</li>
<li>
<code>garrow_decimal128_new_string</code> and <code>garrow_decimal256_new_string</code> now returns errors when they gets a invalid decimal string <a href="https://issues.apache.org/jira/browse/ARROW-14530" target="_blank" rel="noopener">(ARROW-14530)</a>
</li>
<li>
<code>garrow_decimal128_data_type_new</code> and <code>garrow_decimal256_data_type_new</code> functions now validates the given precision <a href="https://issues.apache.org/jira/browse/ARROW-14529" target="_blank" rel="noopener">(ARROW-14529)</a>
</li>
</ul>
<h2>Rust notes</h2>
<p>Rust releases minor versions every 2 weeks in addition to a major
version with the rest of the Arrow language implementations. Thus most
enhancements have been incrementally released over the last 3 months
as part of the 6.x.</p>
<p>Going forward, the Rust implementation version will start deviating
from the rest of the Arrow implementations, incrementing a major
version if the changes to the crate require it. We still plan a
release every other week. Please see issue <a href="https://github.com/apache/arrow-rs/issues/1120" target="_blank" rel="noopener">#1120</a>
for more details</p>
<p>Major changes in the 7.0.0 release include:</p>
<ol>
<li>Additional support for <code>Decimal</code>
</li>
<li>More ergonomic compute kernels that take <code>dyn Array</code>
</li>
<li>
<code>Union</code> type now follows the latest Arrow standard</li>
<li>Support for custom datetime format for inference and parsing CSV files</li>
</ol>
<p>Another highlight is that the community continues to improve the
safety of the arrow crate. The 6.4.0 release included complete data
validation and has resolved all outstanding RUSTSEC issues against the
crate.</p>
<p>For additional details on the 7.0.0
Rust implementation, please see the <a href="https://github.com/apache/arrow-rs/blob/7.0.0/CHANGELOG.md" target="_blank" rel="noopener">Arrow Rust CHANGELOG</a></p>
</main>
</div>
<hr>
<footer class="footer">
<div class="row">
<div class="col-md-9">
<p>Apache Arrow, Arrow, Apache, the Apache logo, and the Apache Arrow project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.</p>
<p>© 2016-2025 The Apache Software Foundation</p>
</div>
<div class="col-md-3">
<a class="d-sm-none d-md-inline pr-2" href="https://www.apache.org/events/current-event.html" target="_blank" rel="noopener">
<img src="https://www.apache.org/events/current-event-234x60.png">
</a>
</div>
</div>
</footer>
</div>
</body>
</html>