layout: default title: Apache Arrow 16.0.0 Release permalink: /release/16.0.0.html

Apache Arrow 16.0.0 (20 April 2024)

This is a major release covering more than 1 months of development.

Download

Contributors

This release includes 587 commits from 119 distinct contributors.

$ git shortlog -sn apache-arrow-15.0.2..apache-arrow-16.0.0
    79	dependabot[bot]
    70	Sutou Kouhei
    41	Antoine Pitrou
    31	Joris Van den Bossche
    28	Raúl Cumplido
    24	Alenka Frim
    19	mwish
    14	Felipe Oliveira Carvalho
    13	Jacob Wujciak-Jens
    12	Dewey Dunnington
    11	Dane Pitkin
    10	Bryce Mecum
    10	Matt Topol
     9	Jonathan Keane
     9	ZhangHuiGui
     8	Vibhatha Lakmal Abeykoon
     7	Rossi Sun
     6	Adam Reeve
     6	David Li
     6	Hyunseok Seo
     6	James Henderson
     6	Thomas Newton
     6	david dali susanibar arce
     5	Dominik Moritz
     5	Laurent Goujon
     5	Weston Pace
     4	Curt Hagenlocher
     4	Divyansh200102
     4	Gang Wu
     4	Ian Cook
     4	James Duong
     4	abandy
     3	Benjamin Kietzman
     3	Jin Shang
     3	Joel Lubinitsky
     3	Judah Rand
     3	Nic Crane
     3	Rok Mihevc
     3	Rossi(Ruoxi) Sun
     3	Vyas Ramasubramani
     3	Xiansen Chen
     2	Anja Kefala
     2	Gabriel Tomitsuka
     2	Josh Soref
     2	LucasG0
     2	Marcus D. Hanwell
     2	Michał Górny
     2	Neal Richardson
     2	Paul
     2	Sten Larsson
     2	Zhen Wang
     2	emkornfield
     2	wayne
     1	0x0000ffff
     1	Adam Curtis
     1	Alex Shcherbakov
     1	Alexander Blazhkov
     1	Ali Khalili
     1	Andrew Grosser
     1	Andrew Lamb
     1	Austin Dickey
     1	Chun Yang
     1	Clay Johnson
     1	Clif Houck
     1	David Greiss
     1	Donald Tolley
     1	Elliot Morrison-Reed
     1	Etienne Bacher
     1	Florian Bernard
     1	Florian Jetter
     1	Fokko Driesprong
     1	Francis
     1	Hadley Wickham
     1	Hattonuri
     1	Hussein Awala
     1	JB Onofré
     1	Jeffrey Vo
     1	Jeremy Aguilon
     1	Jinpeng
     1	Joe Marshall
     1	Jânio
     1	Kemal
     1	Kevin Gurney
     1	Kevin Mingtarja
     1	Lev Tolmachev
     1	Liang-Chi Hsieh
     1	Lubo Slivka
     1	Lyndon Shi
     1	MagicBoost
     1	Matthew McNew
     1	Miguel Pragier
     1	Miles
     1	Paul Nienaber
     1	Peter Newcomb
     1	Sandro
     1	Simon Perkins
     1	Siyang Tang
     1	Tom Jarosz
     1	Uwe L. Korn
     1	Will Jones
     1	Yan Zhou
     1	Yue
     1	arunppsg
     1	av8or1
     1	carehabit
     1	dsisnero
     1	ella-chao
     1	h-vetinari
     1	keshen-msft
     1	lriggs
     1	messense
     1	normanj-bitquill
     1	qmmk
     1	sgilmore10
     1	sullis
     1	tobim
     1	y.yoshida5
     1	ywgrit
     1	野鹿

Patch Committers

The following Apache committers merged contributed patches to the repository.

$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-15.0.2..apache-arrow-16.0.0
   176	Sutou Kouhei
    97	Antoine Pitrou
    58	Joris Van den Bossche
    50	David Li
    32	Matt Topol
    27	Curt Hagenlocher
    20	Jacob Wujciak-Jens
    17	Raúl Cumplido
    16	Felipe Oliveira Carvalho
    14	AlenkaF
    13	mwish
     9	Benjamin Kietzman
     8	Dewey Dunnington
     6	Nic Crane
     5	Bryce Mecum
     5	Jonathan Keane
     3	Weston Pace
     3	dependabot[bot]
     2	Kevin Gurney
     1	Rok Mihevc

Changelog

Apache Arrow 16.0.0 (2024-04-20 07:00:00)

Bug Fixes

  • GH-20379 - [Java] Dataset Failed to update reservation while freeing bytes (#40101)
  • GH-35081 - [Python] construct pandas.DataFrame with public API in to_pandas (#40897)
  • GH-35369 - [Docs] Add a missing space after ref:IPC format <format-ipc> (#38276)
  • GH-35718 - [Go][Parquet] Fix for null-only encoding panic (#39497)
  • GH-36026 - [C++][ORC] Catch all ORC exceptions to avoid crash (#40697)
  • GH-36026 - [Python] Fix ORC test segfault in the python wheel windows test (#40609)
  • GH-37164 - [Python] Attach Python stacktrace to errors in ConvertPyError (#39380)
  • GH-37841 - [Java] Dictionary decoding not using the compression factory from the ArrowReader (#38371)
  • GH-37989 - [Python] Plug reference leaks when creating Arrow array from Python list of dicts (#40412)
  • GH-38768 - [Python] Empty slicing an array backwards beyond the start is now empty (#40682)
  • GH-38768 - [Python] Slicing an array backwards beyond the start now includes first item. (#39240)
  • GH-38794 - [C++][S3] Handle conventional content-type for directories (#40147)
  • GH-38821 - [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371)
  • GH-38828 - [R] Ensure that streams can be written to socket connections (#38897)
  • GH-38833 - [C++] Avoid hash_mean overflow (#39349)
  • GH-38923 - [GLib] Fix spelling (#38924)
  • GH-38962 - [C++] Fix spelling (array) (#38963)
  • GH-39291 - [Docs] Remove the “Show source” links from doc pages (#40167)
  • GH-39309 - [Go][Parquet] handle nil bitWriter for DeltaBinaryPacked (#39347)
  • GH-39310 - [CI][Java][Docs] Failed by new module-info-compiler Maven plugin
  • GH-39416 - [GLib][Docs] Fixed Broken Link in README Content (#39896)
  • GH-39424 - [CI][R] test-r-rhub-debian-gcc-devel-lto-latest fails not being able to install Arrow
  • GH-39440 - [Python] Calling pyarrow.dataset.ParquetFileFormat.make_write_options as a class method results in a segfault (#40976)
  • GH-39444 - [Python] Fix parquet import in encryption test (#40505)
  • GH-39444 - [C++][Parquet] Fix crash in Modular Encryption (#39623)
  • GH-39456 - [Go][Parquet] Arrow DATE64 Type Coerced to Parquet DATE Logical Type (#39460)
  • GH-39466 - [Go][Parquet] Align Arrow and Parquet Timestamp Instant/Local Semantics (#39467)
  • GH-39519 - [Swift] Fix null count when using reader (#39520)
  • GH-39523 - [R] Don't override explicitly set NOT_CRAN=false when on dev version (#39524)
  • GH-39558 - [Java] Add SQL_ALL_TABLES_ARE_SELECTABLE, SQL_NULL_ORDERING and SQL_MAX_COLUMNS_IN_TABLE support to SqlInfoBuilder (#39561)
  • GH-39579 - [Python] fix raising ValueError on _ensure_partitioning (#39593)
  • GH-39683 - [Release] Use temporary direction with TEST_BINARY=1 (#39684)
  • GH-39706 - [Archery] Fix benchmark diff subcommand (#39733)
  • GH-39738 - [R] Support build against the last three released versions of Arrow (#39739)
  • GH-39765 - [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794)
  • GH-39769 - [C++][Device] Fix Importing nested and string types for DeviceArray (#39770)
  • GH-39782 - [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783)
  • GH-39788 - [Python] Validate max_chunksize in Table.to_batches (#39796)
  • GH-39841 - [GLib] Add support for GLib 2.56 again (#39842)
  • GH-39857 - [C++] Improve error message for “chunker out of sync” condition (#39892)
  • GH-39870 - [Go] Include buffered pages in TotalBytesWritten (#40105)
  • GH-39874 - [CI][C++][Windows] Use pre-installed OpenSSL (#39882)
  • GH-39883 - [CI][R][Windows] Use ci/scripts/install_minio.sh with Git bash (#39929)
  • GH-39909 - [Java][CI] Update reference to Float16 testing file reference on Testing submodule (#39911)
  • GH-39921 - [Go][Parquet] ColumnWriter not reset TotalCompressedBytes after Flush (#39922)
  • GH-39925 - [Go][Parquet] Fix re-slicing in maybeReplaceValidity function (#39926)
  • GH-39935 - [GLib][Docs] Use GI-DocGen instead of GTK-Doc (#40427)
  • GH-39955 - [C++] Use make -j1 to install bundled bzip2 (#39956)
  • GH-39965 - [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995)
  • GH-39973 - [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975)
  • GH-39992 - [CI][Docs][Java] ubuntu-docs uses Maven version in .env (#39993)
  • GH-39996 - [Archery] Fix Crossbow build on a PR from a fork's main branch (#40002)
  • GH-39996 - [Archery] Fix Crossbow build on a PR from a fork's main branch (#39997)
  • GH-40038 - [Java] Export non empty offset buffer for variable-size layout through C Data Interface (#40043)
  • GH-40039 - [Java][FlightRPC] Improve performance by removing unnecessary memory copies (#40042)
  • GH-40040 - [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041)
  • GH-40052 - [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054)
  • GH-40085 - [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086)
  • GH-40089 - [Go] Concurrent Recordset for receiving huge recordset (#40090)
  • GH-40097 - [Go][FlightRPC] Enable disabling TLS (#40098)
  • GH-40126 - [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223)
  • GH-40145 - [C++][Docs] Correct the console emitter link (#40146)
  • GH-40153 - [C++][Python] Fix test_gdb failures on 32-bit (#40293)
  • GH-40153 - [Python] Make Tensor.__getbuffer__ work on 32-bit platforms (#40294)
  • GH-40153 - [Python] Avoid using np.take in Array.to_numpy() (#40295)
  • GH-40153 - [Python][C++] Fix large file handling on 32-bit Python build (#40176)
  • GH-40153 - [Python] Update size assumptions for 32-bit platforms (#40165)
  • GH-40153 - [Python] Fix OverflowError in foreign_buffer on 32-bit platforms (#40158)
  • GH-40171 - [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172)
  • GH-40181 - [C++] Support glog 0.7 build (#40230)
  • GH-40183 - [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200)
  • GH-40199 - [R] dbplyr 2.5.0 forward compatibility (#40197)
  • GH-40207 - [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206)
  • GH-40227 - [R] ensure executable files in create_package_with_all_dependencies (#40232)
  • GH-40233 - [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234)
  • GH-40249 - [Java] Fix NPE in ArrowDatabaseMetadata (#40988)
  • GH-40266 - [Python] Mark ListView as a nested type (#40265)
  • GH-40268 - [Archery] Bump the version of pygit2, adapt to API changes (#40269)
  • GH-40276 - [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277)
  • GH-40279 - [C++] Reduce S3Client initialization time (#40299)
  • GH-40306 - [C++] Fix a wrong total_bytes to generate StringType's test data in vector_hash_benchmark (#40307)
  • GH-40308 - [C++][Gandiva] Add support for compute module's decimal promotion rules (#40434)
  • GH-40316 - [Python] only allocate the ScalarMemoTable when used (#40565)
  • GH-40327 - [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330)
  • GH-40331 - [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332)
  • GH-40334 - [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338)
  • GH-40366 - [C++] Remove const qualifier from Buffer::mutable_span_as (#40367)
  • GH-40375 - [Python] Error compiling Cython files on Windows during release verification
  • GH-40395 - [C++] Avoid simplifying expressions which call impure functions (#40396)
  • GH-40398 - [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399)
  • GH-40422 - [C++][FlightRPC] Add missing expiration_time arguments (#40425)
  • GH-40431 - [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484)
  • GH-40432 - [C++] Add missing Threads::Threads dependency to arrow_static (#40433)
  • GH-40439 - [Python] Fix flake8 failures in python/benchmarks/parquet.py (#40440)
  • GH-40443 - [Python] Suppress python/examples/minimal_build/Dockerfile.* warnings (#40444)
  • GH-40445 - [C++] Fix static build on Windows (#40446)
  • GH-40500 - [C++] Ensure using bundled FlatBuffers (#40519)
  • GH-40535 - [Docs][R] Set RETICULATE_PYTHON_ENV in order to find pyarrow (#40571)
  • GH-40558 - [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559)
  • GH-40562 - [C++] Repair FileSystem merge error (#40564)
  • GH-40566 - [C++] Fix 3.12 Python support (#40322)
  • GH-40568 - [Java] Test failure in Dataset regarding TestAllTypes (#40662)
  • GH-40591 - [R] Add extra CSS for navbar on pkgdown website (#40610)
  • GH-40602 - [C++] Move mold linker flags to variables (#40603)
  • GH-40615 - [Packaging][deb] Move libprotobuf-dev dependency to libarrow-dev from libarrow-flight-dev (#40617)
  • GH-40616 - [Docs][GLib] Ensure overwriting placeholder front pages (#40618)
  • GH-40619 - [Java] JDBC Adapter Build Issue (#40656)
  • GH-40623 - [Python][Docs] Add workaround for autosummary (#40739)
  • GH-40634 - [C#] ArrowStreamReader should not be null (#40765)
  • GH-40642 - [Python] BUG: Empty slicing an array backwards beyond the start should be empty
  • GH-40652 - [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769)
  • GH-40668 - [Ruby][CI] Require GLib 2.58 or later for timezone (#40669)
  • GH-40672 - [Go][Parquet] Add proper build tags for min_max (#40676)
  • GH-40674 - [GLib] Don't assume gint64 and int64_t use the same type (#40736)
  • GH-40693 - [Go] Fix Decimal type precision loss on GetOneForMarshal (#40694)
  • GH-40700 - [Go][CI] test-debian-12-go-1.21 fails with `go: updates to go.mod needed`
  • GH-40702 - [R] Avoid undocumented dbplyr internals in duckdb tests (#40710)
  • GH-40703 - [CI][Packaging] Homebrew can't install Python 3.12 on GHA runners (#40704)
  • GH-40706 - [CI][Python] Activate ARROW_PYTHON_VENV if defined in sdist-test job (#40707)
  • GH-40716 - [Java][Integration] Fix test_package_java in verification scripts (#40724)
  • GH-40718 - [JS] Fix set visitor in vectors for js dates (#40725)
  • GH-40719 - [Go] Make arrow.Null non-null for arrow.TypeEqual to work properly with new(arrow.NullType) (#40802)
  • GH-40727 - [C++][Gandiva] ‘ilike’ function does not work (#40728)
  • GH-40751 - [C++] Fix protobuf package name setting for builds with substrait (#40753)
  • GH-40773 - [Java] add DENSEUNION case to StructWriters, resolves #40773 (#40809)
  • GH-40775 - [Benchmarking][Java] Fix conbench timeout (#40786)
  • GH-40788 - [C#] Override Accept in MapArray (#40789)
  • GH-40790 - [C#] Account for offset and length when getting fields of a StructArray (#40805)
  • GH-40792 - [C#] Fix slicing a previously sliced array (#40793)
  • GH-40847 - [Go] update readme (#40877)
  • GH-40851 - [JS] Fix nullcount and make vectors created from typed arrays not nullable (#40852)
  • GH-40855 - [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023)
  • GH-40858 - [R] Remove dangling commas from codegen.R (#40859)
  • GH-40863 - [C++] Fix TSAN link error for module library (#40864)
  • GH-40870 - [C#] Update CompareValidityBuffer() to pass when unspecified final bits are not identical (#40873)
  • GH-40878 - [JAVA] Fix flight-sql-jdbc-driver shading issues (#40879)
  • GH-40891 - [JS] Store Dates as TimestampMillisecond (#40892)
  • GH-40893 - [Java][FlightRPC] Support IntervalMonthDayNanoVector in FlightSQL JDBC Driver (#40894)
  • GH-40896 - [Java] Remove runtime dependencies on Eclipse, logback (#40904)
  • GH-40898 - [C#] Do not import length-zero buffers from C Data Interface Arrays (#41054)
  • GH-40900 - [Go] Fix Mallocator Weirdness (#40902)
  • GH-40907 - [Java][FlightSQL] Shade slf4j-api in JDBC driver (#40908)
  • GH-40952 - [Java][FlightSQL] Clean up flight-sql-jdbc-driver dependencies (#40953)
  • GH-40954 - [CI] Fix use of obsolete docker-compose command on Github Actions (#40949)
  • GH-40961 - [GLib] Suppress warnings for Vala examples on macOS (#40962)
  • GH-40974 - [CI][Python] CI failures on Python builds due to pytest_cython (#40975)
  • GH-40991 - [R] Prefer r-universe, add a startup message (#41019)
  • GH-40999 - [Java] Fix AIOOBE trying to splitAndTransfer DUV within nullable struct (#41000)
  • GH-41004 - [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with Valgrind (#41163)
  • GH-41005 - [CI] HDFS and skyhook tests require docker compose usage because they require multiple containers (#41027)
  • GH-41007 - [CI][Archery] Correctly interpolate environment variables from docker compose when using docker cli on archery docker (#41026)
  • GH-41015 - [JS][Benchmarking] allow JS benchmarks to run more portably (#41031)
  • GH-41016 - [C++] Fix null count check in BooleanArray.true_count() (#41070)
  • GH-41024 - [C++] IO: fixing compiling in gcc 7.5.0 (#41025)
  • GH-41032 - [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037)
  • GH-41039 - [Python] ListView pandas tests should use np.nan instead of None (#41040)
  • GH-41044 - [C++] formatting.h: Make sure space is allocated for the ‘Z’ when formatting timestamps (#41045)
  • GH-41061 - [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062)
  • GH-41088 - [CI][Crossbow] Fix GitHub Actions workflow syntax error (#41091)
  • GH-41119 - [Archery][Packaging][CI] Avoid using --progress flag on Docker on Windows on archery (#41120)
  • GH-41121 - [C++] Fix: left anti join filter empty rows. (#41122)
  • GH-41124 - [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151)
  • GH-41127 - [CI] Use GitHub Actions instead of Azure Pipelines for docker-tests (#41153)
  • GH-41145 - [R][CI] test-r-dev-duckdb fails installing duckdb (#41152)
  • GH-41147 - [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150)
  • GH-41154 - [C++] Fix Valgrind error in string-to-float16 conversion (#41155)
  • GH-41167 - [CI][Release][GLib][Conda] Pin gobject-introspection to 1.78.1 (#41181)
  • GH-41169 - [CI][Release] Specify --build-config explicitly on Windows (#41178)
  • GH-41176 - [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177)
  • GH-41201 - [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char* -> bool (#41202)

New Features and Improvements

  • GH-18014 - [C++] Filesystem implementation for Azure Blob Storage
  • GH-20127 - [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis setup (#40363)
  • GH-20127 - [Python] Remove deprecated pyarrow.filesystem legacy implementations (#39825)
  • GH-20213 - [C++] Implement cast to/from halffloat (#40067)
  • GH-20339 - [C++] Add residual filter support to swiss join (#39487)
  • GH-23221 - [C++] Add support for building with Emscripten (#37821)
  • GH-24826 - [Java] Add DUV.setOffset method (#40985)
  • GH-24834 - [C#] Support writing compressed IPC data (#39871)
  • GH-30915 - [C++][Python] Add missing methods to RecordBatch (#39506)
  • GH-31545 - [GLib] Enable clang-format (#40451)
  • GH-31735 - [Docs][Release] Move release verification guide to developers documentation (#39960)
  • GH-33499 - [Python][CI] Support ORC in Windows wheels
  • GH-34235 - [Python] Correct test marker for join_asof tests (#40666)
  • GH-34235 - [Python] Add join_asof binding (#34234)
  • GH-34865 - [C++][Java][Flight RPC] Add Session management messages (#34817)
  • GH-35875 - [R] Update Readme (#40148)
  • GH-35941 - [Dev][MATLAB] Add clang-format configuration to pre-commit (#40588)
  • GH-36656 - [Dev] Validate in merge script if issue has an assigned milestone already (#40771)
  • GH-37286 - [Java] Start adding nullability/nullness annotations (#37723)
  • GH-37328 - [Python] Add a function to download and extract timezone database on Windows (#38179)
  • GH-37381 - [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and Windows wheels for pyarrow
  • GH-37484 - [Python] Add a FixedSizeTensorScalar class (#37533)
  • GH-37931 - [Python][CI][Dev][Python] Release and merge script errors (#37819)" (#40150)
  • GH-38010 - [Python] Construct pyarrow.Field and ChunkedArray through Arrow PyCapsule Protocol (#40818)
  • GH-38309 - [C++] build filesystems as separate modules (#39067)
  • GH-38560 - [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335)
  • GH-38573 - [Java][FlightRPC] Try all locations in JDBC driver (#40104)
  • GH-38659 - [CI][MATLAB][Packaging] Add MATLAB packaging task to crossbow tasks.yml (#38660)
  • GH-38663 - [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160)
  • GH-38703 - [C++][FS][Azure] Implement DeleteFile() (#39840)
  • GH-38704 - [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904)
  • GH-38717 - [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455)
  • GH-38916 - [R] Simplify dataset and table print output (#38917)
  • GH-38988 - [Go] Expose dictionary size from DictionaryBuilder (#39521)
  • GH-38998 - [Java] Build memory-core and memory-unsafe as JPMS modules (#39011)
  • GH-39001 - [Java] Modularize remaining modules (#39221)
  • GH-39057 - [CI][C++][Go] Don't run jobs that use a self-hosted GitHub Actions Runner on fork (#39903)
  • GH-39069 - [C++][FS][Azure] Use the generic filesystem tests (#40567)
  • GH-39147 - [R] Add Bootstrap.r (#39148)
  • GH-39231 - [C++][Compute] Add binary_slice kernel for fixed size binary (#39245)
  • GH-39233 - [Compute] Add some duration kernels (#39358)
  • GH-39270 - [C++] Avoid creating memory manager instance for every buffer view/copy (#39271)
  • GH-39277 - [Python] Fix missing byte_width attribute on DataType class (#39592)
  • GH-39330 - [Java][CI] Fix or suppress spurious errorprone warnings (#39529)
  • GH-39336 - [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337)
  • GH-39352 - [FS][Azure] Enable azure in builds (#39971)
  • GH-39377 - [C++] IO: Reuse same buffer in CompressedInputStream (#39807)
  • GH-39385 - [C++] Use more permissable return code for rename (#39481)
  • GH-39398 - [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397)
  • GH-39427 - [GLib] Update script and documentation (#39428)
  • GH-39463 - [C++] Support cast kernel from large string, (large) binary to dictionary (#40017)
  • GH-39532 - [Python] Compatibility with NumPy 2.0
  • GH-39549 - [C++] Pass -jN to make in external projects (#39550)
  • GH-39552 - [Go] inclusion of option to use replacer when creating csv strings with go library (#39576)
  • GH-39555 - [Packaging][Python] Enable building pyarrow against numpy 2.0 (#39557)
  • GH-39560 - [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570)
  • GH-39574 - [Go] Enable PollFlightInfo in Flight RPC (#39575)
  • GH-39621 - [CI][Packaging] Update vcpkg to 2023.11.20 release (#39622)
  • GH-39651 - [Python] Basic pyarrow bindings for Binary/StringView classes (#39652)
  • GH-39654 - [Java] Upgrade to Netty 4.1.105.Final (#39655)
  • GH-39663 - [C++] Ensure top-level benchmarks present informative metrics (#40091)
  • GH-39666 - [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764)
  • GH-39667 - [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766)
  • GH-39669 - [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435)
  • GH-39680 - [Java] enable half float support on Java module (#39681)
  • GH-39697 - [R] Source build should check if offline (#39699)
  • GH-39702 - [GLib] Add support for time zone in GArrowTimestampDataType (#39717)
  • GH-39704 - [C++][Parquet] Benchmark levels decoding (#39705)
  • GH-39707 - [Java] Enable local build cache for Maven/Java build (#39708)
  • GH-39718 - [C++][FS][Azure] Remove StatusFromErrorResponse as it's not necessary (#39719)
  • GH-39720 - [Swift] Switch reader to use arrow field instead of proto for building arrays (#39721)
  • GH-39734 - [Java] Bump org.codehaus.mojo:exec-maven-plugin from 1.6.0 to 3.1.1 (#39696)
  • GH-39747 - [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748)
  • GH-39752 - [Java] Remove Static imports for Utf8 Usage (#40683)
  • GH-39761 - [Docs] Link to Go documentation references outdated documentation from 2018 (#39750)
  • GH-39771 - [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772)
  • GH-39774 - [Go] Add public access to PreparedStatement handle (#39775)
  • GH-39779 - [Python] Expose force_virtual_addressing in PyArrow (#39819)
  • GH-39780 - [Python][Parquet] Support hashing for FileMetaData and ParquetSchema (#39781)
  • GH-39812 - [Python] Add bindings for ListView and LargeListView (#39813)
  • GH-39815 - [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817)
  • GH-39823 - [C++] Allow building cpp/src/arrow/**/*.cc without waiting bundled libraries (#39824)
  • GH-39837 - [Go][Flight] Allow cloning existing cookies in middleware (#39838)
  • GH-39843 - [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844)
  • GH-39845 - [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847)
  • GH-39848 - [Python][Packaging] Build pyarrow wheels with numpy RC instead of nightly (#41097)
  • GH-39852 - [Python] Support creating Binary/StringView arrays from python objects (#39853)
  • GH-39855 - [Python] ListView support for pa.array() (#40160)
  • GH-39859 - [R] Remove macOS from the allow list (#39861)
  • GH-39863 - [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878)
  • GH-39864 - [C++] DataType::ToString support optionally show metadata (#39888)
  • GH-39872 - [Packaging][Ubuntu] Add support for Ubuntu 24.04 Noble Numbat (#39887)
  • GH-39885 - [CI][MATLAB] Bump matlab-actions/setup-matlab and matlab-actions/run-tests from v1 to v2 (#39886)
  • GH-39900 - [Java][CI] To upload Maven and Memory Netty Buffer Patch into Apache Nightly repository (#39901)
  • GH-39910 - [Go] Add func to load prepared statement from ActionCreatePreparedStatementResult (#39913)
  • GH-39928 - [C++][Gandiva] Accept LLVM 18 (#39934)
  • GH-39930 - [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932)
  • GH-39946 - [Java] Bump com.puppycrawl.tools:checkstyle from 8.19 to 8.29 (#39694)
  • GH-39958 - [Python][CI] Remove upper pin on pytest (#40487)
  • GH-39962 - [C++] Small CSV reader refactoring (#39963)
  • GH-39968 - [Python][FS][Azure] Minimal Python bindings for AzureFileSystem (#40021)
  • GH-39978 - [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094)
  • GH-39979 - [Python] Low-level bindings for exporting/importing the C Device Interface (#39980)
  • GH-39984 - [Python] Add ChunkedArray import/export to/from C (#39985)
  • GH-39987 - [R] Make it possible to use a rtools libarrow on windows (#39986)
  • GH-40011 - [CI] Update Fedora to 39 from 38 (#40012)
  • GH-40023 - [Python] Use Cast() instead of CastTo (#40116)
  • GH-40026 - [C++][FS][Azure] Add support for reading user defined metadata (#40671)
  • GH-40028 - [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325)
  • GH-40029 - [Packaging][Ubuntu] Drop support for Ubuntu 23.10 Mantic Minotaur (#40030)
  • GH-40037 - [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119)
  • GH-40055 - [Java][Docs] Simplify use of Filter and Expression into Dataset Substrait (#40056)
  • GH-40059 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064)
  • GH-40060 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359)
  • GH-40061 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803)
  • GH-40066 - [Python] Support requested_schema in __arrow_c_stream__() (#40070)
  • GH-40074 - [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075)
  • GH-40077 - [CI] Use GitHub hosted M1 macOS runner (#40437)
  • GH-40079 - [CI][Packaging] Enable Azure in more tests and builds (#40080)
  • GH-40082 - [CI][C++] Add a job on ARM64 macOS (#40456)
  • GH-40092 - [Python] Support Binary/StringView conversion to numpy/pandas (#40093)
  • GH-40095 - [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127)
  • GH-40113 - [Go][Parquet] New RegisterCodec function (#40114)
  • GH-40133 - [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132)
  • GH-40142 - [Python] Allow FileInfo instances to be passed to dataset init (#40143)
  • GH-40151 - [C++] Make S3 narrative test more flexible (#40144)
  • GH-40152 - [C++] Remove redundant invocation of BatchesFromTable (#40173)
  • GH-40155 - [Go][FlightRPC][FlightSQL] Implement Session Management (#40284)
  • GH-40159 - [Python][CI] Add 32-bit Debian build on Crossbow (#40164)
  • GH-40190 - [R][Docs] Update NEWS.md with build system changes (#40191)
  • GH-40205 - [Python] ListView arrow-to-pandas conversion (#40482)
  • GH-40209 - [C++][CMake] Use “RapidJSON” CMake target for RapidJSON (#40210)
  • GH-40212 - [R][CI] Add a C++ with gcc 14 build (#40244)
  • GH-40221 - [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222)
  • GH-40224 - [C++] Fix: improve the backpressure handling in the dataset writer (#40722)
  • GH-40228 - [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229)
  • GH-40236 - [Python][CI] Disable generating C lines in Cython tracebacks (#40225)
  • GH-40261 - [Go] Don't export array functions with unexposed return types (#40272)
  • GH-40273 - [Python] Support construction of Run-End Encoded arrays in pa.array(..) (#40341)
  • GH-40274 - [C++] Add support for system glog 0.7 (#40275)
  • GH-40280 - [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281)
  • GH-40291 - [Python] Accept dict in pyarrow.record_batch() function (#40292)
  • GH-40318 - [C++][Docs] Add documentation of array factories (#40373)
  • GH-40323 - [R][CI] Use rocker/r-ver instead of library/r-base (#40321)
  • GH-40328 - [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329)
  • GH-40333 - [Docs] Improve env var docs for ARROW_USER_SIMD_LEVEL (#40374)
  • GH-40345 - [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084)
  • GH-40357 - [C++] Add benchmark for ToTensor conversions (#40358)
  • GH-40370 - [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372)
  • GH-40376 - [Python] Update for NumPy 2.0 ABI change in PyArray_Descr->elsize (#40418)
  • GH-40377 - [Python][CI] Fix install of nightly dask in integration tests (#40378)
  • GH-40379 - [Python] Fix byte_width for binary(0) + fix hypothesis tests (#40381)
  • GH-40394 - [C++] Add support for mold (#40397)
  • GH-40400 - [C++] Add support for LLD (#40927)
  • GH-40402 - [GLib] Add missing compute function options classes (#40403)
  • GH-40405 - [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406)
  • GH-40428 - [Python][CI] Fix dataset partition filter tests with pandas nightly (#40429)
  • GH-40438 - [GLib] Add GArrowTimestampParser (#40457)
  • GH-40441 - [GLib][Docs] Use Sphinx for Apache Arrow GLib front page (#40442)
  • GH-40448 - [CI][Dev] Run pre-commit (#40449)
  • GH-40454 - [CI][Debian] Update Debian to 12 from 11 (#40455)
  • GH-40495 - [GLib] Use G_DECLARE_DERIVABLE_TYPE() (#40497)
  • GH-40498 - [GLib] Remove arrow-glib/gobject-type.h (#40499)
  • GH-40507 - [C++][ORC] Upgrade ORC to 2.0.0 (#40508)
  • GH-40515 - [Java] Bump org.apache.maven dependencies from 3.3.9 to 3.8.7 (#40514)
  • GH-40522 - [Dev][Go] Add Dependabot configuration for Go (#40523)
  • GH-40536 - [CI] : Migrate remaining jobs away from self-hosted mac runners. (#40537)
  • GH-40540 - [CI][C++] Don't install FlatBuffers (#40541)
  • GH-40542 - [Dev][CI] Run pre-commit to all files (#40543)
  • GH-40544 - [Dev] Add cmake-format configuration to pre-commit (#40545)
  • GH-40549 - [Java] Revert bump org.apache.maven.plugins:maven-shade-plugin from 3.2.4 to 3.5.2 in /java (#40462)" (#41006)
  • GH-40551 - [Release][Docs] Improve documentation for patch Release process (#40552)
  • GH-40553 - [C#] Avoid logger instantiations per request (#40554)
  • GH-40573 - [GLib][Ruby][CSV] Add support for customizing timestamp parsers (#40590)
  • GH-40575 - [Docs][Python] Added JsonFileFormat to docs (#40585)
  • GH-40577 - [C++] Ensure pkg-config flags include -ldl for static builds (#40578)
  • GH-40586 - [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
  • GH-40607 - [C++] Rename Function::is_impure() to is_pure() (#40608)
  • GH-40621 - [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625)
  • GH-40630 - [Go][Parquet] Enable writing of Parquet footer without closing file (#40654)
  • GH-40659 - [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661)
  • GH-40680 - [Java] Test JDK 22 in CI (#41038)
  • GH-40684 - [Java][Docs] JNI module debugging with IntelliJ (#40685)
  • GH-40689 - [Docs] Add nanoarrow to implementation status page (#41052)
  • GH-40690 - [C#][FlightRPC] Add do_exchange csharp implementation (#40691)
  • GH-40695 - [C++] Expand Substrait type support (#40696)
  • GH-40698 - [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699)
  • GH-40720 - [Python] Simplify and improve perf of creation of the column names in Table.to_pandas (#40721)
  • GH-40731 - [C++][Parquet] Minor enhancement code of encryption (#40732)
  • GH-40733 - [Go] Require Go 1.21 or later (#40848)
  • GH-40745 - [Java][FlightRPC] Support configuring backpressure threshold (#41051)
  • GH-40767 - [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768)
  • GH-40783 - [C++] Re-order loads and stores in MemoryPoolStats update (#40647)
  • GH-40784 - [JS] Use bigIntToNumber (#40785)
  • GH-40791 - [Dev][CI] Use the official hadolint configuration (#40794)
  • GH-40796 - [Java] set lastSet in ListVector.setNull to avoid O(n²) in ListVectors with lots of nulls (#40810)
  • GH-40799 - [Doc][Format] Implementation status page should list canonical extension types (#41053)
  • GH-40801 - [Docs] Clarify device identifier documentation in the Arrow C Device data interface (#41101)
  • GH-40806 - [C++] Revert changes from PR #40857 (#40980)
  • GH-40806 - [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857)
  • GH-40814 - [C++] Thirdparty: bump zstd to 1.5.6 (#40837)
  • GH-40833 - [Docs][Release] Make explicit in the documentation that verifying binaries is not required in order to case a vote (#40834)
  • GH-40841 - [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842)
  • GH-40843 - [Java] Cleanup protobuf-maven-plugin usage (#40844)
  • GH-40866 - [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867)
  • GH-40872 - [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876)
  • GH-40882 - [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883)
  • GH-40888 - [Go][FlightRPC] support conversion from array.Duration in FlightSQL driver (#40889)
  • GH-40983 - [C++] Fix unused function build error (#40984)
  • GH-40994 - [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995)
  • GH-41034 - [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068)
  • GH-41043 - [CI][Python] check message in test_make_write_options_error for Cython 2 (#41059)
  • GH-41047 - [C#] Address performance issue of reading from StringArray (#41048)
  • GH-41098 - [Python] Add copy keyword in Array.array for numpy 2.0+ compatibility (#41071)
  • GH-41100 - [Python][Packaging] PyArrow wheel building is failing because of disabled vcpkg install of liblzma
  • GH-41227 - [CI][Release][GLib][Conda] Unpin gobject-introspection (#41228)
  • PARQUET-2423 - [C++][Parquet] Avoid allocating buffer object in RecordReader's SkipRecords (#39818)