layout: default title: Apache Arrow 13.0.0 Release permalink: /release/13.0.0.html

Apache Arrow 13.0.0 (23 August 2023)

This is a major release covering more than 2 months of development.

Download

Contributors

This release includes 608 commits from 108 distinct contributors.

$ git shortlog -sn apache-arrow-12.0.1..apache-arrow-13.0.0
    83	Sutou Kouhei
    47	Raúl Cumplido
    35	Nic Crane
    26	Joris Van den Bossche
    25	mwish
    24	Weston Pace
    20	sgilmore10
    19	Felipe Oliveira Carvalho
    17	Antoine Pitrou
    16	Alenka Frim
    15	Matt Topol
    15	rtpsw
    13	Igor Izvekov
    13	Jin Shang
    12	Dane Pitkin
    12	Kevin Gurney
    11	Alex Shcherbakov
    11	David Li
    11	Dewey Dunnington
     9	Gang Wu
     9	Jacob Wujciak-Jens
     8	Ben Harkins
     8	Herman Schaaf
     7	david dali susanibar arce
     6	Dominik Moritz
     6	Will Jones
     6	abandy
     5	Curt Hagenlocher
     5	Yevgeny Pats
     5	dependabot[bot]
     4	Li Jin
     4	Matthias Loibl
     4	Neal Richardson
     3	Bryce Mecum
     3	Jinpeng
     3	eitsupi
     2	Abe Tomoaki
     2	Aleksei Smirnov
     2	Benjamin Kietzman
     2	Chunchun Ye
     2	David Greiss
     2	Davide Pasetto
     2	Julien Jerphanion
     2	Junming Chen
     2	Laurent Goujon
     2	Michael Lui
     2	Simon Perkins
     2	Spencer Nelson
     2	henrymai
     2	liujiacheng777
     2	rtadepalli
     2	zhjwpku
     1	0x26res
     1	Adam Reeve
     1	Alexey Ozeritskiy
     1	Aljaž Mur Eržen
     1	Andrew Lamb
     1	Anja Kefala
     1	Arnaud Feldmann
     1	Austin Dickey
     1	Benson Muite
     1	Bryan Cutler
     1	Carlos O'Ryan
     1	Chenxi LI
     1	Chris Hoff
     1	Diana Sulmone
     1	Diogo Teles Sant'Anna
     1	Dirk Eddelbuettel
     1	Dongjoon Hyun
     1	Dr. Jan-Philip Gehrcke
     1	Elliott Brossard
     1	Erez Rokah
     1	Fokko Driesprong
     1	Francis
     1	Ian Cook
     1	Ivan Chesnov
     1	James Henderson
     1	June Liu
     1	Lei Hou
     1	Mark Wolfe
     1	Martin Traverse
     1	Mats Kindahl
     1	Matthew Roeschke
     1	Nick Byrne
     1	NoahFournier
     1	Parth Chonkar
     1	Philip
     1	Rok Mihevc
     1	Romain François
     1	Rong Ma
     1	Sergey Fedorov
     1	Sven Rebhan
     1	The Alchemist
     1	Theodore Tsirpanis
     1	Thor
     1	Toby Dylan Hocking
     1	Wenbo Hu
     1	candiduslynx
     1	clickingbuttons
     1	jeremyosterhoudt
     1	lord
     1	lriggs
     1	micah-white
     1	panbingkun
     1	ruoxi
     1	sunpeng
     1	takuya kodama
     1	wenxlan

Patch Committers

The following Apache committers merged contributed patches to the repository.

$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-12.0.1..apache-arrow-13.0.0
   155	Sutou Kouhei
    96	Antoine Pitrou
    62	Matt Topol
    44	Joris Van den Bossche
    44	Nic Crane
    34	David Li
    29	Raúl Cumplido
    27	Weston Pace
    16	Jacob Wujciak-Jens
    16	Will Jones
    13	Li Jin
     8	Dewey Dunnington
     7	Eric Erhardt
     6	Alenka Frim
     5	AlenkaF
     5	Dominik Moritz
     4	Benjamin Kietzman
     2	Andrew Lamb
     2	Kevin Gurney
     2	Matthew Topol
     1	Gang Wu
     1	Neal Richardson

Changelog

Apache Arrow 13.0.0 (2023-08-23 07:00:00)

Bug Fixes

  • GH-14969 - [R][Docs] Enable pkgdown built-in search (#36374)
  • GH-20385 - [C++][Parquet] Reject partial load of an extension type (#33634)
  • GH-23870 - [Python] Ensure parquet.write_to_dataset doesn't create empty files for non-observed dictionary (category) values (#36465)
  • GH-32832 - [Go] support building with tinygo (#35723)
  • GH-34017 - [Python][FlightRPC][Doc] Fix FlightStreamReader.read_chunk's docstring (#35583)
  • GH-34293 - [Java] Error loading native libraries on Windows (#34312)
  • GH-34338 - [Java] Removing the automatic enabling of BaseAllocator.DEBUG on -ea (#36042)
  • GH-34351 - [C++][Parquet] Statistics: add detail documentation and tiny optimization (#35989)
  • GH-34363 - [C++] Use equal size parts in S3 upload for R2 compatibility (#35808)
  • GH-34391 - [C++] Future as-of-join-node hangs on distant times (#34392)
  • GH-34523 - [C++] Avoid mixing bundled Abseil and system Abseil (#35387)
  • GH-34656 - [CI][Python] Use gemfury tool to upload wheels instead of curl to fix Windows wheel upload (#35032)
  • GH-34723 - [Java] Enable log trace for Netty allocator memory usage (#35314)
  • GH-34752 - [C++] Add support for LoongArch (#34740)
  • GH-34775 - [R] arrow_table: as.data.frame() sometimes returns a tbl and sometimes a data.frame (#35173)
  • GH-34884 - [Python] : Support pickling pyarrow.dataset PartitioningFactory objects (#36550)
  • GH-34884 - [Python] : Support pickling pyarrow.dataset Partitioning subclasses (#36462)
  • GH-34886 - [Python] Add correct array numpy conversion for Table and RecordBatch (#36242)
  • GH-34897 - [R] Ensure that the RStringViewer helper class does not own any Array references (#35812)
  • GH-34907 - [Docs][R] Version selector reports that release version is dev (#35103)
  • GH-35007 - [C++] Fix reading stdin (#35006)
  • GH-35015 - [Go] Fix parquet memleak (#35973)
  • GH-35027 - [Go] : Use base64.StdEncoding in FixedSizeBinaryBuilder Unmarshal (#35028)
  • GH-35053 - [Java] Fix MemoryUtil to support Java 21 (#36370)
  • GH-35059 - [C++] Fix “hash_count” for run-end encoded inputs (#35129)
  • GH-35101 - [C++] Update deprecated LOCATION target property in ArrowConfig.cmake.in (#35109)
  • GH-35107 - [FlightSQL] : Use uint8 to refer to 8 bit unsigned integers rather than uint1 (#35108)
  • GH-35118 - [Format][FlightSQL] More use int32 to refer to 32-bit integers rather than int (#35213)
  • GH-35118 - [FlightSQL] Use int32 to refer to 32-bit integers rather than int (#35120)
  • GH-35140 - [R] Rewrite configure script and ensure we don't use mismatched libarrow (#35147)
  • GH-35144 - [C++] Fix a unit test broken when the output order of the aggregate node changed (#35145)
  • GH-35177 - [Docs][Python] Suppress “WARNING: autosummary: failed to import serialize” (#35182)
  • GH-35179 - [C++] Fix IMPORTED_LOCATION property for Arrow::bundled_dependencies (#35196)
  • GH-35188 - [Go] Use AppendValueFromString for extension types in CSV Reader (#35189)
  • GH-35190 - [Go] Correctly handle null values in CSV reader (#35191)
  • GH-35193 - [Python][Packaging] Enable GCS on Windows wheels (#35255)
  • GH-35202 - [Go][Parquet] Fix panic reading nested empty list (#35276)
  • GH-35234 - [Go] Fix skip argument to Callers (#35231)
  • GH-35240 - [Go][FlightRPC] Fix crash in client middleware (#35241)
  • GH-35266 - [GLib][Parquet] Fix a GC bug that parent metadata reference is missing in sub metadata (#35286)
  • GH-35266 - [CI][GLib][Parquet] Omit gparquet_column_chunk_metadata_equal() test (#35278)
  • GH-35267 - [C#] Serialize TotalBytes and TotalRecords in FlightInfo (#35222)
  • GH-35270 - [C++] Use Buffer instead of raw buffer in hash join internals (#35347)
  • GH-35297 - [C++][IPC] Fix schema deserialization of map field (#35298)
  • GH-35306 - Fix Schema.Fields() to return copy of fields (#35307)
  • GH-35310 - [Go] Incorrect value decimal128 from string (#35311)
  • GH-35316 - [C++][FlightSQL] Use RowsToBatches() instead of ArrayFromJSON() in SQLite example server (#35322)
  • GH-35326 - [Go] Fix *array.List and *array.LargeList ValueOffsets implementation (#35327)
  • GH-35346 - [CI][Python] Move gdb from env-file to dockerfile (#35348)
  • GH-35352 - [Java] Fix issues with “semi complex” types. (#35353)
  • GH-35359 - [C++] FixedSizeListArray.flatten() errors if all elements are null (#35674)
  • GH-35360 - [C++] Take offset into account in ScalarHashImpl::ArrayHash() (#35814)
  • GH-35363 - [C++] Fix Substrait schema names and for segmented aggregation (#35364)
  • GH-35379 - [C++][FlightRPC] Add teardown needed checks to avoid crash on error (#35380)
  • GH-35383 - [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault (#35384)
  • GH-35406 - [Website][Docs] Missing logo on Arrow docs page
  • GH-35413 - [Python] Add concrete floating point array types to pyarrow public API (#35414)
  • GH-35421 - [Go] Ensure interface contract between array.X.ValueStr & array.XBuilder.AppendValueFromString (#35457)
  • GH-35425 - [R] Tests failures on R < 4.0 due to data.frame conversion (#35432)
  • GH-35438 - [Docs] Make corrections to the source docs (#35549)
  • GH-35445 - [R] Behavior something like group_by(foo) |> across(everything()) is different from dplyr (#35473)
  • GH-35448 - [C++] Fix detection of %z in strptime format (#35449)
  • GH-35468 - [C++] Fix Acero var/std for multiple batches (#35469)
  • GH-35483 - [CI][C++] Add header for snprintf for Windows (#35484)
  • GH-35490 - [Python] Interchange protocol: update tests for string and large_string (#35504)
  • GH-35501 - [C++] Fix error C2280 in MSVC (#35683)
  • GH-35503 - [CI][Packaging][C++] Snappy patch fails to apply on arm64 windows wheel builds (#35509)
  • GH-35521 - [C++] Hash null bitmap only if null count is 0 (#35522)
  • GH-35526 - [CI][C++] Fixing arrow::internal::IsNullRunEndEncoded redeclared (#35527)
  • GH-35528 - [Java] Fix RangeEqualsVisitor comparing BitVector with different begin index (#35525)
  • GH-35534 - [R] Ensure missing grouping variables are added to the beginning of the variable list (#36305)
  • GH-35539 - [C++] Remove use of internal header files from public header file (#35592)
  • GH-35553 - [JAVA] Fix unwrap() in NettyArrowBuf (#35554)
  • GH-35571 - [C++][CI][Parquet] Change EQ to FLOAT_EQ in Decryption tests (#35605)
  • GH-35573 - [Python] pa.FixedShapeTensorArray.to_numpy_ndarray fails on sliced arrays (#36164)
  • GH-35576 - [C++] Make Decimal{128,256}::FromReal more accurate (#35997)
  • GH-35588 - [Java] returning a constant hashCode for null values, resolves #35588 (#35590)
  • GH-35593 - [R] Confusing (NULL) results when using `[[` and `$` to try to extract columns from Datasets
  • GH-35596 - [C++][CI] Improve compilation caching with PCG (#35597)
  • GH-35599 - [Python] Canonical fixed-shape tensor extension array/type is not picklable. (#35933)
  • GH-35606 - [CI][C++][MinGW32] Use more accurate float inputs for decimal test (#35680)
  • GH-35617 - [Docs] Current n_buffers use in C API example (#35626)
  • GH-35618 - [C++][Doc] Improve doc for Datum (#35794)
  • GH-35633 - [R] R builds failing with error ‘Invalid: Timestamps already have a timezone: ‘UTC’. Cannot localize to ‘UTC’’ (#35671)
  • GH-35635 - [C++][CI] Preserve root when ignoring host on PathFromUriHelper to fix HDFS tests (#36063)
  • GH-35636 - [C++] Extract two expensive test suites from compute-vector-test (#36401)
  • GH-35649 - [R] Always call RecordBatchReader::ReadNext() from DuckDB from the main R thread (#36307)
  • GH-35651 - [C++] Suppress self-move warning introduced in gcc 13 (#36328)
  • GH-35651 - [C++] Don't use self-move with MinGW (#35653)
  • GH-35662 - [CI][C++][MinGW] Avoid crash in FormatTwoDigits() with release build (#35663)
  • GH-35665 - [C++][Parquet] DeltaLengthByteArrayEncoder::Put reserve too much space (#35670)
  • GH-35675 - [C++] Don't copy the ArraySpan into the REE ArraySpan (#35677)
  • GH-35681 - [Ruby] Add support for #select_columns with empty table (#35682)
  • GH-35684 - [Go][Parquet] Fix nil dereference with nil list array (#35690)
  • GH-35710 - [R] Followup improvements to new configure script (#36435)
  • GH-35712 - [C++][CI] MacOS Disable ASSERT_DEATH in arrow-array-test (#35724)
  • GH-35728 - [CI][Python] Move test_total_bytes_allocated to a subprocess to improve reliability (#36355)
  • GH-35733 - [Java] Fix minor type in IntervalMonthDayNanoVector ctor (#35734)
  • GH-35736 - [C++] Fix compile key_map_avx2.cc (#35737)
  • GH-35760 - [C++] C Data Interface helpers should also run checks in non-debug mode (#36215)
  • GH-35761 - [Go] Fix map comparison in TypeEqual (#35762)
  • GH-35763 - [Go] Fix TypeEqual for lists (#35764)
  • GH-35789 - [C++] Remove check_overflow from CumulativeSumOptions (#35790)
  • GH-35809 - [C#] Improvements to the C Data Interface (#35810)
  • GH-35819 - [GLib][Ruby] Refer dependency objects of GArrowExecutePlan (#35963)
  • GH-35833 - [C++] Add support for Abseil 20230125 (#35881)
  • GH-35837 - [C++] Acero will hang if StopProducing is called while backpressure is applied on the source node (#35902)
  • GH-35838 - [C++] Add backpressure test for asof join node (#35874)
  • GH-35838 - [C++] Fix asof join backpresure (#35878)
  • GH-35853 - [Python] Fix deprecation warnings from NumPy NEP50 (#35854)
  • GH-35858 - [Python] Fixup linting from PR GH-36011 (#36046)
  • GH-35858 - [Python] disallow none schema parquet writer (#36011)
  • GH-35859 - [Python] Actually change the default row group size to 1Mi (#36012)
  • GH-35866 - [Go] Provide a copy in arrow.NestedType.Fields() implementations (#35867)
  • GH-35868 - [C++] Occasional TSAN failure on asof-join-node-test (#35904)
  • GH-35869 - [R][Release] U ndefined symbol _ZN5arrow6Status14AddContextLineEPKciS2_ on test-r-devdocs on maintenance branch for 12.0.1
  • GH-35870 - [C++] Add support for changing optimization flags with CMAKE_CXX_FLAGS_DEBUG (#35924)
  • GH-35891 - [Doc][Python] Update link to Parquet C++ repository (#35892)
  • GH-35911 - [Go] Fix method CastToBytes of decimal256Traits (#35912)
  • GH-35943 - [Dev] Ensure link issue works when PR body is empty (#36460)
  • GH-35948 - [Go] Only cast int8 and unit8 to float64 when JSON marshaling arrays (#35950)
  • GH-35952 - [R] Ensure that schema metadata can actually be set as a named character vector (#35954)
  • GH-35960 - [Java] Detect overflow in allocation (#36185)
  • GH-35965 - [Go] Fix Decimal256DictionaryBuilder (#35966)
  • GH-35982 - [Go] Fix go1.18 broken builds (#35983)
  • GH-35988 - [C#] The C data interface implementation can leak on import (#35996)
  • GH-36003 - [Packaging][RPM] RPM jobs have a duplicated artifact pattern (#36004)
  • GH-36013 - [C++] Disabling bundled OpenTelemetry with Protobuf 3.22+ (#36016)
  • GH-36052 - [Go][Parquet] Cross build failures for 386 (#36066)
  • GH-36053 - [C++] summarizing a variable results in NA at random, while there is no NA in the subset of data (#36368)
  • GH-36076 - [C++] Remove deprecated cli flag (#36077)
  • GH-36082 - [Release] Do nothing deb bump minor/patch version by post-11-bump-versions.sh on main (#36083)
  • GH-36090 - [C++] Add testing libraries for Acero & Datasets (#36206)
  • GH-36117 - [C++] Ensure creating BUILD_OUTPUT_ROOT_DIRECTORY (#36160)
  • GH-36121 - [R] Warn for set_io_thread_count() with num_threads < 2 (#36304)
  • GH-36168 - [C++][Python] Support halffloat for Arrow list to pandas (#35944)
  • GH-36172 - [R] Windows devdocs build failing as it uses libarrow built without JSON capabilities (#36174)
  • GH-36176 - [C++] Fix regression for single-key Table sorting (#36179)
  • GH-36182 - [Gandiva][C++] Fix substring_index function when index is negative. (#36184)
  • GH-36200 - [CI][Docs] Avoid “No space left on device” (#36230)
  • GH-36201 - [Python][CI] test_total_bytes_allocated fails on arm64 wheels for manylinux
  • GH-36209 - [Java] Upgrade Netty due to security vulnerability (#36211)
  • GH-36214 - [C++] Specify FieldPath::Hash as template parameter where possible (#36222)
  • GH-36224 - [CI] Update rest api invocations in GitHub scripts (#36225)
  • GH-36239 - [CI][C++] Add support for multiple flags for ARROW_FLAGS (#36281)
  • GH-36245 - [C++] Compile errors with gcc 13
  • GH-36257 - [CI][Dev][Archery] bot requires pygithub 1.59.0 or later (#36467)
  • GH-36259 - [R] Docs for as_schema description incorrect (#36260)
  • GH-36311 - [C++] Fix integer overflows in utf8_slice_codeunits (#36575)
  • GH-36327 - [C++][CI] Fix Valgrind failures (#36461)
  • GH-36329 - [C++][CI] Use OpenSSL 3 on macOS (#36336)
  • GH-36331 - [C++][CI] Sporadic errors in AsofJoinTest (#36356)
  • GH-36340 - [Java] Address race condition in allocator logger thread (#36341)
  • GH-36346 - [C++] Safe S3 finalization (#36442)
  • GH-36349 - [Python][CI] Avoid using ‘build/etc/localtime’ timezone in hypothesis tests (#36391)
  • GH-36352 - [Python] Add project_id to GcsFileSystem options (#36376)
  • GH-36353 - [R] Fix package version references to be text only and never numeric (#36364)
  • GH-36369 - [C++][FlightRPC] Fix a hang bug in FlightClient::Authenticate*() (#36372)
  • GH-36396 - [R] Non-existent functions called in array tests (#36397)
  • GH-36404 - [CI][C++][Gandiva] Crash tests for JNI build on arm64 macOS
  • GH-36446 - [C++] Minor style improvements in ConcatenateImpl (#36463)
  • GH-36447 - [C++][CI] arrow-s3fs-test fails on some nightly jobs
  • GH-36448 - [C++][CI] vcpkg nightly job fails to build scalar_test.cc
  • GH-36449 - [C++][CI] Don't use -g1 for Python jobs (#36453)
  • GH-36451 - [CI][C++] Fix compilation failure on Fedora 35 (#36457)
  • GH-36452 - [CI][C++] Test C++20 support with compatible compiler (#36454)
  • GH-36456 - [R] Link to correct version of OpenSSL when using autobrew (#36551)
  • GH-36475 - [C++][CI] Fix Flight feature verification (#36473)
  • GH-36476 - [C++][FlightRPC] Fix uninitialized fields in FlightInfo (#36484)
  • GH-36477 - [CI][macOS] Ignore brew update failure on crossbow tasks (#36478)
  • GH-36482 - [C++][CI] Fix sporadic test failures in AsofJoinBasicTest (#36499)
  • GH-36498 - [Python][CI] Hypothesis nightly test fails with pytz.exceptions.UnknownTimeZoneError: ‘Factory’ (#36508)
  • GH-36500 - [CI][Java][JAR] Remove Homebrew's protobuf (#36515)
  • GH-36501 - [CI][Java][JAR] Ensure removing Homebrew's gRPC packages (#36516)
  • GH-36523 - [C++] Fix TSan-detected lock ordering issues in S3 (#36536)
  • GH-36524 - [GLib] Suppress a pessimizing-move warning (#36531)
  • GH-36537 - [Python] Ensure dataset writer follows default Parquet version of 2.6 (#36538)
  • GH-36543 - [CI][Docs] Use -g1 instead of -g for building docs (#36576)
  • GH-36598 - [C++][MinGW] Fix build failure with Protobuf 23.4 (#36606)
  • GH-36629 - [CI][Python] Skip dask tests due to our non-nanosecond changes in arrow->pandas conversion (#36630)
  • GH-36641 - [C++] Remove reference to acero from non-acero file (#36650)
  • GH-36659 - [Python] Fix pyarrow.dataset.Partitioning.eq when comparing with other type (#36661)
  • GH-36669 - [Go] Guard against garbage in C Data structures (#36670)
  • GH-36686 - [C++] Pass CMAKE_OSX_SYSROOT to external projects (#36706)
  • GH-36687 - [R] Add correct branch name to autobrew formulae to facilitate local testing (#36689)
  • GH-36707 - [C++] Use ARROW_PACKAGE_PREFIX for OPENSSL_ROOT_DIR too (#36710)
  • GH-36812 - [C#] Fix C API support to work with .NET desktop framework (#36813)
  • GH-36832 - [Packaging][RPM] Remove needless Requires (#36833)
  • GH-36892 - [C++] Fix performance regressions in FieldPath::Get (#37032)
  • GH-36913 - [C++] Skip empty buffer concatenation to fix UBSan error (#36914)
  • GH-36928 - [Java] Make it run well with the netty newest version 4.1.96 (#36926)
  • GH-36969 - [R] Disable GCS by default when doing a bundled build on gcc-13 (#37147)
  • GH-37019 - [R] Documentation for read_parquet() et al needs updating (#37020)
  • GH-37197 - [Java][CI][Packaging] Free some disk space on the java-jars GitHub job (#37198)
  • GH-37201 - [CI][Packaging][Java] java-jars job fail on macOS aarch_64

New Features and Improvements

  • GH-14790 - [Dev] Avoid extra comment with Closes issue id on PRs (#35811)
  • GH-14946 - [C++] Add flattening FieldPath/FieldRef::Get methods (#35197)
  • GH-15187 - [Java] Made reader initialization lazy and added new getTransferPair() function that takes in a Field type (#34424)
  • GH-18547 - [Java] Support re-emitting dictionaries in ArrowStreamWriter (#35920)
  • GH-20047 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Windows (#35792)
  • GH-21761 - [Python] accept pyarrow scalars in array constructor (#36162)
  • GH-26153 - [C++] Share common codes for RecordBatchStreamReader and StreamDecoder (#36344)
  • GH-29781 - [C++][Parquet] Switch to use compliant nested types by default (#35146)
  • GH-29887 - [C++] Implement dictionary array sorting (#35280)
  • GH-31521 - [C++][Flight] Migrate Flight SQL client to Result (#36559)
  • GH-32190 - [C++][Compute] Implement cumulative prod, max and min functions (#36020)
  • GH-32282 - [R] Update case_when() binding to match changes in dplyr (#35502)
  • GH-32335 - [C++][Docs] Add design document for Acero (#35320)
  • GH-32605 - [C#] Extend validity buffer api (#35342)
  • GH-32605 - [C#] Extend ArrowBuffer.BitmapBuilder to improve performance of array concatenation (#13810)
  • GH-32739 - [CI][Docs] Document Docs PR Preview (#35614)
  • GH-32763 - [C++] Add FromProto for fetch & sort (#34651)
  • GH-33206 - [C++] Add support for StructArray sorting and nested sort keys (#35727)
  • GH-33321 - [Python] Support converting to non-nano datetime64 for pandas >= 2.0 (#35656)
  • GH-33517 - [C++][Flight] Exercise UCX on CI (#14667)
  • GH-33804 - [Python] Add support for manylinux_2_28 wheel (#34818)
  • GH-33854 - [MATLAB] Add basic libmexclass integration code to MATLAB interface (#34563)
  • GH-33856 - [C#] Implement C Data Interface for C# (#35496)
  • GH-33980 - [Docs][Python] Document DataFrame Interchange Protocol implementation and usage (#35835)
  • GH-33987 - [R] Support new dplyr .by/by argument (#35667)
  • GH-34216 - [Python] Support for reading JSON Datasets With Python (#34586)
  • GH-34223 - [Java] Java Substrait Consumer JNI call to ACERO C++ (#34227)
  • GH-34375 - [C++][Parquet] Ignore page header stats when page index enabled (#35455)
  • GH-34386 - [C++] Add a PathFromUriOrPath method (#34420)
  • GH-34436 - [R] Bindings for JSON Dataset (#35055)
  • GH-34509 - [C++][Parquet] Improve docstrings for ArrowReaderProperties::batch_size (#36486)
  • GH-34722 - [C++][Parquet] Minor: Update wording of Parquet NextPage (#35368)
  • GH-34729 - [C++][Python] Enhanced Arrow<->Pandas map/pydict support (#34730)
  • GH-34749 - [Java] Make Zstd compression level configurable (#34873)
  • GH-34787 - [Python] Accept zero_copy_only=False for ChunkedArray.to_numpy (#35582)
  • GH-34788 - [Python][Packaging][CI] Drop Python 3.7 support (#36061)
  • GH-34852 - [C++][Go][Java][FlightRPC] Add support for ordered data (#35178)
  • GH-34858 - [Swift] Initial reader impl (#34842)
  • GH-34868 - [Python] Share docstrings between classes (#34894)
  • GH-34911 - [C++] Add first and last aggregator (#34912)
  • GH-34918 - [C++] Update vendored double-conversion 3.2.1 (#34919)
  • GH-34921 - [C++][Python][Java] Require CMake 3.16 or later (#35921)
  • GH-34949 - [C++][Parquet] Enable page index by columns (#35230)
  • GH-34971 - [Format] Add non-CPU version of C Data Interface (#34972)
  • GH-34979 - [Python] Create a base class for Table and RecordBatch (#34980)
  • GH-35004 - [C++] Remove RelationInfo (#35005)
  • GH-35033 - [Java][Datasets] Add support for multi-file datasets from Java (#35034)
  • GH-35035 - [R] Implement names<- for Schemas (#35172)
  • GH-35067 - [JavaScript] toString for signed BigNums (#35067)
  • GH-35084 - [Docs][Format] Add how to change format specification (#35174)
  • GH-35099 - [CI][Packaging] Upgrade vcpkg to 2023.04.15 Release (#35430)
  • GH-35112 - [Python] Expose keys_sorted in python MapType (#35113)
  • GH-35124 - [C++] Avoid unnecessary copy when outputting join result (#35114)
  • GH-35125 - [C++][Acero] Add a self-defined io-executor in QueryOptions (#35464)
  • GH-35130 - [Docs] Document how to become a collaborator to get triage role (#36445)
  • GH-35134 - [C++] Add arrow_vendored namespace around double-conversion library (#35135)
  • GH-35136 - [Go][FlightSQL] Support backends without CreatePreparedStatement implemented (#35137)
  • GH-35162 - [Go] Float16 arithmetic (#35163)
  • GH-35164 - [Go] Additional methods for decimal data types (#35165)
  • GH-35168 - [CI][Packaging][Conan] Merge upstream changes (#35169)
  • GH-35171 - [C++][Parquet] Implement CRC for data page v2 (#35242)
  • GH-35180 - [R] Implement bindings for cumsum function (#35339)
  • GH-35212 - [Go] Add ability to show full call stack with ARROW_CHECKED_MAX_RETAINED_FRAMES (#35215)
  • GH-35228 - [C++][Parquet] Minor: Comment typo fixing in Parquet Reader (#35229)
  • GH-35245 - [Java][Dataset][Linux] Enable GCS (#35246)
  • GH-35247 - [C++] Add Arrow Substrait support for stddev/variance (#35249)
  • GH-35250 - [Python] Add test for datetime column conversion to pandas (#35546)
  • GH-35256 - [Go] Add ToMap to Metadata (#35257)
  • GH-35264 - [Python] Interchange protocol: test clean-up (#35530)
  • GH-35275 - [Java] Ensure VectorSchemaRoot slice returns a new root (#35476)
  • GH-35279 - [C++][Parquet] Tools: enhancement Parquet print stats (#35262)
  • GH-35282 - [C++] auto enable brotli when enable fuzzing (#35283)
  • GH-35290 - [JS] Update dependencies (#35291)
  • GH-35302 - [Go] Improve unsupported type error message in pqarrow (#35303)
  • GH-35304 - [C++][ORC] Support attributes conversion (#35499)
  • GH-35315 - [C++][CMake] Add presets for Flight SQL (#35317)
  • GH-35335 - [Python][Docs] Fix docstring of map_ (#35336)
  • GH-35361 - [C++] Remove Perl dependency from cpp/build-support/run-test.sh (#35362)
  • GH-35375 - [C++][FlightRPC] Add arrow::flight::ServerCallContext::incoming_headers() (#35376)
  • GH-35377 - [C++][FlightRPC] Add a ServerCallContext parameter to arrow::flight::ServerAuthHandler methods (#35378)
  • GH-35390 - [Python] Consolidate some APIs in Table and RecordBatch (#35396)
  • GH-35400 - [R] Import download.file from utils (#35401)
  • GH-35403 - [Docs] Support sphinx 6 for building the docs (#36296)
  • GH-35411 - [MATLAB] Create a templated C++ Proxy Class for Numeric Arrays (#35479)
  • GH-35415 - [Python] RecordBatch string reprsentation includes column preview (#35416)
  • GH-35417 - [GLib] Add GArrowRunEndEncodedDataType (#36444)
  • GH-35418 - [GLib] Add GArrowRunEndEncodedArray (#36470)
  • GH-35435 - [Ruby][Flight] Add ArrowFlight::Client#authenticate_basic (#35436)
  • GH-35442 - [C++][FlightRPC] Pass ServerCallContext instead of CallHeaders to ServerMiddlewareFactory::StartCall() (#35454)
  • GH-35480 - [MATLAB] Add abstract MATLAB base class called arrow.array.Array (#35491)
  • GH-35482 - [Go] Append nulls to values in array.FixedSizeListBuilder.AppendNull (#35481)
  • GH-35485 - [CI][Python] Archery formats Python C++ codebase (#35487)
  • GH-35489 - [MATLAB] Add CMake build directory to MATLAB .gitignore (#35493)
  • GH-35492 - [MATLAB] : Add arrow.array.Float32Array MATLAB Class (#35495)
  • GH-35500 - [C++][Go][Java][FlightRPC] Add support for result set expiration (#36009)
  • GH-35506 - [C++] Support First and Last aggregators in Substrait (#35513)
  • GH-35511 - [C++] Util: add memory_pool in SwapEndianArrayData (#36431)
  • GH-35515 - [C++][Python] Add non decomposable aggregation UDF (#35514)
  • GH-35516 - [R] Add 11.0.0.3 to backwards compatibility matrix (#35517)
  • GH-35537 - [MATLAB] Create shared test class utility for numeric arrays (#35556)
  • GH-35542 - [R] Implement schema extraction function (#35543)
  • GH-35545 - [R] Re-organise reference page on pkgdown site (#36171)
  • GH-35550 - [MATLAB] Add public toMATLAB method to arrow.array.Array for converting to MATLAB types (#35551)
  • GH-35557 - [MATLAB] Add unsigned integer array MATLAB classes (i.e. UInt8Array, UInt16Array, UInt32Array, UInt64Array) (#35562)
  • GH-35558 - [MATLAB] Add signed integer array MATLAB classes (i.e. Int8Array, Int16Array, Int32Array, Int64Array) (#35561)
  • GH-35579 - [C++] Support non-named FieldRefs in Parquet scanner (#35798)
  • GH-35598 - [MATLAB] Add a public Valid property to to the MATLAB arrow.array.<Array> classes to query Null values (i.e. validity bitmap support) (#35655)
  • GH-35601 - [R][Documentation] Add missing docs to fileysystem.R (#35895)
  • GH-35607 - [C++] Support simple Substrait aggregate extensions (#35608)
  • GH-35609 - [Docs] Enable the build of subsections of the documentation (#35610)
  • GH-35611 - [C++] Remove unnecessary safe operations for ListBuilder and BinaryBuilder (#35613)
  • GH-35652 - [Go][Compute] Allow executing Substrait Expressions using Go Compute (#35654)
  • GH-35659 - [Swift] Initial Swift IPC writer (#35660)
  • GH-35669 - [C++] Update to double-conversion 3.3.0, activate new flags, remove patches (#36002)
  • GH-35676 - [MATLAB] Add an InferNulls name-value pair for controlling null value inference during construction of arrow.array.Array (#35827)
  • GH-35686 - [Go] Add AppendTime to TimestampBuilder (#35687)
  • GH-35693 - [MATLAB] Add Valid as a name-value pair on the arrow.array.Float64Array constructor (#35977)
  • GH-35705 - [R] Rename docs page from acero (#36107)
  • GH-35706 - [CI] Set minimal permissions on pr_review_trigger.yml (#35708)
  • GH-35709 - [R][Documentation] Document passing data to duckdb for windowed aggregates (#35882)
  • GH-35711 - [Go] Add Value and GetValueIndex methods to some builders (#35744)
  • GH-35729 - [C++][Parquet] Implement batch interface for BloomFilter in Parquet (#35731)
  • GH-35746 - [Parquet][C++][Python] Switch default Parquet version to 2.6 (#36137)
  • GH-35749 - [C++] Handle run-end encoded filters in compute kernels (#35750)
  • GH-35752 - [CI][GLib][Ruby] Pass GITHUB_ACTIONS environment variable to Docker containers (#35753)
  • GH-35754 - [CI][GLib] Don't build static C++ libraries (#35755)
  • GH-35757 - [C++][Parquet] using page-encoding-stats to build encodings (#35758)
  • GH-35765 - [C++] Split vector_selection.cc into more compilation units (#35751)
  • GH-35779 - [R][Documentation] Document workaround for window-like functionality (#35702)
  • GH-35783 - [JS] Update dependencies (#35784)
  • GH-35786 - [C++] Add pairwise_diff function (#35787)
  • GH-35788 - [Swift] bug fixes and change reader/writer to user Result type (#35774)
  • GH-35803 - [Doc] Add columns to the Implementation Status tables for Swift (#35862)
  • GH-35817 - [Docs][C++] Fix value_counts/unique doc about null handling (#35818)
  • GH-35828 - [Go] Add array.WithUnorderedMapKeys option for array.ApproxEqual (#35823)
  • GH-35847 - [C++][Thirdparty] Bump xxhash version to v0.8.1 (#35849)
  • GH-35871 - [Go] Account for struct validity bitmap in array.ApproxEqual (#35872)
  • GH-35879 - [C++] Bump bundled google-cloud-cpp to 2.12.0 (#36119)
  • GH-35906 - [Docs] Enable building the documentation without having pyarrow installed (#35907)
  • GH-35909 - [Go] Deprecate arrow.MapType.ValueField & arrow.MapType.ValueType methods (#35899)
  • GH-35914 - [MATLAB] Integrate the latest libmexclass changes to support error-handling (#35918)
  • GH-35915 - [Ruby] Add support for converting function options from Hash automatically (#35927)
  • GH-35922 - [C++] Drop support for Debian GNU/Linux buster (10) (#35923)
  • GH-35926 - [C++][Parquet] Allow disabling ColumnIndex by disabling statistics (#35958)
  • GH-35935 - [C++] Clean interruption of a Acero plan with use_threads=false (#35953)
  • GH-35949 - [R] CSV File reader options class objects should print the selected values (#35955)
  • GH-35961 - [C++][FlightSQL] Accept Protobuf 3.12.0 or later (#35962)
  • GH-35969 - [Swift] use ArrowType instead of ArrowType.info and add binary, time32 and time64 types (#35985)
  • GH-35974 - [Go] Don't panic if importing C Array Stream fails (#35978)
  • GH-35975 - [Go] Support importing decimal256 (#35981)
  • GH-35979 - [C++] Refactor Acero scalar and hash aggregation into separate files (#35980)
  • GH-35984 - [MATLAB] Add null support to all numeric array classes (#36039)
  • GH-35987 - [C++] Unpin brew protobuf version (#36087)
  • GH-35987 - [C++] Pin brew protobuf version to 21 (#36029)
  • GH-35990 - [CI][C++][Windows] Don't use -l for “choco list” (#35991)
  • GH-36006 - [Packaging][RPM] Add support for Amazon Linux 2023 (#36081)
  • GH-36008 - [Ruby][Parquet] Add Parquet::ArrowFileReader#each_row_group (#36022)
  • GH-36014 - [Go] Allow duplicate field names in structs (#36015)
  • GH-36023 - [CI][Ruby][Release] Suppress meaningless progress log from verify-rc-ruby (#36024)
  • GH-36025 - [JS] Allow Node.js 18.14 or later in verify-release-candidate.sh (#36089)
  • GH-36031 - [JS] : Update dependencies (#36032)
  • GH-36033 - [JS] Remove BigInt compat (#36034)
  • GH-36038 - [Python] Implement reduce on ExtensionType class (#36170)
  • GH-36040 - [MATLAB] Add arrow.array.BooleanArray class (#36041)
  • GH-36045 - [Python] Improve usability of pc.map_lookup / MapLookupOptions (#36387)
  • GH-36047 - [C++][Compute] Add support for duration types to IndexIn and IsIn (#36058)
  • GH-36050 - [Docs][C] Fix memory leak in C export documentation (#36051)
  • GH-36055 - [JS] Use Node.js 18 in CI (#36147)
  • GH-36056 - [CI] Enable Dependabot for GitHub Actions (#36194)
  • GH-36059 - [C++][Compute] Reserve space for hashtable for scalar lookup functions (#36067)
  • GH-36070 - [Go][Flight] Add Flight Client Cookie Middleware (#36071)
  • GH-36072 - [MATLAB] Add MATLAB arrow.tabular.RecordBatch class (#36190)
  • GH-36074 - [C++] Clarify docs for ConcatenateTablesOptions::field_merge_options (#36075)
  • GH-36092 - [C++] Simplify concurrency in as-of-join node (#36094)
  • GH-36095 - [Go] Add doc for pqarrow.FileWriter.WriteBuffered (#36163)
  • GH-36096 - [Python] Call from_arrow in Array.to_pandas (#36314)
  • GH-36098 - [MATLAB] Change C++ proxy constructors to accept an options struct instead of a cell array containing the arguments (#36108)
  • GH-36105 - [Go] Support float16 in csv (#36106)
  • GH-36109 - [MATLAB] Store a nullptr as the validity bitmap if all array elements are valid (#36114)
  • GH-36120 - [C#] Support schema metadata through the C API (#36122)
  • GH-36128 - [C++][Compute] Allow multiplication between duration and all integer types (#36231)
  • GH-36129 - [Python] Consolidate common APIs in Table and RecordBatch (#36130)
  • GH-36131 - [Docs] Use https://arrow.apache.org/julia/ for Julia URL (#36156)
  • GH-36141 - [Go] Support large and fixed types in csv (#36142)
  • GH-36151 - [Java] Add volatile declaration to keyPosition in ParallelSearcher (#36152)
  • GH-36157 - [C++][Dev] Add support for using python3 to run IWYU (#36159)
  • GH-36166 - [C++][MATLAB] Add utility to convert UTF-8 strings to UTF-16 and UTF-16 strings to UTF-8 (#36167)
  • GH-36173 - [C++] Add lone high and low code-point test case for UTF8StringToUTF16 (#36383)
  • GH-36177 - [MATLAB] Add the Type object hierarchy to the MATLAB interface (#36210)
  • GH-36178 - [C++] support prefetching for ReadRangeCache lazy mode (#36180)
  • GH-36181 - [Go] add methods AppendNulls and AppendEmptyValues for all builders (#36145)
  • GH-36198 - [Go] Remove deprecated equality checks (#36169)
  • GH-36203 - [C++] Support casting in both ways for is_in and index_in (#36204)
  • GH-36207 - [MATLAB] Add MATLAB autosave files (.asv) to the .gitignore (#36208)
  • GH-36212 - [MATLAB] Update README.md to mention support for arrow.array.Array classes (#36213)
  • GH-36217 - [MATLAB] Add arrow.array.TimestampArray (#36333)
  • GH-36218 - [CI][Go] Run benchmark steps only on the main branch (#36229)
  • GH-36218 - [CI][Go] Run benchmark steps only on the main branch (#36219)
  • GH-36220 - [CI] Run the “Docker Push” step only on the main branch (#36221)
  • GH-36227 - [C++] New GcsOption to set the project id (#36228)
  • GH-36232 - [Packaging][Ubuntu] Drop support for Ubuntu 22.10 (kinetic) (#36237)
  • GH-36233 - [Packaging][Ubuntu] Add support for Ubuntu 23.04 (lunar) (#36238)
  • GH-36234 - [Packaging][Debian] Add support for Debian GNU/Linux trixie (13) (#36285)
  • GH-36241 - [Packaging] Drop support for Amazon Linux 2 (#36282)
  • GH-36243 - [Dev] Remove PR workflow label as part of merge (#36244)
  • GH-36249 - [MATLAB] Create a MATLAB_ASSIGN_OR_ERROR macro to mirror the C++ ARROW_ASSIGN_OR_RAISE macro (#36273)
  • GH-36250 - [MATLAB] Add arrow.array.StringArray class (#36366)
  • GH-36251 - [MATLAB] Add Type property to arrow.array.Array (#36270)
  • GH-36252 - [Python] Add non decomposable hash aggregate UDF (#36253)
  • GH-36255 - [C++] Add benchmarks for “if_else” kernel on lists (#36256)
  • GH-36264 - [R] Add scalar() function (#36265)
  • GH-36271 - [R] Split out R6 classes and convenience functions (#36394)
  • GH-36284 - [Python][Parquet] Support write page index in Python API (#36290)
  • GH-36287 - [Ruby] Add support for installing arrow-c-glib conda package automatically (#36288)
  • GH-36293 - [C++] Use ipc_write_options.memory_pool for compressed buffer and shrink after compression (#36294)
  • GH-36297 - [C++][Parquet] Benchmark for non-binary dict encoding (#36298)
  • GH-36299 - [R][CI] Remove pkgdown check CI step (#36300)
  • GH-36309 - [C++] Add ability to cast between scalars of list-like types (#36310)
  • GH-36317 - [C++] Return a BufferVector from CleanListOffsets (#36316)
  • GH-36319 - [Go][Parquet] Improved row group writer error messages (#36320)
  • GH-36337 - [Ruby] Relax required Apache Arrow C++ version (#36338)
  • GH-36342 - [C++] Add missing move semantic to RecordBatch (#36343)
  • GH-36345 - [C++] Prefer TypeError over Invalid in IsIn and IndexIn kernels (#36358)
  • GH-36359 - [MATLAB] Add support for Timestamp arrays to RecordBatch (#36361)
  • GH-36367 - [C++] Add a zipped range utility (#36393)
  • GH-36375 - [Java] Added creating MapWriter in ComplexWriter. (#36351)
  • GH-36380 - [R] Create convenience function arrow_array (#36381)
  • GH-36384 - [Go] Schema: NumFields (#36365)
  • GH-36402 - [CI][macOS] Ignore brew update failure (#36403)
  • GH-36405 - [C++][ORC] Upgrade ORC to 1.9.0 (#36406)
  • GH-36407 - [C++] Add arrow::ipc::Listener::OnSchemaDecoded(schema, filtered_schema) (#36533)
  • GH-36408 - [GLib][FlightSQL] Add support for INSERT/UPDATE/DELETE (#36409)
  • GH-36414 - [C++] Add missing type_traits.h predicate: is_var_length_list() (#36415)
  • GH-36421 - [Java] Enable Support for reading JSON Datasets (#36422)
  • GH-36423 - [C++][Compute] Support “or” in Expression::IsSatisfiable (#36424)
  • GH-36450 - [CI][Python] Upload wheel artifacts for Windows (#36466)
  • GH-36479 - [C++][FlightRPC] Use gRPC version detected by find_package() (#36581)
  • GH-36483 - [C++] Make UTF8StringToUTF16 and UTF16StringToUTF8 accept string_views (#36485)
  • GH-36492 - [CI][Python] Add Ubuntu 22.04 nightly build (#36480)
  • GH-36513 - [Dev][C#] Add Dependabot configuration for NuGet (#36514)
  • GH-36541 - [Python][CI] Fixup nopandas build after merge of GH-33321 (#36586)
  • GH-36541 - [Python][CI] Ensure the “Without pandas” CI build has no pandas installed (don't install doc requirements in conda-python image) (#36542)
  • GH-36544 - [Swift] Add/change some init methods to public access (#36545)
  • GH-36553 - [Python] Improve error message if certain submodule (cython or cpp) is not built (#36554)
  • GH-36556 - [CI][C++] Enable S3 in Valgrind build (#36579)
  • GH-36560 - [MATLAB] Remove the DeepCopy name-value pair from arrow.array.<Numeric>Array constructors (#36561)
  • GH-36568 - [Go] Include Timestamp Zone in ValueStr (#36569)
  • GH-36577 - [Dev][C#] Use version-update:semver-major for some packages (#36578)
  • GH-36582 - [CI][C++][Homebrew] Backport the latest formula changes (#36583)
  • GH-36599 - [MATLAB] Bump libmexclass version to 3465900 (#36600)
  • GH-36744 - [Python][Packaging] Add upper pin for cython<3 to pyarrow build dependencies (#36743)
  • GH-36746 - [R] Update NEWS.md for 12.0.1.1 release (#36747)
  • GH-36756 - [CI][Python] Install Cython < 3.0 on verify-release-candidate script (#36757)
  • GH-36805 - [R] Update NEWS.md for 13.0.0 (#36806)
  • GH-36839 - [CI][Docs] Update test-ubuntu-default-docs to use GitHub actions instead of Azure (#36840)
  • GH-36947 - [CI] Move free up disk space to the Jinja macros to be able to reuse it on docs job (#36948)
  • PARQUET-2316 - [C++] Allow partial PreBuffer in the parquet FileReader (#36192)
  • PARQUET-2323 - [C++] Use bitmap to store pre-buffered column chunks (#36649)