layout: default title: Apache Arrow 11.0.0 Release permalink: /release/11.0.0.html

Apache Arrow 11.0.0 (26 January 2023)

This is a major release covering more than 3 months of development.

Download

Contributors

This release includes 516 commits from 95 distinct contributors.

$ git shortlog -sn apache-arrow-10.0.0..apache-arrow-11.0.0
    83	Sutou Kouhei
    35	Matt Topol
    28	Raúl Cumplido
    25	Dewey Dunnington
    21	Alenka Frim
    21	Antoine Pitrou
    20	Jacob Wujciak-Jens
    17	David Li
    17	Miles Granger
    16	Weston Pace
    15	Joris Van den Bossche
    15	Will Jones
    14	Nic Crane
    10	Neal Richardson
    10	Vibhatha Lakmal Abeykoon
     9	rtpsw
     8	eitsupi
     7	Ben Harkins
     7	Jin Shang
     6	Alessandro Molina
     6	Bryce Mecum
     6	Fatemah Panahi
     6	Gang Wu
     6	Larry White
     6	mwish
     5	gf2121
     4	David Sisson
     4	Hirokazu SUZUKI
     4	LouisClt
     3	0x26res
     3	Rok Mihevc
     3	h-vetinari
     2	Austin Dickey
     2	Benson Muite
     2	Jonathan Keane
     2	Kshiteej K
     2	Libor Ryšavý
     2	Nikita Eshkeev
     2	Percy Camilo Triveño Aucahuasi
     2	Sasha Krassovsky
     2	Todd Farmer
     2	Yibo Cai
     2	buaazhwb
     2	dependabot[bot]
     2	lafiona
     1	0xflotus
     1	André Kohn
     1	Anja Kefala
     1	Benjamin Kietzman
     1	Daniel Sullivan
     1	Danielle Navarro
     1	Dean Attali
     1	Dhulkifli Hussein
     1	Dominik Moritz
     1	Dongjoon Hyun
     1	Dr. Jan-Philip Gehrcke
     1	ElenaHenderson
     1	Felipe Oliveira Carvalho
     1	Frederick Jansen
     1	Hadley Wickham
     1	Ian Cook
     1	JacekPliszka
     1	JiaKe
     1	Jianshen Liu
     1	Jonas Haag
     1	Joost Hoozemans
     1	Julien Roncaglia
     1	Kae S
     1	Kazuaki Ishizaki
     1	Kyle Barron
     1	Laurent Quérel
     1	Lionel Henry
     1	Mark Schreiber
     1	Matti Picus
     1	Noah Treuhaft
     1	Paul Taylor
     1	Pierre Gramme
     1	Quang Hoang
     1	Sahaj Gupta
     1	Sanjiban Sengupta
     1	Sho Nakatani
     1	Siddhant Rao
     1	Tamas Mate
     1	Tao He
     1	Thomas Sarlandie
     1	Tomek Drabas
     1	William Ayd
     1	Y
     1	Yue
     1	emkornfield
     1	fdzuJ
     1	kambhamvivekshankar
     1	lukester1975
     1	martin-kokos
     1	zagto

Patch Committers

The following Apache committers merged contributed patches to the repository.

$ git shortlog -sn --group=trailer:signed-off-by apache-arrow-10.0.0..apache-arrow-11.0.0
   148	Sutou Kouhei
    89	Antoine Pitrou
    50	Joris Van den Bossche
    36	David Li
    36	Matt Topol
    34	Weston Pace
    24	Dewey Dunnington
    24	Nic Crane
    16	Jacob Wujciak-Jens
    13	Will Jones
     8	Neal Richardson
     6	Raúl Cumplido
     6	Yibo Cai
     4	Alessandro Molina
     4	Rok Mihevc
     3	Dominik Moritz
     3	Jonathan Keane
     2	Alenka Frim
     1	Micah Kornfield
     1	dependabot[bot]

Changelog

Apache Arrow 11.0.0 (2023-01-25 08:00:00)

New Features and Improvements

  • ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
  • ARROW-11776 - [C++][Java] Support parquet write from ArrowReader to file (#14151)
  • ARROW-13938 - [C++] Date and datetime types should autocast from strings
  • ARROW-13980 - [Go] Implement Scalar ApproxEquals (#14543)
  • ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
  • ARROW-14832 - [R] Implement bindings for stringr::str_remove and stringr::str_remove_all (#14644)
  • ARROW-14999 - [C++] Optional field name equality checks for map and list type (#14847)
  • ARROW-15006 - [Python][Doc] Add five more numpydoc checks to CI (#15214)
  • ARROW-15006 - [Python][CI][Doc] Enable numpydoc check PR03 (#13983)
  • ARROW-15206 - [Ruby] Add support for Arrow::Table.load(uri, schema:) (#15148)
  • ARROW-15460 - [R] Add as.data.frame.Dataset method (#14461)
  • ARROW-15470 - [R] Set null value in CSV writer (#14679)
  • ARROW-15538 - [C++] Expanding coverage of math functions from Substrait to Acero (#14434)
  • ARROW-15592 - [C++] Add support for custom output field names in a substrait::PlanRel (#14292)
  • ARROW-15691 - [Dev] Update archery to work with either master or main as default branch (#14033)
  • ARROW-15732 - [C++] Do not use any CPU threads in execution plan when use_threads is false (#15104)
  • ARROW-15812 - [R] Accept col_names in open_dataset for CSV (#14705)
  • ARROW-16266 - [R] Add StructArray$create() (#14922)
  • ARROW-16337 - [Python] Expose flag to enable/disable storing Arrow schema in Parquet metadata (#13000)
  • ARROW-16430 - [Python] Add support for reading record batch custom metadata API (#13041)
  • ARROW-16480 - [R] Update read_csv_arrow and open_dataset parse_options, read_options, and convert_options to take lists (#15270)
  • ARROW-16616 - [Python] Add lazy Dataset.filter() method (#13409)
  • ARROW-16673 - [Java] Integrate C Data into allocator hierarchy (#14506)
  • ARROW-16728 - [Python] ParquetDataset to still take legacy code path when old filesystem is passed (#15269)
  • ARROW-16728 - [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset (#14052)
  • ARROW-16782 - [Format] Add REE definitions to FlatBuffers (#14176)
  • ARROW-17025 - [Dev] Remove github user name links from merge commit message (#14458)
  • ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
  • ARROW-17187 - [R] Improve lazy ALTREP implementation for String (#14271)
  • ARROW-17212 - [Python] Support lazy Dataset.filter
  • ARROW-17301 - [C++] Implement compute function “binary_slice” (#14550)
  • ARROW-17302 - [R] Configure curl timeout policy for S3 (#15166)
  • ARROW-17360 - [Python] Order of columns in pyarrow.feather.read_table (#14528)
  • ARROW-17416 - [R] Implement lubridate::with_tz and lubridate::force_tz
  • ARROW-17425 - [R] lubridate::as_datetime() in dplyr query should be able to handle time in sub seconds (#13890)
  • ARROW-17462 - [R] Cast scalars to type of field in Expression building (#13985)
  • ARROW-17509 - [C++] Simplify async scheduler by removing the need to call End (#14524)
  • ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll) (#14186)
  • ARROW-17610 - [C++] Support additional source types in SourceNode (#14207)
  • ARROW-17613 - [C++] Add function execution API for a preconfigured kernel (#14043)
  • ARROW-17640 - [C++] Add File Handling Test cases for GlobFile handling in Substrait Read (#14132)
  • ARROW-17662 - [R] Facilitate offline installation from binaries (#14086)
  • ARROW-17726 - [CI] Enable sccache on more builds
  • ARROW-17731 - [Website] Add blog post about Flight SQL JDBC driver
  • ARROW-17732 - [Docs][Java] Add minimal JDBC driver docs (#14137)
  • ARROW-17751 - [Go][Benchmarking] Add Go Benchmark Script (#14148)
  • ARROW-17777 - [Dev] Update the pull request merge script to work with master or main
  • ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer (#14191)
  • ARROW-17812 - [Gandiva][Docs] Add C++ Gandiva User Guide (#14200)
  • ARROW-17825 - [C++] Allow the possibility to write several tables in ORCFileWriter (#14219)
  • ARROW-17832 - [Python] Construct MapArray from sequence of dicts (instead of list of tuples) (#14547)
  • ARROW-17836 - [C++] Allow specifying alignment of buffers (#14225)
  • ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext that will store a plan's shared data structures (#14227)
  • ARROW-17838 - [Python] Unify CMakeLists.txt in python/ (#14925)
  • ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource (#14250)
  • ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in Flight SQL (#14266)
  • ARROW-17870 - [Go] Add Scalar Binary Arithmetic
  • ARROW-17871 - [Go] initial binary arithmetic implementation (#14255)
  • ARROW-17887 - [R][Doc] Improve readability of the Get Started and README pages (#14514)
  • ARROW-17892 - [CI] Use Python 3.10 in AppVeyor build (#14307)
  • ARROW-17899 - [Go][CSV] Add Decimal support to CSV reader (#14504)
  • ARROW-17932 - [C++] Implement streaming RecordBatchReader for JSON (#14355)
  • ARROW-17949 - [C++][Docs] Remove the use of clcache from Windows dev docs (#14529)
  • ARROW-17953 - [Archery] Add archery docker info command (#14345)
  • ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
  • ARROW-17966 - [C++] Adjust to new format for Substrait optional arguments (#14415)
  • ARROW-17972 - [CI] Update CUDA docker jobs (#14362)
  • ARROW-17975 - [C++] Create at-fork facility (#14594)
  • ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
  • ARROW-17989 - [C++][Python] Enable struct_field kernel to accept string field names (#14495)
  • ARROW-18008 - [Python][C++] Add use_threads to run_substrait_query
  • ARROW-18012 - [R] Make map_batches .lazy = TRUE by default (#14521)
  • ARROW-18014 - [Java] Implement copy functions for vectors and Table (#14389)
  • ARROW-18016 - [CI] Add sccache to r jobs (#14570)
  • ARROW-18033 - [CI] Use $GITHUB_OUTPUT instead of set-output (#14409)
  • ARROW-18042 - [Java] Distribute Apple M1 compatible JNI libraries via mavencentral (#14472)
  • ARROW-18043 - [R] Properly instantiate empty arrays of extension types in Table__from_schema (#14519)
  • ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
  • ARROW-18075 - [Website] Update install page for 9.0.0
  • ARROW-18081 - [Go] Add Scalar Boolean functions (#14442)
  • ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
  • ARROW-18108 - [Go] More scalar binary arithmetic (Multiply and Divide) (#14544)
  • ARROW-18109 - [Go] Initial Unary Arithmetic (#14605)
  • ARROW-18110 - [Go] Scalar Comparisons (#14669)
  • ARROW-18111 - [Go] Remaining scalar binary arithmetic (shifts, power, bitwise) (#14703)
  • ARROW-18112 - [Go] Remaining Scalar Arithmetic (#14777)
  • ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
  • ARROW-18120 - [Release][Dev] Automate running binaries/wheels verifications (#14469)
  • ARROW-18121 - [Release][CI] Use Ubuntu 22.04 for verifying binaries (#14470)
  • ARROW-18122 - [Release][Dev] Update expected vote e-mail (#14548)
  • ARROW-18122 - [Release][Dev] Add verification PR URL to vote email (#14471)
  • ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be uninitialized (#14480)
  • ARROW-18137 - [Python][Docs] adding info about TableGroupBy.aggregation with empty list (#14482)
  • ARROW-18144 - [C++] Improve JSONTypeError error message in testing (#14486)
  • ARROW-18147 - [Go] Add Scalar Add/Sub for Decimal types (#14489)
  • ARROW-18151 - [CI] Avoid unnecessary redirect for some conda URLs (#14494)
  • ARROW-18152 - [Python] DataFrame Interchange Protocol for pyarrow Table
  • ARROW-18169 - [Website] Don't run dev docs update on fork repositories
  • ARROW-18173 - [Python] Drop older versions of Pandas (<1.0) (#14631)
  • ARROW-18174 - [R] Fix compile of altrep.cpp on some builds (#14530)
  • ARROW-18177 - [Go] Add Add/Sub for Temporal types (#14532)
  • ARROW-18178 - [Java] ArrowVectorIterator incorrectly closes Vectors (#14534)
  • ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
  • ARROW-18203 - [R] Refactor to remove unnecessary uses of build_expr (#14553)
  • ARROW-18206 - [C++][CI] Add a nightly build for C++20 compilation (#14571)
  • ARROW-18220 - [Dev] Remove a magic number for the default parallel level in downloader (#14563)
  • ARROW-18221 - [Release][Dev] Add support for customizing arrow-site dir (#14564)
  • ARROW-18222 - [Release][MSYS2] Detect reverse dependencies automatically (#14565)
  • ARROW-18223 - [Release][Homebrew] Detect reverse dependencies automatically (#14566)
  • ARROW-18224 - [Release][jar] Use temporary directory for download (#14567)
  • ARROW-18230 - [Python] Pass Cmake args to Python CPP
  • ARROW-18233 - [Release][JS] don't install yarn to system (#14577)
  • ARROW-18235 - [C++][Gandiva] Fix the like function implementation for escape chars (#14579)
  • ARROW-18237 - [Java] Extend Table code (#14573)
  • ARROW-18238 - [Docs][Python] Improve docs for S3FileSystem (#14599)
  • ARROW-18240 - [R] head() is crashing on some nightly builds (#14582)
  • ARROW-18243 - [R] Sanitizer nightly failure pointing to mixup between TimestampType and DurationType
  • ARROW-18248 - [CI][Release] Use GitHub token to avoid API rate limit (#14588)
  • ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
  • ARROW-18253 - [C++][Parquet] Add additional bounds safety checks (#14592)
  • ARROW-18259 - [C++][CMake] Add support for system Thrift CMake package (#14597)
  • ARROW-18264 - [Python] Add missing value accessor to temporal types (#14746)
  • ARROW-18264 - [Python] Expose time32/time64 scalar values (#14637)
  • ARROW-18270 - [Python] Remove gcc 4.9 compatibility code (#14602)
  • ARROW-18278 - [Java] Adjust path in Maven generate-libs-jni-macos-linux (#14623)
  • ARROW-18280 - [C++][Python] Support slicing to end in list_slice kernel (#14749)
  • ARROW-18282 - [C++][Python] Support step >= 1 in list_slice kernel (#14696)
  • ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc provided by vcpkg (#14609)
  • ARROW-18289 - [Release][vcpkg] Add a script to update vcpkg's arrow port (#14610)
  • ARROW-18291 - [Release][Docs] Update how to release (#14612)
  • ARROW-18292 - [Release][Python] Upload .wheel/.tar.gz for release not RC (#14708)
  • ARROW-18303 - [Go] Allow easy compute module importing (#14690)
  • ARROW-18306 - [R] Failing test after compute function updates (#14620)
  • ARROW-18318 - [Python] Expose Scalar.validate() (#15149)
  • ARROW-18321 - [R] Add tests for binary_slice kernel (#14647)
  • ARROW-18323 - Enabling issue templates in GitHub issues (#14675)
  • ARROW-18332 - [Go] Cast Dictionary types to value type (#14650)
  • ARROW-18333 - [Go][Docs] Update compute function docs (#14815)
  • ARROW-18336 - [Release][Docs] Don't update versions not in major release (#14653)
  • ARROW-18337 - [R] Possible undesirable handling of POSIXlt objects (#15277)
  • ARROW-18340 - [Python] PyArrow C++ header files no longer always included in installed pyarrow (#14656)
  • ARROW-18341 - [Doc][Python] Update note about bundling Arrow C++ on Windows (#14660)
  • ARROW-18342 - [C++] AsofJoinNode support for Boolean data field (#14658)
  • ARROW-18345 - [R] Create a CRAN-specific packaging checklist that lives in the R package directory (#14678)
  • ARROW-18348 - [CI][Release][Yum] redhat-rpm-config is needed on AlmaLinux 9 (#14661)
  • ARROW-18350 - [C++] Use std::to_chars instead of std::to_string (#14666)
  • ARROW-18358 - [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow
  • ARROW-18361 - [CI][Conan] Merge upstream changes (#14671)
  • ARROW-18363 - [Docs] Include warning when viewing old docs (redirecting to stable/dev docs) (#14839)
  • ARROW-18366 - [Packaging][RPM][Gandiva] Fix link error on AlmaLinux 9 (#14680)
  • ARROW-18367 - [C++] Enable the creation of named table relations (#14681)
  • ARROW-18373 - Fix component drop-down, add license text (#14688)
  • ARROW-18377 - MIGRATION: Automate component labels from issue form content (#15245)
  • ARROW-18380 - [Dev] Update dev_pr GitHub workflows to accept both GitHub issues and JIRA (#14731)
  • ARROW-18384 - [Release][MSYS2] Show pull request title (#14709)
  • ARROW-18391 - [R] Fix the version selector dropdown in the dev docs (#14800)
  • ARROW-18395 - [C++] Move select-k implementation into separate module
  • ARROW-18399 - [Python] Reduce warnings during tests (#14729)
  • ARROW-18401 - [R] Failing test on test-r-rhub-ubuntu-gcc-release-latest (#14894)
  • ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
  • ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu 20.04 (#14735)
  • ARROW-18407 - [Release][Website] Use UTC for release date (#14737)
  • ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in building plasma-glib (#14739)
  • ARROW-18410 - [Packaging][Ubuntu] Add support for Ubuntu 22.10 (#14740)
  • ARROW-18413 - [C++][Parquet] Expose page index info from ColumnChunkMetaData (#14742)
  • ARROW-18418 - [Website] do not delete /datafusion-python
  • ARROW-18419 - [C++] Update vendored fast_float (#14817)
  • ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex (#14803)
  • ARROW-18421 - [C++][ORC] Add accessor for stripe information in reader (#14806)
  • ARROW-18423 - [Python] Expose reading a schema from an IPC message (#14831)
  • ARROW-18426 - Update committers and PMC members on website
  • ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode (#14934)
  • ARROW-18428 - [Website] Enable github issues on arrow-site repo
  • ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
  • GH-14474 - Opportunistically delete R references to shared pointers where possible (#15278)
  • GH-14720 - [Dev] Update merge_arrow_pr script to accept GitHub issues (#14750)
  • GH-14755 - [Python] Expose QuotingStyle to Python (#14722)
  • GH-14761 - [Dev] Update labels on PR labeler to use new Component ones (#14762)
  • GH-14778 - [Python] Add (Chunked)Array sort() method (#14781)
  • GH-14784 - [Dev] Add possibility to autoassign on GitHub issue comment (#14785)
  • GH-14786 - [Java][Doc] Replace in-folder documentation (#14789)
  • GH-14787 - [Java][Doc] Update table.rst (#14794)
  • GH-14809 - [Dev] Add created GitHub issues to issues@arrow.apache.org (#14811)
  • GH-14816 - [Release] Make dev/release/06-java-upload.sh reusable from other project (#14830)
  • GH-14824 - [CI] r-binary-packages should only upload artifacts if all tests succeed (#14841)
  • GH-14844 - [Java] Short circuit null checks when comparing non null field types (#15106)
  • GH-14846 - [Dev] Support GitHub Releases in download_rc_binaries.py (#14848)
  • GH-14854 - Make changes to .md pages (#14852)
  • GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in. (#14900)
  • GH-14873 - [Java] DictionaryEncoder can decode without building a DictionaryHashTable (#14874)
  • GH-14885 - [Docs] Make changes to the New Contrib Guide (Jira -> GitHub) (#14889)
  • GH-14901 - [Java] ListSubfieldEncoder and StructSubfieldEncoder can decode without DictionaryHashTable (#14902)
  • GH-14918 - [Docs] Make changes to developers section of the docs (Jira -> GitHub) (#14919)
  • GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake package (#15251)
  • GH-14937 - [C++] Add rank kernel benchmarks (#14938)
  • GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED encoding (#15140)
  • GH-14961 - [Ruby] Use newer extpp for C++17 (#14962)
  • GH-14975 - [Python] Dataset.sort_by (#14976)
  • GH-14976 - [Python] Avoid dependency on exec plan in Table.sort_by to fix minimal tests (#15268)
  • GH-14977 - [Dev][CI] Add notify-token-expiration to archery (#14978)
  • GH-14981 - [R] Forward compatibility with dplyr::join_by() (#33664)
  • GH-14986 - [Release] Don't detect previous version on maint-X.Y.Z branch (#14987)
  • GH-14992 - [Packaging] Make dev/release/binary-task.rb reusable from other project (#14994)
  • GH-14997 - [Release] Ensure archery release tasks works with both new style GitHub issues and old style JIRA issues (#33615)
  • GH-14999 - [Release][Archery] Update archery release changelog to support GitHub issues
  • GH-15002 - [Release][Archery] Update archery release cherry-pick to support GitHub issues
  • GH-15005 - [Go] Add scalar.Append to append scalars to builder (#15006)
  • GH-15009 - [R] stringr 1.5.0 with the str_like function is already released (#15010)
  • GH-15012 - [Packaging][deb] Use system Protobuf for Debian GNU/Linux bookworm (#15013)
  • GH-15035 - [CI] Remove unsupported turbodbc jobs and scripts from CI (#15036)
  • GH-15050 - [Java][Docs] Update and consolidate Memory documentation (#15051)
  • GH-15072 - [C++] Move the round functionality into a separate module (#15073)
  • GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit (#15182)
  • GH-15081 - [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh (#15082)
  • GH-15084 - [Ruby] Use common keys when keys.nil? in Table#join (#15088)
  • GH-15085 - [Ruby] Add ColumnContainable#column_names (#15089)
  • GH-15087 - [Release] Slow down downloading RC binaries from GitHub (#15090)
  • GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
  • GH-15100 - [C++][Parquet] Add benchmark for reading strings from Parquet (#15101)
  • GH-15119 - [Release][Docs][R] Update version information in patch release (#15120)
  • GH-15134 - [Ruby] Specify -mmacox-version-min=10.14 explicitly for old Xcode (#15135)
  • GH-15146 - [GLib] Add GADatasetFinishOptions (#15147)
  • GH-15151 - [C++] Adding RecordBatchReaderSource to solve an issue in R API (#15183)
  • GH-15168 - [GLib] Add support for half float (#15169)
  • GH-15174 - [Go][FlightRPC] Expose Flight Server Desc and RegisterFlightService (#15177)
  • GH-15185 - [C++][Parquet] Improve documentation for Parquet Reader column_indices (#15184)
  • GH-15199 - [C++][Substrait] Allow AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
  • GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
  • GH-15205 - [R] Fix a parquet-fixture finding in R tests (#15207)
  • GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch (#15240)
  • GH-15218 - [Python] Remove auto generated pyarrow_api.h and pyarrow_lib.h (#15219)
  • GH-15226 - [C++] Add DurationType to hash kernels (#33685)
  • GH-15237 - [C++] Add ::arrow::Unreachable() using std::string_view (#15238)
  • GH-15239 - [C++][Parquet] Parquet writer writes decimal as int32/64 (#15244)
  • GH-15249 - [Documentation] Add PR template (#15250)
  • GH-15257 - [GLib][Dataset] Add GADatasetHivePartitioning (#15272)
  • GH-15265 - [Java] Publish SBOM artifacts (#15267)
  • GH-15289 - [Ruby] Return self when saving Table to csv (#33653)
  • GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case when the scalar is null (#15291)
  • GH-33607 - [C++] Support optional additional arguments for inline visit functions (#33608)
  • GH-33610 - [Dev] Do not allow ARROW prefixed tickets to be merged nor used on PR titles (#33611)
  • GH-33619 - [Documentation] Update PR template (#33620)
  • GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc without ARROW_PARQUET=ON (#33665)
  • GH-33670 - [GLib] Add GArrowProjectNodeOptions (#33677)
  • GH-33671 - [GLib] Add garrow_chunked_array_new_empty() (#33675)
  • PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated fields (#14366)
  • PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader (#14142)
  • PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space (#14509)
  • PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader ReadBatch and Skip (#14523)
  • PARQUET-2209 - [parquet-cpp] Optimize skip for the case that number of values to skip equals page size (#14545)
  • PARQUET-2210 - [C++][Parquet] Skip pages based on header metadata using a callback (#14603)
  • PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field (#14556)

Bug Fixes

  • ARROW-11631 - [R] Implement RPrimitiveConverter for Decimal type
  • ARROW-15026 - [Python] Error if datetime.timedelta to pyarrow.duration conversion overflows (#13718)
  • ARROW-15328 - [C++][Docs] Streaming CSV reader missing from documentation (#14452)
  • ARROW-15822 - [C++] Cast duration to string (thus CSV writing) not supported (#14450)
  • ARROW-16464 - [C++][CI][GPU] Add CUDA CI (#14497)
  • ARROW-16471 - [Go] RecordBuilder UnmarshalJSON handle complex values (#14560)
  • ARROW-16547 - [Python] to_pandas fails with FixedOffset timezones when timestamp_as_object is used (#14448)
  • ARROW-16795 - [C#][Flight] Nightly verify-rc-source-csharp-macos-arm64 fails (#15235)
  • ARROW-16817 - [C++] Test ORC writer errors with invalid types (#14638)
  • ARROW-17054 - [R] Creating an Array from an object bigger than 2^31 results in an Array of length 0 (#14929)
  • ARROW-17192 - [Python] Pass **kwargs in read_feather to to_pandas() (#14492)
  • ARROW-17332 - [R] error parsing folder path with accent (‘c:/Público’) in read_csv_arrow (#14930)
  • ARROW-17361 - [R] dplyr::summarize fails with division when divisor is a variable (#14933)
  • ARROW-17374 - [C++] Snappy package may be built without CMAKE_BUILD_TYPE (#14818)
  • ARROW-17458 - [C++] Cast between decimal and string (#14232)
  • ARROW-17538 - [C++] Import schema when importing array stream (#15037)
  • ARROW-17637 - [R][us][s] (#14935)
  • ARROW-17692 - [R] Add support for building with system AWS SDK C++ (#14235)
  • ARROW-17772 - [Doc] Sphinx / reST markup error
  • ARROW-17774 - [Python] Add python test for decimals to csv (#14525)
  • ARROW-17858 - [C++] Compilating warning in arrow/csv/parser.h (#14445)
  • ARROW-17893 - [Python] Test that reading of timedelta is stable (read_feather/to_pandas) (#14531)
  • ARROW-17985 - [C++][Python] Improve s3fs error message when wrong region (#14601)
  • ARROW-17991 - [Python][C++] Adding support for IpcWriteOptions to the dataset ipc file writer (#14414)
  • ARROW-18052 - [Python] Support passing create_dir thru pq.write_to_dataset (#14459)
  • ARROW-18068 - [Dev][Archery][Crossbow] Comment bot only waits for task if link is not available (#14429)
  • ARROW-18070 - [C++] Invoke google::protobuf::ShutdownProtobufLibrary for substrait tests (#14508)
  • ARROW-18086 - [Ruby] Add support for HalfFloat (#15204)
  • ARROW-18087 - [C++] RecordBatch::Equals should not ignore field names (#14451)
  • ARROW-18088 - [CI][Python] Fix pandas master/nightly build failure related to timedelta (#14460)
  • ARROW-18101 - [R] RecordBatchReaderHead from ExecPlan with UDF cannot be read (#14518)
  • ARROW-18106 - [C++] JSON reader ignores explicit schema with default unexpected_field_behavior=“infer” (#14741)
  • ARROW-18117 - [C++] Fix static bundle build (#14465)
  • ARROW-18118 - [Release][Dev] Fix problems in 02-source.sh/03-binary-submit.sh for 10.0.0-rc0 (#14468)
  • ARROW-18123 - [Python] Fix writing files with multi-byte characters in file name (#14764)
  • ARROW-18125 - [Python] Handle pytest 8 deprecations about pytest.warns(None)
  • ARROW-18126 - [Python] Remove ARROW_BUILD_DIR in building pyarrow C++ (#14498)
  • ARROW-18128 - [Java][CI] Update timestamp of Java Nightlies X.Y.Z-SNAPSHOT folder (#14496)
  • ARROW-18149 - [C++] fix build failure of join_example (#14490)
  • ARROW-18157 - [Dev][Archery] “archery docker run” sets env var to None when inherited (#14501)
  • ARROW-18158 - [CI] Use default Python version when installing conda cpp environment to fix conda builds (#14500)
  • ARROW-18159 - [Go][Release] Add go install to verify-release script (#14503)
  • ARROW-18161 - [Ruby] Refer source input in sub objects (#15217)
  • ARROW-18164 - [Python] Honor default memory pool in Dataset scanning (#14516)
  • ARROW-18167 - [Go][Release] update go.work with release (#14522)
  • ARROW-18172 - [CI][Release] Source Release and Merge Script jobs fail on master
  • ARROW-18183 - [C++] cpp-micro benchmarks are failing on mac arm machine (#14562)
  • ARROW-18188 - [CI] CUDA nightly docker upload fails due to wrong tag (#14538)
  • ARROW-18195 - [C++] Fix case_when produces bad data when condition has nulls (#15131)
  • ARROW-18202 - [C++] Reallow regexp replace on empty string (#15132)
  • ARROW-18205 - [C++] Substrait consumer is not converting right side references correctly on joins (#14558)
  • ARROW-18207 - [Ruby] RubyGems for 10.0.0 aren't updated yet
  • ARROW-18209 - [Java] Make ComplexCopier agnostic of specific implementation of MapWriter (UnionMapWriter) (#14557)
  • ARROW-18212 - [C++] NumericBuilder::Reset() doesn't reset all members (#14559)
  • ARROW-18225 - [Python] Fully support filesystem in parquet.write_metadata (#14574)
  • ARROW-18227 - [CI][Packaging] Do not fail conda-clean if conda search raises PackagesNotFound (#14569)
  • ARROW-18229 - [Python] Check schema argument type in RecordBatchReader.from_batches (#14583)
  • ARROW-18231 - [C++][CMake] Add support for overriding optimization level (#15022)
  • ARROW-18246 - [Python][Docs] PyArrow table join docstring typos for left and right suffix arguments (#14591)
  • ARROW-18247 - [JS] fix: RangeError crash in Vector.toArray() (#14587)
  • ARROW-18256 - [C++][Windows] Use IMPORTED_IMPLIB for external shared Thrift (#14595)
  • ARROW-18257 - [Python] pass back time types with correct type class (#14633)
  • ARROW-18269 - [C++] Handle slash character in Hive-style partition values (#14646)
  • ARROW-18272 - [Python] Support filesystem parameter in ParquetFile (#14717)
  • ARROW-18284 - [Python][Docs] Add missing CMAKE_PREFIX_PATH to allow setup.py CMake invocations to find Arrow CMake package (#14586)
  • ARROW-18290 - [C++] Escape all special chars in URI-encoding (#14645)
  • ARROW-18309 - [Go] Fix delta bit packing decode panic (#14649)
  • ARROW-18320 - [C++][FlightRPC] Fix improper Status/Result conversion in Flight client (#14859)
  • ARROW-18334 - [C++] Handle potential non-commutativity by rebinding (#14659)
  • ARROW-18339 - [Python][CI] Add DYLD_LIBRARY_PATH to avoid requiring PYARROW_BUNDLE_ARROW_CPP on macOS job (#14643)
  • ARROW-18343 - [C++] Remove AllocateBitmap() with out parameter (#14657)
  • ARROW-18351 - [C++][FlightRPC] Fix crash in DoExchange with UCX (#15031)
  • ARROW-18353 - [C++][FlightRPC] Prevent concurrent Finish in UCX (#15034)
  • ARROW-18360 - [Python] Don't crash when schema=None in FlightClient.do_put (#14698)
  • ARROW-18374 - [Go][CI][Benchmarking] Fix Go benchmark github info (#14691)
  • ARROW-18374 - [Go][CI][Benchmarking] Fix Go Bench Script after Conbench change (#14689)
  • ARROW-18379 - [Python] Change warnings to _warnings in _plasma_store_entry_point (#14695)
  • ARROW-18382 - [C++] Set ADDRESS_SANITIZER in fuzzing builds (#14702)
  • ARROW-18383 - [C++] Avoid global variables for thread pools and at-fork handlers (#14704)
  • ARROW-18389 - [CI][Python] Update nightly test-conda-python-3.7-pandas-0.24 to pandas >= 1.0 (#14714)
  • ARROW-18390 - [CI][Python] Update spark test modules to match spark master (#14715)
  • ARROW-18392 - [Python] Fix test_s3fs_wrong_region; set anonymous=True (#14716)
  • ARROW-18394 - [Python][CI] Fix nightly job using pandas dev (temporarily skip tests) (#15048)
  • ARROW-18397 - [C++] Clear S3 region resolver client at S3 shutdown (#14718)
  • ARROW-18400 - [Python] Quadratic memory usage of Table.to_pandas with nested data
  • ARROW-18405 - [Ruby] Avoid rebuilding chunked arrays in Arrow::Table.new (#14738)
  • ARROW-18412 - [C++][R] Windows build fails because of missing ChunkResolver symbols (#14774)
  • ARROW-18424 - [C++] Fix Doxygen error on ARROW_ENGINE_EXPORT (#14845)
  • ARROW-18429 - [R] : Bump dev version following 10.0.1 patch release (#14887)
  • ARROW-18436 - [C++] Ensure correct (un)escaping of special characters in URI paths (#14974)
  • ARROW-18437 - [C++][Parquet] Fix encoder for DELTA_BINARY_PACKED when flushing more than once (#14959)
  • GH-14745 - [R] {rlang} dependency must be at least version 1.0.0 because of check_dots_empty (#14744)
  • GH-14775 - [Go] Fix UnionBuilder.Len implementations (#14776)
  • GH-14780 - [Go] Fix issues with IPC writing of sliced map/list arrays (#14793)
  • GH-14791 - [JS] Fix BitmapBufferBuilder size truncation (#14881)
  • GH-14805 - [Format] C Data Interface: clarify nullability of buffer pointers (#14808)
  • GH-14819 - [CI][RPM] Add workaround for build failure on CentOS 9 Stream (#14820)
  • GH-14828 - [CI][Conda] Sync with conda-forge, fix nightly jobs (#14832)
  • GH-14842 - [C++] Propagate some errors in JSON chunker (#14843)
  • GH-14849 - [CI] R install-local builds sometimes fail because sccache times out (#14850)
  • GH-14855 - [C++] Support importing zero-case unions (#14857)
  • GH-14856 - [CI] Azure builds fail with docker permission error (#14858)
  • GH-14865 - [Go][Parquet] Address several memory leaks of buffers in pqarrow (#14878)
  • GH-14872 - [R] arrow returns wrong variable content when multiple group_by/summarise statements are used (#14905)
  • GH-14875 - [C++] C Data Interface: check imported buffer for non-null (#14814)
  • GH-14876 - [Go] Handling Crashes in C Data interface (#14877)
  • GH-14883 - [Go] Fix IPC encoding empty maps (#14904)
  • GH-14883 - [Go] ipc.Writer leaks memory when compressing body (#14892)
  • GH-14884 - [CI] R install resource may got 404 (#14893)
  • GH-14890 - [Java] Fix memory leak of DictionaryEncoder when exception thrown (#14891)
  • GH-14907 - [R] right_join() function does not produce the expected outcome (#15077)
  • GH-14909 - [Java] Prevent potential memory leak of ListSubfieldEncoder and StructSubfieldEncoder (#14910)
  • GH-14916 - [C++] Remove the API declaration about “ConcatenateBuffers” (#14915)
  • GH-14927 - [Dev] Crossbow submit does not work with fine grained PATs (#14928)
  • GH-14940 - [Go][Parquet] Fix Encryption Column writing (#14954)
  • GH-14943 - [Python] Fix pyarrow.get_libraries() order (#14944)
  • GH-14945 - [Ruby] Add support for macOS 12 / Xcode 14 (#14960)
  • GH-14947 - [R] Compatibility with dplyr 1.1.0 (#14948)
  • GH-14949 - [CI][Release] Output script's stdout on failure (#14957)
  • GH-14967 - [R] Minimal nightly builds are failing (#14972)
  • GH-14968 - [Python] Fix segfault for dataset ORC write (#15049)
  • GH-14990 - [C++][Skyhook] Follow FileFormat API change (#15086)
  • GH-14993 - [CI][Conda] Fix missing RECIPE_ROOT variable now expected by conda build (#15014)
  • GH-14995 - [Go][FlightSQL] Fix Supported Unions Constant (#15003)
  • GH-15001 - [R] Fix Parquet datatype test failure (#15197)
  • GH-15007 - [CI][RPM] Ignore import failed key (#15008)
  • GH-15023 - [CI][Packaging][Java] Force to use libz3.a with Homebrew (#15024)
  • GH-15025 - [CI][C++][Homebrew] Ensure removing Python related commands (#15026)
  • GH-15028 - [R][Docs] NOT_CRAN should be "true" instead of TRUE in R (#15029)
  • GH-15040 - [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF (#15075)
  • GH-15042 - [C++][Parquet] Update stats on subsequent batches of dictionaries (#15179)
  • GH-15043 - [Python][Docs] Update docstring for pyarrow.decompress (#15061)
  • GH-15052 - [C++][Parquet] Fix DELTA_BINARY_PACKED decoder when reading only one value (#15124)
  • GH-15062 - [C++] Simplify EnumParser behavior (#15063)
  • GH-15064 - [Python][CI] Dask nightly tests are failing due to fsspec bug (#15065)
  • GH-15069 - [C++][Python][FlightRPC] Make DoAction truly streaming (#15118)
  • GH-15080 - [CI][R] Re-enable binary package job for R 4.1 on Windows (#25359)
  • GH-15092 - [CI][C++][Homebrew] Ensure removing Python related commands (again) (#15093)
  • GH-15094 - [CI][Release][Ruby] Install Bundler by APT (#15095)
  • GH-15110 - [R][CI] Windows build fails in packaging job (#15111)
  • GH-15114 - [R][C++][CI] Homebrew can't install Python 3.11 on GHA runners (#15116)
  • GH-15115 - [R][CI] pyarrow tests fail on macos 10.13 due to missing pyarrow wheel (#15117)
  • GH-15122 - [Benchmarking][Python] Set ARROW_INSTALL_NAME_RPATH=ON for benchmark builds (#15123)
  • GH-15126 - [R] purrr::rerun was deprecated in purrr 1.0.0 (#15127)
  • GH-15136 - [Python][macOS] Use @rpath for libarrow_python.dylib (#15143)
  • GH-15141 - [C++] fix for unstable test due to unstable sort (#15142)
  • GH-15150 - [C++][FlightRPC] Wait for side effects in DoAction (#15152)
  • GH-15156 - [JS] Fix can't find variable: BigInt64Array (#15157)
  • GH-15172 - [Python] Docstring test failure (#15186)
  • GH-15176 - Fix various issues introduced in the asof-join benchmark by ARROW-17980 and ARROW-15732 (#15190)
  • GH-15189 - [R] Skip S3 tests on MacOS 10.13 (#33613)
  • GH-15243 - [C++] fix for potential deadlock in the group-by node (#33700)
  • GH-15254 - [GLib] garrow_execute_plain_wait() checks the finished status (#15255)
  • GH-15259 - [CI] component assignment fails due to typo (#15260)
  • GH-15264 - [C++] Add scanner tests for disabling readahead and fix relevant bugs (#29185)
  • GH-15274 - [Java][FlightRPC] handle null keystore password (#15276)
  • GH-15282 - [CI][C++] add CLANG_TOOLS variable in .travis.yaml (#32972)
  • GH-15292 - [C++] Typeclass alias is missing in ExtensionArray (#15293)
  • GH-25633 - [CI][Java][macOS] Ensure using bundled RE2 (#33711)
  • GH-26209 - [Ruby] Add support for Ruby 2.5 (#33602)
  • GH-26394 - [Python] Don't use target_include_directories() for imported target (#33606)
  • GH-33626 - [Packaging][RPM] Don't remove metadata for non-target arch (#33672)
  • GH-33638 - [C++] Removing ExecPlan::Make deprecation warning (#33658)
  • GH-33643 - [C++] Remove implicit = capture of this which is not valid in c++20 (#33644)
  • GH-33666 - [R] Remove extraneous argument to semi_join (#33693)
  • GH-33667 - [C++][CI] Use Ubuntu 22.04 for ASAN (#33669)
  • GH-33687 - [Dev] Fix commit message generation in merge script (#33691)
  • GH-33705 - [R] Fix link on README (#33706)