layout: default title: Apache Arrow 4.0.0 Release permalink: /release/4.0.0.html

Apache Arrow 4.0.0 (26 April 2021)

This is a major release covering more than 3 months of development.

Download

Contributors

This release includes 719 commits from 114 distinct contributors.

$ git shortlog -sn apache-arrow-3.0.0..apache-arrow-4.0.0
    65	Antoine Pitrou
    47	Andrew Lamb
    41	Heres, Daniel
    40	David Li
    37	Sutou Kouhei
    33	Neal Richardson
    30	Weston Pace
    28	Jorge C. Leitao
    26	Krisztián Szűcs
    25	Ian Cook
    21	Dominik Moritz
    20	Andy Grove
    19	Yibo Cai
    18	Joris Van den Bossche
    17	Neville Dipale
    17	Jonathan Keane
    17	Ritchie Vink
    12	Mike Seddon
    12	Benjamin Kietzman
    11	Mauricio Vargas
    10	Qingping Hou
    10	Diana Clarke
     8	Micah Kornfield
     7	Matthew Topol
     7	Dmitry Patsura
     5	Projjal Chanda
     5	Kenta Murata
     4	Anthony Louis
     4	Ximo Guanter
     4	liyafan82
     3	Andre Braga Reis
     3	Kazuaki Ishizaki
     3	Maarten A. Breddels
     3	Uwe L. Korn
     3	ptaylor
     3	Steven Fackler
     3	Sagnik Chakraborty
     3	Nic Crane
     2	Marc Prud'hommeaux
     2	Raphael Taylor-Davies
     2	João Pedro
     2	Yordan Pavlov
     2	emkornfield
     2	Max Burke
     2	Florian Müller
     2	Ben Chambers
     2	mqy
     2	Christoph Schulze
     2	Manoj Karthick
     2	Sathis Kumar
     2	Ryan Jennings
     2	Ruan Pearce-Authers
     2	Tao He
     2	Eric Burden
     2	Tyrel Rink
     2	Romain Francois
     2	Rok
     1	witchard
     1	Adam Lippai
     1	Albert Villanova del Moral
     1	Alessandro Molina
     1	Ali
     1	Andrew Wieteska
     1	Bob Tinsman
     1	Brian Hulette
     1	Bryan Cutler
     1	Clcanny
     1	Daniel Russo
     1	Daniël Heres
     1	Eduardo Ponce
     1	Evan Chan
     1	FawnD2
     1	Felix Zhu
     1	Fernando Herrera
     1	Fiona La
     1	François Saint-Jacques
     1	GALI PREM SAGAR
     1	Gert Hulselmans
     1	Ha Thi Tham
     1	Hongze Zhang
     1	Ilya Biryukov
     1	Ivan Smirnov
     1	James Winegar
     1	Joe Roberts
     1	Johannes Müller
     1	Jörn Horstmann
     1	Mahmut Bulut
     1	Marco Gorelli
     1	Marko Mikulicic
     1	Markus Silberstein Hont
     1	Martin Nowak
     1	Matt Brubeck
     1	Matt Summersgill
     1	Max Meldrum
     1	Nathaniel Bauernfeind
     1	Nga Tran
     1	Nick Bruno
     1	Rok Mihevc
     1	Roman Karlstetter
     1	Sam Albers
     1	Simon Bertron
     1	Szangin
     1	Truc Lam Nguyen
     1	Weichen Xu
     1	Ying Zhou
     1	frank400
     1	ivan
     1	jpeeter
     1	martinblostein
     1	nmcdonnell-kx
     1	pierwill
     1	sjgupta2
     1	sundy-li
     1	ARF1

Patch Committers

The following Apache committers merged contributed patches to the repository.

$ git shortlog -csn apache-arrow-3.0.0..apache-arrow-4.0.0
   157	Andrew Lamb
   101	Antoine Pitrou
    93	Neal Richardson
    88	Krisztián Szűcs
    72	Sutou Kouhei
    41	David Li
    30	Benjamin Kietzman
    25	Neville Dipale
    22	Micah Kornfield
    19	Jorge C. Leitao
    16	Andy Grove
    14	Praveen
    11	Joris Van den Bossche
     9	GitHub
     8	Yibo Cai
     4	Uwe L. Korn
     3	Sebastien Binet
     2	liyafan82
     1	Kenta Murata
     1	Eric Erhardt
     1	Chao Sun
     1	Bryan Cutler

Changelog

Apache Arrow 4.0.0 (2021-04-26)

New Features and Improvements

  • ARROW-951 - [JS] Fix generated API documentation
  • ARROW-2229 - [C++] Write CSV files from RecordBatch, Table
  • ARROW-3690 - [Rust] Add Rust to the format integration testing
  • ARROW-6103 - [Java] Stop using the maven release plugin
  • ARROW-6248 - [Python] Use FileNotFoundError in HadoopFileSystem.open() in Python 3
  • ARROW-6455 - [C++] Implement ExtensionType for non-UTF8 Unicode data
  • ARROW-6604 - [C++] Add support for nested types to MakeArrayFromScalar
  • ARROW-7215 - [C++][Gandiva] Implement castVARCHAR(numeric_type) functions in Gandiva
  • ARROW-7364 - [Rust] Add cast options to cast kernel
  • ARROW-7633 - [C++][CI] Create fuzz targets for tensors and sparse tensors
  • ARROW-7808 - [Java][Dataset] Implement Datasets Java API
  • ARROW-7906 - [C++][Python] Full functionality for ORC format
  • ARROW-8049 - [C++] Upgrade bundled Thrift version to 0.13.0
  • ARROW-8282 - [C++/Python][Dataset] Support schema evolution for integer columns
  • ARROW-8284 - [C++][Dataset] Schema evolution for timestamp columns
  • ARROW-8630 - [C++][Dataset] Pass schema including all materialized fields to catch CSV edge cases
  • ARROW-8631 - [C++][Dataset] Add ConvertOptions and ReadOptions to CsvFileFormat
  • ARROW-8658 - [C++][Dataset] Implement subtree pruning for FileSystemDataset::GetFragments
  • ARROW-8732 - [C++] Let Futures support cancellation
  • ARROW-8771 - [C++] Add boost/process library to build support
  • ARROW-8796 - [Rust] Allow parquet to be written directly to memory
  • ARROW-8797 - [C++] Support Flight RPC among diffent endian platforms
  • ARROW-8900 - [C++] Respect HTTP(S)_PROXY for S3 Filesystems and/or expose proxy options as parameters
  • ARROW-8919 - [C++] Add “DispatchBest” APIs to compute::Function that selects a kernel that may require implicit casts to invoke
  • ARROW-9128 - [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
  • ARROW-9149 - [C++] Improve configurability of RandomArrayGenerator::ArrayOf
  • ARROW-9196 - [C++] Make temporal casts work on Scalar inputs
  • ARROW-9318 - [C++][Parquet] Encryption key management tools
  • ARROW-9731 - [C++][Dataset] Port “head” method from R to C++ Dataset Scanner
  • ARROW-9749 - [C++][Dataset] Extract format-specific scan options from FileFormat
  • ARROW-9777 - [Rust] Implement IPC changes to catch up to 1.0.0 format
  • ARROW-9856 - [R] Add bindings for string compute functions
  • ARROW-10014 - [C++] TaskGroup::Finish should execute tasks
  • ARROW-10089 - [R] inject base class for Array, ChunkedArray and Scalar
  • ARROW-10183 - [C++] Create a ForEach library function that runs on an iterator of futures
  • ARROW-10195 - [C++] Add string struct extract kernel using re2
  • ARROW-10250 - [FlightRPC][C++] Remove default constructor for FlightClientOptions
  • ARROW-10255 - [JS] Reorganize imports and exports to be more friendly to ESM tree-shaking
  • ARROW-10297 - [Rust] Parameter for parquet-read to output data in json format
  • ARROW-10299 - [Rust] Support reading and writing V5 of IPC metadata
  • ARROW-10305 - [R] Filter with regular expressions
  • ARROW-10306 - [C++] Add string replacement kernel
  • ARROW-10349 - [Python] Build and publish aarch64 wheels
  • ARROW-10354 - [Rust] [DataFusion] Add support for regex extract
  • ARROW-10360 - [CI] Bump github actions cache version
  • ARROW-10372 - [C++][Dataset] Read compressed CSVs
  • ARROW-10406 - [C++] Unify dictionaries when writing IPC file in a single shot
  • ARROW-10420 - [C++] FileSystem::OpenInput{File,Stream} should accept a MemoryPool
  • ARROW-10421 - [R] Feather reader/writer should accept a MemoryPool
  • ARROW-10438 - [C++][Dataset] Partitioning::Format on nulls
  • ARROW-10520 - [C++][R] Implement add/remove/replace for RecordBatch
  • ARROW-10570 - [R] Use Converter API to convert SEXP to Array/ChunkedArray
  • ARROW-10580 - [C++] When Validating, ensure DenseUnionArray offsets are increasing
  • ARROW-10606 - [C++][Compute] Support casts to and from Decimal256 type.
  • ARROW-10655 - [C++] Add LRU cache facility
  • ARROW-10734 - [R] Build and test on Solaris
  • ARROW-10735 - [R] Remove arrow-without-arrow wrapping
  • ARROW-10766 - [Rust] Compute nested definition and repetition for list arrays
  • ARROW-10797 - [C++] Investigate faster random generation for tests and benchmarks
  • ARROW-10816 - [Rust] [DataFusion] Implement INTERVAL
  • ARROW-10831 - [C++][Compute] Implemement quantile kernel
  • ARROW-10846 - [C++] Add async filesystem operations
  • ARROW-10880 - [Java] Support compressing RecordBatch IPC buffers by LZ4
  • ARROW-10882 - [Python][Dataset] Writing dataset from python iterator of record batches
  • ARROW-10895 - [C++][Gandiva] Implement bool to varchar cast function in Gandiva
  • ARROW-10903 - [Rust] Implement FromIter<Option<Vec<u8>>> constructor for FixedSizeBinaryArray
  • ARROW-11022 - [Rust] [DataFusion] Upgrade to tokio 1.0
  • ARROW-11070 - [C++] Implement power / exponentiation compute kernel
  • ARROW-11074 - [Rust][DataFusion] Implement predicate push-down for parquet tables
  • ARROW-11081 - [Java] Make IPC option immutable
  • ARROW-11108 - [Rust] Improve performance of MutableBuffer
  • ARROW-11141 - [Rust]: Miri checks
  • ARROW-11149 - [Rust] create_batch_empty - support List, LargeList
  • ARROW-11150 - [Rust] Set up bi-weekly Rust sync call and update website
  • ARROW-11154 - [CI][C++] Move homebrew crossbow tests off of Travis-CI
  • ARROW-11156 - [Rust][DataFusion] Create hashes vectorized in hash join
  • ARROW-11174 - [C++][Dataset] Make Expressions available for projection
  • ARROW-11179 - [Format] Make comments in fb files friendly to rust doc
  • ARROW-11183 - [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
  • ARROW-11191 - [C++] Use FnOnce for TaskGroup's tasks instead of std::function
  • ARROW-11216 - [Rust] Improve documentation for StringDictionaryBuilder
  • ARROW-11220 - [Rust] DF Implement GROUP BY support for Boolean
  • ARROW-11222 - [Rust] [Arrow] catch up with flatbuffers 0.8.1
  • ARROW-11246 - DF - Add type to Unexpected accumulator state message
  • ARROW-11254 - [Rust][DataFusion] Add SIMD and snmalloc flags as options to benchmarks
  • ARROW-11260 - [C++][Dataset] Don't require dictionaries for reading dataset with schema-based Partitioning
  • ARROW-11265 - [Rust] Made bool not convertable to bytes
  • ARROW-11268 - [Rust][DataFusion] Support specifying repartitions in MemTable
  • ARROW-11270 - [Rust] Use slices for simple array data buffer access
  • ARROW-11279 - [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
  • ARROW-11284 - [R] Support dplyr verb transmute()
  • ARROW-11289 - [Rust] [DataFusion] Support GROUP BY for Dictionary columns
  • ARROW-11290 - [Rust][DataFusion] Address hash aggregate performance with high number of groups
  • ARROW-11291 - [Rust] implement extend for MutableBuffer (from iterator)
  • ARROW-11300 - [Rust][DataFusion] Improve hash aggregate performance with large number of groups in
  • ARROW-11308 - [Rust] [Parquet] Add Arrow decimal array writer
  • ARROW-11309 - [Release][C#] Use .NET 3.1 for verification
  • ARROW-11310 - [Rust] Implement arrow JSON writer
  • ARROW-11314 - [Release][APT][Yum] Add support for verifying arm64 packages
  • ARROW-11317 - [Rust] Test the prettyprint feature in CI
  • ARROW-11318 - [Rust] Support pretty printing timestamp, date, and time types
  • ARROW-11319 - [Rust] [DataFusion] Improve test comparisons to record batch
  • ARROW-11321 - [Rust][DataFusion] Fix DataFusion compilation error
  • ARROW-11325 - [Packaging][C#] Release Apache.Arrow.Flight and Apache.Arrow.Flight.AspNetCore
  • ARROW-11329 - [Rust] Do not rebuild the library on every change
  • ARROW-11330 - [Rust][DataFusion] Add ExpressionVisitor pattern
  • ARROW-11332 - [Rust] Use MutableBuffer in take_string instead of Vec
  • ARROW-11333 - [Rust] Suport creating arbitrary nested empty arrays
  • ARROW-11336 - [C++][Doc] Improve Developing on Windows docs
  • ARROW-11338 - [R] Bindings for quantile and median
  • ARROW-11340 - [C++] Add vcpkg.json manifest to cpp project root
  • ARROW-11343 - [DataFusion] Simplified example
  • ARROW-11346 - [C++][Compute] Implement quantile kernel benchmark
  • ARROW-11349 - [Rust] Add from_iter_values to create arrays from T instead of Option<T>
  • ARROW-11350 - [C++] Bump dependency versions
  • ARROW-11354 - [Rust] Speed-up casts of dates and times
  • ARROW-11355 - [Rust] Align Date type with spec
  • ARROW-11358 - [Rust] Add benchmark for concatenating small arrays
  • ARROW-11360 - [Rust] [DataFusion] Improve CSV “No files found” error message
  • ARROW-11361 - [Rust] Build buffers from iterator of booleans
  • ARROW-11362 - [Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance
  • ARROW-11365 - [Rust] [Parquet] Implement parsers for v2 of the text schema
  • ARROW-11366 - [Rust][DataFusion] Add Constant Folding / Support boolean literal in equality expression
  • ARROW-11367 - [C++] Implement approximante quantile utility
  • ARROW-11369 - [DataFusion] Split expressions.rs
  • ARROW-11372 - Support RC verification on macOS-ARM64
  • ARROW-11373 - [Python][Docs] Add example of specifying type for a column when reading csv file
  • ARROW-11374 - [Python] Make legacy pyarrow.filesystem / pyarrow.serialize warnings more visisble
  • ARROW-11375 - [Rust] CI fails due to deprecation warning in clippy
  • ARROW-11377 - [C++][CI] Add ThreadSanitizer nightly build
  • ARROW-11383 - [Rust] use trusted len on bit ops
  • ARROW-11386 - [Release] Fix post documents update script
  • ARROW-11389 - [Rust] Inconsistent comments for datatypes
  • ARROW-11395 - [DataFusion] Support custom optimizations
  • ARROW-11401 - [Rust][DataFusion] Pass slices instead of Vec in DataFrame API
  • ARROW-11404 - [Rust][DataFusion] Upgrade to aHash 0.7
  • ARROW-11405 - [DataFusion] Support multiple custom nodes
  • ARROW-11406 - [CI][C++] Fix caching on Travis-CI builds
  • ARROW-11408 - Add window support to datafusion readme
  • ARROW-11411 - [Packaging][Linux] Disable arm64 nightly builds
  • ARROW-11414 - [Rust] Reduce copies in Schema::try_merge
  • ARROW-11417 - [Integration] Add integration test for buffer compression
  • ARROW-11418 - [Doc] Add IPC buffer compression to support matrix
  • ARROW-11421 - [Rust][DataFusion] Support group by Date32
  • ARROW-11422 - [C#] Add support for decimals
  • ARROW-11423 - [R] value_counts and some StructArray methods
  • ARROW-11425 - [C++][Compute] Improve quantile kernel for integers
  • ARROW-11426 - [Rust][DataFusion] EXTRACT support
  • ARROW-11428 - [Rust] Add power kernel
  • ARROW-11429 - Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11430 - [Rust] Kernel to combine two arrays based on boolean mask
  • ARROW-11431 - [Rust] [DataFusion] Add support for the SQL HAVING clause
  • ARROW-11435 - Allow creating ParquetPartition from external crate
  • ARROW-11436 - [Rust] Allow non-sized iterators in Primitive::from_iter
  • ARROW-11437 - [Rust] Simplify benches
  • ARROW-11438 - Unsupported ast node Value(Boolean(true)) in sqltorel
  • ARROW-11439 - [Rust] Add year support to temporal kernel
  • ARROW-11440 - [Rust] [DataFusion] Add method to CsvExec to get CSV schema
  • ARROW-11442 - [Rust] Expose the logic used to interpret date/times
  • ARROW-11443 - [Rust] Write datetime information for Date64 Type in csv writer
  • ARROW-11444 - [Rust][DataFusion] Pass slices instead of &Vec to functions
  • ARROW-11446 - [DataFusion] Support scalars in builtin functions
  • ARROW-11447 - [Rust] Add shift kernel
  • ARROW-11449 - [CI][R][Windows] Use ccache
  • ARROW-11457 - [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11459 - [Rust] Allow ListArray of primitives to be built from iterator
  • ARROW-11462 - [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX
  • ARROW-11463 - [Python] Allow configuration of IpcWriterOptions 64Bit from PyArrow
  • ARROW-11466 - [Flight][Go] Add BasicAuth and BearerToken handlers for Go
  • ARROW-11467 - [R] Fix reference to json_table_reader() in R docs
  • ARROW-11468 - [R] Allow user to pass schema to read_json_arrow()
  • ARROW-11474 - [C++] Update bundled re2 version
  • ARROW-11476 - [Rust][DataFusion] Test running of TPCH benchmarks in CI
  • ARROW-11477 - [R][Doc] Reorganize and improve README and vignette content
  • ARROW-11478 - [R] Consider ways to make arrow.skip_nul option more user-friendly
  • ARROW-11479 - [Rust][Parquet] Add method to return compressed size of row group
  • ARROW-11481 - [Rust] More cast implementations
  • ARROW-11484 - [Rust] Derive Clone for ExecutionContext
  • ARROW-11486 - [Website] Use Jekyll 4 and webpack to support Ruby 3.0 or later
  • ARROW-11489 - [Rust][DataFusion] Make DataFrame Send+Sync
  • ARROW-11491 - [Rust] Support json schema inference for nested list and struct
  • ARROW-11493 - [CI][Packaging][deb][RPM] Test built packages
  • ARROW-11500 - [R] Allow bundled build script to run on Solaris
  • ARROW-11501 - [C++] endianness check does not work on Solaris
  • ARROW-11504 - [Rust] verify Datatype in ListArray::from(ArrayDataRef)
  • ARROW-11505 - [Rust] Add support for LargeUtf8 in csv-writer
  • ARROW-11507 - [R] Bindings for GetRuntimeInfo
  • ARROW-11510 - [Python] Add note that pip >= 19.0 is required to get binary packages
  • ARROW-11511 - [Rust] Replace Arc<ArrayData> by ArrayData
  • ARROW-11512 - [Packaging][deb] Add missing gRPC dependency for Ubuntu 21.04
  • ARROW-11513 - [R] Bindings for sub/gsub
  • ARROW-11516 - [R] Allow all C++ compute functions to be called by name in dplyr
  • ARROW-11539 - [Developer][Archery] Change items_per_seconds units
  • ARROW-11541 - [C++][Compute] Implement approximate quantile kernel
  • ARROW-11542 - [Rust] json reader should not crash when reading nested list
  • ARROW-11544 - [Rust] [DataFusion] Implement as_any for AggregateExpr
  • ARROW-11545 - [Rust] [DataFusion] SendableRecordBatchStream should implement Sync
  • ARROW-11556 - [C++] Minor benchmark improvements
  • ARROW-11557 - [Rust] Add table de-registration to DataFusion ExecutionContext
  • ARROW-11559 - [C++] Improve flatbuffers verification limits
  • ARROW-11559 - [C++] Improve flatbuffers verification limits
  • ARROW-11561 - [Rust][DataFusion] Add Send + Sync to MemTable::load
  • ARROW-11563 - [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))
  • ARROW-11568 - [C++][Compute] Mode kernel performance is bad in some conditions
  • ARROW-11570 - [Rust] ScalarValue - support Date64
  • ARROW-11571 - [CI] Cancel stale Github Actions workflow runs
  • ARROW-11572 - [Rust] Add a kernel for division by single scalar
  • ARROW-11573 - [Developer][Archery] Google benchmark now reports run type
  • ARROW-11574 - [Rust][DataFusion] Upgrade sqlparser to 0.8 to support parsing all TPC-H queries
  • ARROW-11575 - [Developer][Archery] Expose execution time in benchmark results
  • ARROW-11576 - [Rust] Remove unused variable in example
  • ARROW-11580 - [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
  • ARROW-11589 - [R] Add methods for modifying Schemas
  • ARROW-11590 - [C++] Move CSV background generator to IO thread pool
  • ARROW-11591 - [C++][Compute] Prototype version of hash aggregation
  • ARROW-11592 - [Rust] Typo in comment
  • ARROW-11594 - [Rust] Support pretty printing with NullArrays
  • ARROW-11597 - [Rust] Split datatypes in a module
  • ARROW-11598 - [Rust] Split buffer.rs in smaller files
  • ARROW-11599 - [Rust] Add function to create array with all nulls
  • ARROW-11601 - [C++][Dataset] Expose pre-buffering in ParquetFileFormatReaderOptions
  • ARROW-11606 - [Rust] [DataFusion] Need guidance on HashAggregateExec reconstruction
  • ARROW-11610 - [C++] Download boost from sourceforge instead of bintray
  • ARROW-11612 - [C++] Rebuild trimmed boost bundle for 1.75.0
  • ARROW-11613 - [R] Move nightly C++ builds off of bintray
  • ARROW-11616 - [Rust][DataFusion] Expose collect_partitioned for DataFrame
  • ARROW-11621 - [CI][Gandiva][Linux] Fix Crossbow setup failure
  • ARROW-11626 - [Rust][DataFusion] Move DataFusion examples to own project to reduce nr dependencies
  • ARROW-11627 - [Rust] Typed allocator
  • ARROW-11637 - [CI][Conda] Update nightly clean target platforms and packages list
  • ARROW-11641 - [CI] Use docker buildkit's inline cache to reuse build cache across different hosts
  • ARROW-11649 - [R] Add support for null_fallback to R
  • ARROW-11651 - [Rust][DataFusion] Implement Postgres Length Functions
  • ARROW-11653 - Ascii/unicode functions
  • ARROW-11655 - Pad/trim functions
  • ARROW-11656 - Left over functions/fixes
  • ARROW-11659 - [R] Preserve group_by .drop argument
  • ARROW-11662 - [C++] Support sorting for decimal data type.
  • ARROW-11664 - [Rust] Cast to LargeUtf8
  • ARROW-11665 - [Python] Document precision and scale parameters of decimal128()
  • ARROW-11666 - [Integration] Add endianness “gold” integration file for decimal256
  • ARROW-11667 - [Rust] Add docs for utf8 comparison functions
  • ARROW-11669 - [Rust] [DataFusion] Remove concurrency field from GlobalLimitExec
  • ARROW-11671 - [Rust][DataFusion] Clean up docs on Expr
  • ARROW-11677 - [C++][Dataset] Write documentation
  • ARROW-11680 - [C++] Add vendored version of folly's spsc queue
  • ARROW-11683 - [R] Support dplyr::mutate()
  • ARROW-11685 - [C++] Typo in future_test.cc
  • ARROW-11688 - [Rust] Casts between utf8 and large-utf8
  • ARROW-11690 - [Rust][DataFusion] Avoid Expr::clone in Expr builder methods
  • ARROW-11692 - [Rust][DataFusion] Improve documentation on Optimizer
  • ARROW-11693 - [C++] Add string length kernel
  • ARROW-11700 - [R] Internationalize error handling in tidy eval
  • ARROW-11701 - [R] Implement dplyr::relocate()
  • ARROW-11703 - [R] Implement dplyr::arrange()
  • ARROW-11704 - [R] Wire up dplyr::mutate() for datasets
  • ARROW-11707 - Support CSV schema inference without IO
  • ARROW-11708 - Clean up Rust 2021 linting warning
  • ARROW-11709 - [Rust][DataFusion] Move `expressions` and `inputs` into LogicalPlan rather than helpers in util
  • ARROW-11710 - [Rust][DataFusion] Implement ExprRewriter to avoid tree traversal redundancy
  • ARROW-11719 - Support merged schema for memory table
  • ARROW-11721 - json schema inference should return Schema type instead of SchemaRef
  • ARROW-11722 - Improve error message in FFI
  • ARROW-11724 - [C++] Namespace collisions with protobuf 3.15
  • ARROW-11725 - [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow
  • ARROW-11727 - [C++][FlightRPC] Use TDigest to estimate latency quantiles in benchmark
  • ARROW-11730 - [C++] Add implicit Future(Status) constructor for convenience
  • ARROW-11733 - [Rust][DataFusion] Support hash repartitioning
  • ARROW-11734 - [C++] vendored safe-math.h does not compile on Solaris
  • ARROW-11735 - [R] Allow Parquet and Arrow Dataset to be optional components
  • ARROW-11736 - [R] Allow string compute functions to be optional
  • ARROW-11737 - [C++] Patch vendored xxhash for Solaris
  • ARROW-11738 - [Rust][DataFusion] Concat Functions
  • ARROW-11740 - [C++] posix_memalign not declared in scope on Solaris
  • ARROW-11742 - [Rust] [DataFusion] Add Expr::is_null and Expr::is_not_null functions
  • ARROW-11744 - [C++] Add xsimd dependency
  • ARROW-11745 - [C++] Improve configurability of random data generation
  • ARROW-11750 - [Python][Dataset] Add support for project expressions
  • ARROW-11752 - [R] Replace usage of testthat::expect_is()
  • ARROW-11753 - [Rust][DataFusion] Add test for Join Statement: Schema contains duplicate unqualified field name
  • ARROW-11754 - [R] Support dplyr::compute()
  • ARROW-11761 - [C++] Increase public API testing
  • ARROW-11766 - [R] Better handling for missing compression codecs on Linux
  • ARROW-11768 - [C++][CI] Make s390x build non-optional
  • ARROW-11773 - [Rust] Allow json writer to write out JSON arrays as well as newline formatted objects
  • ARROW-11774 - [R] one-line install from source on macOS
  • ARROW-11775 - [Rust][DataFusion] Feature Flags for Dependencies
  • ARROW-11777 - [Rust] impl AsRef for StringBuilder/BinaryBuilder
  • ARROW-11778 - Cast from large-utf8 to numerical arrays
  • ARROW-11779 - [Rust] make alloc module public
  • ARROW-11790 - [Rust][DataFusion] Change plan builder signature to take Vec<Expr> rather than &[Expr]
  • ARROW-11794 - [Go] Add concurrent-safe ipc.FileReader.RecordAt(i)
  • ARROW-11795 - [MATLAB] Migrate MATLAB Interface for Apache Arrow design doc to Markdown
  • ARROW-11797 - [C++][Dataset] Provide Scanner methods to yield/visit scanned batches
  • ARROW-11798 - [Integration] Update testing submodule
  • ARROW-11799 - [Rust] String and Binary arrays created with incorrect length from unbound iterator
  • ARROW-11801 - [C++] Remove bad header guard in filesystem/type_fwd.h
  • ARROW-11803 - [Rust] [Parquet] Support v2 LogicalType
  • ARROW-11806 - [Rust][DataFusion] Optimize inner join creation of indices
  • ARROW-11820 - Added macro create_native to construct impl
  • ARROW-11822 - Support case sensitive for function
  • ARROW-11824 - [Rust] [Parquet] Use logical types in Arrow writer
  • ARROW-11825 - [Rust][DataFusion] Add mimalloc as option to benchmarks
  • ARROW-11833 - [C++] Vendored fast_float errors for emscripten (architecture flag missing)
  • ARROW-11837 - [C++][Dataset] Expose originating fragment as a property of ScanTask
  • ARROW-11838 - [C++] Support reading IPC data with shared dictionaries
  • ARROW-11839 - [C++] Rewrite bit-unpacking optimizations using xsimd
  • ARROW-11842 - [Rust][Parquet] Use more efficient clone_from in get_batch_with_dict
  • ARROW-11852 - [Documentation] Update CONTRIBUTING to explain Contributor role
  • ARROW-11856 - [C++] Remove unused reference to RecordBatchStreamWriter
  • ARROW-11858 - [GLib] Gandiva Filter in GLib
  • ARROW-11859 - [GLib] GArrowArray: concatenate is missing
  • ARROW-11864 - [R] Document arrow.int64_downcast option
  • ARROW-11870 - [Dev] Automatically run merge script in venv
  • ARROW-11876 - [Website] Update governance page
  • ARROW-11877 - [C++] Add initial microbenchmarks for Dataset internals
  • ARROW-11879 - [Rust][DataFusion] ExecutionContext::sql should optimize query plan
  • ARROW-11883 - [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map
  • ARROW-11887 - [C++] Add asynchronous read to streaming CSV reader
  • ARROW-11894 - [Rust][DataFusion] Change flight server example to use DataFrame API
  • ARROW-11895 - [Rust][DataFusion] Add support for extra column statistics
  • ARROW-11898 - [Rust] Pretty print columns
  • ARROW-11899 - [Java] Refactor the compression codec implementation into core/Arrow specific parts
  • ARROW-11900 - [Website] Add Yibo to committer list
  • ARROW-11906 - [R] Make FeatherReader print method more informative
  • ARROW-11907 - [C++] Use our own executor in S3FileSystem
  • ARROW-11910 - [Packaging][Ubuntu] Drop support for 16.04
  • ARROW-11911 - [Website] Add protobuf vs arrow to FAQ
  • ARROW-11912 - [R] Remove args from FeatherReader$create
  • ARROW-11913 - [Rust] Improve performance of StringBuilder
  • ARROW-11920 - [R] Add r/libarrow to make clean
  • ARROW-11921 - [R] Set LC_COLLATE in r/data-raw/codegen.R
  • ARROW-11924 - [C++] Provide streaming output from GetFileInfo
  • ARROW-11925 - [R] Add `between` method for arrow_dplyr_query
  • ARROW-11927 - [Rust][DataFusion] Support limit push down
  • ARROW-11931 - [Go][CI] Bump CI to use Go 1.15
  • ARROW-11935 - [C++] Add push generator
  • ARROW-11944 - [Developer] Achery benchmark diff regression: cannot compare jsons
  • ARROW-11949 - [Ruby] Accept raw Ruby objects as sort key and options
  • ARROW-11951 - [Rust] Remove OffsetSize::prefix
  • ARROW-11952 - [Rust] Make ArrayData --> GenericListArray fallable instead of `panic!`
  • ARROW-11954 - [C++] arrow/util/io_util.cc does not compile on Solaris
  • ARROW-11955 - [Rust][DataFusion] Support Union
  • ARROW-11958 - [GLib] GArrowChunkedArray: combine is missing
  • ARROW-11959 - [Rust][DataFusion] Fix logging of optimized plan
  • ARROW-11962 - [Rust][DataFusion] Update Datafusion Docs / readme
  • ARROW-11969 - [Rust][DataFusion] Improve Examples in documentation
  • ARROW-11972 - [C++][Dataset] Extract IpcFragmentScanOptions, ParquetFragmentScanOptions
  • ARROW-11973 - [Rust] Boolean AND/OR kernels should follow sql behaviour regarding null values
  • ARROW-11977 - [Rust] Add documentation examples for sort kernel
  • ARROW-11982 - [Rust] Donate Ballista Distributed Compute Platform
  • ARROW-11984 - [C++][Gandiva] Implement SHA1 and SHA256 functions
  • ARROW-11987 - [C++][Gandiva] Implement trigonometric functions on Gandiva
  • ARROW-11988 - [C++][Gandiva] Implements the last_day function
  • ARROW-11992 - [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType #9731
  • ARROW-11993 - [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
  • ARROW-11996 - [R] Make r/configure run successfully on Solaris
  • ARROW-11999 - [Java] Support parallel vector element search with user-specified comparator
  • ARROW-12000 - [Documentation] Add note about deviation from style guide on struct/classes
  • ARROW-12005 - [R] Fix a bash typo in configure
  • ARROW-12017 - [R] [Documentation] Make proper developing arrow docs
  • ARROW-12019 - [Rust] [Parquet] Update README for 2.6.0 support
  • ARROW-12020 - [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion
  • ARROW-12031 - [C++][CSV] infer CSV timestamps columns with fractional seconds
  • ARROW-12032 - [Rust] Optimize comparison kernels using trusted_len iterator for bools
  • ARROW-12034 - [Docs] Formalize Minor PRs
  • ARROW-12037 - [Rust] [DataFusion] Support catalogs and schemas for table namespacing
  • ARROW-12038 - [Rust][DataFusion] Upgrade hashbrown to 0.11
  • ARROW-12039 - [CI][C++][Gandiva] Fix gandiva nightly linux build failure
  • ARROW-12040 - [R] [CI] [C++] test-r-rstudio-r-base-3.6-opensuse15 timing out during tests
  • ARROW-12043 - [Rust] [Parquet] Write fixed size binary arrays
  • ARROW-12045 - First Chunk of ported Parquet Code
  • ARROW-12047 - [Rust] Clippy parquet
  • ARROW-12048 - [Rust][DataFusion] Support Common Table Expressions
  • ARROW-12052 - [Rust] Implement child data in C FFI
  • ARROW-12056 - [C++] Create sequencing AsyncGenerator
  • ARROW-12058 - [Python] Enable arithmetic operations on Expressions
  • ARROW-12068 - [Python] Stop using distutils
  • ARROW-12069 - [C++][Gandiva]Implement IN expressions for Decimal types
  • ARROW-12070 - [GLib] Drop support for GNU Autotools
  • ARROW-12071 - [GLib] Keep input stream reference of GArrowJSONReader
  • ARROW-12075 - [Rust][DataFusion] Add CTE to list of supported features
  • ARROW-12081 - [R] Bindings for utf8_length
  • ARROW-12082 - [R][Dataset] Allow create dataset from vector of file paths
  • ARROW-12094 - [C++][R] Fix/workaround re2 building on clang/libc++
  • ARROW-12097 - [C++] Modify BackgroundGenerator so it creates fewer threads
  • ARROW-12098 - [R] Catch cpp build failures on linux
  • ARROW-12104 - Next Chunk of ported Code
  • ARROW-12106 - [Rust][DataFusion] Support `SELECT * from information_schema.tables`
  • ARROW-12107 - [Rust][DataFusion] Support `SELECT * from information_schema.columns`
  • ARROW-12108 - [Rust][DataFusion] Support `SHOW TABLES`
  • ARROW-12109 - [Rust][DataFusion] Support `SHOW COLUMNS`
  • ARROW-12110 - [Java] Implement ZSTD buffer compression for java
  • ARROW-12111 - [Java] place files generated by flatc under source control
  • ARROW-12116 - [Rust] Fix or ignore 1.51 clippy lints
  • ARROW-12119 - [Rust][DataFusion] Improve performance of to_array_of_size
  • ARROW-12120 - [Rust] Generate random arrays and batches
  • ARROW-12121 - [Rust] [Parquet] Arrow writer benchmarks
  • ARROW-12123 - [Rust][DataFusion] Use smallvec for indices for better join performance
  • ARROW-12128 - [CI][Crossbow] Remove (or fix) test-ubuntu-16.04-cpp job
  • ARROW-12131 - [CI][GLib] Ensure upgrading MSYS2
  • ARROW-12133 - [C++][Gandiva] Add option to disable setting mcpu flag to host cpu during llvm ir compilation
  • ARROW-12134 - [C++] Add regex string match kernel
  • ARROW-12136 - [Rust][DataFusion] Reduce default batch_size to 8192
  • ARROW-12139 - [Python][Packaging] Use vcpkg to build macOS wheels
  • ARROW-12141 - [R] Bindings for grepl
  • ARROW-12143 - [CI] R builds should timeout and fail after some threshold and dump the output.
  • ARROW-12146 - [C++][Gandiva] Implement CONVERT_FROM(expression, ‘UTF8’, replacement char) function
  • ARROW-12151 - [Docs] Add Jira component + summary conventions to the docs
  • ARROW-12153 - [Rust] [Parquet] Return file metadata after writing Parquet file
  • ARROW-12160 - [Rust] Add an `into_inner()` method to ipc::writer::StreamWriter
  • ARROW-12164 - [Java] Make BaseAllocator.Config public
  • ARROW-12165 - [Rust] Inline append functions in builders for performance
  • ARROW-12168 - [Go][IPC] Implement Compression handling for IPC
  • ARROW-12170 - [Rust][DataFusion] Introduce repartition optimization
  • ARROW-12173 - [GLib] Remove #include <config.h>
  • ARROW-12176 - parquet/low-level-api/reader-writer.cc has some typos.
  • ARROW-12187 - [C++][FlightRPC] Enable compression in Flight benchmark
  • ARROW-12188 - [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
  • ARROW-12190 - [Rust][DataFusion] Implement partitioned hash join
  • ARROW-12192 - [Website] Use downloadable URL for archive download
  • ARROW-12193 - [Dev][Release] Use downloadable URL for archive download
  • ARROW-12194 - [Rust] [Parquet] Update zstd version
  • ARROW-12197 - [R] dplyr bindings for cast, dictionary_encode
  • ARROW-12200 - [R] Export and document list_compute_functions
  • ARROW-12204 - [Rust][CI] Reduce size of rust build artifacts in integration test
  • ARROW-12206 - [Python] Fix Table docstrings
  • ARROW-12208 - [C++] Add the ability to run async tasks without using the CPU thread pool
  • ARROW-12210 - [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / InformationSchema
  • ARROW-12214 - [Rust][DataFusion] Add some tests for limit
  • ARROW-12215 - [C++] fixed size binary columns cannot be null in CSV reader
  • ARROW-12217 - [C++] Cleanup cpp examples source file names
  • ARROW-12222 - [Dev][Packaging] Include build url in the crossbow console report
  • ARROW-12224 - [Rust] Use stable rust for no default test, clean up CI tests
  • ARROW-12228 - [CI] Create base image for conda environments
  • ARROW-12236 - [R][CI] Add check that all docs pages are listed in _pkgdown.yml
  • ARROW-12237 - [Packaging][Debian] Add support for bulleye
  • ARROW-12238 - [JS] Remove trailing spaces
  • ARROW-12239 - [JS] Switch to yarn
  • ARROW-12242 - [Python][Doc] Tweak nightly build instructions
  • ARROW-12246 - [CI] Sync conda recipes with upstream feedstock
  • ARROW-12248 - [C++] Allow static builds to change memory allocators
  • ARROW-12249 - [R] [CI] Fix test-r-install-local nightlies
  • ARROW-12251 - [Rust] [Ballista] Add Ballista tests to CI
  • ARROW-12263 - [Dev][Packaging] Move Crossbow to Archery
  • ARROW-12269 - [JS] Move to eslint
  • ARROW-12274 - [JS] Document how to run tests without building
  • ARROW-12277 - [Rust][DataFusion] Min/Max are not supported for timestamp types
  • ARROW-12278 - [Rust][DataFusion]Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type
  • ARROW-12280 - [Developer] Remove @-mentions from commit messages in merge tool
  • ARROW-12281 - [JS] Remove shx, trash, and rimraf
  • ARROW-12283 - [R] Bindings for basic type convert functions in dplyr verbs
  • ARROW-12286 - [C++] Create AsyncGenerator from Future<AsyncGenerator<T>>
  • ARROW-12287 - [C++] Create enumerating generator
  • ARROW-12288 - [C++] Create Scanner interface
  • ARROW-12289 - [C++] Create basic AsyncScanner implementation
  • ARROW-12303 - [JS] Use iterators instead of generators in critical code paths
  • ARROW-12304 - [R] Update news and polish docs for 4.0
  • ARROW-12305 - [JS] Benchmark test data generate.py assumes python 2
  • ARROW-12309 - [JS] Make es2015 bundles the default
  • ARROW-12316 - [C++] Switch default memory allocator from jemalloc to mimalloc on macOS
  • ARROW-12317 - [Rust] JSON writer does not support time, date or interval types
  • ARROW-12320 - [CI] REPO arg missing from conda-cpp-valgrind
  • ARROW-12323 - [C++][Gandiva] Implement castTIME(timestamp) function
  • ARROW-12325 - [C++] [CI] Nightly gandiva build failing due to failure of compiler to move return value
  • ARROW-12326 - [C++] Avoid needless c-ares detection
  • ARROW-12328 - [Rust] [Ballista] Fix code formatting
  • ARROW-12329 - [Rust] [Ballista] Add README
  • ARROW-12332 - [Rust] [Ballista] Api server for scheduler
  • ARROW-12333 - [JS] Remove jest-environment-node-debug and do not emit from typescript by default
  • ARROW-12335 - [Rust] [Ballista] Bump DataFusion version
  • ARROW-12337 - add DoubleEndedIterator and ExactSizeIterator traits
  • ARROW-12351 - [CI][Ruby] Use ruby/setup-ruby instead of actions/setup-ruby
  • ARROW-12352 - [CI][R][Windows] Remove needless workaround for MSYS2
  • ARROW-12353 - [Packaging][deb] Rename -archive-keyring to -apt-source
  • ARROW-12354 - [Packaging][RPM] Use apache.jfrog.io/artifactory/ instead of apache.bintray.com/
  • ARROW-12356 - [Website] Update install page instructions to point to artifactory
  • ARROW-12361 - [Rust] [DataFusion] Allow users to override physical optimization rules
  • ARROW-12367 - [C++] Stop producing when PushGenerator was destroyed
  • ARROW-12370 - [R] Bindings for power kernel
  • ARROW-12374 - [CI][C++][cron] Use Ubuntu 20.04 instead of 16.04
  • ARROW-12375 - [Release] Remove rebase post-release scripts
  • ARROW-12376 - [Dev] archery trigger-bot should use logger.exception
  • ARROW-12380 - [Rust][Ballista] Add scheduler ui
  • ARROW-12381 - [Packaging][Python] macOS wheels are built with wrong package kind
  • ARROW-12383 - [JS] Update direct deps
  • ARROW-12384 - [JS] Improve code style
  • ARROW-12389 - [R] [Docs] Add note about autocasting
  • ARROW-12395 - [C++]: Create RunInSerialExecutor benchmark
  • ARROW-12396 - [Python][Docs] Clarify serialization docstrings about deprecated status
  • ARROW-12397 - [Rust] [DataFusion] Simplify readme example #10038
  • ARROW-12398 - [Rust] Remove double bound checks in iterators
  • ARROW-12400 - [Rust] Re-enable transform module tests
  • ARROW-12402 - [Rust] [DataFusion] Implement SQL metrics framework
  • ARROW-12406 - [R] fix checkbashims violation in configure
  • ARROW-12409 - [R] Remove LazyData from DESCRIPTION
  • ARROW-12419 - [Java] flatc is not used in mvn
  • ARROW-12420 - [C++/Dataset] Reading null columns as dictionary not longer possible
  • ARROW-12423 - [Docs] Codecov badge in main Readme only applies to Rust
  • ARROW-12425 - [Rust] new_null_array doesn't allocate keys buffer for dictionary arrays
  • ARROW-12432 - [Rust] [DataFusion] Add metrics for SortExec
  • ARROW-12436 - [Rust][Ballista] Add watch capabilities to config backend trait
  • ARROW-12467 - [C++][Gandiva] Add support for LLVM12
  • ARROW-12477 - [Release] Download linux aarch64 miniforge in verify-release-candidate.sh
  • ARROW-12485 - [C++] Use mimalloc as the default memory allocator on macOS
  • ARROW-12488 - [GLib] Use g_memdup2() with GLib 2.68 or later
  • ARROW-12494 - [C++] ORC adapter fails to compile on GCC 4.8
  • PARQUET-1846 - [C++] Remove deprecated IO classes and related functions
  • PARQUET-1899 - [C++] Deprecated ReadBatchSpaced in parquet/column_reader
  • PARQUET-1990 - [C++] ConvertedType::NA is written out in some cases
  • PARQUET-1993 - [C++] Expose when prefetching completes
  • PARQUET-1998 - [C++] Implement LZ4_RAW compression

Bug Fixes

  • ARROW-4784 - [C++][CI] Re-enable flaky mingw tests.
  • ARROW-6818 - [Doc] Format docs confusing
  • ARROW-7288 - [C++][R] read_parquet() freezes on Windows with Japanese locale
  • ARROW-7830 - [C++] Parquet library version doesn't change with releases
  • ARROW-9451 - [Python] Unsigned integer types will accept string values in pyarrow.array
  • ARROW-9634 - [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow
  • ARROW-9878 - [Python] table to_pandas self_destruct=True + split_blocks=True cannot prevent doubling memory
  • ARROW-10038 - [C++] SetCpuThreadPoolCapacity(1) spins up nCPUs threads
  • ARROW-10056 - [C++] Increase flatbuffers max_tables parameter in order to read wide tables
  • ARROW-10364 - [Dev][Archery] Test is failed with semver 2.13.0
  • ARROW-10370 - [Python] Spurious s3fs-related test failures
  • ARROW-10403 - [C++] Implement unique kernel for dictionary type
  • ARROW-10405 - [C++] IsIn kernel should be able to lookup dictionary in string
  • ARROW-10457 - [CI] Fix Spark branch-3.0 integration tests
  • ARROW-10489 - [C++] Unable to configure or make with intel compiler
  • ARROW-10514 - [C++][Parquet] Data inconsistency in parquet-reader output modes
  • ARROW-10953 - [R] Validate when creating Table with schema
  • ARROW-11066 - [Java] Is there a bug in flight AddWritableBuffer
  • ARROW-11066 - [Java] Is there a bug in flight AddWritableBuffer
  • ARROW-11066 - [Java] Is there a bug in flight AddWritableBuffer
  • ARROW-11066 - [Java] Is there a bug in flight AddWritableBuffer
  • ARROW-11066 - [Java] Is there a bug in flight AddWritableBuffer
  • ARROW-11134 - [C++][CI] ARM64 job on Travis-CI doesn't run tests
  • ARROW-11147 - [Python][CI] Parquet tests failing in nightly build with Dask master
  • ARROW-11180 - [Developer] cmake-format pre-commit hook doesn't run
  • ARROW-11192 - [Documentation] Describe opening Visual Studio so it inherits a working env
  • ARROW-11223 - [Java] BaseVariableWidthVector/BaseLargeVariableWidthVector setNull and getBufferSizeFor is buggy
  • ARROW-11235 - [Python] S3 test failures inside non-default regions
  • ARROW-11239 - [Rust] array::transform::tests::test_struct failed
  • ARROW-11269 - [Rust] Unable to read Parquet file because of mismatch in column-derived and embedded schemas
  • ARROW-11277 - [C++] Fix compilation error in dataset expressions on macOS 10.11
  • ARROW-11299 - [Python] build warning in python
  • ARROW-11303 - [Release][C++] Enable mimalloc in the windows verification script
  • ARROW-11305 - [Rust]: parquet-rowcount binary tries to open itself as a parquet file
  • ARROW-11311 - [Rust] unset_bit is toggling bits, not unsetting them
  • ARROW-11313 - [Rust] Size hint of iterators is incorrect
  • ARROW-11315 - [Packaging][APT][arm64] Add missing gir1.2 files
  • ARROW-11320 - [C++] Spurious test failure when creating temporary dir
  • ARROW-11322 - [Rust] Arrow `memory` made private is a breaking API change
  • ARROW-11323 - [Rust][DataFusion] ComputeError(“concat requires input of at least one array”)) with queries with ORDER BY or GROUP BY that return no
  • ARROW-11328 - [R] Collecting zero columns from a dataset returns entire dataset
  • ARROW-11334 - [Python][CI] Nightly pandas builds failing because of internal pandas change
  • ARROW-11337 - [C++] Compilation error with ThreadSanitizer
  • ARROW-11357 - [Rust] take primitive implementation is unsound
  • ARROW-11376 - [C++] ThreadedTaskGroup failure with Thread Sanitizer enabled
  • ARROW-11379 - [C++][Dataset] Reading dataset with filtering on timestamp partition field crashes
  • ARROW-11387 - [Rust] Arrow 3.0.0 release with simd feature doesn't compile without feature=avx512.
  • ARROW-11391 - [C++] HdfsOutputStream::Write unsafely truncates integers exceeding INT32_MAX
  • ARROW-11394 - [Rust] Slice + Concat incorrect for structs
  • ARROW-11400 - [Python] Pickled ParquetFileFragment has invalid partition_expresion with dictionary type in pyarrow 2.0
  • ARROW-11403 - [Developer] archery benchmark list: unexpected keyword ‘benchmark_filter’
  • ARROW-11412 - [Python] Expressions not working with logical boolean operators (and, or, not)
  • ARROW-11412 - [Python] Expressions not working with logical boolean operators (and, or, not)
  • ARROW-11427 - [C++] Arrow uses AVX512 instructions even when not supported by the OS
  • ARROW-11448 - [C++] tdigest build failure on Windows with Visual Studio
  • ARROW-11451 - [C++] Fix gcc-4.8 build error
  • ARROW-11452 - [Rust] Parquet reader cannot read file where a struct column has the same name as struct member columns
  • ARROW-11461 - [Flight][Go] GetSchema does not work with Java Flight Server
  • ARROW-11464 - [Python] pyarrow.parquet.read_pandas doesn't conform to its docs
  • ARROW-11470 - [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColumnMajorStrides, and CheckTensorStridesValidity
  • ARROW-11472 - [Python][CI] Kartothek integrations build is failing with numpy 1.20
  • ARROW-11472 - [Python][CI] Kartothek integrations build is failing with numpy 1.20
  • ARROW-11480 - [Python] Segmentation fault reading parquet with date filter with INT96 column
  • ARROW-11483 - [Java][C++][Integration] C++ integration test creates JSON files incompatible with Java
  • ARROW-11488 - [Rust]: StructBuilder's Drop impl leaks memory
  • ARROW-11490 - [C++] BM_ArrowBinaryDict/EncodeLowLevel is not deterministic
  • ARROW-11494 - [Rust] Fix take bench
  • ARROW-11497 - [Python] pyarrow parquet writer for list does not conform with Apache Parquet specification
  • ARROW-11538 - [Python] Segfault reading Parquet dataset with Timestamp filter
  • ARROW-11547 - [Packaging][Conda][Drone] Nightly builds are failed by undefined variable error
  • ARROW-11548 - [C++] RandomArrayGenerator::List size mismatch
  • ARROW-11551 - [C++][Gandiva] castTIMESTAMP(utf8) function doesn't show error out for invalid inputs
  • ARROW-11560 - [FlightRPC][C++][Python] Interrupting a Flight server results in abort
  • ARROW-11567 - [C++][Compute] Variance kernel has precision issue
  • ARROW-11577 - [Rust] Concat kernel panics on slices of string arrays
  • ARROW-11582 - [R] write_dataset “format” argument default and validation could be better
  • ARROW-11586 - [Rust] [Datafusion] Invalid SQL sometimes panics
  • ARROW-11595 - [C++][NIGHTLY:test-conda-cpp-valgrind] GenerateBitsUnrolled triggers valgrind on uninit inputs
  • ARROW-11596 - [Python][Dataset] SIGSEGV when executing scan tasks with Python executors
  • ARROW-11603 - [Rust] Fix clippy error
  • ARROW-11607 - [Python] Error when reading table with list values from parquet
  • ARROW-11614 - [C++][Gandiva] Fix round() logic to return positive zero when argument is zero
  • ARROW-11617 - [C++][Gandiva] Fix nested if-else optimisation in gandiva
  • ARROW-11620 - [Rust] [DataFusion] Inconsistent use of Box and Arc for TableProvider
  • ARROW-11630 - [Rust] Introduce partial_sort and limit option for sort kernel
  • ARROW-11632 - [Rust] csv::Reader doesn't propagate schema metadata to RecordBatches
  • ARROW-11639 - [C++][Gandiva] Fix signbit compilation issue in Ubuntu nightly build
  • ARROW-11642 - [C++] Incorrect preprocessor directive for Windows in JVM detection
  • ARROW-11657 - [R] group_by with .drop specified errors
  • ARROW-11658 - [R] Handle mutate/rename inside group_by
  • ARROW-11663 - [DataFusion] Master does not compile
  • ARROW-11668 - [C++] Sporadic UBSAN error in FutureStessTest.TryAddCallback
  • ARROW-11672 - [R] Fix string function test failure on R 3.3
  • ARROW-11681 - [Rust] IPC writers shouldn't unwrap in destructors
  • ARROW-11686 - [C++]flight-test-integration-client sometimes exits by SIGABRT but does not print the stack trace
  • ARROW-11687 - [Rust][DataFusion] RepartitionExec Hanging
  • ARROW-11694 - [C++] Array Take may dereference absent null bitmap
  • ARROW-11695 - [C++][FlightRPC][Packaging] Update support for disabling TLS server verification for recent gRPC versions
  • ARROW-11717 - [Integration] Intermittent (but frequent) flight integration failures with auth:basic_proto
  • ARROW-11718 - [Rust] IPC writers shouldn't implicitly finish on drop
  • ARROW-11741 - [C++] Decimal cast failure on big-endian
  • ARROW-11743 - [R] Use pkgdown's new found ability to autolink Jiras
  • ARROW-11746 - [Developer][Archery] Fix prefer real time check
  • ARROW-11756 - [R] passing a partition as a schema leads to segfaults
  • ARROW-11758 - [C++][Compute] Summation kernel round-off error
  • ARROW-11767 - [C++] Scalar::hash may segfault for null scalars
  • ARROW-11771 - [Developer][Archery] Move benchmark tests (so CI runs them)
  • ARROW-11784 - [Rust][DataFusion] CoalesceBatchesStream doesn't honor Stream interface
  • ARROW-11785 - [R] Fallback when filtering Table with unsupported expression fails
  • ARROW-11786 - [C++] CMake output noisy
  • ARROW-11788 - [Java] Appending Empty List Vector yields NPE
  • ARROW-11791 - [Rust][DataFusion] RepartitionExec Blocking
  • ARROW-11802 - [Rust][DataFusion] Mixing of crossbeam channel and async tasks can lead to deadlock
  • ARROW-11819 - [Rust] Add link to the doc
  • ARROW-11821 - [Rust] Edit Rust README
  • ARROW-11830 - [C++] gRPC compilation tests occur every time
  • ARROW-11832 - [R] Handle conversion of extra nested struct column
  • ARROW-11836 - Target libarrow_bundled_dependencies.a is not alreay created but is already required.
  • ARROW-11845 - [Rust] Debug implementation of Date32Array panics if array contains negative values
  • ARROW-11850 - [GLib] GARROW_VERSION_0_16 macro is missing
  • ARROW-11855 - [C++] [Python] Memory leak in to_pandas when converting chunked struct array
  • ARROW-11857 - [Python] Resource temporarily unavailable when using the new Dataset API with Pandas
  • ARROW-11860 - [Rust] [DataFusion] Add DataFusion logos
  • ARROW-11866 - [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
  • ARROW-11872 - [C++] Array Validation of GPU buffers fails due to incorrect validation check
  • ARROW-11880 - [R] Handle empty or NULL transmute() args properly
  • ARROW-11881 - [Rust][DataFusion] Fix Clippy Lint
  • ARROW-11896 - [Rust] Hang / failure in CI on AMD64 Debian 10 Rust stable test workspace
  • ARROW-11904 - [C++] “pure virtual method called” crash at the end of arrow-csv-test
  • ARROW-11905 - [C++] SIMD info always returning none on MacOS
  • ARROW-11914 - [R] [CI] r-sanitizer nightly is broken
  • ARROW-11918 - [R] [Documentation] Docs cleanups
  • ARROW-11923 - [CI] Update branch name for dask dev integration tests
  • ARROW-11937 - [C++] GZip codec hangs if flushed twice
  • ARROW-11941 - [Dev] “DEBUG=1 merge_arrow_pr.py” updates Jira issue
  • ARROW-11942 - [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads
  • ARROW-11945 - [R] filter doesn't accept negative numbers as valid
  • ARROW-11956 - [C++] Fix system re2 dependency detection for static library
  • ARROW-11965 - [R][Docs] Fix install.packages command in R dev docs
  • ARROW-11970 - [C++][CI] Fix Valgrind failures
  • ARROW-11971 - [Packaging] Vcpkg patch doesn't apply on windows due to line endings
  • ARROW-11975 - [CI][GLib] Failed to update gcc
  • ARROW-11976 - [C++] Sporadic TSAN error in TestThreadPool.SetCapacity
  • ARROW-11983 - [Python] ImportError calling pyarrow from_pandas within ThreadPool
  • ARROW-11997 - [Python] concat_tables crashes python interpreter
  • ARROW-12003 - [R] Fix NOTE re undefined global function group_by_drop_default
  • ARROW-12006 - [Java] Fix checkstyle config to work on Windows
  • ARROW-12012 - [Java] [JDBC] BinaryConsumer cannot reallocate memory correctly
  • ARROW-12013 - [C++][FlightRPC] Failed to detect gRPC version
  • ARROW-12015 - [Rust] [DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
  • ARROW-12028 - [Rust][DataFusion] Unsupported GROUP BY for Timestamp(Millisecond, None)
  • ARROW-12029 - Remove args from FeatherReader$create v2
  • ARROW-12033 - [Docs] Fix link in developers/benchmarks.html
  • ARROW-12041 - [C++] Fix string description of tensor IPC messages
  • ARROW-12051 - [GLib] Intermittent CI failure in test_add_column_type(TestCSVReader::#read::options)
  • ARROW-12057 - [Python] Remove direct usage of pandas' Block subclasses
  • ARROW-12065 - [C++][Python] Segfault reading JSON file
  • ARROW-12067 - [Python][Doc] Document pyarrow_(un)wrap_scalar
  • ARROW-12073 - [R] Fix R CMD check NOTE about ‘X_____X’
  • ARROW-12076 - [Rust] Fix build
  • ARROW-12077 - [C++] Out-of-bounds write in ListArray::FromArrays
  • ARROW-12086 - [C++] offline builds does not use ARROW_$LIBRARY_URL to search for packages
  • ARROW-12088 - [Python][C++] Warning about offsetof in pyarrow.dataset.RecordBatchIterator
  • ARROW-12089 - [Doc] Fix warnings when building Sphinx docs
  • ARROW-12100 - [C#] Cannot round-trip record batch with PyArrow
  • ARROW-12103 - [C++] “load of misaligned address” in Parquet reader
  • ARROW-12112 - [CI] No space left on device - AMD64 Conda Integration test
  • ARROW-12112 - [CI] No space left on device - AMD64 Conda Integration test
  • ARROW-12113 - [R] Fix rlang deprecation warning from check_select_helpers()
  • ARROW-12130 - [C++] Arm64 buid failed if -DARROW_SIMD_LEVEL=NONE
  • ARROW-12138 - [Go][IPC]
  • ARROW-12140 - [C++][CI] Valgrind failure on Grouper tests
  • ARROW-12145 - [Developer][Archery] Flaky test: test_static_runner_from_json
  • ARROW-12149 - [Dev] Archery benchmark test case is failing
  • ARROW-12154 - [C++][Gandiva] Fix gandiva crash in certain OS/CPU combinations
  • ARROW-12155 - [R] Require Table columns to be same length
  • ARROW-12161 - [C++][R] Async streaming CSV reader deadlocking when being run synchronously from datasets
  • ARROW-12161 - [C++][R] Async streaming CSV reader deadlocking when being run synchronously from datasets
  • ARROW-12169 - [C++] Fix compressed file reading with an empty stream at end of file
  • ARROW-12171 - [Rust] Clippy error
  • ARROW-12172 - [Python][Packaging] Pass python version as setuptools pretend version in the macOS wheel builds
  • ARROW-12178 - [CI] Update setuptools in the ubuntu images
  • ARROW-12186 - [Rust][DataFusion] Fix regexp_match test
  • ARROW-12209 - [JS] @apache-arrow/ts nor apache-arrow does not compile
  • ARROW-12220 - [C++][CI] Thread sanitizer failure
  • ARROW-12226 - [C++] ASAN error in s3fs_test.cc
  • ARROW-12227 - [R] Fix RE2 and median nightly build failures
  • ARROW-12235 - [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions
  • ARROW-12241 - [Python] Parallel csv reader cancellation test kills pytest
  • ARROW-12250 - [Rust] Failing test arrow::arrow_writer::tests::fixed_size_binary_single_column
  • ARROW-12254 - [Rust][DataFusion] Limit keeps polling input after limit is reached
  • ARROW-12258 - [R] Never do as.data.frame() on collect(as_data_frame = FALSE)
  • ARROW-12262 - [Doc][C++][Python] Docs built and pushed with S3 and Flight disabled
  • ARROW-12267 - [Rust] JSON writer does not support timestamp types
  • ARROW-12273 - [JS] Coveralls does not work anymore
  • ARROW-12279 - [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)
  • ARROW-12294 - [Rust] Fix Boolean Kleene Kernels with no Remainder
  • ARROW-12299 - [Python] pq.write_to_dataset does not recognize S3FileSystem
  • ARROW-12300 - [C++] ArrowCUDA erroneously links to CUDA Runtime while only using CUDA Driver API
  • ARROW-12313 - [Rust] [Ballista] Benchmark documentation out of date
  • ARROW-12314 - [Python] pq.read_pandas with use_legacy_dataset=False does not accept columns as a set (kartothek integration failure)
  • ARROW-12327 - [Dev] Use pull request's head remote when submitting crossbow jobs via the comment bot
  • ARROW-12330 - [Developer] Restore values in counters column of Archery benchmark
  • ARROW-12334 - [Rust] [Ballista] Aggregate queries producing incorrect results
  • ARROW-12342 - [Packaging] Fix tabulation in crossbow templates for submitting nightly builds
  • ARROW-12357 - [Archery] Error running “crossbow submit ...”
  • ARROW-12377 - [Doc][Java] Java doc build broken
  • ARROW-12379 - [C++][CI] Thread sanitizer failure in SerialExecutor
  • ARROW-12382 - [C++][CI] Conda nightly jobs fail due to not bundling xsimd
  • ARROW-12385 - [R] [CI] fix cran picking in CI
  • ARROW-12390 - [Rust] Inline Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice
  • ARROW-12401 - [R] Fix guard around dataset___Scanner__TakeRows
  • ARROW-12405 - [Packaging] Fix apt artifact patterns and artifact uploading from travis
  • ARROW-12408 - [R] Delete Scan() bindings
  • ARROW-12421 - [Rust] [DataFusion] topk_query test fails in master
  • ARROW-12421 - [Rust] [DataFusion] topk_query test fails in master
  • ARROW-12429 - [C++] MergedGeneratorTestFixture is incorrectly instantiated
  • ARROW-12433 - [Rust] Builds failing due to new flatbuffer release introducing const generics
  • ARROW-12437 - [Rust] [Ballista] Ballista plans must not include RepartitionExec
  • ARROW-12440 - [Release] Various packaging, release script and release verification script fixes
  • ARROW-12466 - [Python] Comparing array to None raises error
  • ARROW-12475 - [C++] Build warning from thread_pool_benchmark.cc
  • ARROW-12487 - [C++][Dataset] ScanBatches() hangs if there's an error during scanning
  • ARROW-12495 - [C++][Python] NumPy buffer sets is_mutable_ to true but does not set mutable_data_ when the NumPy array is writable
  • PARQUET-1655 - [C++] Decimal comparisons used for min/max statistics are not correct
  • PARQUET-2008 - [C++] Wrong information written in RowGroup::total_byte_size