Apache Arrow 8.0.1 (2022-07-14)

New Features and Improvements

  • ARROW-16759 - [Go] backport gopkg.in/yaml.v3 security patch to v8 (#13588)

Apache Arrow 8.0.0 (2022-05-03)

Bug Fixes

  • ARROW-5248 - [Python] support zoneinfo / dateutil timezones
  • ARROW-7350 - [Python] Decode parquet statistics as scalars
  • ARROW-9664 - [Python] Array/ChunkedArray.to_pandas do not support types_mapper keyword
  • ARROW-11415 - [R] map_batches wouldn't accept a dataset as an argument
  • ARROW-13168 - [C++][R] Enable runtime timezone database for Windows
  • ARROW-13594 - [CI] Enable nightly turbodbc builds again
  • ARROW-13922 - [Python] Fix ParquetDataset throw error when len(path_or_paths) == 1
  • ARROW-14047 - [C++] [Parquet] FileReader returns inconsistent results on repeat reads
  • ARROW-14215 - [R][CI] Conda Windows builds failing due to space in library name
  • ARROW-14256 - [CI][Package] Re-enable disabled conda packaging builds
  • ARROW-14389 - [C++][Gandiva] Fix performance bug with LIKE expressions
  • ARROW-14638 - [C++][R] Unknown C compiler / ccache on Arch Linux
  • ARROW-14647 - [JS] fix bignumToNumber for negative numbers
  • ARROW-14665 - [JAVA] fix JdbcToArrow ResultSet iteration bug
  • ARROW-14708 - [C++] Adding missing abseil dependencies to enable static flight build
  • ARROW-14908 - [C++][R] Dataset hash join segfaults on Windows
  • ARROW-14911 - [C++] arrow-compute-hash-join-node-test failed
  • ARROW-14960 - [C++] Add exception to Arrow style guide based on changes in Google style guide that we are not adopting
  • ARROW-15092 - [R] Support create_package_with_all_dependencies() on non-linux systems
  • ARROW-15253 - [Python] Error in to_pandas for empty dataframe with index with extension type
  • ARROW-15272 - [Java] Add cleanup failures as suppressed in ArrowVectorIterator#create
  • ARROW-15291 - [C++][Python] Segfault in StructArray.to_numpy and to_pandas if it contains an ExtensionArray
  • ARROW-15312 - [R][C++] filtering a Parquet dataset with is.na() misses some rows
  • ARROW-15401 - [Python] Gdb tests are failing on windows and apple M1
  • ARROW-15426 - [C++][Gandiva] Update InExpressionNode validation
  • ARROW-15444 - [C++] Compilation with GCC 7.5 fails in aggregate_basic.cc
  • ARROW-15465 - [Python] Add some missing parquet marks in dataset tests
  • ARROW-15502 - [Java] Detect exceptional footer size in Arrow file reader
  • ARROW-15504 - [Python][CI] Ensure that optional components are tested
  • ARROW-15509 - [Go][Parquet] Parquet cmds crash
  • ARROW-15511 - [Python][C++] Remove reference management in numpy indexer
  • ARROW-15514 - [C++][Gandiva] Add flag to enable Gandiva Object Code
  • ARROW-15520 - [C++] Qualify arrow_vendored::date::format() for C++20 compatibility
  • ARROW-15533 - [C++] Check ARROW_WITH_OPENTELEMETRY in CI
  • ARROW-15539 - [Archery] Add ARROW_JEMALLOC to build options
  • ARROW-15541 - [Python] Bump the minimum Cython version
  • ARROW-15544 - [Go][Parquet] Fix origin schema base64 decoding
  • ARROW-15546 - [FlightRPC][C++] Remove quotes from cookie header
  • ARROW-15555 - [Release] Don't push the release tag since it already exists
  • ARROW-15580 - [Python] Make pytz an actual optional dependency of PyArrow
  • ARROW-15593 - [C++] Make after-fork ThreadPool reinitialization thread-safe
  • ARROW-15598 - [C++][Gandiva] Avoid using hardcoded raw pointer addresses in generated code
  • ARROW-15599 - [R] Convert a column as a sub-second timestamp from CSV file with the T col type option
  • ARROW-15603 - [C++] Remove unused variables
  • ARROW-15604 - [C++][CI] Sporadic ThreadSanitizer failure with OpenTracing
  • ARROW-15604 - [C++][CI] Sporadic ThreadSanitizer failure with OpenTracing
  • ARROW-15607 - [C++] Fix incorrect CPUID flag for AVX detection
  • ARROW-15626 - [GLib] Fix a bug that GArrowGIOInputStream may not read enough data
  • ARROW-15627 - [R] Fix union dataset unify schema
  • ARROW-15648 - [C++][Gandiva] Fix the size of the Gandiva cache
  • ARROW-15652 - [C++] Fix GDB pretty-printing from inside parquet namespace
  • ARROW-15659 - [R] strptime should return NA (not error) with format mismatch
  • ARROW-15664 - [C++] parquet reader Segfaults with illegal SIMD instruction
  • ARROW-15667 - [R] Test development build with ARROW_BUILD_STATIC=OFF
  • ARROW-15674 - [C++][Gandiva] Like function doesn't properly handle patterns with special characters in certain cases
  • ARROW-15677 - [R] calling invalidate() method on ArrowObjects causes subsequent segfault
  • ARROW-15679 - [R] count should return an ungrouped dataframe
  • ARROW-15688 - [C++] add_checked doesn't error out on duration overflow
  • ARROW-15699 - [C++][Gandiva] Fix implementation of left and right func…
  • ARROW-15700 - [C++] Compilation error on Ubuntu 18.04
  • ARROW-15705 - [JavaScript] Allowing appending null on children in a StructBuilder
  • ARROW-15710 - [C++] Intermittent deadlock on arrow-threading-utility-test
  • ARROW-15715 - [Go] ipc trim value offsets on arrays
  • ARROW-15718 - [C++] Increase thread limit to work around thread issues
  • ARROW-15720 - [CI] Fix nightly dask build (skip failing test due to wrong usage of Array.to_pandas)
  • ARROW-15723 - [Python] Segfault orcWriter write table
  • ARROW-15727 - [Python] Allow converting lists of MonthDayNano intervals to Pandas
  • ARROW-15728 - [Python] Reduce entropy for zstd test_ipc
  • ARROW-15743 - [R] skip not connected up to skip_rows on open_dataset despite error messages indicating otherwise
  • ARROW-15746 - [Release][Java] Add missing artifacts to tasks.yml
  • ARROW-15748 - [Python] Round temporal options default unit is day but documented as second. Follow-up
  • ARROW-15748 - [Python] Round temporal options default unit is day but documented as second
  • ARROW-15757 - [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior
  • ARROW-15760 - [C++] Avoid hard dependency on git in cmake (download tarballs from github instead)
  • ARROW-15770 - [CI] Not all python tests are running on CI jobs
  • ARROW-15772 - [Go][Flight] Server Basic Auth Middleware/Interceptor wrongly base64 decode
  • ARROW-15778 - [Java] set native endian to schema
  • ARROW-15783 - [Python] Initialize static pandas data on write
  • ARROW-15784 - [C++][Python] Removing flag enable_parallel_column_conversion which is no longer used
  • ARROW-15791 - [Go] ipc FileWriter negative WaitGroup counter
  • ARROW-15794 - [CI][Crossbow] Nightly builds failing due to error in types_mapper
  • ARROW-15815 - [C++][Parquet] Fix undefined behaviour on invalid input
  • ARROW-15819 - [R] R docs version switcher doesn't work on Safari on MacOS
  • ARROW-15830 - [C++] Ensure target directory exists before running Substrait generation
  • ARROW-15837 - [C++][Python] Clarify documentation for ListArray::offsets()
  • ARROW-15845 - [Python][Packaging] Fix macOS wheel builds
  • ARROW-15847 - [Python][CI] Ensure we have a nightly Python build with parquet encryption disabled
  • ARROW-15847 - [Python] Building with Parquet but without Parquet encryption fails
  • ARROW-15848 - [Gandiva][C++] Fix function istrue and is not true
  • ARROW-15851 - [C++] Enable RE2 when building with gRPC
  • ARROW-15852 - [JS] Fix error thrown by Table.getByteLength()
  • ARROW-15857 - [R] rhub/fedora-clang-devel fails to install ‘sass’ (rmarkdown dependency)
  • ARROW-15863 - [Packaging][C++][Python] Fix conda package builds
  • ARROW-15869 - [C++] Fix Valgrind failure (uninitialized value)
  • ARROW-15888 - [Doc][Python] Modernize development instructions
  • ARROW-15892 - [C++] Dataset APIs require s3:ListBucket Permissions
  • ARROW-15895 - [R] R docs version switcher disappears & reappears with back button on Chrome
  • ARROW-15898 - [CI] Clean old conda nightlies more thoroughly
  • ARROW-15905 - [Python][C++] Fix CMake warning when building PyArrow
  • ARROW-15928 - [C++] Fix crashes and implement chunked array support for replace_with_mask function
  • ARROW-15929 - [R] io_thread_count is actually the CPU thread count
  • ARROW-15946 - [Go] Fix memory leak in pqarrow.NewColumnWriter when writing nested data
  • ARROW-15949 - [Python] Do not require Parquet encryption when Parquet is disabled
  • ARROW-15951 - [CI][Python] “Test wheel” step successful despite test error
  • ARROW-15954 - [Java] Remove mac native netty kqueue dependency after upgrade
  • ARROW-15960 - [C++] Fix crash on adaptive int builder edge cases
  • ARROW-15962 - [C++][GANDIVA] Fix unhex errors return
  • ARROW-15965 - [C++][Python] Add Scalar constructor of RoundToMultipleOptions to Python
  • ARROW-15970 - [R][CI] Re-enable DuckDB dev tests
  • ARROW-15973 - [CI] Split nightly reports into three: Tests, Packaging, Release
  • ARROW-15982 - [Python] parquet.read_table fails to parse home directory path
  • ARROW-15985 - [CI] Fix conda-clean failure when there are no files to delete
  • ARROW-15987 - [C++][FlightRPC] Work around arrow-flight-test crash on AppVeyor
  • ARROW-15993 - [CI] Add sphinx-tabs to ci/conda_env_sphinx.txt
  • ARROW-16012 - [C++] Retry S3 request in tests when Minio not fully initialized
  • ARROW-16013 - [C++][Python] Signed overflow when using negative stride in NumPyStridedConverter
  • ARROW-16016 - [C++] Fix recursive ccache invocation error
  • ARROW-16019 - [C++] Minimize chances of Minio connect errors
  • ARROW-16021 - [C++] arrow-compute-hash-join-node-test timeout on MinGW
  • ARROW-16025 - [Python][C++] Fix segmentation fault when closing ORCFileWritter
  • ARROW-16031 - [C++][Gandiva] Fix Soundex errors generate
  • ARROW-16035 - [Java] Handling empty JDBC ResultSet
  • ARROW-16043 - [C++][Filesystem][S3] Add missing empty content for creating directory
  • ARROW-16048 - [Python] Avoid exposing null buffer address to the Python buffer protocol
  • ARROW-16051 - [Gandiva][C++] Fix datediff regression build
  • ARROW-16052 - [R] undefined global function %>%
  • ARROW-16060 - [C++] subtract_checked support for timestamp(“s”) and date32
  • ARROW-16071 - [R] More undefined global functions
  • ARROW-16078 - Upgrade bundled zlib to 1.2.12
  • ARROW-16099 - [JS] RecordBatches that are compressed should throw an error
  • ARROW-16107 - [Dev][Archery] Fix archery crossbow latest-prefix query
  • ARROW-16110 - [C++] GcsFileSystem::Make ignores IOContext
  • ARROW-16113 - [Python] Partitioning.dictionaries in case of a subset of fields are dictionary encoded
  • ARROW-16131 - [C++] support saving and retrieving custom metadata in batches for IPC file
  • ARROW-16134 - [C++][GANDIVA] Fix Concat_WS errors return
  • ARROW-16136 - [Gandiva][C++] Fix problem of the huge size of AddMappings function
  • ARROW-16139 - [Python] Crash in tests/test_dataset.py::test_write_dataset_s3
  • ARROW-16143 - [Java] Upgrade jackson dependencies CVE-2020-36518
  • ARROW-16143 - [Java] Upgrade jackson dependencies CVE-2020-36518
  • ARROW-16146 - [C++] arrow-gcsfs-test is timing out
  • ARROW-16148 - [C++] TPC-H generator cleanup
  • ARROW-16152 - [C++] Fix segfault with unknown functions in Substrait
  • ARROW-16159 - [C++][Python] Allow FileSystem::DeleteDirContents to succeed if the directory is missing
  • ARROW-16162 - [C++][FlightRPC] Fix Flight build on Ubuntu 18.04
  • ARROW-16163 - [Go] IPC FileReader leaks memory when used with ZSTD compression
  • ARROW-16165 - [CI][Archery] Fix nightly query to crossbow to send reports
  • ARROW-16169 - [C++][Gandiva] Fix empty string case in convert_fromUTF8_binary()
  • ARROW-16181 - [CI][C++] Valgrind failure in TPCH node tests
  • ARROW-16182 - [C++][CI] TPCH node tests timeout under ThreadSanitizer
  • ARROW-16185 - [C++] Fix uninitialized output data in strptime kernel
  • ARROW-16197 - [Docs] Fix broken link
  • ARROW-16205 - [C++][FlightRPC] Don't use constexpr std::initializer_list
  • ARROW-16209 - [JS] Support setting arbitrary symbols on Tables
  • ARROW-16215 - [C++][FlightRPC] Fix segfault in Flight test on Windows
  • ARROW-16216 - [Python][FlightRPC] Fix test_flight.py when Flight is not available
  • ARROW-16219 - [CI] Fix git config to prevent SCM tools failure
  • ARROW-16223 - [C++] Fix decimal reduce scale rounding
  • ARROW-16225 - [C++][Parquet] Fix length of encryption AAD random byte generation
  • ARROW-16233 - [Python][Packaging] test_zoneinfo_tzinfo_to_string fails with zoneinfo._common.ZoneInfoNotFoundError on packaging wheels on Windows
  • ARROW-16235 - [C++] Fix build failure, compiler warnings from MinGW
  • ARROW-16236 - [Python] [Packaging] test_s3fs_limited_permissions_create_bucket fails with Permission denied on MAC OS wheel builds
  • ARROW-16237 - [Docs] Apache Impala is no longer incubating
  • ARROW-16238 - [C++] Fix nullptr dereference when pre-buffering IPC reads
  • ARROW-16261 - [C++] Fix DeleteDirContents on HDFS with missing_dir_ok=True
  • ARROW-16262 - [CI][Integration] Skip failing tests from kartothek integration
  • ARROW-16278 - [CI] Fix git installation failure on brew
  • ARROW-16278 - [CI] Fix git installation failure on brew
  • ARROW-16278 - [CI] Fix git installation failure on brew
  • ARROW-16293 - [CI][GLib] Make tests stable
  • ARROW-16295 - [CI][Release] Use windows-2019 for verify-rc-source-windows
  • ARROW-16300 - pc.sort_indices with nonexistent column throws malloc error
  • ARROW-16301 - [C#][CI] Fix docker configuration for .NET 6
  • ARROW-16305 - [C++] Missed reference to ARROW_ENGINE during the rename
  • ARROW-16306 - [CI] Fix Nightly verify rc on ubuntu
  • ARROW-16307 - [Java][FlightRPC] Skip flaky test TestDoExchange.testClientCancel
  • ARROW-16311 - [Java] Do not return table_schema column when it's not requested
  • ARROW-16312 - [C++][CI] Install tzdata in the windows verification builds
  • ARROW-16313 - [R] Ensure assume_timezone options are always initialized
  • ARROW-16332 - [Release][Java] Add artifacts uploaded verification
  • ARROW-16336 - [Python] ParquetDataset - Hide internal (common_)metadata related warnings from the user
  • ARROW-16374 - [R][C++] skip another snappy test during sanitizer runs
  • ARROW-16375 - [R][CI] Pin test-r-devdocs on Windows to R 4.1
  • ARROW-16393 - [JAVA] Update option spec to accept value for query, catalog, schema and table
  • ARROW-16413 - [Python] Certain dataset APIs hang with a python filesystem
  • ARROW-16417 - [C++][Python] Segfault in test_exec_plan.py / test_joins
  • ARROW-16419 - [Python] Properly wait for ExecPlan to finish
  • ARROW-16442 - [Python][Dataset] Fix fragments of ORC Dataset to use FileFragment class
  • PARQUET-2115 - [C++] Parquet dictionary bit widths are limited to 32 bits
  • PARQUET-2118 - [C++] Don't assume standard pointers
  • PARQUET-2119 - [C++] Fix DeltaBitPackDecoder fuzzer found issue
  • PARQUET-2123 - [C++] Fix invalid memory access in ScanFileContents
  • PARQUET-2124 - [C++] Remove Parquet Dictionary DCHECK
  • PARQUET-2130 - Fix crash in debug with non-standard key names.
  • PARQUET-2131 - Number values decoded DCHECKs should be exceptions

New Features and Improvements

  • ARROW-1888 - [C++] Implement Struct Casts
  • ARROW-3016 - [Docs][C++] Memory profiling with perf
  • ARROW-3039 - [Go] Add support for DictionaryArray
  • ARROW-3998 - [C++] Add TPC-H Generator
  • ARROW-5107 - [Release] Validate non-RC source and binary artifacts
  • ARROW-5598 - [Go] Rename array.Array{,Approx}Equal to array.{,Approx}Equal
  • ARROW-6780 - [C++][Parquet] Support DurationType in writing/reading parquet (written as int64)
  • ARROW-7174 - [Python] Expose parquet dictionary_pagesize_limit write parameter
  • ARROW-7272 - [C++][Java][Dataset] JNI bridge between RecordBatch and VectorSchemaRoot
  • ARROW-7914 - [Python] Allow pandas datetime as index for feather
  • ARROW-9235 - [R] Support for connection class when reading and writing files
  • ARROW-9378 - [Go] Support unsigned dictionary indices
  • ARROW-9947 - [Python] High-level Python API for Parquet encryption of files.
  • ARROW-10643 - [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe
  • ARROW-10924 - [C++] Validate temporal data in ValidateArrayFull
  • ARROW-11071 - [R][CI] Use processx to set up minio and flight servers in tests
  • ARROW-11259 - [Python] Allow to create field reference to nested field
  • ARROW-11989 - [C++][Python] Improve ChunkedArray's complexity for the access of elements
  • ARROW-12515 - [Dev][Wiki][Release] Fix and update Windows RC verify script
  • ARROW-12516 - [C++][Gandiva] Implements castINTERVALDAY(varchar) and castINTERVALYEAR(varchar) functions
  • ARROW-12659 - [C++] Support is_valid as a guarantee
  • ARROW-12743 - [R] Add DESCRIPTION fields for dev dependencies
  • ARROW-13185 - [MATLAB] Create a single MEX gateway function which delegates to specific C++ functions
  • ARROW-13204 - [MATLAB] Update documentation for the MATLAB Interface to reflect latest CMake build system changes
  • ARROW-13231 - [Doc] Add ORC documentation
  • ARROW-13260 - [Doc] Host different released versions of the documentation + version switcher
  • ARROW-13337 - [R] Define Math group generics
  • ARROW-13375 - [C++][Gandiva] Implement POSITIVE and NEGATIVE Hive functions on Gandiva
  • ARROW-13409 - [C++][FlightRPC] Expose server shutdown with deadline
  • ARROW-13564 - [Dev] Check individual commit messages for “Co-authored-by:” tags when integrating a pull request
  • ARROW-13616 - [R] Cheat Sheet Structure
  • ARROW-13683 - [R] Test Windows UCRT R
  • ARROW-13703 - [Python][R] Add bindings for new dataset writing options
  • ARROW-13993 - [C++][Compute] Add hash_one aggregate function
  • ARROW-14075 - [C++][CI] Add an appveyor CI job for VisualStudio 2019, non-conda
  • ARROW-14091 - [C++] add(date, duration) -> timestamp kernel
  • ARROW-14093 - [C++] subtract(date, date) -> duration kernel
  • ARROW-14094 - [C++] add(timestamp, duration) -> timestamp kernel
  • ARROW-14095 - [C++] subtract(timestamp, duration) -> timestamp kernel
  • ARROW-14096 - [C++] add(time, duration) -> time kernel
  • ARROW-14097 - [C++] subtract(time, duration) -> time kernel
  • ARROW-14098 - [C++] subtract(time, time) -> duration kernel
  • ARROW-14099 - [C++] add(duration, duration) -> duration kernel
  • ARROW-14100 - [C++] subtract(duration, duration) -> duration kernel
  • ARROW-14101 - [C++] multiply(duration, integer) -> duration kernel
  • ARROW-14102 - [C++] divide(duration, integer) -> duration kernel
  • ARROW-14153 - [C++][Dataset] Add support for batch_size in the ORC Scanner
  • ARROW-14168 - [R] Warn only once about arrow function differences
  • ARROW-14169 - [R] altrep for factors
  • ARROW-14199 - [R] bindings for format (where possible)
  • ARROW-14266 - [R] Use WriteNode to write queries
  • ARROW-14279 - [Docs] Initial attempt at describing structure of PyArrow library
  • ARROW-14292 - [C++][Python] Join foundation for Tables
  • ARROW-14293 - [Python] Basic Join functionality in PyArrow
  • ARROW-14322 - [Doc] Add Python doc on how to connect Python to other languages
  • ARROW-14333 - [C++][Compute] Add binary and LargeStringType tests to comparison kernels
  • ARROW-14339 - [Docs] Add canonical url to the pkgdown (R) docs
  • ARROW-14442 - [R] fix behaviour when converting timestamps with "" as tzone
  • ARROW-14444 - [C++] Implement task-based model into the executable-pipelines.
  • ARROW-14498 - [Docs] Make it possible to regenerate older docs with additional patch(es)
  • ARROW-14502 - [C++][Gandiva] Add test DayOfMonth
  • ARROW-14506 - [C++] Conda support for google-cloud-cpp
  • ARROW-14553 - [Doc] Java Cookbook Release 1
  • ARROW-14579 - [Documentation] Document the CI
  • ARROW-14591 - [R] Implement bindings for lubridate duration types
  • ARROW-14612 - [C++] Support for filename-based partitioning
  • ARROW-14631 - [C++][Gandiva] Implement Nextday Function
  • ARROW-14651 - [Release][Archery] Add support for retrying download
  • ARROW-14672 - [Docs] Document how to exchange data between Python and Java
  • ARROW-14679 - [R][C++] Handle suffix argument in joins
  • ARROW-14698 - [Docs][FlightRPC] Add API docs for Flight SQL
  • ARROW-14702 - [Doc][C++] Document threading model
  • ARROW-14745 - [R] Enable true duckdb streaming
  • ARROW-14776 - [Website] Don't include squashed commits in merge commit message
  • ARROW-14798 - [C++][Python][R] Add container window to PrettyPrintOptions
  • ARROW-14808 - [R] Implement bindings for lubridate::date()
  • ARROW-14810 - [R] Implement bindings for lubridate's date_decimal() and decimal_date()
  • ARROW-14815 - [R] bindings for lubridate::semester()
  • ARROW-14817 - [R] Implement bindings for lubridate::tz()
  • ARROW-14823 - [R] Implement bindings for lubridate::leap_year
  • ARROW-14824 - [R] Implement bindings for lubridate::epiyear()
  • ARROW-14825 - [C++] Temporal component extraction function for extracting epiyear
  • ARROW-14826 - [R] Implement bindings for lubridate::dst()
  • ARROW-14827 - [C++] Temporal component extraction function for extracting dst indicator
  • ARROW-14893 - [C++] Allow creating GCS filesystem from URI
  • ARROW-14927 - [CI] Upgrade Fedora 33 to Fedora 35
  • ARROW-14942 - [R] Bindings for lubridate's dpicoseconds, dnanoseconds, desconds, dmilliseconds, dmicroseconds
  • ARROW-14943 - [R] Bindings for lubridate's ddays, dhours, dminutes, dmonths, dweeks, dyears
  • ARROW-14944 - [R] Implement lubridate::make_difftime()
  • ARROW-14963 - [Doc] Add copy button extension to code-blocks
  • ARROW-14993 - [C++] Benchmark CSV writer
  • ARROW-14997 - [Python][Doc] Add thread_count functions to API docs
  • ARROW-15013 - [R] Expose concatenate at the R level
  • ARROW-15015 - [R] Test / CI flag for ensuring all tests are run?
  • ARROW-15020 - [R] Add bindings for new dataset writing options
  • ARROW-15040 - [R] Enable write_csv_arrow to take a Dataset or arrow_dplyr_query as input
  • ARROW-15061 - [C++] Add logging for kernel functions and exec plan nodes
  • ARROW-15062 - [C++] Add memory information to current spans
  • ARROW-15064 - [C++] Vectorize CheckStringHasNoStructuralChars in CSV writer
  • ARROW-15066 - [C++] Enable use of non-bundled OpenTelemetry
  • ARROW-15067 - [C++] Add tracing spans to the scanner
  • ARROW-15080 - [Python][C++] Enable tuples conversion to interval
  • ARROW-15089 - [C++][Compute] Implement kernel to lookup a MapArray item for a given key
  • ARROW-15098 - [R] Add binding for lubridate::duration() and/or as.difftime()
  • ARROW-15118 - [C++] Avoid bitmap buffer if all inputs are all valid for Scalar Kernels
  • ARROW-15152 - [C++][Compute] Implement hash_list aggregate function
  • ARROW-15156 - [Doc] Implement Tutorials for the Java Documentation
  • ARROW-15157 - [Doc] New Contributors Guide v2
  • ARROW-15163 - [R] lubridate functions for 8.0.0
  • ARROW-15167 - [R] Improve efficiency of decimal casting
  • ARROW-15168 - [R] Add S3 generics to create main Arrow objects
  • ARROW-15178 - [Java][Docs] Java Tutorial: Developer Docs for Java
  • ARROW-15180 - Document how to add JNI bindings for C++ features
  • ARROW-15183 - [Python][Docs] Add Missing Dataset Write Options
  • ARROW-15192 - [Java] Allow use of Jackson 2.12 and higher
  • ARROW-15195 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on macOS
  • ARROW-15197 - [C++] UTF-8 string repeat kernel
  • ARROW-15212 - [C++] Handle suffix argument in joins
  • ARROW-15215 - [C++] Consolidate kernel data-copy utilities between replace_with_mask, case_when, coalesce, choose, fill_null_forward, fill_null_backward
  • ARROW-15223 - [C++] Implement Not Between ternary kernel
  • ARROW-15238 - [C++] ARROW_ENGINE module with substrait consumer
  • ARROW-15239 - [C++][Compute] Adding Bloom filter implementation
  • ARROW-15258 - [C++] Easy options to create a source node from a table
  • ARROW-15262 - [C++] Create a ToTable sink node
  • ARROW-15281 - [C++] Implement ability to retrieve fragment filename
  • ARROW-15282 - [C++][FlightRPC] Split data methods from the underlying transport
  • ARROW-15294 - [R] Remove arrow-without-arrow and other Solaris hacks
  • ARROW-15296 - [CI][GO] Add Go staticcheck linting to CI lint job
  • ARROW-15299 - [R] investigate {remotes} dependencies “soft” vs TRUE
  • ARROW-15313 - [C++][Java][FlightRPC] Implement type info method to flight-sql
  • ARROW-15314 - [C++][Java][FlightRPC] Add missing metadata on Arrow schemas returned by Flight SQL
  • ARROW-15321 - [Dev][Python] Also numpydoc-validate Cython-generated methods
  • ARROW-15346 - [Doc][Guide] Arrow codebase - minor corrections
  • ARROW-15347 - [Doc][Guide] Update testing section in new contributors guide
  • ARROW-15348 - [Doc][Guide] Lifecycle of a PR - minor corrections
  • ARROW-15349 - [Doc][Guide] Existing Contributors page - update
  • ARROW-15350 - [Doc][Guide] Add styling and linters info section
  • ARROW-15351 - [Doc][Guide] Additional tutorial for R bindings
  • ARROW-15352 - [Doc][Guide] R package and make clean
  • ARROW-15353 - [Doc][Guide] Intro into CI topic and link to the existing docs
  • ARROW-15364 - [Python] Update filesystem entry in read docstrings to reflect current behaviour
  • ARROW-15366 - [Docs] Automate incrementing of package version for R and non-R version switchers
  • ARROW-15367 - [Python] Improve Classes and Methods Docstrings
  • ARROW-15369 - [Doc] Tweak example to use the new support for str pointers
  • ARROW-15374 - [C++][FlightRPC] Add support for MemoryManager in data methods
  • ARROW-15389 - [C++][Dev] Improve Array preview in GDB plugin
  • ARROW-15400 - [Go][CI] Exercise builds on arm machines
  • ARROW-15410 - [C++][Datasets] Improve memory usage of datasets API when scanning parquet
  • ARROW-15418 - [Go][Flight] Update gRPC version, hide impl details
  • ARROW-15425 - [C++] Add delta dictionaries in file format to integration tests
  • ARROW-15428 - [Python] Address docstrings in Parquet classes and functions
  • ARROW-15429 - [Python] Address docstrings for ChunkedArray class, methods, attributes and constructor
  • ARROW-15431 - [Python] Address docstrings in Schema
  • ARROW-15432 - [Python] Address CSV docstrings
  • ARROW-15440 - [Go] Implement ‘unpack_bool’ with Arm64 GoLang Assembly
  • ARROW-15450 - [Python][Wheel] Flight test receives SIGKILL during in macOS tests
  • ARROW-15462 - [GLib] Add GArrow{Month,DayTime,MonthDayNano}Interval{Scalar,Array,ArrayBuilder}
  • ARROW-15468 - [R][CI] A crossbow job that tests against DuckDB's dev branch
  • ARROW-15471 - [R] ExtensionType support in R
  • ARROW-15472 - [Website] Add Flight SQL blog post
  • ARROW-15477 - [C++][Python] Allow to create (FixedSize/Large)ListArray from arrays and type
  • ARROW-15480 - [R] Expand on schema/colnames mismatch error messages
  • ARROW-15483 - [Release] Revamp the verification scripts
  • ARROW-15487 - [FlightRPC][C++][GLib][Python][R] Implement FlightClient::Close
  • ARROW-15489 - [R] Expand RecordBatchReader usability
  • ARROW-15491 - [Website] Rotate PMC chair for 2022
  • ARROW-15497 - [C++][Homebrew] Use Clang Tools 12
  • ARROW-15501 - [Java] Support validating decimal vectors
  • ARROW-15503 - [GLib][Release] Avoid deprecation warning
  • ARROW-15505 - [C++][Compute] Support null type in product aggregation
  • ARROW-15506 - [C++][Compute] Support Null type in hash_sum/hash_product/hash_mean
  • ARROW-15510 - [C++][FlightRPC] Add CUDA memory manager support to benchmark
  • ARROW-15515 - [C++] Update ExecPlan example code and documentation with new options
  • ARROW-15517 - [R] Use WriteNode in write_dataset()
  • ARROW-15523 - [Python] Support for Datasets as inputs of Joins
  • ARROW-15524 - [Python] Make joins able to receive Tables as inputs
  • ARROW-15525 - [Python] Make joins able to output a Table as result.
  • ARROW-15526 - [Python] Support for Dataset.join
  • ARROW-15527 - [Python] Make Joins able to execute the join operation
  • ARROW-15532 - [C++] Fix unused warning for StringClassifyDoc
  • ARROW-15542 - [GLib][Parquet] Add GParquet*Metadata
  • ARROW-15550 - [C++] Add optional debug memory checks
  • ARROW-15551 - [C++][FlightRPC] Update gRPC TLS options detection for 1.43
  • ARROW-15552 - [Doc][Format] Remove erroneous mention of base64
  • ARROW-15556 - [Release] Add a script to update Homebrew packages
  • ARROW-15569 - [Packaging][deb] Use gem instead of apt to install gobject-introspection gem
  • ARROW-15570 - [CI][Nightly] Drop centos-8 R nightly job
  • ARROW-15572 - [Java][Docs] Add Installation section to Java documentation
  • ARROW-15573 - [Java][Doc] Document Apache Arrow memory management
  • ARROW-15574 - [Java][Doc] Review existing documentation
  • ARROW-15575 - [Java][Doc] Datasets Tutorial
  • ARROW-15576 - [Java][Doc] Document VectorSchemaRoots for 2D data
  • ARROW-15577 - [Java][Doc] Add Arrow Flight documentation
  • ARROW-15578 - [Java][Doc] Document C Data Interface and how to interface with other languages
  • ARROW-15579 - [C++] Add MemoryManager::CopyBuffer(const Buffer&)
  • ARROW-15594 - [C++][FlightRPC] Add Deserialize(const Buffer&) to various Flight types
  • ARROW-15595 - [Release][Ruby] Add support for MFA
  • ARROW-15600 - [C++][FlightRPC] Add minimal Flight SQL query example
  • ARROW-15601 - [Docs][Release] Update post release script to move stable docs + keep dev docs
  • ARROW-15605 - [CI][R] Keep using old macos runners on our autobrew CI job
  • ARROW-15606 - [CI][R] Add brew build that exercises the R package
  • ARROW-15609 - [C++][Compute] Support hash_aggregate with only keys
  • ARROW-15611 - [C++] Migrate arrow::ipc::internal::json::ArrayFromJSON to Result<>
  • ARROW-15614 - [C++] Add sqrt binary scalar kernel
  • ARROW-15617 - [Doc][C++] Document environment variables
  • ARROW-15619 - [C++] Temporal component extraction function for extracting is_leap_year indicator
  • ARROW-15623 - [C++][Python] Update developers/python.rst (console blocks + "" in archery install)
  • ARROW-15625 - [C++] Convert underscore to hyphen in example executable names
  • ARROW-15629 - [GLib] Add garrow_{,large_}string_array_builder_append_string_len()
  • ARROW-15630 - [Release][MSYS2] Update reverse dependencies too
  • ARROW-15631 - [Packaging][RPM] Add major version to libs packages
  • ARROW-15632 - [R] Prune the bundled libarrow source
  • ARROW-15633 - [R] Skip s3_bucket example that requires network connection
  • ARROW-15634 - [C++][Packaging] Improve compilation speed for java-jars nighlty build for MacOS
  • ARROW-15643 - [C++] Allow selecting subset of fields of a StructArray via cast
  • ARROW-15650 - [MATLAB] Rename the MEX gateway function
  • ARROW-15653 - [R][CI] Fix tests of bundled cpp source
  • ARROW-15656 - [C++][R] Make valgrind builds slightly quicker
  • ARROW-15657 - [C++][Java] Upgrade Apache ORC to 1.7.3
  • ARROW-15665 - [C++] Fix error_is_null in strptime with invalid inputs
  • ARROW-15665 - [C++] Add error handling option to StrptimeOptions
  • ARROW-15670 - [C++/Python/Packaging] Update conda pinnings and enable GCS on Windows
  • ARROW-15672 - [C++] Enable CSV writer to control the field delimiter
  • ARROW-15673 - [R] Error gracefully if DuckDB isn't installed
  • ARROW-15680 - [C++] Temporal floor/ceil/round should accept week_starts_monday when rounding to multiple of week
  • ARROW-15682 - [CI] Github starting to migrate “windows-latest” tag from windows 2019 to windows 2022
  • ARROW-15683 - [Website][Rust][DataFusion] Make a 7.0.0 release announcement blog
  • ARROW-15690 - [Dev] Update GitHub Actions workflows that hardcode master as default
  • ARROW-15692 - [Dev] Update release scripts to use default branch
  • ARROW-15694 - [Dev] Update apache/arrow-site GitHub Actions deploy.yml website deployment workflow to support being triggered when pushing to main
  • ARROW-15697 - [R] Add logo and meta tags to pkgdown site
  • ARROW-15698 - [Integration] Privatized some code in tests
  • ARROW-15701 - [R] month() should allow integer inputs
  • ARROW-15706 - [C++][FlightRPC] Implement a UCX transport
  • ARROW-15707 - [C++][FlightRPC] Make Flight tests more resuable across transports
  • ARROW-15708 - [R][CI] skip snappy encoded parquets on clang sanitizer
  • ARROW-15709 - [C++] Compilation of ARROW_ENGINE fails if doing an “inline” build
  • ARROW-15709 - [C++] Revert change
  • ARROW-15709 - [C++] Compilation of ARROW_ENGINE fails if doing an “inline” build
  • ARROW-15712 - [R] Add a type method for Expression objects
  • ARROW-15714 - [C++][Gandiva] Increase the protobuf recursion limit in gandiva protobuf parser
  • ARROW-15717 - [Docs] Add hash_one to the documentation
  • ARROW-15721 - [Docs][FlightRPC] Add Flight/Flight SQL to subprojects
  • ARROW-15722 - [Java] Improve error message for nested types with incorrect children
  • ARROW-15726 - [C++] If a projected_schema is not supplied but a bound projection expression is then we should use that to infer the projected_schema
  • ARROW-15739 - [C++] Bump xsimd to latest version
  • ARROW-15740 - [C++][Compute] Benchmark element wise min/max
  • ARROW-15741 - [Doc][Format] Clarify thread-safety of C stream interface
  • ARROW-15742 - [Go] Implement ‘bitmap_neon’ with Arm64 GoLang Assembly
  • ARROW-15744 - [Gandiva][C++] Add NEGATIVE function for interval types
  • ARROW-15749 - [Ruby] Add support for #values of Month Interval Type
  • ARROW-15750 - [Ruby] Add support for #raw_records of Month Interval Type
  • ARROW-15755 - [Java] Support Java 17
  • ARROW-15763 - [C++] Improve CSV writer performance
  • ARROW-15766 - [R] Implement bindings for lubridate::duration()
  • ARROW-15769 - [C++] Generate less arithmetic kernels
  • ARROW-15775 - [R] Clean up as.* methods to use build_expr()
  • ARROW-15776 - [Python] Expose IpcReadOptions
  • ARROW-15777 - [Python][Flight] Allow passing IpcReadOptions to FlightCallOptions
  • ARROW-15781 - [Python] Release GIL in ensure_complete_metadata
  • ARROW-15782 - [C++] Fix Findre2Alt.cmake to check RE2_ROOT variable first
  • ARROW-15788 - [C++][FlightRPC] Prepare benchmark for alternative transports
  • ARROW-15789 - [C++] Update OpenTelemetry to v1.2.0
  • ARROW-15795 - [Java] Add a getter for the timeZone in timestamp with timezone vectors
  • ARROW-15796 - [Python] Pickling ParquetFileFragment shouldn't fetch metadata
  • ARROW-15799 - [R] Update as.Date() to support an origin different from epoch
  • ARROW-15800 - [R] Implement bindings for lubridate::as_date() and lubridate::as_datetime()
  • ARROW-15801 - [R] Implement bindings for lubridate date-time helpers
  • ARROW-15802 - [R] bindings for lubridate::make_datetime() and lubridate::make_date()
  • ARROW-15810 - [CI][Nightly] Check R related image strictly
  • ARROW-15814 - [R][DOCS] Improve documentation for cast()
  • ARROW-15817 - [R] Use TableSourceNode instead of InMemoryDataset
  • ARROW-15818 - [R] Implement initial Substrait consumer in the R bindings
  • ARROW-15820 - [C++][Doc] Add table_source to streaming_execution.rst & clarify parameter name
  • ARROW-15821 - [JS] Fix paths to sourcemaps in directories
  • ARROW-15823 - [C++][Python] Add a method to convert a Table to a RecordBatchReader
  • ARROW-15824 - [Python] Make pyarrow.parquet a package
  • ARROW-15827 - [R] Improve UX of write_dataset(..., max_rows_per_group)
  • ARROW-15831 - [Java] Upgrade Flight dependencies
  • ARROW-15841 - [R] Implement SafeCallIntoR to safely call the R API from another thread
  • ARROW-15844 - [Release][Packaging] Use ASCII format for detached sign
  • ARROW-15846 - [Format] Clarify presence of struct validity bitmap
  • ARROW-15850 - [C++] Engine substrait headers missing from install
  • ARROW-15854 - [C++] Refine CSV writer code
  • ARROW-15860 - [Python] Document RecordBatchReader
  • ARROW-15864 - [Java][Docs] Update Arrow nightly Maven releases documentation
  • ARROW-15866 - [Packaging][Ubuntu] Drop support for Ubuntu 21.04
  • ARROW-15870 - [Python] Start to raise deprecation warnings for use_legacy_dataset=True in parquet.read_table
  • ARROW-15871 - [Python] Start raising deprecation warnings for ParquetDataset keywords that won't be supported with the new API
  • ARROW-15873 - [CI] Migrate from Ubuntu 21.04 to 22.04
  • ARROW-15875 - [R] Expose ReadMetadata for input streams
  • ARROW-15882 - [Python][CI] Ensure we are running hypothesis tests in the nightly hypothesis build
  • ARROW-15885 - [Ruby] Add support for #values of DayTime Interval Type
  • ARROW-15886 - [Ruby] Add support for #raw_records of DayTimeInterval type
  • ARROW-15890 - [CI][Python] Use venv instead of virtualenv
  • ARROW-15896 - [Python][C++] Add errno detail for filesystem “file not found” errors
  • ARROW-15900 - [C++] Support Substrait reading of a Feather-format local file
  • ARROW-15902 - [Website] Add new committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, Kun Liu
  • ARROW-15916 - [Packaging][RPM] Add support for CentOS Stream 8
  • ARROW-15917 - [Java][Docs] Document how to use Flight artifacts
  • ARROW-15918 - [Ruby][{day:, millisecond:}, ...] )
  • ARROW-15919 - [C++] Add function not commutative with timestamps & duration maths
  • ARROW-15921 - [Format][FlightRPC][C++][Java] Clarify interpretation of FlightEndpoint.locations
  • ARROW-15923 - [Packaging][Linux] Enable GCS support
  • ARROW-15924 - [Ruby] Add support for #values of MonthDayNanoInterval type
  • ARROW-15925 - [Ruby] Add support for #raw_records of MonthDayNanoInterval type
  • ARROW-15931 - [Website] Add explicit Apache LICENSE.txt and NOTICE.txt files to apache/arrow-site repository
  • ARROW-15932 - [C++][FlightRPC] Add more tests to the common Flight suite
  • ARROW-15934 - [Python] Expose write_batch_size in python
  • ARROW-15935 - [Ruby] Add test for Arrow::DictionaryArray#values
  • ARROW-15939 - [Python] Add pickle support for JSON options classes
  • ARROW-15940 - [Gandiva][C++] Add NEGATIVE function for decimal data type
  • ARROW-15941 - [C++] Allow overriding the number of IO threads with an environment variable
  • ARROW-15944 - [Docs][C++] Document dependencies for building on Arch Linux
  • ARROW-15947 - [R] rename_with s3 method for arrow_dplyr_query
  • ARROW-15950 - [Go] Lift BitSetRunReader to internal/bitutils package
  • ARROW-15952 - [C++] Document Visitors and finish Scalar::Accept
  • ARROW-15955 - [Packaging][RPM] Add missing json-devel to CentOS Stream 8 build image
  • ARROW-15956 - [Java] Consolidate Flight integration testing code
  • ARROW-15963 - [Go][Parquet] simplify ReaderAtSeeker interface
  • ARROW-15968 - [C++] Update AsyncGenerator semantics to emit a terminal item only after all outstanding futures have completed
  • ARROW-15972 - [Java][Doc] Add Getting Started section
  • ARROW-15974 - [C++] Migrate flight/types.h header definitions to use Result<>
  • ARROW-15975 - [C++] Document type traits and inline visitors
  • ARROW-15976 - [C++] Clean up commenting on execution plan example
  • ARROW-15979 - [C++][Doc] Expose more functions of parquet::WriterProperties in doc
  • ARROW-15984 - [C++] Change RecordBatchReader API to use Result<>
  • ARROW-15989 - [R] rbind & cbind for Table & RecordBatch
  • ARROW-15994 - [C++] Back out taskify changes
  • ARROW-15995 - [GO] Improve ‘sum_float64_neon’ performance
  • ARROW-15998 - [Docs][CI] Use sphinx-design tabs instead of sphinx-tabs
  • ARROW-15999 - [Python] Turn deadlines off for the test using hypothesis
  • ARROW-16007 - [R] grepl bindings return FALSE for NA inputs
  • ARROW-16011 - [R] CI jobs should fail if lintr picked up issues
  • ARROW-16014 - [C++] Create more benchmarks for measuring expression evaluation overhead
  • ARROW-16026 - [C++] Add support for the serial executor to expose an async generator as an iterable
  • ARROW-16032 - [C++] Migrate FlightClient API to Result<>
  • ARROW-16033 - [C++] Pass schema to consuming sink node
  • ARROW-16038 - [R] different behavior from dplyr when mutate's .keep option is set
  • ARROW-16042 - [GO] Fix header file preprocessor issues
  • ARROW-16044 - [Julia] Remove from apache/arrow
  • ARROW-16046 - [Docs][FlightRPC][Python] Ensure Flight Python API is documented
  • ARROW-16049 - [C++][FlightRPC] Fix Flight SQL's ColumnMetadata constructor visibility
  • ARROW-16053 - [C++][FlightRPC] Fix flaky test TestAuthHandler.FailUnauthenticatedCalls
  • ARROW-16055 - [C++][Gandiva] Skip unnecessary work during cache hit when using object code cache
  • ARROW-16057 - [Python] Address docstrings for RecordBatch class, methods, attributes and constructor
  • ARROW-16058 - [Python] Address docstrings for Table class, methods, attributes and constructor
  • ARROW-16059 - [Python] Address docstrings for Tensor class
  • ARROW-16061 - [R][CI] Speed up windows 3.6 builds
  • ARROW-16062 - [Python] Move libarrow_python include definitions to its own file
  • ARROW-16064 - [Java][C++][FlightRPC] Add missing column metadata for type name on FlightSQL
  • ARROW-16065 - [FlightRPC][Docs] Improve Flight documentation
  • ARROW-16068 - [C++][FlightRPC] Migrate remaining flight API to use Result<>
  • ARROW-16069 - [C++][FlightRPC] Refactor out gRPC error code handling
  • ARROW-16073 - [R] clean-up date time unit testing once tzdb is available on Windows
  • ARROW-16074 - [Docs] Document joins
  • ARROW-16079 - [Python] Address docstrings in Parquet schema and metadata
  • ARROW-16082 - [Flight][Go] Allow specifying a net.Listener
  • ARROW-16098 - [JS] Don't return null in table and recordbatch iterators
  • ARROW-16102 - [C++] Add support for building with system gRPC and bundled GCS
  • ARROW-16104 - [Packaging] Add support for Ubuntu 22.04
  • ARROW-16105 - [C++][Gandiva] Add support for LLVM 14
  • ARROW-16109 - [Python] Add dataset mark to test in order to avoid failure
  • ARROW-16114 - [Docs][Python] Document Parquet FileMetaData
  • ARROW-16117 - [JS] Improve decode UTF8 performance
  • ARROW-16120 - [Python] ParquetDataset deprecation: change Deprecation to FutureWarnings
  • ARROW-16121 - [Python] Deprecate the (common_)metadata(_path) attributes of ParquetDataset
  • ARROW-16122 - [Python] Change use_legacy_dataset default and deprecate no-longer supported keywords in parquet.write_to_dataset
  • ARROW-16128 - [C++][FlightRPC] Fix Flight SQL static build on Windows
  • ARROW-16132 - [Packaging][deb][CUDA] Relax libcuda1 dependency
  • ARROW-16154 - [R] Errors which pass through handle_csv_read_error() and handle_parquet_io_error() need better error tracing
  • ARROW-16156 - [R] Clarify warning message for features not turned on in .onAttach()
  • ARROW-16158 - [C++][R] Rename ARROW_ENGINE to ARROW_SUBSTRAIT
  • ARROW-16166 - [C++][Compute] Utilities for assembling join output
  • ARROW-16167 - [JS] refactor get and set visitors
  • ARROW-16173 - [C++] Add benchmarks for temporal functions/kernels
  • ARROW-16176 - [Release][C#] Use .NET 6.0 on Ubuntu 22.04
  • ARROW-16186 - [C++][GANDIVA] Add alias and tests for decimal, quarter, xor, etc...
  • ARROW-16187 - [Go][Parquet] Properly utilize BufferedStream and buffer size when reading
  • ARROW-16192 - [Go] Remove deprecated aliases for v8
  • ARROW-16193 - [Go] Replace CPU discovery package with golang.org/x/sys/cpu module
  • ARROW-16198 - [CI][Packaging][Python] Update VCPKG version
  • ARROW-16201 - [R] SafeCallIntoR on 3.4
  • ARROW-16203 - [Release] Remove all old artifacts on release
  • ARROW-16204 - [C++][Dataset] Default error existing_data_behaviour for writing dataset ignores a single file
  • ARROW-16208 - [JS] Upgrade deps
  • ARROW-16210 - [JS] Implement tableFromJSON and support struct vector in vectorFromArray
  • ARROW-16214 - [GLib][Parquet] Add GParquetFileMetadata
  • ARROW-16229 - [CI] Temporary remove turbodbc tests from nightly tests
  • ARROW-16232 - [C++] Include OpenTelemetry in LICENSE.txt
  • ARROW-16240 - [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset with use_legacy_dataset=False
  • ARROW-16242 - [Go] xerrors.Errorf and xerrors.Is are deprecated, fix linting
  • ARROW-16245 - [GLib][Parquet] Add GParquetRowGroupMetadata
  • ARROW-16247 - [GLib] Add GArrowGCSFileSystem
  • ARROW-16250 - [GLib][Parquet] Add GParquetColumnChunkMetadata
  • ARROW-16251 - [GLib][Parquet] Add GParquetStatistics and its family
  • ARROW-16252 - [CI][Archery] Highlight number of failed builds on nightly reports
  • ARROW-16256 - [Docs] Document which format version is supported
  • ARROW-16257 - [R] Break-up as_date and as_datetime into individual functions
  • ARROW-16264 - [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test
  • ARROW-16277 - [Python] No builds for macOS arm64.
  • ARROW-16280 - [C++] Avoid copying shared_ptr in Expression::type()
  • ARROW-16282 - [CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04
  • ARROW-16283 - [Go] Cleanup panics in new Buffered Reader
  • ARROW-16284 - [Python][Packaging] Use delocate-fuse to create universal2 wheels
  • ARROW-16291 - [Java]: Support JSE17 for Java Cookbooks
  • ARROW-16292 - [Java][Doc] Upgrade java documentation for JSE17/JSE18
  • ARROW-16294 - [C++] Improve performance of parquet readahead
  • ARROW-16296 - [GLib] Add missing casts for GArrowRoundMode
  • ARROW-16303 - [C++] Check EINTR in file IO
  • ARROW-16308 - [CI] Upgrade windows runner version as windows-2016 is deprecated.
  • ARROW-16314 - [Python][CI] Skip running cython tests in windows verification builds
  • ARROW-16325 - [R] Add task for R package with gcc12
  • ARROW-16334 - [Archery][CI] Use build links on nightly report emails instead of branch link
  • ARROW-16338 - [CI] Update azure windows image as vs2017-win2016 is retired
  • ARROW-16347 - [Release] Escape backtick in verification script
  • ARROW-16349 - [Release][Packaging][RPM] Remove ed25519 keys from KEYS
  • ARROW-16350 - [Dev][Archery] Add missing newline in error message comment
  • ARROW-16352 - [GLib] Fix wrong enums.h install location
  • ARROW-16354 - [Packaging][RPM] Update artifacts pattern list
  • ARROW-16355 - [Dev] Update verify-release-candidate.sh to compile cpp in parallel
  • ARROW-16373 - [Docs][CI] Small improvements to CI documentation
  • ARROW-16387 - [C++] Add -Wshorten-64-to-32 to list of CHECKIN warnings tested by clang
  • ARROW-16390 - [C++] Dataset initialization could segfault if called simultaneously
  • ARROW-16408 - [C++] Add support for DATE type in SQLite FlightSQL example
  • ARROW-16411 - [Website] Migrate to Matomo from Google Analitics
  • ARROW-16412 - [Java] Updated README to reference compilation docs
  • ARROW-16416 - [C++] Support cast-function in Substrait
  • ARROW-16428 - [Release] Add prefix to ENV variables

Apache Arrow 6.0.1 (2021-11-18)

Bug Fixes

  • ARROW-14437 - [Python] Make CSV cancellation test more robust
  • ARROW-14492 - [JS] Fix export for browser bundles
  • ARROW-14513 - [Release][Go] Add /v6 suffix to release-6.0.0
  • ARROW-14519 - [C++] joins segfault when data contains list column
  • ARROW-14523 - [C++] Fix potential data loss in S3 multipart upload
  • ARROW-14538 - [R] Work around empty tr call on Solaris
  • ARROW-14550 - [Doc] Remove the JSON license; a non-free one.
  • ARROW-14583 - [R][C++] Crash when summarizing after filtering to no rows on partitioned data
  • ARROW-14584 - [Python][CI] Python sdist installation fails with latest setuptools 58.5
  • ARROW-14620 - [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior
  • ARROW-14630 - [C++] DCHECK in GroupByNode when error encountered
  • ARROW-14739 - [JS][Docs] Point to wrong source
  • ARROW-15071 - [C#] Fixed a bug in Column.cs ValidateArrayDataTypes method
  • ARROW-15072 - [R] Error: This build of the arrow package does not support Datasets

New Features and Improvements

  • ARROW-13156 - [R] bindings for str_count
  • ARROW-14181 - [C++][Compute] Hash Join support for dictionary
  • ARROW-14189 - [Docs] Add version dropdown to the sphinx docs
  • ARROW-14310 - [R] Make expect_dplyr_equal() more intuitive
  • ARROW-14365 - [R] Update README example to reflect new capabilities
  • ARROW-14390 - [Packaging][Ubuntu] Add support for Ubuntu 21.10
  • ARROW-14433 - [Release][APT] Skip arm64 Ubuntu 21.04 verification
  • ARROW-14450 - [R] Old macos build error
  • ARROW-14459 - [Doc] Update the pinned sphinx version to 4.2
  • ARROW-14480 - [R] Expose arrow::dataset::ExistingDataBehavior to R
  • ARROW-14486 - [Packaging][deb] Add missing libthrift-dev dependency
  • ARROW-14490 - [Doc] Regenerate CHANGELOG.md to include all versions
  • ARROW-14496 - [Docs] Create relative links for R / JS / C/Glib references in the sphinx toctree using stub pages
  • ARROW-14499 - [Docs] Version dropdown side-by-side with search box
  • ARROW-14514 - [C++][R] UBSAN error on round kernel
  • ARROW-14580 - [Python] update trove classifiers to include Python 3.10
  • ARROW-14623 - [Packaging][Java] Upload not only .jar but also .pom
  • ARROW-14628 - [Release][Python] Use python -m pytest
  • ARROW-15058 - [Java] Remove log4j2 dependency in performance module

Apache Arrow 6.0.0 (2021-10-26)

Bug Fixes

  • ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
  • ARROW-8452 - [Go] support proper nested nullable flags
  • ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
  • ARROW-8999 - [Python][C++] Non-deterministic segfault in “AMD64 MacOS 10.15 Python 3.7” build
  • ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
  • ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
  • ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
  • ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
  • ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
  • ARROW-11579 - [R] read_feather hanging on Windows
  • ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
  • ARROW-11729 - [R] Add examples to datasets documentation
  • ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
  • ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
  • ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
  • ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
  • ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
  • ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
  • ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
  • ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
  • ARROW-12837 - [C++] Do not crash when printing invalid arrays
  • ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
  • ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
  • ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
  • ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
  • ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
  • ARROW-13336 - [Doc] Make clean in docs should clean generated docs
  • ARROW-13422 - [R] Clarify README about S3 support on Windows
  • ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
  • ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
  • ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
  • ARROW-13430 - [Go] fix handling of zero value for FromBigInt
  • ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
  • ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
  • ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
  • ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
  • ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
  • ARROW-13446 - [Release] Fix verification on amazon linux
  • ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
  • ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
  • ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
  • ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
  • ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
  • ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
  • ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
  • ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
  • ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
  • ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
  • ARROW-13496 - [CI][R] Repair r-sanitizer job
  • ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
  • ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
  • ARROW-13500 - [C++] Fix using ‘-Wno-unknown-warning-option’ with GCC
  • ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
  • ARROW-13507 - [R] LTO job on CRAN fails
  • ARROW-13509 - [C++] Take kernel with empty inputs
  • ARROW-13522 - [C++] Fix regression in UTF8 trim functions
  • ARROW-13523 - [C++] Normalize test executable name
  • ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
  • ARROW-13529 - [Go] Fixing too many releases in IPC writer
  • ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
  • ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
  • ARROW-13556 - [C++] Add protobuf to linking for flight
  • ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
  • ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
  • ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
  • ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
  • ARROW-13600 - [C++] Fix maybe uninitialized warnings
  • ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
  • ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
  • ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
  • ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
  • ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
  • ARROW-13624 - [R] readr short type mapping has T and t backwards
  • ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
  • ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
  • ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
  • ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
  • ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
  • ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
  • ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
  • ARROW-13662 - [CI] Fix failing strftime test with older pandas
  • ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
  • ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
  • ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
  • ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
  • ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
  • ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
  • ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
  • ARROW-13694 - [R] Arrow filter crashes (R aborted session)
  • ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
  • ARROW-13744 - [CI] c++14 and 17 nightly job fails
  • ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
  • ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
  • ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
  • ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
  • ARROW-13786 - [R][CI] Don‘t fail the RCHK build if arrow doesn’t build
  • ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
  • ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
  • ARROW-13799 - [R] case_when error handling is capturing strings
  • ARROW-13800 - [R] Use divide instead of divide_checked
  • ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
  • ARROW-13814 - [CI] Fix Spark master integration tests
  • ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
  • ARROW-13846 - [C++] Fix crashes on invalid IPC file
  • ARROW-13850 - [C++] Fix crashes on invalid Parquet data
  • ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
  • ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
  • ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
  • ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
  • ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
  • ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
  • ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
  • ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
  • ARROW-13882 - [C++] Improve min_max/hash_min_max type support
  • ARROW-13884 - [JS] Move source files into a separate directory
  • ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
  • ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
  • ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
  • ARROW-13916 - [C++] Implement strftime on date32/64 types
  • ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
  • ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
  • ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
  • ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
  • ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
  • ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
  • ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
  • ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
  • ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
  • ARROW-13997 - [C++] restore exec node based query performance
  • ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
  • ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
  • ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
  • ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
  • ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
  • ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
  • ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
  • ARROW-14027 - [C++] Handle scalars in Grouper
  • ARROW-14040 - [C++] Fix result order dependence in scanner test
  • ARROW-14053 - [C++][CSV] Use atomic counter for async tests
  • ARROW-14057 - [C++] Bump aws-c-common version
  • ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
  • ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
  • ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
  • ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
  • ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
  • ARROW-14124 - [R] Timezone support in R <= 3.4
  • ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
  • ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
  • ARROW-14141 - [IR][C++] Join missing from RelationImpl
  • ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
  • ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
  • ARROW-14173 - [IR] Allow typed null literals to be represented
  • ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
  • ARROW-14184 - [C++] allow joins where the keys include new columns on the left
  • ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
  • ARROW-14195 - [R] Fix ExecPlan binding annotations
  • ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
  • ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
  • ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
  • ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
  • ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
  • ARROW-14206 - [Go][CI] Fix build on s390x and ARM
  • ARROW-14208 - [C++] Fix compilation on Windows
  • ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
  • ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
  • ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
  • ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
  • ARROW-14219 - [R][CI] DuckDB valgrind failure
  • ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
  • ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
  • ARROW-14223 - [C++] add missing third-party dependency
  • ARROW-14224 - [C++] Try to reduce build time/memory usage
  • ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
  • ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
  • ARROW-14240 - [C++] Fix wrong nlohmann-json header path
  • ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
  • ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
  • ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
  • ARROW-14252 - [R] Partial matching of arguments warning
  • ARROW-14255 - [Python] Fix FlightClient.do_action
  • ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
  • ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
  • ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
  • ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
  • ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
  • ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
  • ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
  • ARROW-14302 - [C++] Valgrind errors
  • ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
  • ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
  • ARROW-14313 - [Doc] Make Archery installation docs more accurate
  • ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
  • ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
  • ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
  • ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
  • ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
  • ARROW-14381 - [CI][Python] Fix Spark integration failures
  • ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
  • ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
  • ARROW-14393 - [C++] GTest linking errors during the source release verification
  • ARROW-14397 - [C++] Fix valgrind error in test utility
  • ARROW-14406 - [CI] Skip failing test on dask-master nightly build
  • ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
  • ARROW-14417 - [R] Joins ignore projection on left dataset
  • ARROW-14423 - [Python] Fix version constraints in pyproject.toml
  • ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
  • ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
  • ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
  • PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
  • PARQUET-2089 - [C++] Align RowGroup file_offset with specification

New Features and Improvements

  • ARROW-1565 - [C++] Implement TopK/BottomK
  • ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
  • ARROW-4333 - [C++] Sketch out design for kernels and “query” execution in compute layer
  • ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
  • ARROW-5002 - [C++] Implement Hash Aggregation query execution node
  • ARROW-5244 - [C++] Remove experimental marker from some APIs
  • ARROW-6072 - [C++] Implement casting List <-> LargeList
  • ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
  • ARROW-6626 - [Python] Support converting nested sets when converting to arrow
  • ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
  • ARROW-7102 - [Python] Make filesystems compatible with fsspec
  • ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
  • ARROW-7901 - [Go][Integration] enable integration tests for null case
  • ARROW-8022 - [C++] Add static and small vector implementations
  • ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
  • ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
  • ARROW-8621 - [Release] Add post release step to add tags for Go versioning
  • ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
  • ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
  • ARROW-9226 - [Python] Support core-site.xml default filesystem.
  • ARROW-9434 - [C++] Store type code in UnionScalar
  • ARROW-9719 - [Python] Improve HadoopFileSystem docstring
  • ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
  • ARROW-10415 - [R] Support for dplyr::distinct()
  • ARROW-10898 - [C++] Improve table sort performance
  • ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
  • ARROW-11243 - [C++] Recognize time types in CSV files
  • ARROW-11460 - [R] Use system libraries if present on Linux
  • ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
  • ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
  • ARROW-11828 - [C++] Expose CSVWriter object in api
  • ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
  • ARROW-11981 - [C++] Implement Union ExecNode
  • ARROW-12063 - [C++] Add null placement option to sort functions
  • ARROW-12181 - [C++][R] The “CSV dataset” in test-dataset.R is failing on RTools 3.5
  • ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
  • ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
  • ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
  • ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
  • ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
  • ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
  • ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
  • ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
  • ARROW-12657 - [C++] Adding String hex to numeric conversion
  • ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
  • ARROW-12673 - [C++] Add callback to handle incorrect column counts
  • ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
  • ARROW-12714 - [C++] String title case kernel
  • ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
  • ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
  • ARROW-12744 - [C++][Compute] Add rounding kernel
  • ARROW-12759 - [C++][Compute] Add ExecNode for group by
  • ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
  • ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
  • ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
  • ARROW-12871 - [R] upgrade to testthat 3e
  • ARROW-12876 - [R] Fix build flags on Raspberry Pi
  • ARROW-12944 - [C++] String capitalize kernel
  • ARROW-12946 - [C++] String swap case kernel
  • ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
  • ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
  • ARROW-12965 - [Java] C Data Interface implementation
  • ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
  • ARROW-12981 - [R] Install source package from CRAN alone
  • ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
  • ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
  • ARROW-13067 - [C++][Compute] Implement integer to decimal cast
  • ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
  • ARROW-13112 - [R] altrep vectors for strings and other types
  • ARROW-13132 - [C++] Add Scalar validation
  • ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
  • ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
  • ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
  • ARROW-13164 - [R] altrep vectors from Array with nulls
  • ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
  • ARROW-13174 - [C++][Compute] Add strftime kernel
  • ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
  • ARROW-13218 - [Format] Clarify interpretation of timestamp values
  • ARROW-13220 - [C++] Implement ‘choose’ function
  • ARROW-13222 - [C++] Improve type support for case_when
  • ARROW-13227 - [Documentation][Compute] Document ExecNode
  • ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
  • ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
  • ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
  • ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
  • ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
  • ARROW-13298 - [C++] Implement any/all hash aggregate kernels
  • ARROW-13307 - [C++] Remove reflection-based enums
  • ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
  • ARROW-13317 - [Python] Improve documentation on what ‘use_threads’ does in ‘read_feather’
  • ARROW-13326 - [R][Archery] Add linting to dev CI
  • ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
  • ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
  • ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
  • ARROW-13345 - [C++] Add basic implementation for log to base b
  • ARROW-13358 - [C++] Improve type support in if_else
  • ARROW-13379 - [Dev][Docs] Improvements to archery docs
  • ARROW-13390 - [C++] Implement coalesce for remaining types
  • ARROW-13397 - [R] Update arrow.Rmd vignette
  • ARROW-13399 - [R] Update dataset.Rmd vignette
  • ARROW-13402 - [R] Update flight.Rmd vignette
  • ARROW-13403 - [R] Update developing.Rmd vignette
  • ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
  • ARROW-13405 - [Doc] Guide users to the documentation for their own platform
  • ARROW-13416 - [C++] Implement mod compute function
  • ARROW-13420 - [JS] Update dependencies
  • ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
  • ARROW-13433 - [R] Remove CLI hack from Valgrind test
  • ARROW-13434 - [R] group_by() with an unnammed expression
  • ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
  • ARROW-13444 - [C++] Remove usage of deprecated std::result_of
  • ARROW-13448 - [R] Bindings for strftime
  • ARROW-13453 - [R] DuckDB has not yet released 0.2.8
  • ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
  • ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
  • ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
  • ARROW-13465 - [R] to_arrow() from duckdb
  • ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
  • ARROW-13468 - [Release] Fix binary download/upload failures
  • ARROW-13472 - [R] Remove .engine = “duckdb” argument
  • ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
  • ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
  • ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
  • ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
  • ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
  • ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
  • ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
  • ARROW-13489 - [R] Bump CI jobs after 5.0.0
  • ARROW-13501 - [R] Bindings for count aggregation
  • ARROW-13502 - [R] Bindings for min/max aggregation
  • ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
  • ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
  • ARROW-13508 - [C++] Support custom retry strategies in S3Options
  • ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
  • ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
  • ARROW-13516 - [C++] Detect --version-script flag availability
  • ARROW-13519 - [R] Make doc examples less noisy
  • ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
  • ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
  • ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
  • ARROW-13528 - [R] Bindings for mean, var, sd aggregation
  • ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
  • ARROW-13534 - [C++] Improve csv chunker
  • ARROW-13540 - [C++] Add order by sink node
  • ARROW-13541 - [C++][Python] Implement ExtensionScalar
  • ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
  • ARROW-13548 - [C++] Implement temporal difference kernels
  • ARROW-13549 - [C++] Add casts from timestamp to date/time
  • ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
  • ARROW-13552 - [C++] Remove deprecated APIs
  • ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
  • ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
  • ARROW-13562 - [R] Styler followups
  • ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
  • ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
  • ARROW-13573 - [C++] Support dictionaries natively in case_when
  • ARROW-13574 - [C++] Add ‘count all’ option to count kernels
  • ARROW-13575 - [C++] Add hash_product kernel
  • ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
  • ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
  • ARROW-13585 - [GLib] Add support for C ABI interface
  • ARROW-13587 - [R] Handle --use-LTO override
  • ARROW-13595 - [C++] Add debug mode check for compute kernel output type
  • ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
  • ARROW-13606 - [R] Actually disable LTO
  • ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
  • ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
  • ARROW-13618 - [R] Use Arrow engine for summarize() by default
  • ARROW-13620 - [R] Binding for n_distinct()
  • ARROW-13626 - [R] Bindings for log base b
  • ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
  • ARROW-13629 - [Ruby] Add support for building/converting map
  • ARROW-13633 - [Packaging][Debian] Add support for bookworm
  • ARROW-13634 - [R] Update distro() in nixlibs.R to map from “bookworm” to 12
  • ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
  • ARROW-13637 - [Python] Fix docstrings
  • ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
  • ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
  • ARROW-13646 - [Go][Parquet] adding the parquet metadata package
  • ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
  • ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
  • ARROW-13651 - [Ruby][Symbol] to Arrow array
  • ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
  • ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
  • ARROW-13670 - [C++] add virtual destructors
  • ARROW-13674 - [CI] PR checks should check for JIRA components
  • ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
  • ARROW-13679 - [GLib][Ruby] Add support for group aggregation
  • ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
  • ARROW-13682 - [C++] Add TDigest API to merge one TDigest
  • ARROW-13684 - [C++][Compute] Strftime kernel follow-up
  • ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
  • ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
  • ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
  • ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
  • ARROW-13696 - [Python] Support for MapType with Fields
  • ARROW-13699 - [Python][Docs] Improve filesystem documentation
  • ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
  • ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
  • ARROW-13704 - [C#] Add support for reading streaming format delta dictionaries
  • ARROW-13705 - [Website] Pin node version
  • ARROW-13721 - [Doc][Cookbook] Specifying Schemas - Python
  • ARROW-13733 - [Java] : Allow JDBC adapters to reuse vector schema roots
  • ARROW-13734 - [Format] Clarify allowed values for time types
  • ARROW-13736 - [C++] Reconcile PrettyPrint and StringFormatter
  • ARROW-13737 - [C++] Support for grouped aggregation over scalar columns
  • ARROW-13739 - [R] Support dplyr::count() and tally()
  • ARROW-13740 - [R] summarize() should not eagerly evaluate
  • ARROW-13757 - [R] Fix download of C++ source for CRAN patch releases
  • ARROW-13759 - [C++] Update linting and formatting scripts to specify python3 in shebang line
  • ARROW-13760 - [C++] Bump required Protobuf when using Flight
  • ARROW-13764 - [C++] Support CountOptions in grouped count distinct
  • ARROW-13768 - [R] Allow JSON to be an optional component
  • ARROW-13772 - [R] Binding for median aggregation
  • ARROW-13776 - [C++] Offline thirdparty versions.txt is missing extensions for some files
  • ARROW-13777 - [R] mutate after group_by should be ok as long as there are only scalar functions
  • ARROW-13778 - [R] Handle complex summarize expressions
  • ARROW-13782 - [C++] Add skip_nulls/min_count to tdigest/mode/quantile
  • ARROW-13783 - . [Python] Preview data when printing tables
  • ARROW-13785 - [C++] Add methods to print exec nodes/plans
  • ARROW-13787 - [C++] Verify third-party downloads
  • ARROW-13789 - [Go] Implement Scalar Values for Go
  • ARROW-13793 - [C++] Migrate ORCFileReader to Result
  • ARROW-13794 - [C++] Deprecate PARQUET_VERSION_2_0
  • ARROW-13797 - [C++][Python] Column projection pushdown for ORC dataset reading + use liborc for column selection
  • ARROW-13803 - [C++] Don't read past end of buffer in BitUtil::SetBitmap
  • ARROW-13804 - [Go] Add Interval type Month, Day, Nano
  • ARROW-13806 - [C++][Python] Add support for new MonthDayNano Interval Type
  • ARROW-13809 - [C++][ABI] Add support for MonthDayNanoInterval to C ABI
  • ARROW-13810 - [C++][Compute] Predicate IsAsciiCharacter allows invalid types and values
  • ARROW-13815 - [R] : Adapt to new callstack changes in rlang
  • ARROW-13816 - [Go][C] Implement Consumer APIs for C Data Interface in Go
  • ARROW-13820 - [R] Rename na.min_count to min_count and na.rm to skip_nulls
  • ARROW-13821 - [R] Handle na.rm in sd, var bindings
  • ARROW-13823 - [Java] : Exclude .factorypath
  • ARROW-13824 - [C++][Compute] Make constexpr BooleanToNumber kernel
  • ARROW-13831 - [GLib][Ruby] Add support for writing by Arrow Dataset
  • ARROW-13835 - [Doc][Python] Add documentation for unify_schemas
  • ARROW-13842 - [C++] Bump vendored date library
  • ARROW-13843 - [C++][CI] Exercise ToString / PrettyPrint in fuzzing setup
  • ARROW-13845 - [C++] Reconcile RandomArrayGenerator::ArrayOf implementations
  • ARROW-13847 - [Java] Avoid unnecessary collection copies
  • ARROW-13849 - [C++] Wrap min_max with min/max functions
  • ARROW-13852 - [R] Handle Dataset schema metadata in ExecPlan
  • ARROW-13853 - [R] String to_title, to_lower, to_upper kernels
  • ARROW-13855 - [C++][Python] Implement C data interface support for extension types
  • ARROW-13857 - [R][CI] Remove checkbashisms download
  • ARROW-13859 - [Java] Add code coverage support
  • ARROW-13866 - [R] Implement Options for all compute kernels available via list_compute_functions
  • ARROW-13869 - [R] Implement options for non-bound MatchSubstringOptions kernels
  • ARROW-13871 - [C++] JSON reader can fail if a list array key is present in one chunk but not in a later chunk
  • ARROW-13874 - [R] Implement TrimOptions
  • ARROW-13883 - [Python] Allow more than numpy.array as masks when creating arrays
  • ARROW-13890 - [R] Split up test-dataset.R and test-dplyr.R
  • ARROW-13893 - [R] Make head/tail lazy on datasets and queries
  • ARROW-13897 - [Python] Correct TimestampScalar.as_py() and DurationScalar.as_py() docstrings
  • ARROW-13898 - [C++][Compute] Add support for string binary transforms
  • ARROW-13899 - [Ruby] Implement slicer by compute kernels
  • ARROW-13901 - [R] Implement IndexOptions
  • ARROW-13904 - [R] Implement ModeOptions
  • ARROW-13905 - [R] Implement ReplaceSliceOptions
  • ARROW-13906 - [R] Implement PartitionNthOptions
  • ARROW-13908 - [R] Implement ExtractRegexOptions
  • ARROW-13909 - [GLib] Add tests for GArrowVarianceOptions
  • ARROW-13909 - [GLib] Add GArrowVarianceOptions
  • ARROW-13910 - [Ruby] accepts Range and selectors
  • ARROW-13919 - [GLib] Add GArrowFunctionDoc
  • ARROW-13924 - [R] Bindings for stringr::str_starts, stringr::str_ends, base::startsWith and base::endsWith
  • ARROW-13925 - [R] Remove system installation devdocs jobs
  • ARROW-13927 - [R] Add Karl to the contributors list for the pacakge
  • ARROW-13928 - [R] Rename the version(s) tasks so that it's clearer which is which
  • ARROW-13937 - [C++][Compute] Add explicit output values to sign function and fix unary type checks
  • ARROW-13942 - [Dev] Update cmake_format usage in autotune comment bot
  • ARROW-13944 - [C++] Bump xsimd to latest version
  • ARROW-13958 - [Python] Migrate Python ORC bindings to use new Result-based APIs
  • ARROW-13959 - [R] Update tests for extracting components from date32 objects
  • ARROW-13962 - [R] Catch up on the NEWS
  • ARROW-13963 - [Go] Minor: Add bitmap reader/writer impl from go Parquet module to Arrow Bitutil package
  • ARROW-13964 - MINOR: [Go][Parquet] remove base bitmap reader/writer from parquet module, use arrow bitutil ones
  • ARROW-13965 - [C++] dynamic_casts in parquet TypedColumnWriterImpl impacting performance
  • ARROW-13966 - [C++] Support decimals in comparisons
  • ARROW-13967 - [Go] Implement Concatenate function for array.Interface
  • ARROW-13973 - [C++] Add a SelectKSinkNode
  • ARROW-13974 - [C++] Resolve follow-up reviews for TopK/BottomK
  • ARROW-13975 - [C++] Implement decimal round
  • ARROW-13977 - [Format] clarify leap seconds for interval type
  • ARROW-13979 - [Go] Enable -race for go tests
  • ARROW-13990 - [R] Bindings for round kernels
  • ARROW-13994 - [Doc][C++] Build document misses git submodule update
  • ARROW-13995 - [R] Bindings for join node
  • ARROW-13999 - [C++] Fix bundled LZ4 build on MinGW
  • ARROW-14002 - [Python] Support tuples in unify_schemas
  • ARROW-14003 - [C++][Python] Not providing a sort_key in the “select_k_unstable” kernel crashes
  • ARROW-14005 - [R] Fix tests for PartitionNthOptions so that can run on various platformsFix partition_nth_indices test
  • ARROW-14006 - [C++][Python] Support cast of naive timestamps to strings
  • ARROW-14007 - [C++] Fix compiler warnings in decimal promotion helper
  • ARROW-14008 - [R][Compute] Running an ExecPlan should yield Reader instead of Table
  • ARROW-14009 - [C++] Seed parallellism in SourceNode
  • ARROW-14012 - [Python] Update kernel categories in compute doc to match C++
  • ARROW-14013 - [C++][Docs] Add instructions for Fedora
  • ARROW-14016 - [C++] Wrong type_name used for directory partitioning
  • ARROW-14019 - [R] expect_dplyr_equal() test helper function ignores grouping
  • ARROW-14023 - [Ruby] Arrow::Table#slice accepts Hash
  • ARROW-14025 - [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
  • ARROW-14030 - [GLib] Use arrow::Result based ORC API
  • ARROW-14031 - [Ruby] Use min and max separately
  • ARROW-14033 - [Ruby] Append OpenSSL's .pc path automatically on macOS with Homebrew
  • ARROW-14033 - [Ruby][Doc] Add macOS development guide for Red Arrow
  • ARROW-14035 - [C++][Python][R] Implement count distinct kernel
  • ARROW-14036 - [R] Binding for n_distinct() with no grouping
  • ARROW-14043 - [Python] Allow unsigned integer index type in dictionary() type factory function
  • ARROW-14044 - [R] Handle group_by .drop parameter in summarize
  • ARROW-14049 - [C++][Java] Upgrade ORC to 1.7.0
  • ARROW-14050 - [C++] Make TDigest/Quantile kernels return nulls instead
  • ARROW-14052 - [C++] Add approximate_median aggregation
  • ARROW-14054 - [C++][Docs] Simplify C++ row conversion example
  • ARROW-14055 - [Docs] Add canonical url to the sphinx docs
  • ARROW-14056 - [Doc][C++] Document ArrayData
  • ARROW-14061 - [Go][C++] Add Cgo Arrow Memory Pool Allocator
  • ARROW-14062 - [Format] Initial arrow-internal specification of compute IR
  • ARROW-14064 - [CI] Use Debian 11
  • ARROW-14069 - [R] By default, filter out hash functions in list_compute_functions()
  • ARROW-14070 - [C++][CI] Remove support for VS2015
  • ARROW-14072 - [GLib][Parquet] Add gparquet_arrow_file_reader_get_n_rows()
  • ARROW-14073 - [C++] Deduplicate sort keys
  • ARROW-14084 - [GLib][Ruby][Dataset] Add support for scanning from directory
  • ARROW-14088 - [GLib][Ruby][Dataset] Add support for filter
  • ARROW-14106 - [Go][C] Implement Exporting to the C Data Interface
  • ARROW-14107 - [R][CI] Parallelize Windows CI jobs
  • ARROW-14111 - [C++] Add extraction function support for time32/time64
  • ARROW-14116 - [C++][Docs] Consistent variable names in WriteCSV example
  • ARROW-14127 - [C++][Docs] Example of using compute function and output
  • ARROW-14128 - [Go] Implement MakeArrayFromScalar for nested types
  • ARROW-14132 - [C++] Improve CSV chunker tests
  • ARROW-14135 - [Python] Missing Python tests for compute kernels
  • ARROW-14140 - [R] skip arrow_binary/arrow_large_binary class from R metadata
  • ARROW-14143 - [IR][C++] Add explicit cast node to IR
  • ARROW-14146 - [Dev] Update merge script to specify python3 in shebang line
  • ARROW-14150 - [C++] Don't check delimiter in CSV chunker if no quoting
  • ARROW-14155 - [Go] add fingerprint and hash functions for types and scalars
  • ARROW-14157 - [C++] Refactor Abseil to its own macro
  • ARROW-14165 - [C++] Improve table sort performance
  • ARROW-14178 - [C++] Boost download location has moved
  • ARROW-14180 - [Packaging] Add support for AlmaLinux 8
  • ARROW-14191 - [C++][Dataset] Dataset writes should respect backpressure
  • ARROW-14194 - [Docs] Improve vertical spacing in the sphinx C++ API docs
  • ARROW-14198 - [Java] Upgrade netty, grpc, and boringssl dependencies
  • ARROW-14207 - [C++] Add missing dependencies for bundled Boost targets
  • ARROW-14212 - [GLib][Ruby] Add GArrowTableConcatenateOptions
  • ARROW-14217 - [Python][CI] Add support for python 3.10
  • ARROW-14222 - [C++] implement GCSFileSystem skeleton
  • ARROW-14228 - [R] Allow for creation of nullable fields
  • ARROW-14230 - [C++] Deprecate ArrayBuilder::Advance
  • ARROW-14232 - [C++] update crc32c to version 1.1.2
  • ARROW-14235 - [C++][Compute] Use a node counter as the label if no label is supplied
  • ARROW-14236 - [C++] Add GCS testbench for testing
  • ARROW-14239 - [R] Don't use rlang::as_label
  • ARROW-14241 - [C++][Java][CI] Fix java-jars build
  • ARROW-14243 - [C++] Split vector_sort.cc
  • ARROW-14244 - [C++] Reduce scalar_temporal.cc compilation time
  • ARROW-14258 - [R] Warn if an SF column is made into a table
  • ARROW-14259 - [R] converting from R vector to Array when the R vector is altrep
  • ARROW-14261 - [C++] Includes should be in alphabetical order
  • ARROW-14269 - [C++] Consolidate utf8 benchmark
  • ARROW-14274 - [C++] Refine base64 api
  • ARROW-14284 - [C++][Python] Improve error message when trying use SyncScanner when requiring async
  • ARROW-14291 - [CI][C++] Add cpp/examples/ files to lint targets
  • ARROW-14295 - [Doc] Indicate location of archery
  • ARROW-14296 - [Go] Update generated flatbuf
  • ARROW-14304 - [R] Update news for 6.0.0
  • ARROW-14309 - [Python] Extend CompressedInputStream to work with paths, strings and files
  • ARROW-14317 - [Doc] Update C data interface implementation status
  • ARROW-14326 - [Docs] Add C/GLib and Ruby to C Data/Stream interface supported libraries
  • ARROW-14327 - [Release] Remove conda-* from packaging group
  • ARROW-14335 - [GLib][Ruby] Add support for expression
  • ARROW-14337 - [C++] Arrow doesn't build on M1 when SIMD acceleration is enabled
  • ARROW-14341 - [C++] Improve decimal benchmark
  • ARROW-14343 - [Packaging][Python] Enable NEON SIMD optimization for M1 wheels
  • ARROW-14345 - [C++] Implement streaming reads
  • ARROW-14348 - [R] add group_vars.RecordBatchReader method
  • ARROW-14349 - [IR] Remove RelBase
  • ARROW-14358 - [Doc] Update CMake options in documentation
  • ARROW-14361 - [C++] Add default simd level
  • ARROW-14364 - [CI][C++] Support LLVM 13
  • ARROW-14368 - [CI] Use ubuntu-latest for Azure Pipelines
  • ARROW-14369 - [C++][Python] Use std::move() explicitly for g++ 4.8.5
  • ARROW-14386 - [Packaging][Java] Ensure using installed devtoolset version
  • ARROW-14387 - [Release][Ruby] Check Homebrew/MSYS2 package version before releasing
  • ARROW-14396 - [R][Doc] Remove relic note in write_dataset that columns cannot be renamed
  • ARROW-14400 - [Go] Equals and ApproxEquals for Tables and Chunked Arrays
  • ARROW-14401 - [C++] Fix bundled crc32c's include path
  • ARROW-14402 - [Release][Yum] Specify gpg path explicitly
  • ARROW-14404 - [Release][APT] Skip arm64 Debian GNU/Linux bookwarm verification
  • ARROW-14408 - [Packaging][Crossbow] Option for skipping artifact pattern validation
  • ARROW-14410 - [Python][Packaging] Use numpy 1.21.3 to build python 3.10 wheels for macOS and windows
  • ARROW-14452 - [Release][JS] Update Javascript testing
  • ARROW-14511 - [Website][Rust] Rust 6.0.0 release blog post
  • PARQUET-490 - [C++][Parquet] Basic support for reading DELTA_BINARY_PACKED data

Apache Arrow 5.0.0 (2021-07-28)

Bug Fixes

  • ARROW-6189 - [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values
  • ARROW-6312 - [C++] Add support for “pkg-config --static arrow”
  • ARROW-7948 - [Go] Decimal128 Integration fix
  • ARROW-9594 - [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas
  • ARROW-10910 - [Python] Provide better error message when trying to read from None source
  • ARROW-10958 - [GLib] “Nested data conversions not implemented” through glib, but not through pyarrow
  • ARROW-11077 - [Rust] ParquetFileArrowReader panicks when trying to read nested list
  • ARROW-11146 - [CI] Remove test-conda-python-3.8-jpype build
  • ARROW-11161 - [C++][Python] Add stream metadata
  • ARROW-11633 - [CI][Doc] Maven default skin not found
  • ARROW-11780 - [Python] Avoid crashing when a ChunkedArray is provided to StructArray.from_arrays()
  • ARROW-11908 - [Rust] Intermittent Flight integration test failures
  • ARROW-12007 - [C++] Loading parquet file returns “Invalid UTF8 payload” error
  • ARROW-12055 - [R] is.na() evaluates to FALSE on Arrow NaN values
  • ARROW-12096 - [C++] Allows users to define arrow timestamp unit for Parquet INT96 timestamp
  • ARROW-12122 - [Python] Cannot install via pip M1 mac
  • ARROW-12142 - [Python][Doc] Mention the CXX ABI flag in the docs
  • ARROW-12150 - [Python] Correctly infer type of mixed-precision Decimals
  • ARROW-12232 - [Rust][Datafusion] Error with CAST: Unsupported SQL type Time
  • ARROW-12240 - [Python] Fix invalid-offsetof warning
  • ARROW-12377 - [Doc][Java] Java doc build broken
  • ARROW-12407 - [Python][Dataset] Remove ScanTask bindings
  • ARROW-12431 - [Python] Mask is inverted when creating FixedSizeBinaryArray
  • ARROW-12472 - [Python] Properly convert paths to strings (using fspath)
  • ARROW-12482 - [Doc][C++][Python] Mention CSVStreamingReader pitfalls with type inference
  • ARROW-12491 - [Packaging][RPM] Add support for Amazon Linux 2
  • ARROW-12503 - [C++] Ensure using “lib/” for jemalloc's library directory
  • ARROW-12508 - [R] expect_as_vector implementation causes test failure on R <= 3.3 & variables defined outside of test_that break build when no arrow install
  • ARROW-12543 - [CI][Python] Fix test-conda-python-3.9 build (gdb version conflict)
  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12569 - [R][CI] Run revdep in CI
  • ARROW-12570 - [JS] Fix issues that blocked the v4.0.0 release
  • ARROW-12579 - [Python] Pyarrow 4.0.0 dependency numpy 1.19.4 throws errors on Apple silicon/M1 compilation
  • ARROW-12589 - [C++] Compiling on windows doesn't work when -DARROW_WITH_BACKTRACE=OFF
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12605 - [Documentation] Update line numbers in cpp/dataset.rst
  • ARROW-12606 - [C++][Compute] Fix Quantile and Mode on arrays with offset
  • ARROW-12610 - [C++] Skip TestS3FSGeneric TestDeleteDir and TestDeleteDirContents on Windows as they are flaky
  • ARROW-12611 - [CI][Python] Add different numpy versions to pandas nightly builds
  • ARROW-12613 - [Python] Support comparison to None in Scalar values
  • ARROW-12614 - [C++][Compute] Remove support for Tables in ExecuteScalarExpression
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12620 - [C++][Dataset] Fix projection during writing
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12630 - [Dev][Integration] conda-integration docker build fails
  • ARROW-12639 - [CI][Archery] Archery build fails to create branch
  • ARROW-12640 - [C++] Fix errors from VS 2019 in cpp/src/parquet/types.h
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12644 - [C++][Python][R][Dataset] URL-decode path segments in partitioning
  • ARROW-12646 - [C++][CI][Packaging][Python] Bump vcpkg version to its latest release
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12668 - [C++][Dataset] Fix segfault in CountRows
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12672 - [C++] Fix fill_null kernel to set null_count + cast kernel to handle no-bitmap with unknown null_count case
  • ARROW-12679 - [Java] JDBC->Arrow for NOT NULL columns.
  • ARROW-12684 - [Go][Flight] fix nil pointer dereference, add test.
  • ARROW-12708 - [C++] Valgrind errors when calling negate_checked
  • ARROW-12729 - [R] Fix length method for Table, RecordBatch
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12756 - [C++] MSVC build fails with latest gtest from vcpkg
  • ARROW-12757 - [Archery] Fix spurious warning when running “archery docker run”
  • ARROW-12762 - [Python] Preserve field name when pickling list types
  • ARROW-12769 - [Python] Fix slicing array with “negative” length (start > stop)
  • ARROW-12771 - [C++][Compute] Fix MaybeReserve parameter in the Consume function of GroupedCountImpl
  • ARROW-12772 - [CI] Merge script test fails due to missing dependency
  • ARROW-12773 - [Docs] Clarify Java support for ORC and Parquet via JNI bindings
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12779 - [Python][FlightRPC] Guard against DoGet handler that never sends data
  • ARROW-12780 - [CI][C++] Install necessary packages for MinGW builds
  • ARROW-12790 - [C++] Improve HadoopFileSystem conformance
  • ARROW-12793 - [Python] Fix support for pyarrow debug builds
  • ARROW-12797 - [JS] Update readme with new links and remove outdated examples
  • ARROW-12798 - [JS] Use == null Comparison
  • ARROW-12799 - [JS] Use Nullish Coalescing Operator (??) For Defaults
  • ARROW-12804 - [C++] Return expected result for IsNull and IsValid for NullArray
  • ARROW-12807 - [C++] Fix build errors in IPC reader
  • ARROW-12838 - [Java][Gandiva] Fix JNI CI test
  • ARROW-12842 - [FlightRPC][Java] Fix sending trailers using CallStatus
  • ARROW-12850 - [R] is.nan() evaluates to null on Arrow null values
  • ARROW-12854 - [Dev][Release] Windows wheel verification script fails to download artifacts
  • ARROW-12857 - [C++] Fix build of hash_aggregate_test
  • ARROW-12864 - [C++] Remove needless out argument from arrow::internal::InvertBitmap
  • ARROW-12865 - [C++][FlightRPC] Link gRPC with RE2
  • ARROW-12882 - [C++][Gandiva] Fix behavior of the convert replace function on gandiva
  • ARROW-12887 - [CI] AppVeyor SSL certificate issue
  • ARROW-12906 - [C++][Python] Fix fill_null segfault
  • ARROW-12907 - [Java] Fix memory leak on deserialization errors
  • ARROW-12911 - [Python] Export scalar aggregate options to pc.sum
  • ARROW-12917 - [C++] Fix handling of decimal types with negative scale in C data import
  • ARROW-12918 - [C++] Fill out iterator_traits
  • ARROW-12919 - [Dev][Archery] Crossbow comment bot failing to react to comments
  • ARROW-12935 - [C++][CI] Fix compiler error on some clang versions
  • ARROW-12941 - [C++] Add rows skipped to rows seen
  • ARROW-12942 - [C++][Compute] Fix incorrect result of Arrow compute hash_min_max with a chunked array
  • ARROW-12956 - [C++] Fix crash on Parquet file (OSS-Fuzz)
  • ARROW-12969 - [C++] Fix match_substring with empty haystack
  • ARROW-12974 - [R] test-r-without-arrow build fails because of example requiring Arrow
  • ARROW-12983 - [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion
  • ARROW-12987 - [C++][CI] Switch to bundled utf8proc with version 2.2 in Ubuntu 18.04 images
  • ARROW-12988 - [CI][Python] Revert skip of failing test in kartothek nightly integration build
  • ARROW-12988 - [CI] Skip the failing test in kartothek nightly integration build
  • ARROW-12989 - [CI] Avoid aggressive cancellation of the “Dev PR” workflow
  • ARROW-12991 - [CI] Migrate Travis-CI ARM job to “arm64-graviton2” arch
  • ARROW-12993 - [Python] Avoid half-initialized FeatherReader object
  • ARROW-12995 - [C++] Add validation to CSV options
  • ARROW-12998 - [C++] Add dataset->toolchain dependency
  • ARROW-13001 - [Go][Parquet] fix build failure on s390x
  • ARROW-13003 - [C++] Fix key map unaligned access
  • ARROW-13008 - [C++] Avoid deprecated API in minimal example
  • ARROW-13010 - [C++][Compute] Support outputting to slices from kleene kernels
  • ARROW-13018 - [C++][Docs] Use consistent terminology for nulls (min_count) in scalar aggregate kernels
  • ARROW-13026 - [CI] Use LLVM 10 for s390x
  • ARROW-13037 - [R] Incorrect param when creating Expression crashes R
  • ARROW-13039 - [R] Fix error message handling
  • ARROW-13041 - [C++] Ensure unary kernels zero-initialize data behind null entries
  • ARROW-13046 - [Release] JS package failing test prior to publish
  • ARROW-13048 - [C++] Fix copying objects with special characters on S3FS
  • ARROW-13053 - [Python] Fix build issue with Homebrewed arrow library
  • ARROW-13069 - [Website] Add Daniël to committer list
  • ARROW-13073 - [Developer] archery benchmark list: unexpected keyword ‘benchmark_filter’
  • ARROW-13080 - [Release] Generate the API docs in ubuntu 20.10
  • ARROW-13083 - [Python] Wrong SCM version detection both in setup.py and crossbow
  • ARROW-13085 - [Python] Document compatible toolchains for python bindings
  • ARROW-13090 - [Python] Fix create_dir() implementation in FSSpecHandler
  • ARROW-13104 - [C++] Fix unsafe cast in ByteStreamSplit implementation
  • ARROW-13108 - [Python] Pyarrow 4.0.0 crashes upon import on macOS 10.13.6
  • ARROW-13116 - [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to missing dependencies
  • ARROW-13125 - [R] Throw error when 2+ args passed to desc() in arrange()
  • ARROW-13128 - [C#] TimestampArray conversion logic for nano and micro is wrong
  • ARROW-13135 - [C++] Fix Status propagation from Parquet exception
  • ARROW-13139 - [C++] ReadaheadGenerator cannot be safely copied/moved
  • ARROW-13145 - [C++][CI] Flight test crashes on MinGW
  • ARROW-13148 - [Dev][Archery] Fix crossbow job submission
  • ARROW-13153 - [C++] parquet_dataset loses ordering of files in _metadata
  • ARROW-13154 - [C++] Remove the undocumented type_code <= 125 restriction in union types
  • ARROW-13169 - [C++][Compute] Fix array offset support in GrouperFastImpl
  • ARROW-13173 - [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
  • ARROW-13187 - [Python] Avoid creating reference cycle when reading CSV file
  • ARROW-13189 - [R] Disable row-level metadata application on datasets
  • ARROW-13203 - [R] Fix optional component checks causing failures
  • ARROW-13207 - [Python][Doc] Dataset documentation still suggests deprecated scan method as the preferred iterative approach
  • ARROW-13216 - [R] Type checks test fails with rtools35
  • ARROW-13217 - [C++][Gandiva] Correct error on convert replace function for initial invalid bytes
  • ARROW-13223 - [C++] Fix Thread Sanitizer test failures
  • ARROW-13225 - [Go][FlightRPC][Integration] Implement Flight Custom Middleware and Integration Tests for Go
  • ARROW-13229 - [Python] ascii_trim, ascii_ltrim and ascii_rtrim lack options
  • ARROW-13239 - [Python][Doc] Expose signatures in pyx modules
  • ARROW-13243 - [R] altrep function call in R 3.5
  • ARROW-13246 - [C++] Using CSV skip_rows_after_names can cause data to be discarded prematurely
  • ARROW-13249 - [Java][CI] Consistent timeout in the Java JNI build
  • ARROW-13253 - [FlightRPC][C++] Fix segfault with large messages
  • ARROW-13254 - [Python] Processes killed and semaphore objects leaked when reading pandas data
  • ARROW-13265 - [R] cli valgrind errors in nightlies
  • ARROW-13266 - [JS] Improve benchmark names & include suite name in json
  • ARROW-13281 - [C++][Gandiva] Correct error on timestampDiffMonth function
  • ARROW-13284 - [C++] Fix wrong pkg_check_modules() option name
  • ARROW-13288 - [Python] Missing default values of kernel options in PyArrow
  • ARROW-13290 - [C++] Add missing include
  • ARROW-13305 - [C++] Unable to install nightly on Ubuntu 21.04 due to CSV options
  • ARROW-13315 - [R] Wrap r_task_group includes with ARROW_R_WITH_ARROW checking
  • ARROW-13321 - - [C++][Python] MakeArrayFromScalar doesn't work for FixedSizeBinaryType
  • ARROW-13324 - [R] Typo in bindings for utf8_reverse and ascii_reverse
  • ARROW-13332 - [C++] TSAN failure in TestAsyncUtil.ReadaheadFailed
  • ARROW-13341 - [C++][Compute] Fix race condition in ScalarAggregateNode
  • ARROW-13350 - [Python][CI] Fix test_extract_datetime_components for pandas 0.24
  • ARROW-13352 - [C++] Make sure scalar case_when fully initializes output
  • ARROW-13353 - [Docs] Pin breathe to avoid failure parsing template parameters
  • ARROW-13360 - [C++] Missing dependencies in cpp thirdparty offline dependencies versions.txt
  • ARROW-13363 - [R] is.nan() errors on non-floating point data
  • ARROW-13368 - [C++][Doc] Rename project to make_struct in docs
  • ARROW-13381 - [C++] ArrayFromJSON doesn't work for float value dictionary type
  • ARROW-13382 - [C++] Avoid multiple definitions of same symbol
  • ARROW-13384 - [C++] Specify minimum required zstd version in cmake
  • ARROW-13391 - [CSV] Correct row and column number to error messages with CSV streaming reader
  • ARROW-13417 - [C++] The merged generator can sometimes pull from source sync-reentrant
  • ARROW-13419 - [JS] Fix perf tests
  • ARROW-13428 - [C++][Flight] Add missing -lssl with bundled gRPC and system shared OpenSSL
  • ARROW-13431 - [Release] Bump go version to 1.15; don't verify rust source anymore
  • ARROW-13432 - [Release] Fix ssh connection to the binary uploader container

New Features and Improvements

  • ARROW-2665 - [C++][Python] Add index() kernel
  • ARROW-3014 - [C++] Minimal writer adapter for ORC file format
  • ARROW-3316 - [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
  • ARROW-5385 - [Go] Implement EXTENSION datatype
  • ARROW-5640 - [Go] Implement Arrow Map Array
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-7001 - [C++] Develop threading APIs to accommodate nested parallelism
  • ARROW-7114 - [JS][CI] Enable NodeJS tests for Windows
  • ARROW-7252 - [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation
  • ARROW-7396 - [Format] Register media types (MIME types) for Apache Arrow formats to IANA
  • ARROW-8421 - [Rust] [Parquet] Implement parquet writer
  • ARROW-8459 - [Dev][Archery] Use a more recent cmake-format
  • ARROW-8527 - [C++][CSV] Add support for ReadOptions::skip_rows >= block_size
  • ARROW-8655 - [C++][Python] Preserve partitioning information for a discovered Dataset
  • ARROW-8676 - [Rust] Create implementation of IPC RecordBatch body buffer compression from ARROW-300
  • ARROW-9054 - [C++] Add ScalarAggregateOptions
  • ARROW-9056 - [C++] Support aggregations over scalars
  • ARROW-9140 - [R] Zero-copy Arrow to R where possible
  • ARROW-9295 - [Archery] Support rust clippy in the lint command
  • ARROW-9299 - [C++][Python] Expose ORC metadata
  • ARROW-9313 - [Rust] Use feature enum
  • ARROW-9421 - [C++][Parquet] Redundancies SchemaManifest::GetFieldIndices
  • ARROW-9430 - [C++] Implement replace_with_mask kernel
  • ARROW-9697 - [C++][Python][R][Dataset] Add CountRows for Scanner
  • ARROW-10031 - [CI][Java] Support Java benchmark in Archery
  • ARROW-10115 - [C++] Add CSV option to treat quoted strings as always non-null
  • ARROW-10316 - [Python] Improve introspection of compute function options
  • ARROW-10391 - [Rust] [Parquet] Nested Arrow reader
  • ARROW-10440 - [C++][Dataset] Visit FileWriters before Finish
  • ARROW-10550 - [Rust] [Parquet] Write nested types (struct, list)
  • ARROW-10557 - [C++] Add scalar string slicing/substring extract kernel
  • ARROW-10640 - [C++] A, “if_else” (“where”) kernel to combine two arrays based on a mask
  • ARROW-10658 - [Python][Packaging] Wheel builds for Apple Silicon
  • ARROW-10675 - [C++][Python] Support AWS S3 Web identity credentials
  • ARROW-10797 - [C++] Vendor and use PCG random generator library
  • ARROW-10926 - [Rust] Add parquet reader / writer for decimal types
  • ARROW-10959 - [C++] Add scalar string join kernel
  • ARROW-11061 - [Rust] Validate array properties against schema
  • ARROW-11173 - [Java] Add map type in complex reader / writer
  • ARROW-11199 - [C++][Python] Fix the unit tests for the ORC reader
  • ARROW-11206 - [C++][Compute][Python] Rename ‘project’ to ‘make_struct’
  • ARROW-11342 - [Python][Gandiva] Expose ToString and result type information
  • ARROW-11499 - [Release] Use Artifactory instead of Bintray
  • ARROW-11514 - [R][C++] Bindings for paste(), paste0(), str_c()
  • ARROW-11515 - [R] Bindings for strsplit
  • ARROW-11565 - [C++][Gandiva] Modify upper()/lower() to work with UTF8 and add INIT_CAP function
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11608 - [CI] Fix turbodbc nightly
  • ARROW-11660 - [C++] Move RecordBatch::SelectColumns method from R to C++ library
  • ARROW-11673 - - [C++] Casting dictionary type to use different index type
  • ARROW-11675 - [CI][C++] Resolve ctest failures on VS 2019 builds
  • ARROW-11705 - [R] Support scalar value recycling in RecordBatch/Table$create()
  • ARROW-11759 - [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type
  • ARROW-11769 - [R] Pull groups from grouped_df into RecordBatch or Table
  • ARROW-11772 - [C++] Provide reentrant IPC file reader
  • ARROW-11782 - [GLib][Ruby][Dataset] Remove bindings for internal classes
  • ARROW-11787 - [R] Implement write csv
  • ARROW-11843 - [C++] Provide async Parquet reader
  • ARROW-11849 - [R] Use roxygen @examplesIf
  • ARROW-11889 - [C++] Add parallelism to streaming CSV reader
  • ARROW-11909 - [C++] Remove MakeIteratorGenerator
  • ARROW-11926 - [R] Add ucrt64 binaries and fix CI
  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-11928 - [C++] Execution engine API
  • ARROW-11929 - [C++][Dataset][Compute] Promote expression to the compute namespace
  • ARROW-11930 - [C++][Dataset][Compute] Use an ExecPlan for dataset scans
  • ARROW-11932 - [C++] Provide ArrayBuilder::AppendScalar
  • ARROW-11950 - [C++][Compute] Add unary negative kernel
  • ARROW-11960 - [C++][Gandiva] Support escape in LIKE
  • ARROW-11980 - [Python] Remove experimental status from Table.replace_schema_metadata
  • ARROW-11986 - [C++][Gandiva] Implement IN expressions for doubles and floats
  • ARROW-11990 - [C++][Compute] Handle errors consistently
  • ARROW-12004 - [C++] Resultdetail::Empty is annoying
  • ARROW-12010 - [C++][Compute] Improve performance of the hash table used in GroupIdentifier
  • ARROW-12016 - [C++] Implement array_sort_indices and sort_indices for BOOL type
  • ARROW-12050 - [C++][Python][FlightRPC] Make Flight operations interruptible in Python
  • ARROW-12074 - [C++][Compute] Add scalar arithmetic kernels for decimal
  • ARROW-12083 - [C++][Dataset] Use given column types when determining CSV fragment schema
  • ARROW-12092 - [R] Make expect_dplyr_equal() a bit stricter
  • ARROW-12166 - [C++][Gandiva] Implements CONVERT_TO(value, type) function
  • ARROW-12184 - [R] Bindings for na.fail, na.omit, na.exclude, na.pass
  • ARROW-12185 - [R] Bindings for any, all
  • ARROW-12198 - [R] bindings for strptime
  • ARROW-12199 - [R] bindings for stddev, variance
  • ARROW-12205 - [C++][Gandiva][number][number] seconds) function
  • ARROW-12231 - [C++][Python][Dataset] Isolate one-shot data to scanner
  • ARROW-12253 - [Rust] [Ballista] Implement scalable joins
  • ARROW-12255 - [Rust] [Ballista] Integrate scheduler with DataFusion
  • ARROW-12256 - [Rust] [Ballista] Add DataFrame support
  • ARROW-12257 - [Rust] [Ballista] Publish user guide to Arrow site
  • ARROW-12261 - [Rust] [Ballista] Ballista should not have its own DataFrame API
  • ARROW-12291 - [R] Determine the type of an unevaluated expression
  • ARROW-12310 - [Java] ValueVector#getObject should support covariance for complex types
  • ARROW-12355 - [C++] Implement efficient async CSV scanning
  • ARROW-12362 - [Rust] [DataFusion] topk_query test failure
  • ARROW-12364 - [Python][Dataset] Add metadata_collector option to ds.write_dataset()
  • ARROW-12378 - [C++][Gandiva] Implement castVARBINARY functions
  • ARROW-12386 - [C++] Support file parallelism in AsyncScanner
  • ARROW-12391 - [Rust][DataFusion] Implement date_trunc() function
  • ARROW-12392 - [C++] Restore asynchronous streaming CSV reader
  • ARROW-12393 - [JS] Use closure compiler for all UMD targets
  • ARROW-12403 - [Rust] [Ballista] Integration tests should check that query results are correct
  • ARROW-12415 - [CI][Python] Failed building wheel for pygit2 on ARM64
  • ARROW-12424 - [Go][Parquet] Adding Schema Package for Go Parquet
  • ARROW-12428 - [Python] Expose pre_buffer in pyarrow.parquet
  • ARROW-12434 - [Rust] [Ballista] Show executed plans with metrics
  • ARROW-12442 - [CI] Set job timeouts on GitHub Actions
  • ARROW-12443 - [C++][Gandiva] Implement castVARCHAR function for varbinary input
  • ARROW-12444 - [Rust] Remove rust
  • ARROW-12445 - [Rust] Design and implement packaging process to bundle Rust in signed tar
  • ARROW-12468 - [Python][R] Expose ScannerBuilder::UseAsync to Python & R
  • ARROW-12478 - [C++] Support LLVM 12
  • ARROW-12484 - [CI] Change jinja macros to not require CROSSBOW_TOKEN to upload artifacts in Github Actions
  • ARROW-12489 - [Developer] autotune is broken
  • ARROW-12490 - [Dev] Use only miniforge in verify-release-candidate.sh
  • ARROW-12492 - [Python] Helper method to decode DictionaryArray back to Array
  • ARROW-12496 - [C++][Dataset] Ensure AsyncScanner is covered by all scanner tests
  • ARROW-12499 - [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
  • ARROW-12500 - [C++][Datasets] Ensure better test coverage of Dataset file formats
  • ARROW-12501 - [CI][Ruby] Remove needless workaround for MinGW build
  • ARROW-12507 - [CI] Remove duplicated cron/nightly builds
  • ARROW-12512 - [C++][Python][Dataset] Create CSV writer class and add Datasets support
  • ARROW-12514 - [Release] Don't run Gandiva related Ruby test with ARROW_GANDIVA=OFF
  • ARROW-12517 - [Go][Flight] Expose app metadata in flight client and server
  • ARROW-12518 - [Python] Expose Parquet statistics has_null_count / has_distinct_count
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12522 - [C++] Add ReadRangeCache::WaitFor
  • ARROW-12525 - [JS] Vector toJSON() returns an array
  • ARROW-12527 - [Dev] Don't try getting JIRA information for MINOR PR
  • ARROW-12528 - [JS] Support typed arrays in Table.new
  • ARROW-12530 - [C++] Remove Buffer::mutable_data_
  • ARROW-12533 - [C++] Add random real distribution function
  • ARROW-12534 - [C++][Gandiva] Implement LEFT and RIGHT functions on Gandiva for string input values
  • ARROW-12537 - [JS] Docs build should not include test sources
  • ARROW-12541 - [Docs] Improve styling/readability of tables in the new doc theme
  • ARROW-12551 - [Java][Release] Java post-release tests fail due to missing testing data
  • ARROW-12554 - [C++] Allow duplicates in SetLookupOptions::value_set
  • ARROW-12555 - [Java][Release] Java post-release script misses dataset JNI bindings
  • ARROW-12556 - [C++][Gandiva] Implement BYTESUBSTRING function on Gandiva
  • ARROW-12560 - [C++] Add scheduling option for Future callbacks
  • ARROW-12567 - [C++][Gandiva] Implement ILIKE SQL function
  • ARROW-12567 - [C++][Gandiva] Implement LPAD and RPAD functions for string input values
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12575 - [R] Use unary negative kernel
  • ARROW-12577 - [Website] Use Artifactory instead of Bintray in all places
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12581 - [C++][FlightRPC] Allow benchmarking DoPut with a data file
  • ARROW-12584 - [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators
  • ARROW-12591 - [Java][Gandiva] Create single Gandiva jar for MacOS and Linux
  • ARROW-12593 - [Packaging][Ubuntu] Add support for Ubuntu 21.04
  • ARROW-12597 - [C++] Enable per-row-group parallelism in async Parquet reader
  • ARROW-12598 - [C++][Dataset] Speed up CountRows for CSV
  • ARROW-12599 - [Doc][Python] Documentation missing for pyarrow.Table
  • ARROW-12600 - [CI] Push docker images from crossbow tasks
  • ARROW-12602 - [R] Add BuildInfo from C++ to arrow_info
  • ARROW-12608 - [C++][Python][R] Add split_pattern_regex kernel
  • ARROW-12612 - [C++] Add Expression to type_fwd.h
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12621 - [C++][Gandiva] Add alias to sha1 and sha256 functions
  • ARROW-12631 - [Python] Accept Scanner in pyarrow.dataset.write_dataset
  • ARROW-12643 - [Governance] Added experimental repos guidelines.
  • ARROW-12645 - [Python] Fix numpydoc validation
  • ARROW-12648 - [C++][FlightRPC] Enable TLS for Flight benchmark
  • ARROW-12649 - [Python/Packaging] Move conda-aarch64 to Azure with cross-compilation
  • ARROW-12653 - [Archery] allow me to add a comment to crossbow requests
  • ARROW-12658 - [C++] Bump aws-c-common to v0.5.10
  • ARROW-12660 - [R] Post-4.0 adjustments for CRAN
  • ARROW-12661 - [C++] Add ReaderOptions::skip_rows_after_names
  • ARROW-12662 - [Website] Force to use squash merge
  • ARROW-12667 - [Python] Add a more complete test for strided numpy array conversion
  • ARROW-12675 - [C++] CSV parsing report row on which error occurred
  • ARROW-12677 - [Python] Add a mask argument to pyarrow.StructArray.from_arrays
  • ARROW-12685 - [C++][Compute] Add unary absolute value kernel
  • ARROW-12686 - [C++][Python][FlightRPC] Convert Flight reader into a regular reader
  • ARROW-12687 - [C++][Python][Dataset] Convert Scanner into a RecordBatchReader
  • ARROW-12689 - [R] Implement ArrowArrayStream C interface
  • ARROW-12692 - [R] Improve tests and comments for strsplit() bindings
  • ARROW-12694 - [C++] Fix segfault under RTools35 toolchain
  • ARROW-12696 - [R] Improve testing of error messages converted to warnings
  • ARROW-12699 - [CI][Packaging][Java] Generate a jar compatible with Linux and MacOS for all Arrow components
  • ARROW-12702 - [JS] Update webpack and terser
  • ARROW-12703 - [JS] Separate Table from DataFrame
  • ARROW-12704 - [JS] Support and use optional chaining
  • ARROW-12709 - [C++] Add binary_join_element_wise
  • ARROW-12713 - [C++] String reverse kernel
  • ARROW-12715 - [C++][Python] Add SQL LIKE match kernel
  • ARROW-12716 - [C++] Add string padding kernel
  • ARROW-12717 - [C++][Python] Add find_substring kernel
  • ARROW-12719 - [C++] Allow passing S3 canned ACL as output stream metadata
  • ARROW-12721 - [CI] Fix path for uploading aarch64 conda artifacts from the nightly builds
  • ARROW-12722 - [R] Raise error when attemping to print table with duplicated naming
  • ARROW-12730 - [MATLAB] Update featherreadmex and featherwritemex to build against latest Arrow C++ APIs
  • ARROW-12731 - [R] Use InMemoryDataset for Table/RecordBatch in dplyr code
  • ARROW-12736 - [C++] Eliminate forced copy of potentially large vector<shared_ptr<>>
  • ARROW-12738 - [C++/Python/R] Update conda variant files
  • ARROW-12741 - [CI] Configure Crossbow GitHub Token for Nightly Builds
  • ARROW-12745 - [C++][Compute] Add floor, ceiling, and truncate kernels
  • ARROW-12749 - [C++] Construct RecordBatch/Table/Schema with rvalue arguments
  • ARROW-12750 - [CI][R] Actually pass parameterized docker options to the templates
  • ARROW-12751 - [C++] Implement minimum/maximum kernels
  • ARROW-12758 - [R] Add examples to more function documentation
  • ARROW-12760 - [C++][Python][R] Allow setting I/O thread pool size
  • ARROW-12761 - [R] Better error handling for write_to_raw
  • ARROW-12764 - [CI] Support wildcard expansion when uploading crossbow artifacts
  • ARROW-12777 - [R] Convert all inputs to Arrow objects in match_arrow and is_in
  • ARROW-12781 - [R] Implement is.type() functions for dplyr
  • ARROW-12785 - [CI] the r-devdocs build errors when brew installing gcc
  • ARROW-12791 - [R] Better error handling for DatasetFactory$Finish() when no format specified
  • ARROW-12796 - [JS] Support JSON output from benchmarks
  • ARROW-12800 - [JS] Remove text encoder and decoder polyfills
  • ARROW-12801 - [CI][Packaging][Java] Include all modules in script that generate Arrow jars
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-12808 - [JS] Document browser support
  • ARROW-12810 - [Python] Stop AWS SDK from looking for metadata service
  • ARROW-12812 - [Packaging][Java] Improve JNI jars build
  • ARROW-12824 - [R][CI] Upgrade builds for R 4.1 release
  • ARROW-12827 - [C++] Improve error message for dataset discovery failure
  • ARROW-12829 - [GLib][Ruby] Add support for Apache Arrow Flight
  • ARROW-12831 - [CI][macOS] Remove needless Homebrew workaround
  • ARROW-12832 - [JS] Write benchmarks in TypeScript
  • ARROW-12833 - [JS] Construct perf data in JS
  • ARROW-12835 - [C++][Python][R] Implement case-insensitive match using RE2
  • ARROW-12836 - [C++] Add support for newer IBM i
  • ARROW-12841 - [R] Add examples to more function documentation - part 2
  • ARROW-12843 - [C++][R] Implement is_inf kernel
  • ARROW-12848 - [Release] Fix URLs in vote mail template
  • ARROW-12851 - [Go][Parquet] Add Golang Parquet encoding package
  • ARROW-12856 - [C++][Gandiva] Implement castBIT and castBOOLEAN functions
  • ARROW-12859 - [C++] Add ScalarFromJSON for testing
  • ARROW-12861 - [C++][Compute] Add sign function kernels
  • ARROW-12867 - [R] Bindings for abs()
  • ARROW-12868 - [R] Bindings for find_substring and find_substring_regex
  • ARROW-12869 - [R] Bindings for utf8_reverse and ascii_reverse
  • ARROW-12870 - [R] Bindings for stringr::str_like
  • ARROW-12875 - [JS] Upgrade Jest and other minor updates
  • ARROW-12883 - [R][CI] version compatibility fails on R 4.1
  • ARROW-12891 - [C++] Move subtree pruning to compute
  • ARROW-12894 - [R] Bump R version
  • ARROW-12895 - [CI] Use “concurrency” setting on Github Actions to cancel stale jobs
  • ARROW-12898 - [Release][C#] Fix package upload
  • ARROW-12900 - [Python][Doc] Add missing numpy import
  • ARROW-12901 - [R] Follow on to more examples
  • ARROW-12909 - [R][Release] Build of ubuntu-docs is failing
  • ARROW-12912 - [Website] Use .asf.yaml for publishing
  • ARROW-12915 - [Release] Build of ubuntu-docs is failing on thrift
  • ARROW-12936 - [C++][Gandiva] Implement ASCII Hive function on Gandiva
  • ARROW-12937 - [C++][Python] Allow setting default metadata for new S3 files
  • ARROW-12939 - [R] Simplify RTask stop handling
  • ARROW-12940 - [R] Expose C interface as R6 methods
  • ARROW-12948 - [C++][Python] Add slice_replace kernel
  • ARROW-12949 - [C++] Add starts_with and ends_with
  • ARROW-12950 - [C++] Add count_substring kernel
  • ARROW-12951 - [C++] Reduce generated code size for string kernels
  • ARROW-12952 - [C++] Add count_substring_regex
  • ARROW-12955 - [C++] Add additional type support for if_else kernel
  • ARROW-12957 - [R] rchk issues on cran
  • ARROW-12961 - [Python] Fix MSVC warning building PyArrow
  • ARROW-12962 - [GLib][Ruby] Add Arrow::Scalar
  • ARROW-12964 - [R] Add bindings for ifelse() and if_else()
  • ARROW-12966 - [Python] Expose element_wise_min/max and options in Python
  • ARROW-12967 - [R] Add bindings for pmin() and pmax()
  • ARROW-12968 - [R][CI] Add an rchk job to our nightlies
  • ARROW-12972 - [CI] Fix centos-8 cmake error
  • ARROW-12975 - [C++][Python] if_else kernel doesn't support upcasting
  • ARROW-12982 - [C++] Re-enable unused-variable warning
  • ARROW-12984 - [C++][Compute] Passing options parameter of Count/Index aggregation by reference
  • ARROW-12985 - [Python][Packaging] Unable to install pygit2 in the arm64 wheel builds
  • ARROW-12986 - [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva
  • ARROW-12992 - [R] bindings for substr(), substring(), str_sub()
  • ARROW-12994 - [R] Fix tests that assume UTC local tz
  • ARROW-12996 - Add bytes_read() to StreamingReader
  • ARROW-13002 - [C++] Add a check for the utf8proc's version in CMake
  • ARROW-13005 - [C++] Add support for take implementation on dense union type
  • ARROW-13006 - [C++][Gandiva] Implement BASE64 and UNBASE64 Hive functions on Gandiva
  • ARROW-13009 - [Doc][Dev] Document builds mailing-list
  • ARROW-13022 - [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions
  • ARROW-13025 - [C++][Python] Add FunctionOptions::Equals/ToString/Serialize
  • ARROW-13027 - [C++] Fix ASAN stack traces in CI
  • ARROW-13030 - [CI][Go] Setup Arm64 golang CI
  • ARROW-13031 - [JS] Support arm in closure compiler on macOS
  • ARROW-13032 - [Java] Update guava version
  • ARROW-13034 - [Python][Docs] Update the cloud examples on the Parquet doc page
  • ARROW-13036 - [Doc] Mention recommended file extension(s) for Arrow IPC
  • ARROW-13042 - [C++] Check that kernel output is fully initialized
  • ARROW-13043 - [GLib][Ruby] Add GArrowEqualOptions
  • ARROW-13044 - [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector
  • ARROW-13045 - [Packaging][RPM][deb] Don‘t install system utf8proc if it’s old
  • ARROW-13047 - [Website] Add kiszk to committer list
  • ARROW-13049 - [C++][Gandiva] Implement BIN Hive function on Gandiva
  • ARROW-13050 - [C++][Gandiva] Implement SPACE Hive function on Gandiva
  • ARROW-13054 - [C++] Add option to specify the first day of the week for the “day_of_week” temporal kernel
  • ARROW-13064 - [C++] Implement select (‘case when’) function for fixed-width types
  • ARROW-13065 - [Packaging][RPM] Add missing required LZ4 version information
  • ARROW-13068 - [GLib][Dataset] Change prefix to gdataset_ from gad_
  • ARROW-13070 - [R] bindings for sd and var
  • ARROW-13072 - [C++] Add bit-wise arithmetic kernels
  • ARROW-13074 - [Python] Deprecate ParquetDataset custom properties (eg pieces, partitions)
  • ARROW-13075 - [Python] Expose C data interface API for pyarrow.Field
  • ARROW-13076 - [Java] Allow ExtensionTypeVector with Struct or Union vector storage
  • ARROW-13082 - [CI] Forward R argument to ubuntu-docs build
  • ARROW-13086 - [Python] De-duplicate time unit conversion code
  • ARROW-13086 - [Python] Expose Parquet ArrowReaderProperties::coerce_int96_timestamp_unit_
  • ARROW-13091 - [Python] Add compression_level argument to IpcWriteOptions constructor
  • ARROW-13092 - [C++] Return an error in CreateDir if target is a file
  • ARROW-13095 - [C++] Implement trig compute functions
  • ARROW-13096 - [C++] Implement logarithm compute functions
  • ARROW-13097 - [C++] Provide simple reflection utility
  • ARROW-13098 - [Dev][Archery] Reorganize docker submodule to its own subpackage
  • ARROW-13100 - [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
  • ARROW-13101 - [Python][Doc] pyarrow.FixedSizeListArray does not appear in the documentation
  • ARROW-13110 - [C++] Deadlock can happen when using BackgroundGenerator without transferring callbacks
  • ARROW-13113 - [R] use RTasks to manage parallel in converting arrow to R
  • ARROW-13117 - [R] Retain schema in new Expressions
  • ARROW-13119 - [R] Set empty schema in scalar Expressions
  • ARROW-13124 - [Ruby] Add support for memory view
  • ARROW-13127 - [R] Valgrind nightly errors
  • ARROW-13136 - [C++] Add coalesce function
  • ARROW-13137 - [C++][Documentation] Make in-table references consistent
  • ARROW-13140 - [C++/Python] Upgrade libthrift pin in the nightlies
  • ARROW-13142 - [Python] Use vector append when converting from list of non-strided numpy arrays
  • ARROW-13147 - [Java] Respect the rounding policy when allocating vector buffers
  • ARROW-13157 - [C++][Python] Add find_substring_regex kernel and implement ignore_case for find_substring
  • ARROW-13158 - [Python] Fix StructScalar contains and repr with duplicate field names
  • ARROW-13162 - [C++][Gandiva] Add new alias for extract date functions in registry
  • ARROW-13171 - [R] Add binding for str_pad()
  • ARROW-13190 - [C++][Gandiva] Change behavior of INITCAP function
  • ARROW-13194 - [Java][Document] Create prose document about Java algorithms
  • ARROW-13195 - [R] Problem with rlang reverse dependency checks
  • ARROW-13199 - [R] add ubuntu 21.04 to nightly builds
  • ARROW-13200 - [R] Add binding for case_when()
  • ARROW-13201 - [R] Add binding for coalesce()
  • ARROW-13210 - [Python][CI] Fix vcpkg caching mechanism for the macOS wheels
  • ARROW-13211 - [C++][CI] Remove outdated Github Actions ARM builds
  • ARROW-13212 - [Release] Support deploying to test PyPI in the python post release script
  • ARROW-13215 - [R][CI] Add ENV TZ to docker files
  • ARROW-13218 - [Doc] Document/clarify conventions for timestamp storage
  • ARROW-13219 - [C++][GLib] Demote/deprecate CompareOptions
  • ARROW-13224 - [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
  • ARROW-13226 - [Python] Add a general purpose cython trampolining utility
  • ARROW-13228 - [C++] S3 CreateBucket fails because AWS treats us-east-1 differently than other regions
  • ARROW-13230 - [Docs][Python] Add CSV writer docs
  • ARROW-13234 - [C++] Put extra padding spaces on the right
  • ARROW-13235 - [C++][Python] Simplify mapping of function options
  • ARROW-13236 - [Python] Include options class name in repr
  • ARROW-13238 - [C++][Compute][Dataset] Use an ExecPlan for dataset scans
  • ARROW-13242 - [C++] Improve random generation of decimal arrays
  • ARROW-13244 - [C++] Add facility to get current thread id as uint64
  • ARROW-13258 - [Python] Improve the repr of ParquetFileFragment
  • ARROW-13262 - [R] transmute() fails after pulling data into R
  • ARROW-13273 - [C++] Don't use .pc only in CMake paths for Requires.private
  • ARROW-13274 - [JS] Remove Webpack
  • ARROW-13275 - [JS] : Fix perf tests
  • ARROW-13276 - [GLib][Ruby][Flight] Add support for ListFlights
  • ARROW-13277 - [JS] Add declaration maps for TypeScript and refactor testing infrastructure
  • ARROW-13280 - [R] Bindings for log and trig functions
  • ARROW-13282 - [C++] Remove obsolete generated files
  • ARROW-13283 - [Archery][Dev] Support passing CPU/memory limits to Docker
  • ARROW-13286 - [CI] Require docker-compose 1.27.0 or later
  • ARROW-13289 - [C++] Accept integer args in trig/log functions via promotion to double
  • ARROW-13291 - [GLib][CI] Require gobject-introspection 3.4.5 or later
  • ARROW-13296 - [C++] Provide a reflection compatible enum replacement
  • ARROW-13299 - [JS] Upgrade ix and rxjs
  • ARROW-13303 - [JS] Revise bundles
  • ARROW-13306 - [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
  • ARROW-13313 - [C++][Compute] Add scalar aggregate node
  • ARROW-13320 - [Website] Add MIME types to FAQ
  • ARROW-13323 - [Archery] Validate docker compose configuration
  • ARROW-13343 - [R] Update NEWS.md for 5.0
  • ARROW-13346 - [C++] Remove compile time parsing from EnumType
  • ARROW-13355 - [R] ensure that sf is installed in our revdep job
  • ARROW-13357 - [R] bindings for sign()
  • ARROW-13365 - [R] bindings for floor/ceiling/truncate
  • ARROW-13385 - [C++] Demonstrate registering compute functions out-of-tree
  • ARROW-13386 - [R][C++] CSV streaming changes break Rtools 35 32-bit build
  • ARROW-13418 - [R] typo in python.r
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • PARQUET-1798 - [C++] Review logic around automatic assignment of field_id's
  • PARQUET-1998 - [C++] Implement LZ4_RAW compression
  • PARQUET-2056 - [C++] Add ability for retrieving dictionary and indices separately for ColumnReader

Apache Arrow 4.0.1 (2021-05-26)

Bug Fixes

  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12603 - [C++][Dataset] Backport fix for specifying CSV column types (#10344)
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12769 - [Python] Fix slicing array with “negative” length (start > stop)
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12855 - error: no member named ‘TableReader’ in namespace during compilation

New Features and Improvements

  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-13533 - Buy Yellow Xanax Bars R039 | Buy Yellow Xanax Bars 2mg Online With Creditcard

Apache Arrow 4.0.0 (2021-04-26)

Bug Fixes

  • ARROW-4784 - [C++][CI] Re-enable flaky mingw tests.
  • ARROW-6818 - [DOC] Remove reference to Apache Drill design docs
  • ARROW-7288 - [C++][Parquet] Don't use regular expression to parse application version
  • ARROW-7830 - [C++][Parquet] Use Arrow version number for parquet
  • ARROW-9451 - [Python] Refuse implicit cast of str to unsigned integer
  • ARROW-9634 - [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow
  • ARROW-9878 - [Python] Document caveats of to_pandas(self_destruct=True)
  • ARROW-10038 - [C++] Spawn thread pool threads lazily
  • ARROW-10056 - [C++] Increase flatbuffers max_tables parameter in order to read wide tables
  • ARROW-10364 - [Dev][Archery] Add support for semver 2.13.0
  • ARROW-10370 - [Python] Clean-up filesystem handling in write_dataset
  • ARROW-10403 - [C++] Implement unique kernel for non-uniform chunked dictionary arrays
  • ARROW-10405 - [C++] IsIn kernel should be able to lookup dictionary in string
  • ARROW-10457 - [CI] Fix Spark integration tests with branch-3.0
  • ARROW-10489 - [C++] Add Intel C++ compiler options for different warning levels
  • ARROW-10514 - [C++][Parquet] Make the column name the same for both output formats of parquet reader
  • ARROW-10953 - [R] Validate when creating Table with schema
  • ARROW-11066 - [FlightRPC][Java] Make zero-copy writes a configurable option
  • ARROW-11066 - [FlightRPC][Java] Revert “fix zero-copy optimization”
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11066 - Revert "ARROW-11066: [Java][FlightRPC] fix zero-copy opt…
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11134 - [CI][C++] Always run tests on Travis-CI
  • ARROW-11147 - [CI][Python] Remove pandas=0.25.3 pin for dask-latest
  • ARROW-11180 - [Developer] cmake-format pre-commit hook doesn't run
  • ARROW-11192 - [Documentation] Describe opening Visual Studio so it inherits a working env
  • ARROW-11223 - [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflow
  • ARROW-11235 - [Python] Fix test failure inside non-default S3 region
  • ARROW-11239 - [Rust] Fixed equality with offsets and nulls
  • ARROW-11269 - [Rust][Parquet] Preserve timezone in int96 reader
  • ARROW-11277 - [C++] Workaround macOS 10.11: don't default construct consts
  • ARROW-11299 - [Python] Fix invalid-offsetof warnings
  • ARROW-11303 - [Release][C++] Enable mimalloc in the windows verification script
  • ARROW-11305 - Skip first argument (which is the program name) in parquet-rowcount binary
  • ARROW-11311 - [Rust] Fixed unset_bit
  • ARROW-11313 - [Rust] Fixed size_hint
  • ARROW-11315 - [Packaging][APT][arm64] Add missing gir1.2 files
  • ARROW-11320 - [C++] Try to strengthen temporary dir creation
  • ARROW-11322 - [Rust] Re-opening memory module as public
  • ARROW-11323 - [Rust][DataFusion] Allow sort queries to return no results
  • ARROW-11328 - [R] Collecting zero columns from a dataset returns entire dataset
  • ARROW-11334 - [Python][CI] Fix failing pandas nightly tests
  • ARROW-11337 - [C++] Compilation error with ThreadSanitizer
  • ARROW-11357 - [Rust] : Fix out-of-bounds reads in take and other undefined behavior
  • ARROW-11376 - [C++] ThreadedTaskGroup failure with Thread Sanitizer enabled
  • ARROW-11379 - [C++][Dataset] Better formatting for timestamp scalars
  • ARROW-11387 - [Rust] fix build for conditional compilation of features ‘simd + avx512’
  • ARROW-11391 - [C++] Allow writing more than 2 GB to HDFS
  • ARROW-11394 - [Rust] Tests for Slice & Concat
  • ARROW-11400 - [Python] Ensure pickling Dataset with dictionary partitions works
  • ARROW-11403 - [Developer] archery benchmark list: unexpected keyword ‘benchmark_filter’
  • ARROW-11412 - [Python][Dataset] Disallow logical operators for Expression
  • ARROW-11412 - [Python] Improve Expression docs
  • ARROW-11427 - [C++] On Windows, only use AVX512 when enabled by the OS
  • ARROW-11448 - [C++] Fix tdigest build failure on Windows with Visual Studio
  • ARROW-11451 - [C++] Fix gcc-4.8 build errors
  • ARROW-11452 - [Rust] Fix issue with Parquet Arrow reader not following type path
  • ARROW-11461 - [Go][Flight] Some cleanup for flight, Fix Schema bytes
  • ARROW-11464 - [Python] Fix parquet.read_pandas to support all keywords of read_table
  • ARROW-11470 - [C++] Detect overflow on computation of tensor strides
  • ARROW-11472 - [Python][CI] Remove temporary pin of numpy in kartothek integration build
  • ARROW-11472 - [Python][CI] Temporary pin numpy on kartothek integration builds
  • ARROW-11480 - [Python] Test filtering on INT96 timestamps
  • ARROW-11483 - [C++] Write integration JSON files compatible with the Java reader
  • ARROW-11488 - [Rust] Don't leak memory in StructBuilder
  • ARROW-11490 - [C++] BM_ArrowBinaryDict/EncodeLowLevel is not deterministic
  • ARROW-11494 - [Rust] fix take bench
  • ARROW-11497 - [Python] Provide parquet enable compliant nested type flag for python binding
  • ARROW-11538 - [Python] Segfault reading Parquet dataset with Timestamp filter
  • ARROW-11547 - [Packaging][Conda][Drone] Fix undefined variable error
  • ARROW-11548 - [C++] Fix RandomArrayGenerator::List
  • ARROW-11551 - [C++][Gandiva] Fix castTimestamp(utf8) function
  • ARROW-11560 - [C++][FlightRPC] fix mutex error on SIGINT
  • ARROW-11567 - [C++][Compute] Improve variance kernel precision
  • ARROW-11577 - [Rust] Fix Array transform on strings
  • ARROW-11582 - [R] write_dataset ‘format’ argument default and validation could be better
  • ARROW-11586 - [Rust][Datafusion] Remove force unwrap
  • ARROW-11595 - [C++][NIGHTLY:test-conda-cpp-valgrind] Avoid branching on potentially indeterminate values in GenerateBitsUnrolled
  • ARROW-11596 - [Python][Dataset] make ScanTask.execute() eager
  • ARROW-11603 - [Rust] Fix Clippy Lints for Rust 1.50
  • ARROW-11607 - [C++][Parquet] Update values_capacity_ when resetting.
  • ARROW-11614 - Fix round() logic to return positive zero when argument is zero
  • ARROW-11617 - [C++][Gandiva] Fix nested if-else optimisation in gandiva
  • ARROW-11620 - [Rust][DataFusion] Consistently use Arc rather than Box and Arc
  • ARROW-11630 - [Rust] Introduce limit option for sort kernel
  • ARROW-11632 - [Rust] Make csv::Reader propagate schema metadata to generated RecordBatches
  • ARROW-11639 - [C++][Gandiva] Fix signbit compilation issue in Ubuntu nightly build
  • ARROW-11642 - [C++] Fix preprocessor directive for Windows in JVM detection
  • ARROW-11657 - [R] group_by with .drop specified errors
  • ARROW-11658 - [R] Handle mutate/rename inside group_by
  • ARROW-11663 - [Rust][DataFusion] Fixed error.
  • ARROW-11668 - [C++] Sporadic UBSAN error in FutureStessTest.TryAddCallback
  • ARROW-11672 - [R] Fix string function test failure on R 3.3
  • ARROW-11681 - [Rust] Don't unwrap in IPC writers
  • ARROW-11686 - [C++] Call ArrowLog::InstallFailureSignalHandler to show stack trace
  • ARROW-11687 - [Rust][DataFusion] RepartitionExec Hanging
  • ARROW-11694 - [C++] Fix Take() with no validity bitmap but unknown null count
  • ARROW-11695 - [C++][FlightRPC] fix option to disable TLS verification
  • ARROW-11717 - [Integration] Fix intermittent flight integration failures with rust
  • ARROW-11718 - [Rust] Don't write IPC footers on drop
  • ARROW-11741 - [C++] Fix decimal casts on big endian platforms
  • ARROW-11743 - [R] Use pkgdown's new found ability to autolink Jiras
  • ARROW-11746 - [Developer][Archery] Fix prefer real time check
  • ARROW-11756 - [R] passing a partition as a schema leads to segfaults
  • ARROW-11758 - [C++][Compute] Improve summation kernel percision
  • ARROW-11767 - [C++] Scalar::Hash may segfault
  • ARROW-11771 - [Developer][Archery] Move benchmark tests (so CI runs them)
  • ARROW-11781 - [Python] Reading small amount of files from a partitioned dataset is unexpectedly slow
  • ARROW-11784 - [Rust][DataFusion] CoalesceBatchesStream doesn't honor Stream interface
  • ARROW-11785 - [R] Fallback when filtering Table with unsupported expression fails
  • ARROW-11786 - [C++] Remove noisy CMake message
  • ARROW-11788 - [Java] Fix appending empty delta vectors
  • ARROW-11791 - [Rust][DataFusion] Fix RepartitionExec Blocking
  • ARROW-11802 - [Rust][DataFusion] Remove use of crossbeam channels to avoid potential deadlocks
  • ARROW-11819 - [Rust] Add link to the doc
  • ARROW-11821 - [Rust] Edit Rust README
  • ARROW-11830 - [C++] Don't re-detect gRPC every time
  • ARROW-11832 - [R] Handle conversion of extra nested struct column
  • ARROW-11836 - [C++] Avoid requiring arrow_bundled_dependencies when it doesn't exist for arrow_static.
  • ARROW-11845 - [Rust] Implement to_isize() for ArrowNativeTypes
  • ARROW-11850 - [GLib] Add GARROW_VERSION_0_16
  • ARROW-11855 - [C++][Python] Memory leak in to_pandas when converting chunked struct array
  • ARROW-11857 - [Python] Resource temporarily unavailable when using the new Dataset API with Pandas
  • ARROW-11860 - [Rust][DataFusion] Add DataFusion logos
  • ARROW-11866 - [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
  • ARROW-11872 - [C++] Fix Array validation when Array contains non-CPU buffers
  • ARROW-11880 - [R] Handle empty or NULL transmute() args properly
  • ARROW-11881 - [Rust][DataFusion] Fix clippy lint
  • ARROW-11896 - [Rust] Disable Debug symbols on CI test builds
  • ARROW-11904 - [C++] Try to fix crash on test tear down
  • ARROW-11905 - [C++] Fix SIMD detection on macOS
  • ARROW-11914 - [R][CI] r-sanitizer nightly is broken
  • ARROW-11918 - [R][Documentation] Docs cleanups
  • ARROW-11923 - [CI] Update branch name for dask dev integration tests
  • ARROW-11937 - [C++] Fix GZip codec hanging if flushed twice
  • ARROW-11941 - [Dev] Don't update Jira if run “DEBUG=1 merge_arrow_pr.py”
  • ARROW-11942 - [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads
  • ARROW-11945 - [R] filter doesn't accept negative numbers as valid
  • ARROW-11956 - [C++] Fix system re2 dependency detection for static library
  • ARROW-11965 - [R][Docs] Simplify install.packages command in R dev docs
  • ARROW-11970 - [C++][CI] Fix Valgrind error in arrow-csv-test
  • ARROW-11971 - [Packaging] Vcpkg patch doesn't apply on windows due to line endings
  • ARROW-11975 - [CI][GLib] Remove needless libgccjit
  • ARROW-11976 - [C++] Fix sporadic TSAN error with GatingTask
  • ARROW-11983 - [Python] Avoid ImportError calling from_pandas in threaded code
  • ARROW-11997 - [Python] concat_tables crashes python interpreter
  • ARROW-12003 - [R] Fix NOTE re undefined global function group_by_drop_default
  • ARROW-12006 - [Java] Fix checkstyle config to work on Windows
  • ARROW-12012 - [Java][JDBC] Fix BinaryConsumer reallocation
  • ARROW-12013 - [C++][FlightRPC] Fix bundled gRPC version probing
  • ARROW-12015 - [Rust][DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
  • ARROW-12028 - ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates
  • ARROW-12029 - [R] Remove args from FeatherReader$create v2
  • ARROW-12033 - [Minor][Docs] Fix link in developers/benchmarks.html
  • ARROW-12041 - [C++][Python] Fix type property of tensor and sparse tensor IPC messages
  • ARROW-12051 - [GLib] Keep input stream reference of GArrowCSVReader
  • ARROW-12057 - [Python] Remove direct usage of pandas' Block subclasses (partly)
  • ARROW-12065 - [C++][Python] Fix segfault reading JSON file
  • ARROW-12067 - [Python][Doc] Document pyarrow_(un)wrap_scalar
  • ARROW-12073 - [R] Fix R CMD check NOTE about ‘X_____X’
  • ARROW-12076 - [Rust] Fix build
  • ARROW-12077 - [C++] Fix out-of-bounds write in ListArray::FromArrays
  • ARROW-12086 - [C++] Fix environment variables for bzip2, utf8proc URLs
  • ARROW-12088 - [Python] Fix compiler warning about offsetof
  • ARROW-12089 - [Doc] Fix Sphinx warnings
  • ARROW-12100 - [C++][IPC] Allow null children field when num children is 0
  • ARROW-12103 - [C++] Correctly handle unaligned access in bit-unpacking code
  • ARROW-12112 - [CI] Reduce footprint of conda-integration image
  • ARROW-12112 - [Rust] Create and store less debug information in CI and integration tests
  • ARROW-12113 - [R] Fix rlang deprecation warning from check_select_helpers()
  • ARROW-12130 - [C++] Don't enable Neon if -DARROW_SIMD_LEVEL=NONE
  • ARROW-12138 - [Go][IPC] Update flatbuffers definitions
  • ARROW-12140 - [C++][CI] Fix Valgrind failures in Grouper tests
  • ARROW-12145 - [Developer][Archery] Flaky: test_static_runner_from_json
  • ARROW-12149 - [Dev] Archery benchmark test case is failing
  • ARROW-12154 - [C++][Gandiva] Fix gandiva crash in certain OS/CPU combinations
  • ARROW-12155 - [R] Require Table columns to be same length
  • ARROW-12161 - [C++][Dataset] Revert async CSV reader in datasets
  • ARROW-12161 - [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets
  • ARROW-12169 - [C++] Fix decompressing file with empty stream at the end
  • ARROW-12171 - [Rust] clean up clippy lints
  • ARROW-12172 - [Python][Packaging] Pass python version as setuptools pretend version in the macOS wheel builds
  • ARROW-12178 - [CI] Update setuptools in the ubuntu images
  • ARROW-12186 - [Rust][DataFusion] Fix regexp_match test
  • ARROW-12209 - [JS] Copy all src files into the the TypeScript package
  • ARROW-12220 - [C++][CI] Thread sanitizer failure
  • ARROW-12226 - [C++] Fix Address Sanitizer failures
  • ARROW-12227 - [R] Fix RE2 and median nightly build failures
  • ARROW-12235 - [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions
  • ARROW-12241 - [Python] Make CSV cancellation test more robust
  • ARROW-12250 - [Rust][Parquet] Fix failing arrow_writer test
  • ARROW-12254 - [Rust][DataFusion] Stop polling limit input once limit is reached
  • ARROW-12258 - [R] Never do as.data.frame() on collect(as_data_frame = FALSE)
  • ARROW-12262 - [Doc] Enable S3 and Flight in docs build
  • ARROW-12267 - [Rust] Implement support for timestamps in JSON writer
  • ARROW-12273 - [JS][Rust] Remove coveralls
  • ARROW-12279 - [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)
  • ARROW-12294 - [Rust] Fix boolean kleene kernels with no remainder
  • ARROW-12299 - [Python] Recognize new filesytems in pq.write_to_dataset
  • ARROW-12300 - [C++] Remove linking of cuda runtime library
  • ARROW-12313 - [Rust][Ballista] Update benchmark docs for Ballista
  • ARROW-12314 - [Python] Accept columns as set in parquet read_pandas
  • ARROW-12327 - [Dev] Use pull request's head remote when submitting crossbow jobs via the comment bot
  • ARROW-12330 - [Developer] Restore values at counters column of Archery benchmark
  • ARROW-12334 - [Rust][Ballista] Aggregate queries producing incorrect results
  • ARROW-12342 - [Packaging] Fix tabulation in crossbow templates for submitting nightly builds
  • ARROW-12357 - [Archery] Bump Jinja2 version requirement
  • ARROW-12379 - [C++] Fix ThreadSanitizer failure in SerialExecutor
  • ARROW-12382 - [C++] Bundle xsimd if runtime SIMD level is set
  • ARROW-12385 - [R][CI] fix cran picking in CI
  • ARROW-12390 - [Rust] Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice
  • ARROW-12401 - [R] Fix guard around dataset___Scanner__TakeRows
  • ARROW-12405 - [Packaging] Fix apt artifact patterns and artifact uploading from travis
  • ARROW-12408 - [R] Delete Scan()
  • ARROW-12421 - [Rust][DataFusion] Fix topkexec failure
  • ARROW-12421 - [Rust][DataFusion] Disable repartition rule
  • ARROW-12429 - [C++] Fix incorrectly registered test
  • ARROW-12433 - [Rust] Update nightly rust version
  • ARROW-12437 - [Rust][Ballista] Create DataFusion context without repartition
  • ARROW-12440 - [Release][Packaging] Various packaging, release script and release verification script fixes
  • ARROW-12466 - [Python] Avoid AttributeError crash when comparing with None
  • ARROW-12475 - [C++] Fix ‘warn_unused_result’ warning
  • ARROW-12487 - [C++][Dataset] Fix ScanBatches() hanging
  • ARROW-12495 - [C++] Fix NumPyBuffer::mutable_data()
  • ARROW-12794 - C++/R: read_parquet halts process when accessed multiple times
  • PARQUET-1655 - [C++] Fix comparison of Decimal values in statistics
  • PARQUET-2008 - [C++] Fix information written in RowGroup::total_byte_size

New Features and Improvements

  • ARROW-951 - [JS] Upgrade to typedoc 0.20.19
  • ARROW-2229 - [C++][Python] Add WriteCsv functionality.
  • ARROW-3690 - [Rust] Add Rust to the format integration testing
  • ARROW-6103 - [Release][Java] Remove mvn release plugin
  • ARROW-6248 - [Python][C++] Raise better exception on HDFS file open error
  • ARROW-6455 - [C++] Implement ExtensionType for non-UTF8 Unicode data
  • ARROW-6604 - [C++] Add support for nested types to MakeArrayFromScalar
  • ARROW-7215 - [C++][Gandiva] Implement castVARCHAR(numeric_type) functions
  • ARROW-7364 - [Rust][DataFusion] Add cast options to cast kernel and TRY_CAST to DataFusion
  • ARROW-7633 - [C++][CI] Create fuzz targets for tensors and sparse tensors
  • ARROW-7808 - [Java][Dataset] Implement Dataset Java API by JNI to C++
  • ARROW-7906 - [C++][Python] Add ORC write support
  • ARROW-8049 - [C++] Bump thrift to 0.13 and require cmake 3.10 for it
  • ARROW-8282 - [C++/Python][Dataset] Support schema evolution for integer columns
  • ARROW-8284 - [C++][Dataset] Schema evolution for timestamp columns
  • ARROW-8630 - [C++][Dataset] Pass schema including all materialized fields to catch CSV edge cases
  • ARROW-8631 - [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python
  • ARROW-8658 - [C++][Dataset] Implement subtree pruning for FileSystemDataset
  • ARROW-8672 - [Java] Implement RecordBatch IPC buffer compression from ARROW-300
  • ARROW-8732 - [C++] Add basic cancellation API
  • ARROW-8771 - [C++] Add boost/process library to build support
  • ARROW-8796 - [Rust] Allow parquet to be written directly to memory
  • ARROW-8797 - [C++] Read RecordBatch in a different endian
  • ARROW-8900 - [C++][Python] Expose Proxy Options as parameters for S3FileSystem
  • ARROW-8919 - [C++][Compute][Dataset] Add Function::DispatchBest to accomodate implicit casts
  • ARROW-9128 - [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
  • ARROW-9149 - [C++] Improve configurability of RandomArrayGenerator::ArrayOf
  • ARROW-9196 - [C++][Compute] All casts accept scalar and sliced inputs
  • ARROW-9318 - [C++] Parquet encryption key management
  • ARROW-9731 - [C++][Python][R][Dataset] Implement Scanner::Head
  • ARROW-9749 - [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions
  • ARROW-9777 - [Rust] Implement IPC changes to catch up to 1.0.0 format
  • ARROW-9856 - [R] Add bindings for string compute functions
  • ARROW-10014 - [C++] TaskGroup::Finish should execute tasks
  • ARROW-10089 - [R] inject base class for Array, ChunkedArray and Scalar
  • ARROW-10183 - [C++] Apply composable futures to CSV
  • ARROW-10195 - [C++] Add string struct extract kernel using re2
  • ARROW-10250 - [C++][FlightRPC] Consistently use FlightClientOptions::Defaults
  • ARROW-10255 - [JS] Reorganize exports for ESM tree-shaking
  • ARROW-10297 - [Rust] Parameter for parquet-read to output data in json format, add “cli” feature to parquet crate
  • ARROW-10299 - [Rust] Use IPC Metadata V5 as default
  • ARROW-10305 - [R] Filter with regular expressions
  • ARROW-10306 - [C++] Add string replacement kernel
  • ARROW-10349 - [Python] Build and publish aarch64 wheels
  • ARROW-10354 - [Rust][DataFusion] regexp_extract function to select regex groups from strings
  • ARROW-10360 - [CI] Bump Github Actions cache version
  • ARROW-10372 - [Dataset][C++][Python][R] Support reading compressed CSV
  • ARROW-10406 - [C++] Unify dictionaries when writing IPC file in a single shot
  • ARROW-10420 - [C++] Refactor io and filesystem APIs to take an IOContext
  • ARROW-10421 - [R] Use gc_memory_pool in more places
  • ARROW-10438 - [C++][Dataset] Partitioning::Format on nulls
  • ARROW-10520 - [C++][R] Implement add/remove/replace for RecordBatch
  • ARROW-10570 - [R] Use Converter API to convert SEXP to Array/ChunkedArray
  • ARROW-10580 - [C++] Disallow non-monotonic dense union offsets
  • ARROW-10606 - [C++] Implement Decimal256 casts
  • ARROW-10655 - [C++] Add cache and memoization facility
  • ARROW-10734 - [R] Build and test on Solaris
  • ARROW-10735 - [R] Remove arrow-without-arrow wrapping
  • ARROW-10766 - [Rust][Parquet] Compute nested list definitions
  • ARROW-10816 - [Rust][DataFusion] Initial support for Interval expressions
  • ARROW-10831 - [C++][Compute] Implement quantile kernel
  • ARROW-10846 - [C++] Add async filesystem operations
  • ARROW-10880 - [Java] Support compressing RecordBatch IPC buffers by LZ4
  • ARROW-10882 - [Python] Allow writing dataset from iterator of batches
  • ARROW-10895 - [C++][Gandiva] Implement bool to varchar cast function in Gandiva
  • ARROW-10903 - [Rust] Implement FromIter<Option<Vec>> constructor for FixedSizeBinaryArray
  • ARROW-11022 - [Rust] Upgrade to Tokio 1.0
  • ARROW-11070 - [C++][Compute] Implement power kernel
  • ARROW-11074 - [Rust][DataFusion] Implement predicate push-down for parquet tables
  • ARROW-11081 - [Java] Make IPC option immutable
  • ARROW-11108 - [Rust] Fixed performance issue in mutableBuffer.
  • ARROW-11141 - [Rust] Add basic Miri checks to CI pipeline
  • ARROW-11149 - [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty
  • ARROW-11150 - [Rust] Add Arrow Rust Community section to Rust README
  • ARROW-11154 - [CI][C++] Move homebrew crossbow tests off of Travis-CI
  • ARROW-11156 - [Rust][DataFusion] Create hashes vectorized in hash join
  • ARROW-11174 - [C++][Dataset] Make expressions available to projection
  • ARROW-11179 - [Format] Make FB comments friendly to rust
  • ARROW-11183 - [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
  • ARROW-11191 - [C++] Use FnOnce for TaskGroup's tasks instead of std::function
  • ARROW-11216 - [Rust] add doc example for StringDictionaryBuilder
  • ARROW-11220 - [Rust] Implement GROUP BY support for Boolean
  • ARROW-11222 - [Rust] Catch up with flatbuffers 0.8.1 which had some UB problems fixed
  • ARROW-11246 - [Rust] Add type to Unexpected accumulator state error
  • ARROW-11254 - [Rust][DataFusion] Add SIMD and snmalloc flags as options to benchmarks
  • ARROW-11260 - [C++][Dataset] Don't require dictionaries when specifying explicit partition schema
  • ARROW-11265 - [Rust] Made bool not ArrowNativeType
  • ARROW-11268 - [Rust][DataFusion] MemTable::load output partition support
  • ARROW-11270 - [Rust] Array slice accessors
  • ARROW-11279 - [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
  • ARROW-11284 - [R] Support dplyr verb transmute()
  • ARROW-11289 - [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns
  • ARROW-11290 - [Rust][DataFusion] Address hash aggregate performance issue with high number of groups
  • ARROW-11291 - [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)
  • ARROW-11300 - [Rust][DataFusion] Further performance improvements on hash aggregation with small groups
  • ARROW-11308 - [Rust][Parquet] Support decimal when writing parquet files
  • ARROW-11309 - [Release][C#] Use .NET 3.1 for verification
  • ARROW-11310 - [Rust] implement JSON writer
  • ARROW-11314 - [Release][APT][Yum] Add support for verifying arm64 packages
  • ARROW-11317 - [Rust] Include the prettyprint feature in CI Coverage
  • ARROW-11318 - [Rust] Support pretty printing timestamp, date, and timestamp types
  • ARROW-11319 - [Rust][DataFusion] Improve test comparisons to record batch, remove test::format_batch
  • ARROW-11321 - [Rust][DataFusion] Fix DataFusion compilation error
  • ARROW-11325 - [Packaging][C#] Release Apache.Arrow.Flight and Apache.Arrow.Flight.AspNetCore
  • ARROW-11329 - [Rust] Don't rerun build.rs on every file change
  • ARROW-11330 - [Rust][DataFusion] add ExpressionVisitor to encode expression walking
  • ARROW-11332 - [Rust] Use MutableBuffer in take_string instead of Vec
  • ARROW-11333 - [Rust] Generalized creation of empty arrays.
  • ARROW-11336 - [C++][Doc] Improve Developing on Windows docs
  • ARROW-11338 - [R] Bindings for quantile and median
  • ARROW-11340 - [C++] Add vcpkg.json manifest to cpp project root
  • ARROW-11343 - [Rust][DataFusion] Simplified example with UDF.
  • ARROW-11346 - [C++][Compute] Implement quantile kernel benchmark
  • ARROW-11349 - [Rust] Add from_iter_values to create arrays from (non null) values
  • ARROW-11350 - [C++] Bump dependency versions
  • ARROW-11354 - [Rust] Speed-up cast of dates and times (2-4x)
  • ARROW-11355 - [Rust] Aligned Date DataType with specification.
  • ARROW-11358 - [Rust] Add benchmark for concatenating small arrays
  • ARROW-11360 - [Rust][DataFusion] Improve CSV “No files found” error message
  • ARROW-11361 - [Rust] Build MutableBuffer/Buffer from iterator of bools
  • ARROW-11362 - [Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance
  • ARROW-11365 - [Rust][Parquet] Logical type printer and parser
  • ARROW-11366 - [Datafusion] Implement constant folding for boolean literal expressions
  • ARROW-11367 - [C++] Implement t-digest approximate quantile utility
  • ARROW-11369 - [DataFusion] Split physical_plan/expressions.rs
  • ARROW-11372 - [Release] Support RC verification on macOS-ARM64
  • ARROW-11373 - [Python][Docs] Add example of specifying type for a column when reading csv file
  • ARROW-11374 - [Python] Make legacy pyarrow.filesystem / pyarrow.serialize warnings more visisble (DeprecationWarning -> FutureWarning)
  • ARROW-11375 - [Rust] Fix deprecation warning in clippy
  • ARROW-11377 - [C++][CI] Add Thread Sanitizer nightly build
  • ARROW-11383 - [Rust] Faster bit AND and OR (2x)
  • ARROW-11386 - [Release] Fix post documents update script
  • ARROW-11389 - [Rust] make comments more consistent and fix typos
  • ARROW-11395 - [DataFusion] Support custom optimizers
  • ARROW-11401 - [Rust][DataFusion] Pass slices instead of Vec in DataFrame API
  • ARROW-11404 - [Rust][DataFusion] Upgrade to aHash 0.7 + minor cleanup
  • ARROW-11405 - [DataFusion] Support multiple custom logical nodes
  • ARROW-11406 - [CI][C++] Fix ccache caching on Travis-CI
  • ARROW-11408 - [Rust] Add window support to datafusion readme
  • ARROW-11411 - [Packaging][Linux] Disable arm64 nightly builds
  • ARROW-11414 - [Rust] Reduce copies in Schema::try_merge
  • ARROW-11417 - [Integration] Add integration tests for buffer compression
  • ARROW-11418 - [Doc] Add buffer compression to IPC support matrix
  • ARROW-11421 - [Rust][DataFusion] Support GROUP BY Date32
  • ARROW-11422 - [C#] add decimal support
  • ARROW-11423 - [R] value_counts and some StructArray methods
  • ARROW-11425 - [C++][Compute] Optimize quantile kernel for integers
  • ARROW-11426 - [Rust][DataFusion] EXTRACT support
  • ARROW-11428 - [Rust] Add power_scalar kernel
  • ARROW-11429 - Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11430 - [Rust] zip kernel: combine arrays based on boolean mask
  • ARROW-11431 - [Rust][DataFusion] Support the HAVING clause.
  • ARROW-11435 - [Datafusion] allow creating ParquetPartition from external crate, make combine_filters public
  • ARROW-11436 - [Rust] Improved from_iter for primitive arrays (-20-30% for cast)
  • ARROW-11437 - [Rust] Removed duplicated code in benches
  • ARROW-11438 - [Rust][DataFusion] Support literal boolean values in DataFusion SQL
  • ARROW-11439 - [Rust] Add year support to temporal kernels
  • ARROW-11440 - [Rust][DataFusion] Add method to CsvExec to get CSV schema
  • ARROW-11442 - [Rust] Expose datetime conversion logic independently
  • ARROW-11443 - [Rust] Write datetime information for Date64 Type in csv writer
  • ARROW-11444 - [Rust][DataFusion] Accept slices as parameters
  • ARROW-11446 - [DataFusion] Added support for scalarValue in Builtin functions.
  • ARROW-11447 - [Rust] Add shift kernel for primitive types
  • ARROW-11449 - [CI][R][Windows] Use ccache
  • ARROW-11457 - [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11459 - [Rust] Added API to build ListArray of Primitives from an iterator
  • ARROW-11462 - [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX
  • ARROW-11463 - [Python] Expose “allow_64bit” to IpcWriteOptions in pyarrow.
  • ARROW-11466 - [Go][Flight] adding Basic Auth handling for go flight client and server
  • ARROW-11467 - [R] Fix reference to json_table_reader() in R docs
  • ARROW-11468 - [R] Allow user to pass schema to read_json_arrow()
  • ARROW-11474 - [C++] Update bundled re2 version
  • ARROW-11476 - [Rust][DataFusion] Test running of TPCH benchmarks in CI
  • ARROW-11477 - [R][Doc] Reorganize and improve README and vignette content
  • ARROW-11478 - [R] Consider ways to make arrow.skip_nul option more user-friendly
  • ARROW-11479 - [Rust][Parquet] Add Method to get compressed size of columns from row group metadata
  • ARROW-11481 - [Rust] More cast implementations
  • ARROW-11484 - [Rust][DataFusion] Derive Clone for ExecutionContext
  • ARROW-11486 - [Website] Use Jekyll 4 and webpack to support Ruby 3.0 or later
  • ARROW-11489 - [Rust][DataFusion] Make DataFrame be Send + Sync
  • ARROW-11491 - [Rust] support JSON schema inference for nested list and struct
  • ARROW-11493 - [CI][Packaging][deb][RPM] Test built packages
  • ARROW-11500 - [R] Allow bundled build script to run on Solaris
  • ARROW-11501 - [C++] endianness check does not work on Solaris
  • ARROW-11504 - [Rust] Added checks to List DataType.
  • ARROW-11505 - [Rust] Add support for LargeUtf8 in csv-writer
  • ARROW-11507 - [R] Bindings for GetRuntimeInfo
  • ARROW-11510 - [Python] Add note that pip >= 19.0 is required to get binary packages
  • ARROW-11511 - [Rust] Replace Arc<ArrayData> by ArrayData in all arrays
  • ARROW-11512 - [Packaging][deb] Add missing gRPC dependency for Ubuntu 21.04
  • ARROW-11513 - [R] Bindings for sub/gsub
  • ARROW-11516 - [R] Allow all C++ compute functions to be called by name in dplyr
  • ARROW-11539 - [Developer][Archery] Change items_per_seconds units
  • ARROW-11541 - [C++][Compute] Implement tdigest kernel
  • ARROW-11542 - [Rust] fix validity bitmap buffer length count in json reader
  • ARROW-11544 - [Rust][DataFusion] Implement as_any for AggregateExpr
  • ARROW-11545 - [Rust][DataFusion] SendableRecordBatchStream should implement Sync
  • ARROW-11556 - [C++] Assorted benchmark-related improvements
  • ARROW-11557 - [Rust][Datafusion] Add deregister_table
  • ARROW-11559 - [C++] Add regression file
  • ARROW-11559 - [C++] Use smarter Flatbuffers verification parameters
  • ARROW-11561 - [Rust][DataFusion] Add Send + Sync to MemTable::load
  • ARROW-11563 - [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))
  • ARROW-11568 - [C++][Compute] Rewrite mode kernel
  • ARROW-11570 - [Rust] ScalarValue - support Date64
  • ARROW-11571 - [CI] Cancel stale Github Actions workflow runs
  • ARROW-11572 - [Rust] Add a kernel for division by single scalar
  • ARROW-11573 - [Developer][Archery] Google benchmark now reports run type
  • ARROW-11574 - [Rust][DataFusion] Upgrade sqlparser to support parsing all TPC-H queries
  • ARROW-11575 - [Developer][Archery] Expose execution time in benchmark results
  • ARROW-11576 - [Rust] Fix unused variable in Rust code example
  • ARROW-11580 - [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11589 - [R] Add methods for modifying Schemas
  • ARROW-11590 - [C++] Move CSV background generator to IO thread pool
  • ARROW-11591 - [C++][Compute] Grouped aggregation
  • ARROW-11592 - [Rust] Fix typo in comment
  • ARROW-11594 - [Rust] Support pretty printing of NullArray
  • ARROW-11597 - [Rust] Split file in smaller ones.
  • ARROW-11598 - [Rust] Split buffer.rs in smaller files
  • ARROW-11599 - [Rust] Add function to create array with all nulls
  • ARROW-11601 - [C++][Python][Dataset] expose Parquet pre-buffer option
  • ARROW-11606 - [Rust][DataFusion] Add input schema to HashAggregateExec
  • ARROW-11610 - [C++] Download boost from sourceforge instead of bintray
  • ARROW-11611 - [C++] Update third party dependency mirrors
  • ARROW-11612 - [C++] Rebuild trimmed boost bundle for 1.75.0
  • ARROW-11613 - [R] Move nightly C++ builds off of bintray
  • ARROW-11616 - [Rust][DataFusion] Add collect_partitioned on DataFrame
  • ARROW-11621 - [CI][Gandiva][Linux] Fix Crossbow setup failure
  • ARROW-11626 - [Rust][DataFusion][DataFusion] examples to own project
  • ARROW-11627 - [Rust] Make allocator be a generic over type T
  • ARROW-11637 - [CI][Conda] Update nightly clean target platforms and packages list
  • ARROW-11641 - [CI] Use docker buildkit's inline cache to reuse build cache across different hosts
  • ARROW-11649 - [R] Add support for null_fallback to R
  • ARROW-11651 - [Rust][DataFusion] Implement Postgres String Functions: Length Functions
  • ARROW-11653 - [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex
  • ARROW-11655 - [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
  • ARROW-11656 - [Rust][DataFusion] Remaining Postgres String functions
  • ARROW-11659 - [R] Preserve group_by .drop argument
  • ARROW-11662 - [C++] Support sorting decimal and fixed size binary data
  • ARROW-11664 - [Rust] cast to LargeUtf8
  • ARROW-11665 - [C++][Python] Improve docstrings for decimal and union types
  • ARROW-11666 - [Integration] Add endianness “gold” integration file for decimal256
  • ARROW-11667 - [Rust] Add documentation for utf8 comparison kernels
  • ARROW-11669 - [Rust][DataFusion] Remove concurrency field from GlobalLimitExec and SortExec
  • ARROW-11671 - [Rust][DataFusion] Clean up Expr doc comments and examples
  • ARROW-11677 - [C++][Docs] Add basic C++ datasets documentation
  • ARROW-11680 - [C++] Add vendored version of folly's spsc queue
  • ARROW-11683 - [R] Support dplyr::mutate()
  • ARROW-11685 - [C++] Fix typo: FutureStessTest -> FutureStressTest
  • ARROW-11688 - [Rust] Casts between Utf8 and LargeUtf8
  • ARROW-11690 - [Rust][DataFusion] Avoid expr copies while using builder methods
  • ARROW-11692 - [Rust][DataFusion] Improve OptimizerRule comments
  • ARROW-11693 - [C++] Add string length kernel
  • ARROW-11700 - [R] Internationalize error handling in tidy eval
  • ARROW-11701 - [R] Implement dplyr::relocate()
  • ARROW-11703 - [R] Implement dplyr::arrange()
  • ARROW-11704 - [R] Wire up dplyr::mutate() for datasets
  • ARROW-11707 - [Rust] support CSV schema inference without file IO
  • ARROW-11708 - [Rust] fix Rust 2021 linting warnings
  • ARROW-11709 - [Rust][DataFusion] Move expressions and inputs into LogicalPlan ratherthan helpers in util
  • ARROW-11710 - [Rust][DataFusion] Implement ExpressionRewriter
  • ARROW-11719 - [Rust][Datafusion] support creating memory table with merged schema
  • ARROW-11721 - [Rust] json schema inference to return Schema instead of SchemaRef
  • ARROW-11722 - [Rust] Improve error message in FFI cast.
  • ARROW-11724 - [C++] Resolve namespace collisions with protobuf 3.15
  • ARROW-11725 - [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow
  • ARROW-11727 - [C++][FlightRPC] Estimate latency quantiles with TDigest
  • ARROW-11730 - [C++] Add implicit convenience constructors for constructing Future from Status/Result
  • ARROW-11733 - [Rust][DataFusion] Implement hash partitioning
  • ARROW-11734 - [C++] vendored safe-math.h does not compile on Solaris
  • ARROW-11735 - [R] Allow Parquet and Arrow Dataset to be optional components
  • ARROW-11736 - [R] Allow string compute functions to be optional
  • ARROW-11737 - [C++] Patch vendored xxhash for Solaris
  • ARROW-11738 - [Rust][DataFusion] Fix Concat and Trim Functions
  • ARROW-11740 - [C++] posix_memalign not declared in scope on Solaris
  • ARROW-11742 - [Rust][DataFusion] Add Expr::is_null and Expr::is_not_nu…
  • ARROW-11744 - [C++] Add xsimd dependency
  • ARROW-11745 - [C++] Add helper to generate random record batches by schema
  • ARROW-11750 - [Python][Dataset] Add support for project expressions
  • ARROW-11752 - [R] Replace usage of testthat::expect_is()
  • ARROW-11753 - [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved
  • ARROW-11754 - [R] Support dplyr::compute()
  • ARROW-11761 - [C++] Increase public API testing
  • ARROW-11766 - [R] Better handling for missing compression codecs on Linux
  • ARROW-11768 - [CI][C++] Make s390x job required
  • ARROW-11773 - [Rust] Support writing well formed JSON arrays as well as newline delimited json streams
  • ARROW-11774 - [R] macos one line install
  • ARROW-11775 - [Rust][DataFusion] Feature Flags for Dependencies
  • ARROW-11777 - [Rust] impl AsRef for StringBuilder/BinaryBuilder
  • ARROW-11778 - [Rust] Cast from LargeUtf8 to Numerical and temporal types
  • ARROW-11779 - [Rust] make alloc module public
  • ARROW-11790 - [Rust][DataFusion][Expr]
  • ARROW-11794 - [Go] Add concurrent-safe ipc.FileReader.RecordAt(i)
  • ARROW-11795 - [MATLAB] Migrate MATLAB Interface for Apache Arrow design doc to Markdown
  • ARROW-11797 - [C++][Dataset] Provide batch stream Scanner methods
  • ARROW-11798 - [Integration] Update testing submodule
  • ARROW-11799 - [Rust] fix len of string and binary arrays created from unbound iterator
  • ARROW-11801 - [C++] Remove bad header guard in filesystem/type_fwd.h
  • ARROW-11803 - [Rust][Parquet] Support v2 LogicalType
  • ARROW-11806 - [Rust][DataFusion] Optimize join / inner join creation of indices
  • ARROW-11820 - [Rust] Added macro to create native types
  • ARROW-11822 - [Rust][Datafusion] Support case sensitive comparisons for functions and aggregates
  • ARROW-11824 - [Rust][Parquet] Use logical types in Arrow schema conversion
  • ARROW-11825 - [Rust][DataFusion] Add mimalloc as option to benchmarks
  • ARROW-11833 - [C++] Bump vendored fast_float
  • ARROW-11837 - [C++][Dataset] expose originating Fragment on ScanTask
  • ARROW-11838 - [C++] Support IPC reads with shared dictionaries.
  • ARROW-11839 - [C++] Use xsimd for generation of accelerated bit-unpacking
  • ARROW-11842 - [Rust][Parquet] Use clone_from in get_batch_with_dict
  • ARROW-11852 - [Docs] Update CONTRIBUTING to explain Contributor role
  • ARROW-11856 - [C++] Remove unused reference to RecordBatchStreamWriter
  • ARROW-11858 - [GLib][Gandiva] Add Gandiva::Filter and related functions
  • ARROW-11859 - [GLib][Ruby] Add garrow_array_concatenate()
  • ARROW-11861 - [R][Packaging] Apply changes in r/tools/autobrew upstream
  • ARROW-11864 - [R] Document arrow.int64_downcast option
  • ARROW-11870 - [Dev] Automatically run merge script in virtual environment
  • ARROW-11876 - [Website] Update governance page
  • ARROW-11877 - [C++] Add microbenchmark for SimplifyWithGuarantee
  • ARROW-11879 - [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan
  • ARROW-11883 - [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map
  • ARROW-11887 - [C++] Add asynchronous read to streaming CSV reader
  • ARROW-11894 - [Rust][DataFusion] Change flight server example to use DataFrame API
  • ARROW-11895 - [Rust][DataFusion] Add support for more column statistics
  • ARROW-11898 - [Rust] Pretty print columns
  • ARROW-11899 - [Java] Refactor the compression codec implementation into core/Arrow specific parts
  • ARROW-11900 - [Website] Add Yibo to committer list
  • ARROW-11906 - [R] : Make FeatherReader print method more informative
  • ARROW-11907 - [C++] Use our own executor in S3FileSystem
  • ARROW-11910 - [Packaging][Ubuntu] Drop support for 16.04
  • ARROW-11911 - [Website] Add protobuf vs arrow to FAQ
  • ARROW-11912 - [R] Remove args from FeatherReader$create
  • ARROW-11913 - [Rust] Improve performance of StringBuilder by delaying bitmap creation
  • ARROW-11920 - [R] Remove r/libarrow when make cleaning
  • ARROW-11921 - [R] Set LC_COLLATE in r/data-raw/codegen.R
  • ARROW-11924 - [C++] Add streaming version of FileSystem::GetFileInfo
  • ARROW-11925 - [R] : Add between method for arrow_dplyr_query
  • ARROW-11927 - [Rust][DataFusion] Support Limit push down optimization
  • ARROW-11931 - [Go] bump to go1.15
  • ARROW-11935 - [C++] Add push generator
  • ARROW-11944 - [Developer] Fix archery's comparison of cached benchmark runs
  • ARROW-11949 - [Ruby] Accept raw Ruby objects as sort key and options
  • ARROW-11951 - [Rust] Remove OffsetSize::prefix
  • ARROW-11952 - [Rust] Make ArrayData --> GenericListArray fallable instead of panic!
  • ARROW-11954 - [C++] arrow/util/io_util.cc does not compile on Solaris
  • ARROW-11955 - [Rust][DataFusion] Support Union
  • ARROW-11958 - [GLib] Add garrow_chunked_array_combine()
  • ARROW-11959 - [Rust][DataFusion] Fix log line
  • ARROW-11962 - [Rust][DataFusion] Improve DataFusion docs
  • ARROW-11969 - [Rust][DataFusion] Improve Examples in documentation
  • ARROW-11972 - [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options
  • ARROW-11973 - [Rust][DataFusion] Boolean kleene kernels
  • ARROW-11977 - [Rust] Add documentation examples for sort kernel
  • ARROW-11982 - [Rust] Donate Ballista Distributed Compute Platform
  • ARROW-11984 - [C++][Gandiva] Implement SHA1 and SHA256 functions
  • ARROW-11987 - [C++][Gandiva] Implement trigonometric functions
  • ARROW-11988 - [C++][Gandiva] Implements last_day function
  • ARROW-11992 - [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType
  • ARROW-11993 - [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
  • ARROW-11996 - [R] Make r/configure run successfully on Solaris
  • ARROW-11999 - [Java] Support parallel vector element search with user-specified comparator
  • ARROW-12000 - [Documentation] Add note about deviation from style guide on struct/classes
  • ARROW-12005 - [R] Fix a bash typo in configure
  • ARROW-12017 - [R][Documentation] Make proper developing arrow docs
  • ARROW-12019 - [Rust][Parquet] Update README for 2.6.0 support
  • ARROW-12020 - [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion
  • ARROW-12031 - [C++][CSV] infer CSV timestamps columns with fractional seconds
  • ARROW-12032 - [Rust] Optimize comparison kernels
  • ARROW-12034 - [Developer Tools] Formalize Minor PRs
  • ARROW-12037 - [Rust][DataFusion] Support catalogs and schemas for table namespacing
  • ARROW-12038 - [Rust][DataFusion] Upgrade hashbrown to 0.11
  • ARROW-12039 - [Nightly][Gandiva] Fix gandiva-jar-ubuntu nightly build failure
  • ARROW-12040 - [C++] Fix potential deadlock in recursive S3 walks
  • ARROW-12043 - [Rust][Parquet] Write FSB arrays
  • ARROW-12045 - [Go][Parquet] Initial Chunk of Parquet port to Go
  • ARROW-12047 - [Rust][Parquet] Cleanup clippy
  • ARROW-12048 - [Rust][DataFusion] Support Common Table Expressions
  • ARROW-12052 - [Rust] Add Child Data to Arrow's C FFI implementation. …
  • ARROW-12056 - [C++] Create sequencing AsyncGenerator
  • ARROW-12058 - [Python] Enable arithmetic operations on Expressions
  • ARROW-12068 - [Python] Stop using distutils
  • ARROW-12069 - [C++][Gandiva] Implement IN expressions for Decimal type
  • ARROW-12070 - [GLib] Drop support for GNU Autotools
  • ARROW-12071 - [GLib] Keep input stream reference of GArrowJSONReader
  • ARROW-12075 - [Rust][DataFusion] Add CTE + UNION ALL to supported list of SQL features
  • ARROW-12081 - [R] Bindings for utf8_length
  • ARROW-12082 - [R][Dataset] Allow create dataset from vector of file paths
  • ARROW-12094 - [C++][R] Fix re2 building on clang/libc++
  • ARROW-12097 - [C++] Modify BackgroundGenerator so it creates fewer threads
  • ARROW-12098 - [R] Catch cpp build failures on linux
  • ARROW-12104 - [Go][Parquet] Second chunk of Ported Go Parquet code
  • ARROW-12106 - [Rust][DataFusion] Support SELECT * from information_schema.tables
  • ARROW-12107 - [Rust][DataFusion] Support SELECT * from information_schema.columns
  • ARROW-12108 - [Rust][DataFusion] Implement SHOW TABLES
  • ARROW-12109 - [Rust][DataFusion] Implement SHOW COLUMNS
  • ARROW-12110 - [Java] Implement ZSTD compression
  • ARROW-12111 - [Java] Generate flatbuffer files using flatc 1.12.0
  • ARROW-12116 - [Rust] Fix and ignore 1.51 clippy lints
  • ARROW-12119 - [Rust][DataFusion] Improve performance of to_array_of_size for primitives
  • ARROW-12120 - [Rust] Generate random arrays and batches
  • ARROW-12121 - [Rust][Parquet] Arrow writer benchmarks
  • ARROW-12123 - [Rust][DataFusion] Use smallvec for indices for better join performance
  • ARROW-12128 - [CI][Crossbow] Remove test-ubuntu-16.04-cpp job
  • ARROW-12131 - [CI][GLib] Ensure upgrading MSYS2
  • ARROW-12133 - [C++][Gandiva] Add option to disable targeting host cpu during llvm ir compilation
  • ARROW-12134 - [C++] Add match_substring_regex kernel
  • ARROW-12136 - [Rust][DataFusion] Reduce default batch_size to 8192
  • ARROW-12139 - [Python][Packaging] Use vcpkg to build macOS wheels
  • ARROW-12141 - [R] Bindings for grepl
  • ARROW-12143 - [CI] R builds should timeout and fail after some threshold and dump the output.
  • ARROW-12146 - [C++][Gandiva] Implement CONVERT_FROM(expression, replacement char) function
  • ARROW-12151 - [Docs] Add Jira component + summary conventions to the docs
  • ARROW-12153 - [Rust][Parquet] Return file stats after writing file
  • ARROW-12160 - [Rust] Add into_inner() to StreamWriter
  • ARROW-12164 - [Java] Make BaseAllocator.Config public
  • ARROW-12165 - [Rust] inline append functions of builders
  • ARROW-12168 - [Go][IPC] Implement Compression handling for Arrow IPC
  • ARROW-12170 - [Rust][DataFusion] Introduce repartition optimization
  • ARROW-12173 - [GLib] Remove #include <config.h>
  • ARROW-12176 - [C++] Fix some typos of cpp examples
  • ARROW-12187 - [C++][FlightRPC] Add compression benchmark for stream writing
  • ARROW-12188 - [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
  • ARROW-12190 - [Rust][DataFusion] Implement parallel / partitioned hash join
  • ARROW-12192 - [Website] Use downloadable URL for archive download
  • ARROW-12193 - [Dev][Release] Use downloadable URL for archive download
  • ARROW-12194 - [Rust][Parquet] Bump zstd to v0.7
  • ARROW-12197 - [R] dplyr bindings for cast, dictionary_encode
  • ARROW-12200 - [R] Export and document list_compute_functions
  • ARROW-12204 - [Rust][CI] Reduce size of Rust build artifacts in integration test
  • ARROW-12206 - [Python][Docs] Fix Table docstrings
  • ARROW-12208 - [C++] Add the ability to run async tasks without using the CPU thread pool
  • ARROW-12210 - [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / Information Schema
  • ARROW-12214 - [Rust][DataFusion] Add tests for limit
  • ARROW-12215 - [C++] Allow null values in fixed-size binary columns read from CSV
  • ARROW-12217 - [C++] Cleanup cpp examples source files naming
  • ARROW-12222 - [Dev][Packaging] Include build url in the crossbow console report
  • ARROW-12224 - [Rust] Use stable rust for no default test, clean up CI tests
  • ARROW-12228 - [CI] Create base image for conda environments
  • ARROW-12236 - [R][CI] Add check that all docs pages are listed in _pkgdown.yml
  • ARROW-12237 - [Packaging][Debian] Add support for bullseye
  • ARROW-12238 - [JS] Remove trailing spaces and consistently add space after //
  • ARROW-12239 - [JS] Switch to yarn
  • ARROW-12242 - [Python][Doc] Tweak nightly build instructions
  • ARROW-12246 - [CI] Sync conda recipes with upstream feedstock
  • ARROW-12248 - [C++] Avoid looking up ARROW_DEFAULT_MEMORY_POOL environment variable too late
  • ARROW-12249 - [R][CI] Fix test-r-install-local nightlies
  • ARROW-12251 - [Rust] Add Ballista to CI
  • ARROW-12263 - [Dev][Packaging] Move Crossbow to Archery
  • ARROW-12269 - [JS] Move to eslint
  • ARROW-12274 - [JS] Document how to run tests without building bundles
  • ARROW-12277 - [Rust][DataFusion] Implement Sum/Count/Min/Max aggregates for Timestamp(,)
  • ARROW-12278 - [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type
  • ARROW-12280 - [Developer] Remove @-mentions from commit messages in merge tool
  • ARROW-12281 - [JS] Remove shx, trash, and rimraf and update learna for yarn
  • ARROW-12283 - [R] Bindings for basic type convert functions in dplyr verbs
  • ARROW-12286 - [C++] Create AsyncGenerator from Future<AsyncGenerator>
  • ARROW-12287 - [C++] Create enumerating generator
  • ARROW-12288 - [C++] Create Scanner interface
  • ARROW-12289 - [C++] Create basic AsyncScanner implementation
  • ARROW-12303 - [JS] Use iterator instead of yield
  • ARROW-12304 - [R] Update news and polish docs for 4.0
  • ARROW-12305 - [JS] Update generate.py to python3 and new versions of pyarrow
  • ARROW-12309 - [JS] Make es2015 bundles the default
  • ARROW-12316 - [C++] Prefer mimalloc on Apple
  • ARROW-12317 - [Rust] JSON writer support for time, duration and date
  • ARROW-12320 - [CI] REPO arg missing from conda-cpp-valgrind
  • ARROW-12323 - [C++][Gandiva] Implement castTIME(timestamp) function
  • ARROW-12325 - [C++][CI] Nightly gandiva build failing due to failure of compiler to move return value
  • ARROW-12326 - [C++] Avoid needless c-ares detection
  • ARROW-12328 - [Rust][Ballista] Fix formatting
  • ARROW-12329 - [Rust][Ballista] Add Ballista README
  • ARROW-12332 - [Rust][Ballista] Add simple api server in scheduler
  • ARROW-12333 - [JS] Remove jest-environment-node-debug and do not emit from typescript by default
  • ARROW-12335 - [Rust][Ballista] Use latest DataFusion
  • ARROW-12337 - [Rust] add DoubleEndedIterator and ExactSizeIterator traits
  • ARROW-12351 - [CI][Ruby] Use ruby/setup-ruby instead of actions/setup-ruby
  • ARROW-12352 - [CI][R][Windows] Remove needless workaround for MSYS2
  • ARROW-12353 - [Packaging][deb] Rename -archive-keyring to -apt-source
  • ARROW-12354 - [Packaging][RPM] Use apache.jfrog.io/artifactory/ instead of apache.bintray.com/
  • ARROW-12356 - [Website] Update install page instructions to point to artifactory
  • ARROW-12361 - [Rust][DataFusion] Allow users to override physical optimization rules
  • ARROW-12367 - [C++] Stop producing when PushGenerator was destroyed
  • ARROW-12370 - [R] Bindings for power kernel
  • ARROW-12374 - [CI][C++][cron] Use Ubuntu 20.04 instead of 16.04
  • ARROW-12375 - [Release] Remove rebase post-release scripts
  • ARROW-12376 - [Dev] Log traceback for unexpected exceptions in archery trigger-bot
  • ARROW-12380 - [Rust][Ballista] Basic scheduler ui
  • ARROW-12381 - [Packaging][Python] macOS wheels are built with wrong package kind
  • ARROW-12383 - [JS] Upgrade dependencies
  • ARROW-12384 - [JS] Use let/const and clean up eslint rules
  • ARROW-12389 - [R][Docs] Add note about autocasting
  • ARROW-12395 - Create RunInSerialExecutor benchmark
  • ARROW-12396 - [Python][Docs] Clarify serialization/filesystem docstrings about deprecated status
  • ARROW-12397 - [Rust][DataFusion] Simplify readme example
  • ARROW-12398 - [Rust] remove redundant bound check in iterators
  • ARROW-12400 - [Rust] Re-enable tests in arrow::array::transform
  • ARROW-12402 - [Rust][DataFusion] Implement SQL metrics example
  • ARROW-12406 - [R] Fix checkbashism violation in configure
  • ARROW-12409 - [R] Remove LazyData from DESCRIPTION
  • ARROW-12419 - [Java] Remove to download flatc binary for s390x
  • ARROW-12420 - [C++/Dataset] Reading null columns as dictionary not longer possible
  • ARROW-12423 - [Docs] Remove Codecov badge
  • ARROW-12425 - [Rust] Fix new_null_array dictionary creation
  • ARROW-12432 - [Rust][DataFusion] Add metrics to SortExec
  • ARROW-12436 - [Rust][Ballista] Add watch capabilities to config backend trait
  • ARROW-12467 - [C++][Gandiva] Add support for LLVM12
  • ARROW-12477 - [Release] Download aarch64 miniforge
  • ARROW-12485 - [C++] Use mimalloc as the default memory allocator on macOS
  • ARROW-12488 - [GLib] Use g_memdup2() with GLib 2.68 or later
  • ARROW-12494 - [C++] ORC adapter fails to compile on GCC 4.8
  • ARROW-12506 - [Python] Improve modularity of pyarrow codebase to speedup compile time
  • ARROW-12652 - disable conda arm64 in nightly
  • PARQUET-1846 - [C++] Remove deprecated IO classes
  • PARQUET-1899 - [C++] Deprecated ReadBatchSpaced
  • PARQUET-1990 - [C++] Refuse to write ConvertedType::NA
  • PARQUET-1993 - [C++] expose way to wait for I/O to complete

Apache Arrow 3.0.0 (2021-01-25)

New Features and Improvements

  • ARROW-1846 - [C++][Compute] Implement “any” reduction kernel for boolean data
  • ARROW-4193 - [Rust] Add support for decimal data type
  • ARROW-4544 - [Rust] JSON nested struct reader
  • ARROW-4804 - [Rust] Parse Date32 and Date64 in CSV reader
  • ARROW-4960 - [R] Build r-arrow conda package in crossbow
  • ARROW-4970 - [C++][Parquet] Implement parquet::FileMetaData::Equals
  • ARROW-5336 - [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries
  • ARROW-5350 - [Rust] Allow filtering on simple lists
  • ARROW-5394 - [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types
  • ARROW-5679 - [Python][CI] Remove Python 3.5 support
  • ARROW-5950 - [Rust][DataFusion] Add logger
  • ARROW-6071 - [C++] Generic binary-to-binary casts
  • ARROW-6697 - [Rust] [DataFusion] Validate that all parquet partitions have the same schema
  • ARROW-6715 - [Website] Describe “non-free” component is needed for Plasma packages in install page
  • ARROW-6883 - [C++][Python] Allow writing dictionary deltas
  • ARROW-6995 - [Packaging][Crossbow] The windows conda artifacts are not uploaded to GitHub releases
  • ARROW-7531 - [C++] Reduce header inclusion cost slightly
  • ARROW-7800 - [Python] implement iter_batches() method for ParquetFile and ParquetReader
  • ARROW-7842 - [Rust][Parquet] Arrow list reader
  • ARROW-8113 - [C++] Lighter weight variant<>
  • ARROW-8199 - [C++] Add support for multi-column sort indices on Table
  • ARROW-8289 - [Rust] Parquet Arrow writer with nested support
  • ARROW-8423 - [Rust][Parquet] Serialize Arrow schema metadata
  • ARROW-8425 - [Rust][Parquet] Correct temporal IO
  • ARROW-8426 - [Rust][Parquet] - Add more support for converting Dicts
  • ARROW-8426 - [Rust][Parquet] Add support for writing dictionary types
  • ARROW-8853 - [Rust][Integration Testing] Enable Flight tests
  • ARROW-8876 - [C++] Implement casts from date types to Timestamp
  • ARROW-8883 - [Rust][Integration] Enable more tests
  • ARROW-9001 - [R] Box outputs as correct type in call_function
  • ARROW-9164 - [C++] Add embedded documentation to compute functions
  • ARROW-9187 - [R] Add bindings for arithmetic kernels
  • ARROW-9296 - [Rust][DataFusion] Address clippy errors clippy::unnecessary_unwrap, clippy::useless_format,
  • ARROW-9304 - [C++] Add “AppendEmpty” builder APIs for use inside StructBuilder::AppendNull
  • ARROW-9361 - [Rust] Move array types into their own modules
  • ARROW-9367 - [Python] Sorting on pyarrow data structures ?
  • ARROW-9400 - [Python] Do not depend on conda-forge static libraries in Windows wheel builds
  • ARROW-9475 - [Java] Clean up usages of BaseAllocator, use BufferAllocator in…
  • ARROW-9489 - [C++][string][string] )
  • ARROW-9555 - [Rust][DataFusion] Implement physical node for inner join
  • ARROW-9564 - [Packaging] Vendor r-arrow-feedstock conda-forge recipe
  • ARROW-9674 - [Rust] Make the parquet read and writers Send
  • ARROW-9704 - [Java] TestEndianness.testLittleEndian supports little- and big-endian platforms
  • ARROW-9707 - [Rust] [DataFusion] Re-implement threading model
  • ARROW-9709 - [Java] Test cases in arrow-vector takes care of endianness
  • ARROW-9728 - [Rust][Parquet] Nested definition & repetition for structs
  • ARROW-9747 - [Java][C++] Initial Support for 256-bit Decimals
  • ARROW-9771 - [Rust][DataFusion] treat predicates separated by AND separately in predicate pushdown
  • ARROW-9803 - [Go] Add initial support for s390x
  • ARROW-9804 - [FlightRPC] Flight auth redesign
  • ARROW-9828 - [Rust][DataFusion] Support filter pushdown optimisation for TableProvider implementations
  • ARROW-9861 - [Java] Support big-endian in DecimalVector
  • ARROW-9862 - [Java] Enable UnsafeDirectLittleEndian on a big-endian platform
  • ARROW-9911 - [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output
  • ARROW-9945 - [C++][Dataset] Refactor Expression::Assume to return a Result
  • ARROW-9991 - [C++] Split kernels for strings/binary
  • ARROW-10002 - [Rust] Remove trait specialization from arrow crate
  • ARROW-10021 - [C++][Compute] Return top-n modes in mode kernel
  • ARROW-10032 - [Documentation] update C++ windows docs
  • ARROW-10079 - [Rust] Benchmark and improve count bits
  • ARROW-10095 - [Rust] Update rust-parquet-arrow-writer branch's encode_arrow_schema with ipc changes
  • ARROW-10097 - [C++] Persist SetLookupState in between usages of IsIn when filtering dataset batches
  • ARROW-10106 - [FlightRPC][Java] Expose onIsReady() callback
  • ARROW-10108 - [Rust] [Parquet] Fix compiler warning about unused return value
  • ARROW-10109 - [Rust] Add support to the C data interface for primitive types and utf8
  • ARROW-10110 - [Rust] Add support to consume C Data Interface
  • ARROW-10131 - [C++][Dataset][Python] Lazily parse parquet metadata
  • ARROW-10135 - [Rust][Parquet] Refactor file module to help adding sources
  • ARROW-10143 - [C++] Rewrite Array(Range)Equals
  • ARROW-10144 - [Flight] Add support for using the TLS_SNI extension
  • ARROW-10149 - [Rust] Add support to external release of un-owned buffers
  • ARROW-10163 - [Rust][DataFusion] Add DictionaryArray coercion support
  • ARROW-10168 - [Rust][Parquet] Schema roundtrip - use Arrow schema from Parquet metadata when available
  • ARROW-10173 - [Rust][DataFusion] Implement support for direct comparison to scalar values
  • ARROW-10180 - [C++][Doc] Update dependency management docs
  • ARROW-10182 - [C++] Add basic continuation support to Future
  • ARROW-10191 - [Rust][Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes
  • ARROW-10197 - [python][Gandiva] Execute expression on filtered data
  • ARROW-10203 - [Doc] Give guidance on big-endian support in the contributors docs
  • ARROW-10207 - [C++] Allow precomputing output string/list offsets in kernels
  • ARROW-10208 - [C++] Fix split string kernels on sliced input
  • ARROW-10216 - [Rust] Simd implementation for primitive min/max kernels
  • ARROW-10224 - [Python] Add support for Python 3.9 except macOS wheel and Windows wheel
  • ARROW-10225 - [Rust][Parquet] Fix null comparison in roundtrip
  • ARROW-10228 - [Julia] Contribute Julia implementation
  • ARROW-10236 - [Rust] Add can_cast_types to arrow cast kernel, use in DataFusion
  • ARROW-10241 - [C++][Compute] Add variance kernel benchmark
  • ARROW-10249 - [Rust] Support nested dictionaries inside list arrays
  • ARROW-10259 - [Rust] Add custom metadata to Field
  • ARROW-10261 - [Rust][Breaking] Change List datatype to Box
  • ARROW-10263 - [C++][Compute] Improve variance kernel numerical stability
  • ARROW-10268 - [Rust] Write out non-nested dictionaries in the IPC format
  • ARROW-10269 - [Rust] Update to 2020-11-14 nightly
  • ARROW-10277 - [C++] Support comparing scalars approximately
  • ARROW-10289 - [Rust] Read dictionaries in IPC streams
  • ARROW-10292 - [Rust][DataFusion] Simplify merge
  • ARROW-10295 - [Rust][DataFusion] Replace Rc<RefCell<>> by Box<> in accumulators.
  • ARROW-10300 - [Rust] Improve documentation for TPC-H benchmark
  • ARROW-10301 - [C++][Compute] Implement “all” reduction kernel for boolean data
  • ARROW-10302 - [Python] Don't double-package plasma-store-server
  • ARROW-10304 - [C++][Compute] Optimize variance kernel for integers
  • ARROW-10310 - [C++][Gandiva] Add single argument round() in Gandiva
  • ARROW-10311 - [Release] Update crossbow verification process
  • ARROW-10313 - [C++] Faster UTF8 validation for small strings
  • ARROW-10318 - [C++] Use pimpl idiom in CSV parser
  • ARROW-10319 - [Go][Flight] Add context to flight client auth handler
  • ARROW-10320 - [Rust][DataFusion] Migrated from batch iterators to batch streams.
  • ARROW-10322 - [C++][Dataset] Minimize Expression
  • ARROW-10323 - [Release][wheel] Add missing verification setup step
  • ARROW-10325 - [C++][Compute] Refine aggregate kernel registration
  • ARROW-10328 - [C++] Vendor fast_float number parsing library
  • ARROW-10330 - [Rust][DataFusion] Implement NULLIF() SQL function
  • ARROW-10331 - [Rust][DataFusion] Re-organize DataFusion errors
  • ARROW-10332 - [Rust] Allow CSV reader to iterate from start up to end
  • ARROW-10334 - [Rust][Parquet] NullArray roundtrip
  • ARROW-10336 - [Rust] Added FromIter and ToIter for string arrays
  • ARROW-10337 - [C++] More liberal parsing of ISO8601 timestamps with fractional seconds
  • ARROW-10338 - [Rust] Use const fn for applicable methods
  • ARROW-10340 - [Packaging][deb][RPM] Use Python 3.8 for pygit2
  • ARROW-10356 - [Rust][DataFusion] Add support for is_in
  • ARROW-10363 - [Python] Remove CMake bug workaround in manylinux
  • ARROW-10366 - [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate
  • ARROW-10375 - [Rust] Removed PrimitiveArrayOps
  • ARROW-10378 - [Rust] Update take() kernel with support for LargeList.
  • ARROW-10381 - [Rust] Generalized Ordering for inter-array comparisons
  • ARROW-10382 - [Rust] Fix typos
  • ARROW-10383 - [Doc] fix typos
  • ARROW-10384 - [C++] Fix typos
  • ARROW-10385 - [C++][Gandiva] Add support for LLVM 11
  • ARROW-10389 - [Rust][DataFusion] Make the custom source implementation API more explicit
  • ARROW-10392 - [C++][Gandiva] Avoid string copy while evaluating IN expression
  • ARROW-10396 - [Rust][Parquet] Publically export SliceableCursor and FileSource
  • ARROW-10398 - [Rust][Parquet] Re-Export parquet::record::api::Field
  • ARROW-10400 - [C++] Propagate TLS client peer_identity when using mutual TLS
  • ARROW-10402 - [Rust] Refactor array equality
  • ARROW-10407 - [C++] Add BasicDecimal256 division Support
  • ARROW-10408 - [Java] Bump Avro to 1.10.0
  • ARROW-10410 - [Rust] Some refactorings
  • ARROW-10416 - [R] Support Tables in Flight
  • ARROW-10422 - [Rust] Removed unused trait BinaryArrayBuilder
  • ARROW-10424 - [Rust] Minor simplification to the generic impl PrimitiveArray
  • ARROW-10428 - [FlightRPC][Java] Add support for HTTP cookies
  • ARROW-10445 - [Rust] Added doubleEnded iterator to PrimitiveArrayIter
  • ARROW-10449 - [Rust] Make Dictionary::keys be an array
  • ARROW-10454 - [Rust][Datafusion] support creating ParquetExec from filelist and schema
  • ARROW-10455 - [Rust][CI] Fixed error in caching files
  • ARROW-10458 - [Rust][Datafusion] create_logical_plan should not require mutable reference
  • ARROW-10464 - [Rust][DataFusion] Add utility to convert TPC-H data from tbl to CSV and Parquet
  • ARROW-10466 - [Rust] [Website] Update implementation status page
  • ARROW-10467 - [FlightRPC][Java] Add the ability to pass arbitrary client headers.
  • ARROW-10468 - [C++][Compute] Provide KernelExecutor instead of FunctionExecutor
  • ARROW-10476 - [Rust] Allow string arrays to be built from Option<&str> or Option
  • ARROW-10477 - [Rust] Add iterator support for Binary arrays.
  • ARROW-10478 - [Dev][Release] Correct Java versions to 3.0.0-SNAPSHOT
  • ARROW-10481 - [R] Bindings to add, remove, replace Table columns
  • ARROW-10483 - [C++] Move Executor into a separate header
  • ARROW-10484 - [C++] Make Future<> more generic
  • ARROW-10487 - [FlightRPC][C++] Header-based auth in clients
  • ARROW-10490 - [C++][GLib] Fix range-loop-analysis warnings
  • ARROW-10492 - [Java][JDBC] Allow users to config the mapping between SQL types and Arrow types
  • ARROW-10504 - [C++] Suppress UBSAN pointer-overflow warning in RapidJSON
  • ARROW-10510 - [Rust][DataFusion] Benchmark COUNT(DISTINCT) queries.
  • ARROW-10515 - [Julia][Doc] Update lists of supported languages to include Julia
  • ARROW-10522 - [R] Allow rename Table and RecordBatch columns with names()
  • ARROW-10526 - [FlightRPC][C++] Client cookie middleware
  • ARROW-10530 - [R] Optionally use distro package in linuxlibs.R
  • ARROW-10531 - [Rust][DataFusion] : Add schema and graphviz formatting for LogicalPlans and a PlanVisitor
  • ARROW-10539 - [Packaging][Python] Use GitHub Actions to build wheels for Windows
  • ARROW-10540 - [Rust] Extended filter kernel to all types and improved performance
  • ARROW-10541 - [C++] Add re2 library to core arrow / ARROW_WITH_RE2
  • ARROW-10542 - [C#][Flight] Add beginning on flight code for net core
  • ARROW-10543 - [Developer] Add a note about being patient after gitbox is enabled
  • ARROW-10552 - [Rust] Removed un-used Result
  • ARROW-10559 - [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules
  • ARROW-10561 - [Rust] Simplified Buffer's write and write_bytes and fixed undefined behavior
  • ARROW-10562 - [Rust] Potential UB on unsafe code
  • ARROW-10566 - [C++] Allow validating ArrayData directly
  • ARROW-10567 - [C++] Add multiple perf runs options for higher precision reporting
  • ARROW-10572 - [Rust][DataFusion] Use aHash instead of FnvHashMap
  • ARROW-10574 - [Python][Parquet] Allow collections for ‘in’ / ‘not in’ filter (in addition to sets)
  • ARROW-10575 - [Rust] Rename union.rs to be cosistent with other arrays
  • ARROW-10581 - [Doc] IPC dictionary reference to relevant section
  • ARROW-10582 - [Rust][DataFusion] Implement “repartition” operator
  • ARROW-10584 - [Rust][DataFusion] Add SQL support for JOIN ON syntax
  • ARROW-10585 - [Rust][DataFusion] Add join support to DataFrame and LogicalPlan
  • ARROW-10586 - [Rust] [DataFusion] Add join support to query planner
  • ARROW-10589 - [Rust] Implement AVX-512 bit and operation
  • ARROW-10590 - [Rust] Remove Date32(Millisecond) from casts
  • ARROW-10591 - [Rust] Add support for StructArray to MutableArrayData
  • ARROW-10595 - [Rust] Simplify inner loop of min/max kernels for non-null case
  • ARROW-10596 - [Rust] Improve take benchmark
  • ARROW-10598 - [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance
  • ARROW-10604 - [GLib][Ruby] Add support for 256-bit decimal
  • ARROW-10607 - [C++][Parquet] Add parquet support for decimal256.
  • ARROW-10609 - [Rust] Optimize min/max of non null strings
  • ARROW-10628 - [Rust] flag clippy warnings as errors
  • ARROW-10633 - [Rust][DataFusion] Dependency version updates
  • ARROW-10634 - [C#][CI] Change the build version from 2.2 to 3.1 in CI
  • ARROW-10636 - [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet
  • ARROW-10637 - [Rust] Added examples to some boolean kernels.
  • ARROW-10638 - [Rust] Improved tests of boolean kernel.
  • ARROW-10639 - [Rust] Added examples to is_null kernel and simplified signature.
  • ARROW-10644 - [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs
  • ARROW-10646 - [C++][FlightRPC] Disable flaky Flight test on Windows
  • ARROW-10648 - [Java] Prepare Java codebase for source release without requiring any git tags to be created or pushed
  • ARROW-10651 - [C++] Fix alloc-dealloc-mismatch in S3-related factory
  • ARROW-10652 - [C++][Gandiva] Make gandiva cache size configurable
  • ARROW-10653 - [Rust] Update toolchain nightly
  • ARROW-10654 - [Rust] Specialize parsing of floats / bools in CSV Reader
  • ARROW-10660 - [Rust] Implement AVX-512 bit or operation
  • ARROW-10665 - [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike
  • ARROW-10666 - [Rust][DataFusion] Support nested SELECT statements.
  • ARROW-10669 - [C++][Compute] Support scalar arguments to Boolean compute functions
  • ARROW-10672 - [Rust][DataFusion] Made Limit be computed on the stream.
  • ARROW-10673 - [Rust][DataFusion] Made sort not collect on execute.
  • ARROW-10674 - [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests
  • ARROW-10677 - [Rust] Fix CSV Boolean parsing + add tests to demonstrate supported csv parsing
  • ARROW-10679 - [Rust][DataFusion] Implement CASE WHEN physical expression
  • ARROW-10680 - [Rust][DataFusion] Add partial support for TPC-H query 12
  • ARROW-10682 - [Rust] Improve sort kernel performance by enabling inlining of is_valid calls
  • ARROW-10685 - [Rust][DataFusion] Added support for Join on filter-pushdown optimizer.
  • ARROW-10688 - [Rust][DataFusion] Implement CASE WHEN logical plan
  • ARROW-10689 - [Rust][DataFusion] Add SQL support for CASE WHEN
  • ARROW-10693 - [Rust][DataFusion] Add support to left join
  • ARROW-10696 - [C++] Add SetBitRunReader
  • ARROW-10697 - [C++] Add notes about bitmap readers
  • ARROW-10703 - [Rust][DataFusion] Compute build-side of hash join once
  • ARROW-10704 - [Rust][DataFusion] Remove Nested from expression enum
  • ARROW-10708 - [Packaging][deb] Add support for Ubuntu 20.10
  • ARROW-10709 - [C++][Python] Allow PyReadableFile::Read() to call pyobj.read_buffer()
  • ARROW-10712 - [Rust][DataFusion] Add tests to TPC-H benchmarks
  • ARROW-10717 - [Rust][DataFusion] Add support for right join
  • ARROW-10720 - [C++] Add Rescale support for BasicDecimal256
  • ARROW-10721 - [C#][CI] Use .NET 3.1 by default
  • ARROW-10722 - [Rust][DataFusion] Reduce overhead of some data types in aggregations / joins, improve benchmarks
  • ARROW-10723 - [Packaging][deb][RPM] Enable Parquet encription
  • ARROW-10724 - [Dev Tools] Added labeler to PRs that need rebase.
  • ARROW-10725 - [Python][Compute] Expose sort options in Python bindings
  • ARROW-10728 - [Rust][DataFusion] Support USING in SQL
  • ARROW-10729 - [Rust][DataFusion] Add SQL support for JOIN using implicit syntax
  • ARROW-10732 - [Rust][DataFusion] Integrate DFSchema as a step towards supporting qualified column names
  • ARROW-10733 - [R] Improvements to Linux installation troubleshooting
  • ARROW-10740 - [Rust][DataFusion] Remove redundant clones found by clippy
  • ARROW-10741 - [Rust] Apply previously ignored clippy suggestions
  • ARROW-10742 - [Python] Check mask when creating array from numpy
  • ARROW-10745 - [Rust] Directly allocate padding bytes in filter context
  • ARROW-10747 - [Rust] : CSV reader optimization
  • ARROW-10750 - [Rust][DataFusion] Add SQL support for LEFT and RIGHT join
  • ARROW-10752 - [GLib] Add garrow_schema_has_metadata()
  • ARROW-10754 - [GLib] Add support for metadata to GArrowField
  • ARROW-10755 - [Rust][Parquet] Add support for writing boolean type
  • ARROW-10756 - [Rust][DataFusion] Fix reduntant clones
  • ARROW-10759 - [Rust][DataFusion] Implement string to date cast
  • ARROW-10763 - [Rust] Speed up take for primitive / boolean for non-null arrays
  • ARROW-10765 - [Rust] Optimize take string for non-null arrays
  • ARROW-10767 - [Rust] Speed up sum with nulls (non-simd)
  • ARROW-10770 - [Rust] JSON nested list reader
  • ARROW-10772 - [Rust] Speed up take by writing to buffer
  • ARROW-10775 - [Rust][DataFusion] Use ahash in join hashmap
  • ARROW-10776 - [C++] Allow STL iteration over concrete primitive arrays
  • ARROW-10781 - [Rust][DataFusion] add the ‘Statistics’ interface in data source
  • ARROW-10783 - [Rust][DataFusion] Implement Statistics for Parquet TableProvider
  • ARROW-10785 - [Rust] Optimize take string
  • ARROW-10786 - [Packaging][RPM] Drop support for CentOS 6
  • ARROW-10788 - [C++] Make S3 recursive tree walks parallel
  • ARROW-10789 - [Rust][DataFusion] Make TableProvider dynamically typed
  • ARROW-10790 - [C++] Improve ChunkedArray and Table sort_indices performance
  • ARROW-10792 - [Rust][CI] Modularize builds for faster build and smaller caches
  • ARROW-10795 - [Rust] Optimize specialization for datatypes
  • ARROW-10796 - [C++] Implement optimized RecordBatch sorting
  • ARROW-10800 - [Rust][Parquet] Provide access to the elements of parquet::record::{List, Map}
  • ARROW-10802 - [C++][NullType] in parquet column writer
  • ARROW-10808 - [Rust][DataFusion] Support nested expressions in aggregations.
  • ARROW-10809 - [C++] Use Datum for SortIndices() input
  • ARROW-10812 - [Rust] Make BooleanArray not a PrimitiveArray
  • ARROW-10813 - [Rust][DataFusion] Implement DFSchema
  • ARROW-10814 - [Packaging][deb] Remove support for Debian GNU/Linux Stretch
  • ARROW-10817 - [Rust][DataFusion] Implement TypedString and DATE coercion
  • ARROW-10820 - [Rust][DataFusion] Complete TPC-H Benchmark Queries
  • ARROW-10821 - [Rust][Datafusion] support negative expression
  • ARROW-10822 - [Rust][Datafusion] add simd feature flag to datafusion
  • ARROW-10824 - [Rust] Added partialEq to null array
  • ARROW-10825 - [Rust] Added support for NullArray to MutableArrayData
  • ARROW-10826 - [Rust] Add support for FixedSizeBinaryArray to MutableArrayData
  • ARROW-10827 - [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)
  • ARROW-10828 - [Rust][DataFusion] Address / fix clippy lints
  • ARROW-10829 - [Rust][DataFusion] Implement Into for DFSchema
  • ARROW-10832 - [Rust][Arrow] generate src/ipc/gen/* with latest snapshot flatc.
  • ARROW-10836 - [Rust] Extend take kernel to FixedSizeListArray
  • ARROW-10838 - [Rust][CI] Add arrow build targeting wasm32
  • ARROW-10839 - [Rust][Data Fusion] Implement BETWEEN operator
  • ARROW-10843 - [C++] Add support for temporal types in sort family kernels
  • ARROW-10845 - [Python][CI] Build with nightly numpy and pandas artifacts
  • ARROW-10849 - [Python] Handle numpy deprecation warnings for builtin type aliases
  • ARROW-10851 - [C++] Reduce size of generated code for sort kernels
  • ARROW-10857 - [Packaging] Follow PowerTools repository name change on CentOS 8
  • ARROW-10858 - [C++] Add missing Boost dependency with Visual C++
  • ARROW-10861 - [Python] Update minimal NumPy version to 1.16.6
  • ARROW-10864 - [Rust] Use standard ordering for floats
  • ARROW-10865 - [Rust] Easier to use Schema -> DFSchema conversion
  • ARROW-10867 - [C++] Workaround gcc internal compiler error
  • ARROW-10869 - [GLib] Add garrow_*_sort_indices() and related options
  • ARROW-10870 - [Julia][Doc] Include Julia in project documentation
  • ARROW-10871 - [Julia][CI] Setup Julia testing via Github Actions
  • ARROW-10873 - [C++] Apple Silicon is reported as arm64 in CMake
  • ARROW-10874 - [Rust][DataFusion] Add statistics for MemTable, change statistics struct
  • ARROW-10877 - [Rust] [DataFusion] Add benchmark based on kaggle movies
  • ARROW-10878 - [Rust] Simplify extend_from_slice
  • ARROW-10879 - [Packaging][deb] Restore Debian GNU/Linux Buster support
  • ARROW-10881 - [C++] Fix EXC_BAD_ACCESS in PutSpaced
  • ARROW-10885 - [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows
  • ARROW-10887 - [Doc][C++] Document C++ IPC API
  • ARROW-10889 - [Rust][Proposal] Add guidelines about usage of unsafe
  • ARROW-10890 - [Rust] [DataFusion] JOIN support
  • ARROW-10891 - [Rust][DataFusion] Enable / fix clone_on_copy, map_clone, or_fun_call
  • ARROW-10893 - [Rust][DataFusion] More clippy lints
  • ARROW-10896 - [C++][CMake] Rename internal RE2 package name to “re2” from “RE2”
  • ARROW-10900 - [Rust][DataFusion] Resolve TableScan provider eagerly
  • ARROW-10904 - [Python][CI][Packaging] Add support for Python 3.9 macOS wheels
  • ARROW-10905 - [Python] Add support for Python 3.9 windows wheels
  • ARROW-10908 - [Rust][DataFusion] Update relevant tpch-queries with BETWEEN
  • ARROW-10917 - [Doc] Update feature matrix for Rust
  • ARROW-10918 - [Doc][C++] Document supported Parquet features
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10927 - [Rust][Parquet][REVERT]
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10929 - [Rust] Change CI to use Stable Rust
  • ARROW-10933 - [Rust] Update readme files in regard to nightly rust
  • ARROW-10934 - [Python] Skip filesystem tests for in-memory fs for fsspec 0.8.5
  • ARROW-10938 - [Rust] upgrade dependency “flatbuffers” to 0.8
  • ARROW-10940 - [Rust] Extend sort kernel to ListArray
  • ARROW-10941 - [Doc] Document supported Parquet encryption features
  • ARROW-10944 - [Rust] Implement min/max aggregate kernels for BooleanArray
  • ARROW-10946 - [Rust] Simplified bit chunk iterator
  • ARROW-10947 - [Rust][DataFusion] Optimize UTF8 to Date32 Conversion
  • ARROW-10948 - [C++] Always use GTestConfig.cmake
  • ARROW-10949 - [Rust] Removed un-needed clone
  • ARROW-10951 - [Python][CI] Fix nightly pandas builds (pytest monkeypatch issue)
  • ARROW-10952 - [Rust] Add pre-commit hook
  • ARROW-10966 - [C++] Use FnOnce for ThreadPool's tasks instead of std::function
  • ARROW-10968 - [Rust][DataFusion] Don't build hash table for right side of join
  • ARROW-10969 - [Rust][DataFusion] Implement basic String ANSI SQL Functions
  • ARROW-10985 - [Rust] Update unsafe guidelines for adding JIRA references
  • ARROW-10986 - [Rust][DataFusion] Add average stats to TPC-H benchmarks
  • ARROW-10988 - [C++] Require CMake 3.5 or later
  • ARROW-10989 - [Rust] Iterate primitive buffers by slice
  • ARROW-10993 - [CI][macOS] Fix Python 3.9 installation by Homebrew
  • ARROW-10995 - [Rust][DataFusion] Limit ParquetExec concurrency when reading large number of files
  • ARROW-11004 - [FlightRPC][Python] Header-based auth in clients
  • ARROW-11005 - [Rust] Remove indirection from take kernel
  • ARROW-11008 - [Rust][DataFusion] Simplify count accumulator
  • ARROW-11009 - [C++] Allow changing default memory pool with an environment variable
  • ARROW-11010 - [Python] `np.float` deprecation warning in `_pandas_logical_type_map`
  • ARROW-11012 - [Rust][DataFusion] Make write_csv and write_parquet concurrent
  • ARROW-11015 - [CI][Gandiva] Move gandiva nightly build from travis to github action
  • ARROW-11018 - [Rust][DataFusion] Add support for column-level statistics, null count.
  • ARROW-11026 - [Rust] : Run tests without requiring environment variables
  • ARROW-11028 - [Rust] Make a few pattern matches more idiomatic
  • ARROW-11029 - [Rust][DataFusion] Add documentation for code that determines number of rows per operator
  • ARROW-11032 - [C++][FlightRPC] Benchmark unix socket RPC
  • ARROW-11033 - [Rust] Csv writing performance improvements
  • ARROW-11034 - [Rust] remove rustfmt ignore list, fix format
  • ARROW-11035 - [Rust] Improved performance of casting to utf8
  • ARROW-11037 - [Rust] Optimized creation of string array from iterator.
  • ARROW-11038 - [Rust] Removed unused trait and Result.
  • ARROW-11039 - [Rust] Performance improvement for utf-8 to float cast
  • ARROW-11040 - [Rust] Simplified builders
  • ARROW-11042 - [Rust][DataFusion] Increase default batch size
  • ARROW-11043 - [C++] Add “is_nan” kernel
  • ARROW-11046 - [Rust][DataFusion] Support count_distinct in DataFrame API
  • ARROW-11049 - [Python] Expose alternate memory pools
  • ARROW-11052 - [Rust][DataFusion] Implement metrics for HashJoinExec
  • ARROW-11053 - [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches
  • ARROW-11054 - [Rust][DataFusion] Move to sqlparser 0.7.0
  • ARROW-11055 - [Rust][DataFusion] Support date_trunc function
  • ARROW-11058 - [Rust][DataFusion] Implement coalesce batches operator
  • ARROW-11063 - [Rust][Breaking] Validate null counts when building arrays
  • ARROW-11064 - [Rust][DataFusion] Speed up hash join on smaller batches
  • ARROW-11072 - [Rust][Parquet] Support reading decimal from physical int types
  • ARROW-11076 - [Rust][DataFusion] Refactor usage of right indices in hash join
  • ARROW-11079 - [R] Catch up on changelog since 2.0
  • ARROW-11080 - [C++][Dataset] Improvements to implicit casting
  • ARROW-11082 - [Rust] C data interface to largeUTF8
  • ARROW-11086 - [Rust] Extend take implementation to more index types
  • ARROW-11091 - [Rust][DataFusion] Fix new clippy linting errors
  • ARROW-11095 - [Python] access pyarrow.RecordBatch field() and column() by string name
  • ARROW-11096 - [Rust][Large] binary
  • ARROW-11097 - [Rust] Minor simplification of some tests.
  • ARROW-11099 - [Rust] Remove unsafe value_slice and raw_values methods from primitive and boolean arrays
  • ARROW-11100 - [Rust] Speed up numeric to string cast using lexical_core
  • ARROW-11101 - [Rust] rewrite pre-commit hook
  • ARROW-11104 - [GLib] Add append_null/append_nulls to GArrowArrayBuilder and use them
  • ARROW-11105 - [Rust] Migrated MutableBuffer::freeze to From for Buffer
  • ARROW-11109 - [GLib] Add garrow_array_builder_append_empty_value() and values()
  • ARROW-11110 - [Rust][Datafusion] ExecutionContext.table should take immutable reference
  • ARROW-11111 - [GLib] Add GArrowFixedSizeBinaryArrayBuilder
  • ARROW-11121 - [Developer] Use pull_request_target for PR JIRA integration
  • ARROW-11122 - [Rust] Added FFI support for date and time.
  • ARROW-11124 - [Doc] Update status matrix for Decimal256
  • ARROW-11125 - [Rust] Logical equality for list arrays
  • ARROW-11126 - [Rust] Document and test ARROW-10656
  • ARROW-11127 - [C++] ifdef unused cpu_info on non-x86 platforms
  • ARROW-11129 - [Rust][DataFusion] Use tokio for loading parquet
  • ARROW-11130 - [Website][CentOS 8][RHEL 8] Enable all required repositories by default
  • ARROW-11131 - [Rust] Improve performance of boolean_equal
  • ARROW-11136 - [R] Bindings for is.nan
  • ARROW-11137 - [Rust][DataFusion] Clippy needless_range_loop,needless_lifetimes
  • ARROW-11138 - [Rust][DataFusion] Add ltrim, rtrim to built-in functions
  • ARROW-11139 - [GLib] Add support for extension type
  • ARROW-11155 - [C++][Packaging] Move gandiva crossbow jobs off of Travis-CI
  • ARROW-11158 - [Julia] Implement Decimal256 support for Julia
  • ARROW-11159 - [Developer] Consolidate pull request related jobs
  • ARROW-11165 - [Rust][DataFusion] Document Postgres as standard SQL dialect
  • ARROW-11168 - [Rust][Doc] Fix cargo doc warnings
  • ARROW-11169 - [Rust] Add a comment explaining where float total_order algorithm came from
  • ARROW-11175 - [R] Small docs fixes
  • ARROW-11176 - [R] Expose memory pool name and document setting it
  • ARROW-11187 - [Rust][Parquet] Fix Build error by Pin specific parquet-format-rs version
  • ARROW-11188 - [Rust] Support crypto functions from PostgreSQL dialect
  • ARROW-11193 - [Java][Documentation] Add Java ListVector Documentation
  • ARROW-11194 - [Rust] Enable packed_simd for aarch64
  • ARROW-11195 - [Rust] [DataFusion] Built-in table providers should expose relevant fields
  • ARROW-11196 - [GLib] Add support for mock, HDFS and S3 file systems with factory function
  • ARROW-11198 - [Packaging][Python] Ensure setuptools version during build supports markdown
  • ARROW-11200 - [Rust][DataFusion] Physical operators and expressions should have public accessor methods
  • ARROW-11201 - [Rust][DataFusion] create_batch_empty - support more types
  • ARROW-11203 - [Developer][Website] Enable JIRA and pull request integration
  • ARROW-11204 - [C++] Fix build failures with bundled gRPC and Protobuf
  • ARROW-11205 - [GLib][Dataset] Add GADFileFormat and its family
  • ARROW-11209 - [Rust] DF - Better error message on unsupported GROUP BY
  • ARROW-11210 - [CI] Restore workflows that had been blocked by INFRA
  • ARROW-11212 - [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels
  • ARROW-11213 - [Packaging][Python] Dockerize wheel building on windows
  • ARROW-11215 - [CI] Use named volumes by default for caching in docker-compose
  • ARROW-11218 - [R] Make SubTreeFileSystem print method more informative
  • ARROW-11219 - [CI][Ruby][MinGW] Reduce CI time
  • ARROW-11221 - [Rust] DF Implement GROUP BY support for Float32/Float64
  • ARROW-11231 - [Packaging][deb][RPM] Add support for mimalloc
  • ARROW-11234 - [CI][Ruby][macOS] Reduce CI time
  • ARROW-11236 - Bump Jackson to 2.11.4
  • ARROW-11240 - [Packaging][R] Add mimalloc to R packaging
  • ARROW-11242 - [CI] Remove CMake 3.2 job
  • ARROW-11245 - [C++][Gandiva] Add support for LLVM 11.1
  • ARROW-11247 - [C++] Infer date32 columns in CSV
  • ARROW-11256 - [Packaging][Linux] Don't buffer packaging output
  • ARROW-11272 - [Release][wheel] Remove unsupported Python 3.5 and manylinux1
  • ARROW-11273 - [Release][deb] Remove unsupported Debian GNU/Linux stretch
  • ARROW-11278 - [Release][NodeJS] Don't touch ~/.bash_profile
  • ARROW-11280 - [Release][APT] Fix minimal build example check
  • ARROW-11281 - [C++] Remove needless runtime RapidJSON dependency
  • ARROW-11282 - [Packaging][deb] Add missing libgflags-dev dependency
  • ARROW-11285 - [Release][APT] Add support for Ubuntu Groovy
  • ARROW-11292 - [Release][JS] Use Node.JS LTS
  • ARROW-11293 - [C++] Don't require Boost and gflags with find_package(Arrow)
  • ARROW-11307 - [Release][Ubuntu][20.10] Add workaround for dependency issue
  • ARROW-11454 - [Website] [Rust] 3.0.0 Blog Post
  • PARQUET-1566 - [C++] Indicate if null count, distinct count are present in column statistics

Bug Fixes

  • ARROW-2616 - [Python] Cross-compiling Pyarrow
  • ARROW-6582 - [R] Arrow to R fails with embedded nuls in strings
  • ARROW-7363 - [Python] add combine_chunks method to ChunkedArray
  • ARROW-7909 - [Website] Add how to install on Red Hat Enterprise Linux
  • ARROW-8258 - [Rust] [Parquet] ArrowReader fails on some timestamp types
  • ARROW-9027 - [Python][Testing] Split parquet tests into multiple files + clean-up
  • ARROW-9479 - [JS] Fix Table.from for zero-item serialized tables, Table.empty for schemas containing compound types (List, FixedSizeList, Map)
  • ARROW-9636 - [Python] Update documentation about ‘LZO’ compression in parquet.write_table
  • ARROW-9690 - [Go] tests failing on s390x
  • ARROW-9776 - [R] read_feather causes segfault in R if file doesn't exist
  • ARROW-9897 - [C++][Gandiva] Added to_date function
  • ARROW-9897 - [C++][Gandiva] Revert - to_date function
  • ARROW-9898 - [C++][Gandiva] Fix linking issue with castINT/FLOAT functions
  • ARROW-9903 - [R] open_dataset freezes opening feather files on Windows
  • ARROW-9963 - [Python] Recognize datetime.timezone.utc as UTC on conversion python->pyarrow
  • ARROW-10039 - [Rust] Do not require memory alignment of buffers
  • ARROW-10042 - [Rust] Fix tests involving ArrayData/Buffer equality
  • ARROW-10080 - [R] Call gc() and try again in MemoryPool
  • ARROW-10122 - [Python] Fix to_pandas conversion with subset of columns and MultiIndex
  • ARROW-10145 - [C++][Dataset] Assert integer overflow in partitioning falls back to string
  • ARROW-10146 - [Python] Fix parquet FileMetadata.to_dict in case statistics is not set
  • ARROW-10174 - [Java] Fix reading/writing dict structs
  • ARROW-10177 - [CI][Gandiva] Nightly gandiva-jar-xenial fails
  • ARROW-10186 - [Rust] Tests fail when following instructions in README
  • ARROW-10247 - [C++][Dataset] Support writing datasets partitioned on dictionary columns
  • ARROW-10264 - [Python] Fix failing hdfs test
  • ARROW-10270 - [R] Fix CSV timestamp_parsers test on R-devel
  • ARROW-10283 - [Python] Define PY_SSIZE_T_CLEAN to deal with Python deprecation warning
  • ARROW-10293 - [Rust][DataFusion] Fixed benchmarks
  • ARROW-10294 - [Java] Resolve problems of DecimalVector APIs on ArrowBufs
  • ARROW-10298 - [Rust] Incorrect offset handling in iterator over dictionary keys
  • ARROW-10321 - [C++] Use check_cxx_source_compiles for AVX512 detect in compiler
  • ARROW-10333 - [Java] Get rid of org.apache.arrow.util in vector
  • ARROW-10345 - [C++][Compute] Fix NaN handling in sorting and topn kernels
  • ARROW-10346 - [Python] Ensure tests aren't affected by user-supplied AWS config
  • ARROW-10348 - [C++] Fix crash on invalid Parquet data
  • ARROW-10350 - [Rust] Fixes to publication metadata in Cargo.toml
  • ARROW-10353 - [C++] Fix handling of compression in Parquet data pages v2
  • ARROW-10358 - [R] Followups to 2.0.0 release
  • ARROW-10365 - [R] Remove duplicate setting of S3 flag on macOS
  • ARROW-10369 - [Dev] Fix archery release utility test cases
  • ARROW-10371 - [R] Linux system requirements check needs to support older cmake versions
  • ARROW-10386 - [R] List column class attributes not preserved in roundtrip
  • ARROW-10388 - [Java] Fix Spark integration build failure
  • ARROW-10390 - [Rust][Parquet] Ensure it is possible to create custom parquet writers
  • ARROW-10393 - [Rust] Apply fix for null reading in json reader for nested
  • ARROW-10394 - [Rust][Large] BinaryArray creation
  • ARROW-10397 - [C++] Update comment to match change made in b1a7a73ff2
  • ARROW-10399 - [R] Fix performance regression from cpp11::r_string
  • ARROW-10411 - [C++] Fix incorrect child array lengths for Concatenate of FixedSizeList
  • ARROW-10412 - [C++] Improve grpc_cpp_plugin detection
  • ARROW-10413 - [Rust][Parquet] Unignore some tests that are passing now
  • ARROW-10414 - [R] open_dataset doesn't work with absolute/expanded paths on Windows
  • ARROW-10426 - [C++] Allow writing large strings to Parquet
  • ARROW-10433 - [Python] Swopped the conditions for checking for fsspec filesystems
  • ARROW-10434 - [Rust] Fix debug formatting for arrays with lengths between 10 and 20.
  • ARROW-10441 - [Java] Prevent closure of shared channels for FlightClient
  • ARROW-10446 - [C++][Python] Roundtrip Timestamp ns with TzInfo correctly
  • ARROW-10448 - [Rust] Remove PrimitiveArray::new that can cause UB
  • ARROW-10453 - [Rust] [DataFusion] Performance degredation after removing specialization
  • ARROW-10461 - [Rust] Fix offset bug in remainder bits
  • ARROW-10462 - [Python] Fix usage of fsspec in ParquetDataset causing path issue on Windows
  • ARROW-10463 - [R] Better messaging for currently unsupported CSV options in open_dataset
  • ARROW-10470 - [R] Fix missing file error causing NYC taxi example to fail
  • ARROW-10471 - [CI][Python] Ensure we have tests with s3fs and run those on CI
  • ARROW-10472 - [Python] Test to confirm casting timestamp scalars to date type works
  • ARROW-10475 - [C++][FlightRPC] handle IPv6 hosts
  • ARROW-10480 - [Python] don't infer compression by extension for Parquet
  • ARROW-10482 - [Python] Fix compression per column in Parquet writing
  • ARROW-10491 - [FlightRPC][Java] Fix NPE when using makeContext
  • ARROW-10493 - [C++][Parquet] Fix offset lost in MaybeReplaceValidity
  • ARROW-10495 - [Packaging][deb] Move FindRE2.cmake to libarrow-dev
  • ARROW-10496 - [R][CI] Fix conda-r job
  • ARROW-10499 - [C++][Java] Fix ORC Java JNI Crash
  • ARROW-10502 - [C++/Python] CUDA detection messes up nightly conda-win builds
  • ARROW-10503 - [C++] Uriparser will not compile using Intel compiler
  • ARROW-10508 - [Java] Allow FixedSizeListVector to have empty children
  • ARROW-10509 - [C++] Define operator<<(ostream, ParquetException) for clang+Windows
  • ARROW-10511 - [Python] Fix to_pandas() conversion in case of metadata mismatch about timezone
  • ARROW-10518 - [C++][Gandiva] Adding NativeFunction::kCanReturnErrors to cast function in gandiva
  • ARROW-10519 - [Python] Fix deadlock when importing pandas from several threads
  • ARROW-10525 - [C++] Fix crash on unsupported IPC stream
  • ARROW-10532 - [Python] Fix metadata in Table.from_pandas conversion with specified schema with different column order
  • ARROW-10545 - [C++] Fix crash on invalid Parquet file (OSS-Fuzz)
  • ARROW-10546 - [Python] Deprecate DaskFileSystem/S3FSWrapper + stop using it internally
  • ARROW-10547 - [Rust][DataFusion] Do not lose Filters with UserDefined plan nodes
  • ARROW-10551 - [Rust] Fix unreproducible benches by seeding random number generator
  • ARROW-10558 - [Python] Fix python S3 filesystem tests interdependence
  • ARROW-10560 - [Python] Fix crash when creating array from huge string
  • ARROW-10563 - [Packaging][deb][RPM] Add missing dev package dependencies
  • ARROW-10565 - [Python] Table.from_batches and Table.from_pandas have argument Schema_schema in documentation instead of schema
  • ARROW-10568 - [C++][Parquet] Avoid crashing when OutputStream::Tell fails
  • ARROW-10569 - [C++] Improve table filtering performance
  • ARROW-10577 - [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests
  • ARROW-10578 - [C++] Comparison kernels crashing for string array with null string scalar
  • ARROW-10610 - [C++] Updated vendored fast_float version to latest
  • ARROW-10616 - [Developer] Expand PR labeler to all supported languages
  • ARROW-10617 - [Python] Fix RecordBatchStreamReader iteration with Python 3.8
  • ARROW-10619 - [C++] Fix IPC validation regressions
  • ARROW-10620 - [Rust][Parquet] move column chunk range logic to metadata.rs
  • ARROW-10621 - [Java] Put required libraries into the common directory
  • ARROW-10622 - [R] Nameof should not use “void” as the crib
  • ARROW-10623 - [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0
  • ARROW-10624 - [R] Proactively remove “problems” attributes
  • ARROW-10627 - [Rust] Loosen cfg restrictions for wasm32
  • ARROW-10629 - [CI] Fix MinGW Github Actions jobs
  • ARROW-10631 - [Rust] Fixed error in computing equality of fixed-sized binary.
  • ARROW-10642 - [R] Can't get Table from RecordBatchReader with 0 batches
  • ARROW-10656 - [Rust] Allow schema validation to ignore field names and only check data types on new batch
  • ARROW-10656 - [Rust] Use DataType comparison without values
  • ARROW-10661 - [C#] Fix benchmarking project
  • ARROW-10662 - [Java] Avoid integer overflow for Json file reader
  • ARROW-10663 - [C++] Fix is_in and index_in behaviour
  • ARROW-10667 - [Rust][Parquet] Add a convenience type for writing Parquet to memory
  • ARROW-10668 - [R] Support for the .data pronoun
  • ARROW-10681 - [Rust] [DataFusion] TPC-H Query 12 fails with scheduler error
  • ARROW-10684 - [Rust] Inherit struct nulls in child null equality
  • ARROW-10690 - [Java] Fix ComplexCopier bug for list vector
  • ARROW-10692 - [Rust] Removed undefined behavior derived from null pointers
  • ARROW-10694 - [Python] ds.write_dataset() generates empty files for each final partition
  • ARROW-10699 - [C++] Fix BitmapUInt64Reader on big endian
  • ARROW-10701 - [Rust] Fix sort_limit_query_sql benchmark
  • ARROW-10705 - [Rust] Loosen restrictions on some lifetime annotations
  • ARROW-10710 - [Rust] Revert tokio upgrade, go back to 0.2
  • ARROW-10711 - [CI] Remove set-env from auto-tune to work with new GHA settings
  • ARROW-10719 - [C#] ArrowStreamWriter doesn't write schema metadata
  • ARROW-10746 - [C++] Bump gtest version + use GTEST_SKIP in tests
  • ARROW-10748 - [Java][JDBC] Support consuming timestamp data when time zone is not available
  • ARROW-10749 - [C++] Incorrect string format for Datum with the collection type
  • ARROW-10751 - [C++] Add RE2 to minimal build example
  • ARROW-10753 - [Rust][DataFusion] Fix parsing of negative numbers in DataFusion
  • ARROW-10757 - [Rust][CI] Fix CI failures
  • ARROW-10760 - [Rust][DataFusion] Fixed error in filter push down over joins
  • ARROW-10769 - [Rust][Rust] Use DataType comparison without values"
  • ARROW-10774 - [R] Set minimum cpp11 version
  • ARROW-10777 - [Packaging][Python] Build sdist by Crossbow
  • ARROW-10778 - [Python] Fix RowGroupInfo.statistics for empty row groups
  • ARROW-10779 - [Java] Fix writeNull method in UnionListWriter
  • ARROW-10780 - [R] Update known R installation issues for CentOS 7
  • ARROW-10791 - [Rust] StreamReader, read_dictionary duplicating schema info
  • ARROW-10801 - [Rust][Flight] Support sending FlightData for Dictionaries with that of a RecordBatch
  • ARROW-10803 - Support R >= 3.3 and add CI
  • ARROW-10804 - [Rust] Removed some unsafe code from the parquet crate
  • ARROW-10807 - [Rust][DataFusion] Avoid double hashing
  • ARROW-10810 - [Rust] Improve comparison kernels performance
  • ARROW-10811 - [R][CI] Remove nightly centos6 build
  • ARROW-10823 - [Rust] Fixed error in MutableArrayData
  • ARROW-10830 - [Rust] avoid hard crash in json reader
  • ARROW-10833 - [Python] Allow pyarrow to be compiled on NumPy <1.16.6 and work on 1.20+
  • ARROW-10834 - [R] Fix print method for SubTreeFileSystem
  • ARROW-10837 - [Rust][DataFusion] Use Vec<u8> for hash keys
  • ARROW-10840 - [C++] FileMetaData does not have key_value_metadata when built from FileMetaDataBuilder
  • ARROW-10842 - [Rust] decouple IO from json reader, fix crash during json schema inference with invalid json
  • ARROW-10844 - [Rust][DataFusion] Allow joins after a table registration
  • ARROW-10850 - [R] Unrecognized compression type: LZ4
  • ARROW-10852 - [C++] AssertTablesEqual(verbose=true) segfaults if the le…
  • ARROW-10854 - [Rust][DataFusion] Simplify logical plan scans
  • ARROW-10855 - [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1
  • ARROW-10856 - [R] CC and CXX environment variables passing to cmake
  • ARROW-10859 - [Rust][DataFusion] Made collect not require ExecutionContext
  • ARROW-10860 - [Java] Avoid integer overflow for generated classes in Vector
  • ARROW-10863 - [Python] Fix pandas skip in ExtensionArray.to_pandas test
  • ARROW-10863 - [Python] Fix ExtensionArray.to_pandas to use underlying storage array
  • ARROW-10875 - [Rust] simplify simd cfg check with cfg_aliases
  • ARROW-10876 - [Rust] validate row value type in json reader
  • ARROW-10897 - [Rust] Removed level of indirection.
  • ARROW-10907 - [Rust] Fix Cast UTF8 to Date64
  • ARROW-10913 - [Python][Doc] Code block typo in filesystems docs
  • ARROW-10914 - [Rust] Refactor simd arithmetic kernels to use chunked iteration
  • ARROW-10915 - [Rust] README.md: set the Env vars as absolute dirs; several minor fixes.
  • ARROW-10921 - `TypeError: ‘coroutine’ object is not iterable` when reading parquet partitions via s3fs >= 0.5 with pyarrow
  • ARROW-10930 - [Python] Add value_field property to LargeListType / FixedSizeListType
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10942 - [C++] Fix S3FileSystem::Impl::IsEmptyDirectory on Amazon
  • ARROW-10943 - [Rust][Parquet] Always init new RleDecoder
  • ARROW-10954 - [C++][Doc] PlasmaClient is threadSafe now
  • ARROW-10955 - [C++] Fix JSON reading of list(null) values
  • ARROW-10960 - [C++][FlightRPC] Default to empty buffer instead of null
  • ARROW-10962 - [FlightRPC][Java] fill in empty body buffer if needed
  • ARROW-10967 - [Rust] Add functions for test data to mod arrow::util::test_util
  • ARROW-10990 - [Rust] Refactor simd comparison kernels to avoid out of bounds reads
  • ARROW-10994 - [Rust][DataFusion] Add support for compression when writing Parquet files
  • ARROW-10996 - [Rust][Parquet] change return value type of get_arrow_schema_from_metadata()
  • ARROW-10999 - [Rust][Benchmarks] Use signed ints for TPC-H schema
  • ARROW-11014 - [Rust][DataFusion] Use correct statistics for ParquetExec
  • ARROW-11023 - [C++][CMake] Fix gRPC build issue
  • ARROW-11024 - [Python] Add test for List data Parquet roundtrip
  • ARROW-11025 - [Rust] Fixed bench for binary boolean kernels
  • ARROW-11030 - [Rust][DataFusion] Concatenate left side batches to single batch in HashJoinExec
  • ARROW-11048 - [Rust] Add bench to MutableBuffer
  • ARROW-11050 - [R] Handle RecordBatch in write_parquet()
  • ARROW-11067 - [C++] Fix CSV null detection on large values
  • ARROW-11069 - [C++] Parquet writer incorrect data being written when data type is struct
  • ARROW-11073 - [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs
  • ARROW-11083 - [CI] Ensure using Ubuntu 20.04 for dev.yml:release job
  • ARROW-11084 - [Rust] Fixed clippy
  • ARROW-11085 - [Rust] Migrated from action-rs to shell in github actions.
  • ARROW-11092 - [CI] (Temporarily) move offending workflows to separate files
  • ARROW-11102 - [Rust][DataFusion] fmt::Debug for ScalarValue(Utf8) is always quoted
  • ARROW-11113 - [Rust] support as_struct_array cast
  • ARROW-11114 - [Java] Fix Schema and Field metadata JSON serialization
  • ARROW-11132 - [CI] Use pip to install crossbow's dependencies for the comment bot
  • ARROW-11144 - [CI][C++][Python] Move to newer Hadoop version
  • ARROW-11152 - [CI][C++] Fix Homebrew numpy installation on macOS builds
  • ARROW-11162 - [C++][Parquet] Fix invalid cast on Decimal256 Parquet data
  • ARROW-11163 - [C++] Fix reading of compressed IPC/Feather files written with Arrow 0.17
  • ARROW-11166 - [Python] Add binding for ProjectOptions
  • ARROW-11171 - [Go] Fix building on s390x with noasm
  • ARROW-11189 - [Developer] support benchmark diff between JSONs
  • ARROW-11190 - [C++] Clean up compiler warnings
  • ARROW-11202 - [R][CI] Nightly builds not happening (or artifacts not exported)
  • ARROW-11224 - [R] don't test metadata serialization on old R versions
  • ARROW-11226 - [Python] Skip/workaround failing filesystem test with s3fs 0.5
  • ARROW-11227 - [Python] Fix to_pandas with ExtensionArray tests for pandas 0.24
  • ARROW-11229 - [C++][Dataset] Fix static build failure
  • ARROW-11230 - [R] Fix build failures on Windows when multiple libarrow binaries found
  • ARROW-11232 - [C++] Make Table::CombineChunks() handle table with zero column correctly
  • ARROW-11233 - [C++][Flight] Fix link error with bundled gRPC and Abseil
  • ARROW-11237 - [C++] Restore DCHECK definitions after GLog
  • ARROW-11250 - [Python] Inconsistent behavior calling ds.dataset()
  • ARROW-11251 - [CI] Make sure that devtoolset-8 is really installed + being used
  • ARROW-11253 - [R] : Make sure that large metadata tests are reproducible
  • ARROW-11255 - [Packaging][Conda][macOS] Fix Python version
  • ARROW-11257 - [C++][Parquet] PyArrow Table contains different data after writing and reloading from Parquet
  • ARROW-11271 - [Rust][Parquet] Fix parquet list schema null conversion
  • ARROW-11274 - [Packaging][wheel][Windows] Fix wheels path for Gemfury
  • ARROW-11275 - [Packaging][wheel][Linux] Fix paths for Gemfury
  • ARROW-11283 - [Julia] Update Julia install link for 3.0 release
  • ARROW-11286 - [Release][Yum] Fix minimal build example check
  • ARROW-11287 - [Packaging][RPM] Add missing dependencies
  • ARROW-11301 - [C++] Fix reading Parquet LZ4-compressed files produced by Hadoop
  • ARROW-11302 - [Release][Python] Remove verification of python 3.5 wheel on macOS
  • ARROW-11306 - [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
  • ARROW-11363 - C++ Library Build Failure with gRPC 1.34+
  • ARROW-11390 - [Python] pyarrow 3.0 issues with turbodbc
  • ARROW-11445 - Type conversion failure on numpy 0.1.20
  • ARROW-11450 - [Python] pyarrow<3 incompatible with numpy>=1.20.0
  • ARROW-11487 - [Python] Can't create array from Categorical with numpy 1.20
  • ARROW-11835 - [Python] PyArrow 3.0/Pip installation errors on Big Sur.
  • ARROW-12399 - Unable to load libhdfs
  • PARQUET-1935 - [C++] Fix bug in WriteBatchSpaced

Apache Arrow 2.0.0 (2020-10-19)

Bug Fixes

  • ARROW-2367 - [Python] ListArray has trouble with sizes greater than kMaximumCapacity
  • ARROW-4189 - [Rust] Added coverage report.
  • ARROW-4917 - [C++] orc_ep fails in cpp-alpine docker
  • ARROW-5578 - [C++][Flight] Flight does not build out of the box on Alpine Linux
  • ARROW-7226 - [Python][Doc] Add note re: JSON format support
  • ARROW-7384 - [Website] Fix search indexing warning reported by Google
  • ARROW-7517 - [C++] Builder does not honour dictionary type provided during initialization
  • ARROW-7663 - [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table
  • ARROW-7903 - [Rust][DataFusion] Migrated to sqlparser 0.6.1
  • ARROW-7957 - [Python] Handle new FileSystem in ParquetDataset by automatically using new implementation
  • ARROW-8265 - [Rust] [DataFusion] Table API collect() should not require context
  • ARROW-8394 - [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+
  • ARROW-8735 - [Rust][Parquet] Allow arm 32 to use soft hash implementation
  • ARROW-8749 - [C++] IpcFormatWriter writes dictionary batches with wrong ID
  • ARROW-8773 - [Python] Preserve nullability of fields in schema.empty_table()
  • ARROW-9028 - [R] Should be able to convert an empty table
  • ARROW-9096 - [Python] Pandas roundtrip with dtype=“object” underlying numeric column index
  • ARROW-9177 - [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
  • ARROW-9414 - [Packaging][deb][RPM] Enable S3
  • ARROW-9462 - [Go] The Indentation after the first Record in arrjson writer is incorrect
  • ARROW-9463 - [Go] Make arrjson Writer close idempotent
  • ARROW-9490 - [Python][C++] Bug in pa.array when input mixes int8 with float
  • ARROW-9495 - [C++] Equality assertions don't handle Inf / -Inf properly
  • ARROW-9520 - [Rust][DataFusion] Add support for aliased aggregate exprs
  • ARROW-9528 - [Python] Honor tzinfo when converting from datetime
  • ARROW-9532 - [Python][Doc] Use Python3_EXECUTABLE instead of PYTHON_EXECUTABLE for finding Python executable
  • ARROW-9535 - [Python] Remove symlink fixes from conda recipe
  • ARROW-9536 - [Java] Miss parameters in PlasmaOutOfMemoryException.java
  • ARROW-9541 - [C++] CMakeLists requires UTF8PROC_STATIC when building static library
  • ARROW-9544 - [R] Fix version argument of write_parquet()
  • ARROW-9546 - [Python] Clean up Pandas Metadata Conversion test
  • ARROW-9548 - [Go] Test output files are not removed correctly
  • ARROW-9549 - [Rust] Fixed version in dependency in parquet.
  • ARROW-9554 - [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result
  • ARROW-9556 - [Python][C++] Segfaults in UnionArray with null values
  • ARROW-9560 - [Packaging] Add required conda-forge.yml
  • ARROW-9569 - [CI][R] Fix rtools35 builds for msys2 key change
  • ARROW-9570 - [Doc] Clean up sphinx sidebar
  • ARROW-9573 - [Python][Dataset] Provide read_table(ignore_prefixes=)
  • ARROW-9574 - [R] Cleanups for CRAN 1.0.0 release
  • ARROW-9575 - [R] gcc-UBSAN failure on CRAN
  • ARROW-9577 - [C++] Ignore EBADF error in posix_madvise()
  • ARROW-9583 - [Rust] Fix offsets in result of arithmetic kernels
  • ARROW-9588 - [C++] Partially support building with clang in an MSVC setting
  • ARROW-9589 - [C++/R] Forward declare structs as structs
  • ARROW-9592 - [CI] Update homebrew before calling brew bundle
  • ARROW-9596 - [CI][Crossbow] Fix homebrew-cpp again, again
  • ARROW-9597 - [C++] AddAlias in compute::FunctionRegistry should be synchronized
  • ARROW-9598 - [C++][Parquet] Fix writing nullable structs
  • ARROW-9599 - [CI] Appveyor toolchain build fails because CMake detects different C and C++ compilers
  • ARROW-9600 - [Rust] pin proc macro
  • ARROW-9600 - [Rust][Arrow] pin older version of proc-macro2 during build
  • ARROW-9602 - [R] Improve cmake detection in Linux build
  • ARROW-9603 - [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs
  • ARROW-9606 - [C++][Dataset] Support "a"_.In(<>).Assume(<compound>)
  • ARROW-9609 - [C++][Dataset] CsvFileFormat reads all virtual columns as null
  • ARROW-9621 - [Python] Skip test_move_file for in-memory fsspec filesystem
  • ARROW-9622 - [Java] Fixed UnsupportedOperationException in complexcopier with null value in unionvector inside st…
  • ARROW-9628 - [Rust] Disable artifact caching for Mac OSX builds
  • ARROW-9629 - [Python] Fix kartothek integration tests by fixing dependencies
  • ARROW-9631 - [Rust] Make arrow not depend on flight
  • ARROW-9631 - [Rust] flight should depend on arrow, not the other way around
  • ARROW-9642 - [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices
  • ARROW-9643 - [C++] Only register the SIMD variants when it's supported.
  • ARROW-9644 - [C++][Dataset] Don't apply ignore_prefixes to partition base_dir
  • ARROW-9652 - [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs
  • ARROW-9653 - [Rust][DataFusion] Do not error in planner with SQL has multiple group by expressions
  • ARROW-9659 - [C++] Fix RecordBatchStreamReader when source is CudaBufferReader
  • ARROW-9660 - [C++] Revamp dictionary association in IPC
  • ARROW-9666 - [Python][wheel][Windows] Fix wheel build for Windows
  • ARROW-9670 - [C++][FlightRPC] don't hang if Close and Read called simultaneously
  • ARROW-9676 - [R] Error converting Table with nested structs
  • ARROW-9684 - [C++] Fix undefined behaviour on invalid IPC / Parquet input
  • ARROW-9692 - [Python] Fix distutils-related warning
  • ARROW-9693 - [CI][Docs] Nightly docs build fails
  • ARROW-9696 - [Rust][DataFusion] fix nested binary expressions
  • ARROW-9698 - [C++] Remove -DNDEBUG flag leak in .pc file
  • ARROW-9700 - [Python] fix create_library_symlinks for macos
  • ARROW-9712 - [Rust][DataFusion] Fix parquet error handling and general code improvements
  • ARROW-9714 - [Rust][DataFusion] Implement type coercion rule for limit and sort
  • ARROW-9716 - [Rust][DataFusion] Implement limit on concurrent threads in MergeExec
  • ARROW-9726 - [Rust][DataFusion] Do not create parquet reader thread until execute is called
  • ARROW-9727 - [C++] Fix crashes on invalid IPC input (OSS-Fuzz)
  • ARROW-9729 - [Java] Disable Error Prone when project is imported into …
  • ARROW-9733 - [Rust][DataFusion] Added support for COUNT/MIN/MAX on string columns
  • ARROW-9734 - [Rust][DataFusion] TableProvider.scan now returns partitions instead of iterators
  • ARROW-9741 - [Rust] [DataFusion] Incorrect count in TPC-H query 1 result set
  • ARROW-9743 - [R] Sanitize paths in open_dataset
  • ARROW-9744 - [Python] Fix build failure on aarch64
  • ARROW-9764 - [CI][Java] Fix wrong image name for push
  • ARROW-9768 - [Python] Check overflow in conversion of datetime objects to nanosecond timestamps
  • ARROW-9768 - [Rust][DataFusion] Rename PhysicalPlannerImpl to DefaultPhysicalPlanner
  • ARROW-9778 - [Rust][DataFusion] Implement Expr.nullable() and make consistent between logical and physical plans
  • ARROW-9783 - [Rust][DataFusion] Remove aggregate expression data type
  • ARROW-9785 - [Python] Fix excessively slow S3 options test
  • ARROW-9789 - [C++] Don't install jemalloc in parallel
  • ARROW-9790 - [Rust][Parquet] : Increase test coverage in arrow_reader.rs
  • ARROW-9790 - [Rust][Parquet] Fix PrimitiveArrayReader boundary conditions
  • ARROW-9793 - [Rust][DataFusion] Fixed unit tests
  • ARROW-9797 - [Rust] AMD64 Conda Integration Tests is failing for the Master branch
  • ARROW-9799 - [Rust] [DataFusion] Implementation of physical binary expression get_type method is incorrect
  • ARROW-9800 - [Rust][Parquet] Remove println! when writing column statistics
  • ARROW-9801 - DictionaryArray with non-unique values are silently corrupted when written to a Parquet file
  • ARROW-9809 - [Rust][DataFusion] Fixed type coercion, supertypes and type checking.
  • ARROW-9814 - [Python] Fix crash in test_parquet::test_read_partitioned_directory_s3fs
  • ARROW-9815 - [Rust][DataFusion] Remove the use of Arc/Mutex to protect plan time structures
  • ARROW-9815 - [Rust][DataFusion] Add a trait for looking up scalar functions by name
  • ARROW-9815 - [Rust][DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.
  • ARROW-9816 - [C++] Escape quotes in config.h
  • ARROW-9827 - [C++][Dataset] Skip parsing RowGroup metadata statistics when there is no filter
  • ARROW-9831 - [Rust][DataFusion] Fixed compilation error
  • ARROW-9840 - [Python] fs documentation out of date with code (FileStats -> FileInfo)
  • ARROW-9846 - [Rust] Master branch broken build
  • ARROW-9851 - [C++] Disable AVX512 runtime paths with Valgrind
  • ARROW-9852 - [C++] Add more IPC fuzz regression files
  • ARROW-9852 - [C++] Validate dictionaries fully when combining deltas
  • ARROW-9855 - [R] Fix bad merge/Rcpp conflict
  • ARROW-9859 - [C++] Decode username and password in URIs
  • ARROW-9864 - [Python] Support pathlib.path in pq.write_to_dataset
  • ARROW-9874 - [C++] Add sink-owning version of IPC writers
  • ARROW-9876 - [C++] Faster ARM build on Travis-CI
  • ARROW-9877 - [C++] Fix homebrew-cpp build fail on AVX512
  • ARROW-9879 - [Python] Add support for numpy scalars to ChunkedArray.getitem
  • ARROW-9882 - [C++/Python] Update OSX build to conda-forge-ci-setup=3
  • ARROW-9883 - [R] Fix linuxlibs.R install script for R < 3.6
  • ARROW-9888 - [Rust][DataFusion] Allow ExecutionContext to be shared between threads (again)
  • ARROW-9889 - [Rust][DataFusion] Implement physical plan for EmptyRelation
  • ARROW-9906 - [C++] Keep S3 filesystem alive through open file objects
  • ARROW-9913 - [C++] Make outputs of Decimal128::FromString independent of the presence of one another.
  • ARROW-9920 - [Python] Validate input to pa.concat_arrays() to avoid segfault
  • ARROW-9922 - [Rust] Add StructArray::TryFrom (+40%)
  • ARROW-9924 - [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans
  • ARROW-9931 - [C++] Fix undefined behaviour on invalid IPC input
  • ARROW-9932 - [R] Arrow 1.0.1 R package fails to install on R3.4 over linux
  • ARROW-9936 - [Python] Fix / test relative file paths in pyarrow.parquet
  • ARROW-9937 - [Rust][DataFusion] Improved aggregations
  • ARROW-9943 - [C++] Recursively apply Arrow metadata when reading from Parquet
  • ARROW-9946 - [R] Check sink argument class in ParquetFileWriter
  • ARROW-9953 - [R] Declare minimum version for bit64
  • ARROW-9962 - [Python] Fix conversion to_pandas with tz-aware index column and fixed offset timezones
  • ARROW-9968 - [C++] Fix UBSAN build
  • ARROW-9969 - [C++] Fix RecordBatchBuilder with dictionary types
  • ARROW-9970 - [Go] fix checkptr failure in sum methods
  • ARROW-9972 - [CI] Work around grpc-re2 clash on Homebrew
  • ARROW-9973 - [Java] JDBC DateConsumer does not allow dates before epoch
  • ARROW-9976 - [Python] ArrowCapacityError when doing Table.from_pandas with large dataframe
  • ARROW-9990 - [Rust][DataFusion] Fixed the NOT operator
  • ARROW-9993 - [Python] Tzinfo - string roundtrip fails on pytz.StaticTzInfo objects
  • ARROW-9994 - [C++][Python] Auto chunking nested array containing binary-like fields result malformed output
  • ARROW-9996 - [C++] Dictionary is unset when calling DictionaryArray.GetScalar for null values
  • ARROW-10003 - [C++] Create parent dir for any destination fs in CopyFiles
  • ARROW-10008 - [C++][Dataset] Fix filtering/row group statistics of dict columns
  • ARROW-10011 - [C++] Make FindRE2.cmake re-entrant
  • ARROW-10012 - [C++] Make MockFileSystem thread-safe
  • ARROW-10013 - [FlightRPC][C++] fix setting generic client options
  • ARROW-10017 - [Java] Fix LargeMemoryUtil long conversion
  • ARROW-10022 - [C++] Fix divide by zero and overflow error for scalar arithmetic benchmark
  • ARROW-10027 - [C++] Fix Take array kernel for NullType
  • ARROW-10034 - [Rust] Fix Rust build on master
  • ARROW-10041 - [Rust] Added check of data type to GenericString::from.
  • ARROW-10047 - [CI] Conda integration tests failing with cmake error
  • ARROW-10048 - [Rust] Fixed error in computing min/max with null entries.
  • ARROW-10049 - [C++/Python] Sync conda recipe with conda-forge
  • ARROW-10060 - [Rust][DataFusion] Fixed error on which Err were discarded in MergeExec.
  • ARROW-10062 - [Rust] Fix for null elems at key position in dictionary arrays
  • ARROW-10073 - [Python] Don't rely on dict item order in test_parquet_nested_storage
  • ARROW-10081 - [C++/Python] Fix bash syntax in drone.io conda builds
  • ARROW-10085 - [C++] Fix S3 region resolution on Windows
  • ARROW-10087 - [CI] Fix nightly docs job
  • ARROW-10098 - [R][Doc] Fix copy_files doc mismatch
  • ARROW-10104 - [Python] Separate tests into its own conda package
  • ARROW-10114 - [R] Segfault in to_dataframe_parallel with deeply nested structs
  • ARROW-10116 - [Python][Packaging] Fix gRPC linking error in macOS wheels builds
  • ARROW-10119 - [C++] Fix Parquet crashes on invalid input
  • ARROW-10121 - [C++] Fix emission of new dictionaries in IPC writer
  • ARROW-10124 - [C++] Don't restrict permissions when creating files
  • ARROW-10125 - [R] Int64 downcast check doesn't consider all chunks
  • ARROW-10130 - [C++][Dataset] Ensure ParquetFileFragment::SplitByRowGroup preserves the ‘has_complete_metadata’ status
  • ARROW-10136 - [Rust] : Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec
  • ARROW-10137 - [C++][R] Move nameof.h into R subproject
  • ARROW-10147 - [Python] Pandas metadata fails if index name not JSON-serializable
  • ARROW-10150 - [C++] Fix crashes on invalid Parquet file
  • ARROW-10169 - [Rust] Pretty print null PrimitiveTypes as empty strings
  • ARROW-10175 - [CI] Fix nightly HDFS integration tests (ensure to use legacy dataset)
  • ARROW-10176 - [C++] Avoid using unformattable types for test parameters
  • ARROW-10178 - [CI] Remove patch to fix Spark master build
  • ARROW-10179 - [Rust] Fixed error in labeler
  • ARROW-10181 - [Rust] Skip compiling one test on 32 bit ARM architecture
  • ARROW-10188 - [Rust][DataFusion] Fixed DataFusion examples.
  • ARROW-10189 - [Doc] Fixed typo in C-Data interface example
  • ARROW-10192 - [Python] Always decode inner dictionaries when converting array to Pandas
  • ARROW-10193 - [Python] Segfault when converting to fixed size binary array
  • ARROW-10200 - [CI][Java] Fix a job failure for s390x Java on TravisCI
  • ARROW-10204 - [Rust] Filter kernel should only count bits in valid range
  • ARROW-10214 - [Python] Allow printing undecodable schema metadata
  • ARROW-10226 - [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file
  • ARROW-10230 - [JS][Doc] JavaScript documentation fails to build
  • ARROW-10232 - FixedSizeListArray is incorrectly written/read to/from parquet
  • ARROW-10234 - [C++][Gandiva] Fix logic of round() for floats/decimals in Gandiva
  • ARROW-10237 - [C++] Duplicate dict values cause corrupt parquet
  • ARROW-10238 - [C#] List is broken
  • ARROW-10239 - [C++] Add missing zlib dependency to aws-sdk-cpp
  • ARROW-10244 - [Python] Document pyarrow.dataset.parquet_dataset
  • ARROW-10248 - [Python][Dataset] Always apply Python's default write properties
  • ARROW-10262 - [C++] Fix TypeClass for BinaryScalar and LargeBinaryScalar
  • ARROW-10271 - [Rust] Update dependencies
  • ARROW-10279 - [Release][Python] Fix verification script to align with the new macos wheel platform tags
  • ARROW-10280 - [Packaging][Python] Fix macOS wheel artifact patterns
  • ARROW-10281 - [Python] Fix warnings when running tests
  • ARROW-10284 - [Python] Correctly suppress warning about legacy filesystem on import
  • ARROW-10285 - [Python] Fix usage of deprecated num_children in pyarrow.orc submodule
  • ARROW-10286 - [C++][FlightRPC] Make CMake output less confusing
  • ARROW-10288 - [C++] Fix compilation errors on 32-bit x86
  • ARROW-10290 - [C++] List POP_BACK is not available in older CMake versions
  • ARROW-10296 - [R] Data saved as integer64 loaded as integer
  • ARROW-10517 - [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob
  • ARROW-11062 - [Java] When writing to flight stream, Spark's mapPartitions is not working

New Features and Improvements

  • ARROW-983 - [C++] Implement InputStream and OutputStream classes for interacting with socket connections
  • ARROW-1509 - [Python] Write serialized object as a stream of encapsulated IPC messages
  • ARROW-1644 - [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels
  • ARROW-1669 - [C++] Consider adding Abseil (Google C++11 standard library extensions) to toolchain
  • ARROW-1797 - [C++] Implement binary arithmetic kernels for numeric arrays
  • ARROW-2164 - [C++] Clean up unnecessary decimal module refs
  • ARROW-3080 - [Python] Unify Arrow to Python object conversion paths
  • ARROW-3757 - [R] R bindings for Flight RPC client
  • ARROW-3850 - [Python] Support MapType and StructType for enhanced PySpark integration
  • ARROW-3872 - [R] Add ad hoc test of feather compatibility
  • ARROW-4046 - [Python/CI] Exercise large memory tests
  • ARROW-4248 - [C++][Plasma] Build on Windows / Visual Studio
  • ARROW-4685 - [C++] Update Boost to 1.69 in manylinux1 docker image
  • ARROW-4927 - [Rust] Update top level README to describe current functionality
  • ARROW-4957 - [Rust] [DataFusion] Implement get_supertype correctly
  • ARROW-4965 - [Python] Timestamp array type detection should use tzname of datetime.datetime objects
  • ARROW-5034 - [C#] ArrowStreamWriter and ArrowFileWriter implement sync WriteRecordBatch
  • ARROW-5123 - [Rust] Parquet derive for simple structs
  • ARROW-6075 - [FlightRPC] Handle uncaught exceptions in middleware
  • ARROW-6281 - [Python] Produce chunked arrays for nested types in pyarrow.array
  • ARROW-6282 - [Format] Support lossy compression
  • ARROW-6437 - [R] Add AWS SDK to system dependencies for macOS and Windows
  • ARROW-6535 - [C++] Status::WithMessage should accept variadic parameters
  • ARROW-6537 - [R] : Pass column_types to CSV reader
  • ARROW-6972 - [C#] Support for StructArrays
  • ARROW-6982 - [R] Add bindings for compare and boolean kernels
  • ARROW-7136 - [Rust] Added caching to the docker image
  • ARROW-7218 - [Python] Conversion from boolean numpy scalars not working
  • ARROW-7302 - [C++] CSV: allow dictionary types in explicit column types
  • ARROW-7372 - [C++] Allow creating dictionary array from simple JSON
  • ARROW-7871 - [Python] Expose more compute kernels
  • ARROW-7960 - [C++] Add support fo reading additional types
  • ARROW-8001 - [R][Dataset] Bindings for dataset writing
  • ARROW-8002 - [C++][Dataset][R] Support partitioned dataset writing
  • ARROW-8048 - [Python] Run memory leak tests nightly as follow up to ARROW-4120
  • ARROW-8172 - [C++] ArrayFromJSON for dictionary arrays
  • ARROW-8205 - [Rust][DataFusion] Added check to uniqueness of column names.
  • ARROW-8253 - [Rust] [DataFusion] Improve ergonomics of registering UDFs
  • ARROW-8262 - [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
  • ARROW-8296 - [C++][Dataset] Add IpcFileWriteOptions
  • ARROW-8355 - [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py
  • ARROW-8359 - [C++/Python] Enable linux-aarch64 builds
  • ARROW-8383 - [Rust] Allow easier access to keys array of a dictionary array
  • ARROW-8402 - [Java] Support ValidateFull methods in Java
  • ARROW-8493 - [C++][Parquet] Start populating repeated ancestor defintion
  • ARROW-8494 - [C++][Parquet] Full support for reading mixed list and structs
  • ARROW-8581 - [C#] Accept and return DateTime from DateXXArray
  • ARROW-8601 - [Go][FOLLOWUP] Fix RAT violations related to Flight in Go
  • ARROW-8601 - [Go][Flight] Implementations Flight RPC server and client
  • ARROW-8618 - [C++] Clean up some redundant std::move()s
  • ARROW-8678 - [C++/Python][Parquet] Remove old writer code path
  • ARROW-8712 - [R] Expose strptime timestamp parsing in read_csv conversion options
  • ARROW-8774 - [Rust] [DataFusion] Improve threading model
  • ARROW-8810 - [R] Add documentation about Parquet format, appending to stream format
  • ARROW-8824 - [Rust] [DataFusion] Implement new SQL parser
  • ARROW-8828 - [Rust] Implement SQL tokenizer
  • ARROW-8829 - [Rust] Implement SQL parser
  • ARROW-9010 - [Java] Framework and interface changes for RecordBatch IPC buffer compression
  • ARROW-9065 - [C++] Support parsing date32 in dataset partition folders
  • ARROW-9068 - [C++][Dataset] Simplify partitioning interface
  • ARROW-9078 - [C++] Parquet read / write extension type with nested storage type
  • ARROW-9104 - [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory
  • ARROW-9107 - [C++][Dataset] Support temporal partitioning fields
  • ARROW-9147 - [C++][Dataset] Support projection from null->any type
  • ARROW-9205 - [Documentation] Fix typos
  • ARROW-9266 - [Python][Packaging] Enable S3 support in macOS wheels
  • ARROW-9271 - [R] Preserve data frame metadata in round trip
  • ARROW-9286 - [C++] Add function “aliases” to compute::FunctionRegistry
  • ARROW-9328 - [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string
  • ARROW-9338 - [Rust] Add clippy instructions
  • ARROW-9344 - [C++][Flight] Measure latency quantiles
  • ARROW-9358 - [Integration] remove generated_large_batch.json
  • ARROW-9371 - [Java] Run vector tests for both allocators
  • ARROW-9377 - [Java] Support unsigned dictionary indices
  • ARROW-9387 - [R] Use new C++ table select method
  • ARROW-9388 - [C++] Division kernels
  • ARROW-9394 - [Python] Support pickling of Scalars
  • ARROW-9398 - [C++] Register SIMD sum variants to function instance.
  • ARROW-9402 - [C++] Rework portable wrappers for checked integer arithmetic
  • ARROW-9405 - [R] Switch to cpp11
  • ARROW-9412 - [C++] Add non-bundled dependencies to INTERFACE_LINK_LIBRARIES of static libarrow
  • ARROW-9429 - [Python] ChunkedArray.to_numpy
  • ARROW-9454 - [GLib] Add binding of some dictionary builders
  • ARROW-9465 - [Python] Improve ergonomics of compute module
  • ARROW-9469 - [Python] Make more objects weakrefable
  • ARROW-9487 - [Developer] Cover the archery release utilities with unittests
  • ARROW-9488 - [Release] Use the new changelog generation when updating the website
  • ARROW-9507 - [Rust][DataFusion] Implement Display for PhysicalExpr
  • ARROW-9508 - [Release][APT][Yum] Enable verification for arm64 binaries
  • ARROW-9516 - [Rust][DataFusion] refactor of column names
  • ARROW-9517 - [C++/Python] Add support for temporary credentials to S3Options
  • ARROW-9518 - [Python] Deprecate pyarrow serialization
  • ARROW-9521 - [Rust][DataFusion] Handle custom CSV file extensions
  • ARROW-9523 - [Rust] Improve filter kernel performance
  • ARROW-9534 - [Rust][DataFusion] Added support for lit to all supported rust types.
  • ARROW-9550 - [Rust] [DataFusion] Remove Rc<RefCell<_>> from hash aggregate operator
  • ARROW-9553 - [Rust] Release script doesn‘t bump parquet crate’s arrow dependency version
  • ARROW-9557 - [R] Iterating over parquet columns is slow in R
  • ARROW-9559 - [Rust][DataFusion] Made function public
  • ARROW-9563 - [Dev][Release] Use archery's changelog generator when creating release notes for the website
  • ARROW-9568 - [CI][C++] Use msys2/setup-msys2
  • ARROW-9576 - [Python][Doc] Fix error in example code for extension types
  • ARROW-9580 - [JS][Doc] Fix syntax error in example code
  • ARROW-9581 - [Dev][Release] Bump next snapshot versions to 2.0.0
  • ARROW-9582 - [Rust] Implement memory size methods
  • ARROW-9585 - [Rust][DataFusion] Remove duplicated to-do line
  • ARROW-9587 - [FlightRPC][Java] clean up FlightStream/DoPut
  • ARROW-9593 - [Python] Add custom pickle reducers for DictionaryScalar
  • ARROW-9604 - [C++] Add aggregate min/max benchmark
  • ARROW-9605 - [C++] Speed up aggregate min/max compute kernels on integer types
  • ARROW-9607 - [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers
  • ARROW-9608 - [Rust] Remove arrow flight from parquet's feature gating
  • ARROW-9615 - [Rust] Added kernel to compute length of a string.
  • ARROW-9617 - [Rust][DataFusion] Add length of string array
  • ARROW-9618 - [Rust][DataFusion] Made it easier to write optimizers
  • ARROW-9619 - [Rust][DataFusion] Add predicate push-down
  • ARROW-9632 - [Rust] add a func “new” for ExecutionContextSchemaProvider
  • ARROW-9638 - [C++][Compute] Implement mode kernel
  • ARROW-9639 - [Ruby] Add dependency version check
  • ARROW-9640 - [C++][Gandiva] Implement round() for integers and long integers
  • ARROW-9641 - [C++][Gandiva] Implement round() for floating point and double floating point numbers
  • ARROW-9645 - [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs
  • ARROW-9646 - [C++][Dataset] Support writing with ParquetFileFormat
  • ARROW-9650 - [Packaging][APT] Drop support for Ubuntu 19.10
  • ARROW-9654 - [Rust][DataFusion] Add EXPLAIN <SQL> statement
  • ARROW-9656 - [Rust][DataFusion] Better error messages for unsupported EXTERNAL TABLE types
  • ARROW-9658 - [Python] Python bindings for dataset writing
  • ARROW-9665 - [R] head/tail/take for Datasets
  • ARROW-9667 - [CI][Crossbow] Segfault in 2 nightly R builds
  • ARROW-9671 - [C++] Fix a bug in BasicDecimal128 constructor that interprets uint64_t integers with highest bit set as negative.
  • ARROW-9673 - [Rust][DataFusion] Add a param “dialect” for DFParser::parse_sql
  • ARROW-9678 - [Rust][DataFusion] Improve projection push down to remove unused columns
  • ARROW-9679 - [Rust][DataFusion] More efficient creation of final batch from HashAggregateExec
  • ARROW-9681 - [Java] Fix test failures of Arrow Memory - Core on big-endian platform
  • ARROW-9683 - [Rust][DataFusion] Add debug printing to physical plans and associated types
  • ARROW-9691 - [Rust][DataFusion] Make sql_statement_to_plan method public
  • ARROW-9695 - [Rust] Improve comments on LogicalPlan enum variants
  • ARROW-9699 - [C++][Compute] Optimize mode kernel for small integer types
  • ARROW-9701 - [CI][Java] Add a job for s390x Java on TravisCI
  • ARROW-9702 - [C++] Register bpacking SIMD to runtime path.
  • ARROW-9703 - [Developer][Archery] Restartable cherry-picking process for creating maintenance branches
  • ARROW-9706 - [Java] Tests of TestLargeListVector correctly read offset
  • ARROW-9710 - [C++] Improve performance of Decimal128::ToString by 10x, and make the implementation reusable for Decimal256.
  • ARROW-9711 - [Rust] Add new benchmark derived from TPC-H
  • ARROW-9713 - [Rust][DataFusion] Remove explicit panics
  • ARROW-9715 - [R] changelog/doc updates for 1.0.1
  • ARROW-9718 - [Python] ParquetWriter to work with new FileSystem API
  • ARROW-9721 - [Packaging][Python] Update wheel dependency files
  • ARROW-9722 - [Rust] Shorten key lifetime for dict lookup key
  • ARROW-9723 - [C++][Compute] Count NaN in mode kernel
  • ARROW-9725 - [Rust][DataFusion] SortExec and LimitExec re-use MergeExec
  • ARROW-9737 - [C++][Gandiva] Add bitwise_xor() for integers
  • ARROW-9739 - [CI][Ruby] Don't install gem documents
  • ARROW-9742 - [Rust][DataFusion] Improved DataFrame trait (formerly known as the Table trait)
  • ARROW-9751 - [Rust][DataFusion] Allow UDFs to accept multiple data types per argument
  • ARROW-9752 - [Rust][DataFusion] Add support for User-Defined Aggregate Functions.
  • ARROW-9753 - [Rust][DataFusion] Replaced Arc<Mutex<>> by Box<>
  • ARROW-9754 - [Rust][DataFusion] Implement async in ExecutionPlan trait
  • ARROW-9757 - [Rust][DataFusion] Add prelude.rs
  • ARROW-9758 - [Rust][DataFusion] Allow physical planner to be replaced
  • ARROW-9759 - [Rust][DataFusion] Implement DataFrame.sort()
  • ARROW-9760 - [Rust][DataFusion] Added DataFrame::explain
  • ARROW-9761 - [C/C++] Add experimental C stream inferface
  • ARROW-9762 - [Rust][DataFusion] ExecutionContext::sql now returns DataFrame
  • ARROW-9769 - [Python] Un-skip tests with fsspec in-memory filesystems
  • ARROW-9775 - [C++] Automatic S3 region selection
  • ARROW-9781 - [C++] Fix valgrind uninitialized value warnings
  • ARROW-9782 - [C++][Dataset] More configurable Dataset writing
  • ARROW-9784 - [Rust][DataFusion] Make running TPCH benchmark repeatable
  • ARROW-9786 - [R] Unvendor cpp11 before release
  • ARROW-9788 - [Rust][DataFusion] Rename SelectionExec to FilterExec
  • ARROW-9792 - [Rust][DataFusion] Aggregate expression functions should not return result
  • ARROW-9794 - [C++] Add IsVendor API for CpuInfo
  • ARROW-9795 - [C++][Gandiva] Implement castTIMESTAMP(int64) in Gandiva
  • ARROW-9806 - [R] More compute kernel bindings
  • ARROW-9807 - [R] News update/version bump post-1.0.1
  • ARROW-9808 - [Python] Update read_table doc string
  • ARROW-9811 - [C++] Unchecked floating point division by 0 should succeed
  • ARROW-9813 - [C++] Disable semantic interposition
  • ARROW-9819 - [C++] Bump mimalloc to 1.6.4
  • ARROW-9821 - [Rust][DataFusion] Support for User Defined ExtensionNodes in the LogicalPlan
  • ARROW-9821 - [Rust][DataFusion] Make crate::logical_plan and crate::physical_plan modules
  • ARROW-9823 - [CI][C++][MinGW] Enable S3
  • ARROW-9832 - [Rust] [DataFusion] Refactor PhysicalPlan to remove Partition
  • ARROW-9833 - [Rust][DataFusion] TableProvider.scan now returns ExecutionPlan
  • ARROW-9834 - [Rust] [DataFusion] Remove Partition trait
  • ARROW-9835 - [Rust][DataFusion] Removed FunctionMeta and FunctionType
  • ARROW-9836 - [Rust][DataFusion] Improve API for usage of UDFs
  • ARROW-9837 - [Rust][DataFusion] Added provider for variable
  • ARROW-9838 - [Rust] [DataFusion] DefaultPhysicalPlanner should insert explicit MergeExec nodes
  • ARROW-9839 - [Rust][DataFusion] Implement ExecutionPlan.as_any
  • ARROW-9841 - [Rust] Update checked-in fbs files
  • ARROW-9844 - [CI] Add Go build job on s390x
  • ARROW-9845 - [Rust][Parquet] Move serde_json dependency to dev-dependencies as it is only used in tests
  • ARROW-9848 - [Rust] Implement 0.15 IPC alignment
  • ARROW-9849 - [Rust][DataFusion] Simplified argument types of ScalarFunctions.
  • ARROW-9850 - [Go] Defer should not be used inside a loop
  • ARROW-9853 - [RUST] Implement take kernel for dictionary arrays
  • ARROW-9854 - [R] Support reading/writing data to/from S3
  • ARROW-9858 - [Python][Docs] Add user guide for filesystems interface
  • ARROW-9863 - [C++][Parquet] Compile regexes only once
  • ARROW-9867 - [C++][Dataset] Add FileSystemDataset::filesystem property
  • ARROW-9868 - [C++][R] Provide CopyFiles for copying files between FileSystems
  • ARROW-9869 - [R] Implement full S3FileSystem/S3Options constructor
  • ARROW-9870 - [R] Friendly interface for filesystems (S3)
  • ARROW-9871 - [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
  • ARROW-9873 - [C++][Compute] Optimize mode kernel for integers in small value range
  • ARROW-9875 - [Python] Let FileSystem.get_file_info accept a single path
  • ARROW-9884 - [R] Bindings for writing datasets to Parquet
  • ARROW-9885 - [Rust][DataFusion] Minor code simplification
  • ARROW-9886 - [Rust][DataFusion] Parameterized testing of physical cast.
  • ARROW-9887 - [Rust][DataFusion] Added support for complex return types for built-in functions
  • ARROW-9890 - [R] Add zstandard compression codec in macOS build
  • ARROW-9891 - [Rust][DataFusion] Made math functions accept f32.
  • ARROW-9892 - [Rust][DataFusion] Added concat of utf8
  • ARROW-9893 - [Python] Support parquet options in dataset writing
  • ARROW-9895 - [Rust] Improve sorting kernels
  • ARROW-9899 - [Rust][DataFusion] Switch from Box --> SchemaRef (Arc) to be consistent with the rest of Arrow
  • ARROW-9900 - [Rust][DataFusion] Switch from Box -> Arc in LogicalPlanNode
  • ARROW-9901 - [C++] Add hand-crafted Parquet to Arrow reconstruction tests
  • ARROW-9902 - [Rust][DataFusion] Add array() built-in function
  • ARROW-9904 - [C++] Unroll the loop of CountSetBits.
  • ARROW-9908 - [Rust] Add support for temporal types in JSON reader
  • ARROW-9910 - [Rust][DataFusion] Fixed error in type coercion of Variadic.
  • ARROW-9914 - [Rust][DataFusion] Document SQL Type --> Arrow type mapping
  • ARROW-9916 - [RUST] Avoid cloning array data
  • ARROW-9917 - [Python][Compute] Bindings for mode kernel
  • ARROW-9919 - [Rust][DataFusion] Speedup math operations by 15%+
  • ARROW-9921 - [Rust] Replace TryFrom by From in StringArray from Vec<Option<&str>> (+50%)
  • ARROW-9925 - [GLib] Add low level value readers for GArrowListArray family
  • ARROW-9926 - [GLib] Use placement new for GArrowRecordBatchFileReader
  • ARROW-9928 - [C++] Speed up integer parsing slightly
  • ARROW-9929 - [Dev] Autotune cmake-format
  • ARROW-9933 - [Developer] Add drone as a CI provider for crossbow
  • ARROW-9934 - [Rust] Shape and stride check in tensor
  • ARROW-9941 - [Python] Better string representation for extension types
  • ARROW-9944 - [Rust][DataFusion] Implement to_timestamp function
  • ARROW-9949 - [C++] Improve performance of Decimal128::FromString by 46%, and make the implementation reusable for Decimal256.
  • ARROW-9950 - [Rust][DataFusion] Made UDFs usable without a registry
  • ARROW-9952 - [Python] Optionally use pyarrow.dataset in parquet.write_to_dataset
  • ARROW-9954 - [Rust][DataFusion] Made aggregates support the same signatures as functions.
  • ARROW-9956 - [C++][Gandiva] Implementation of binary_string function in gandiva
  • ARROW-9957 - [Rust] Replace tempdir with tempfile
  • ARROW-9961 - [Rust][DataFusion] Make to_timestamp function parses timestamp without timezone offset as local
  • ARROW-9964 - [C++] Allow reading date types from CSV data
  • ARROW-9965 - [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
  • ARROW-9966 - [Rust] Speedup kernels for sum,min,max by 10%-60%
  • ARROW-9967 - [Python] Add compute module docs + expose more option classes
  • ARROW-9971 - [Rust] Improve speed of take by 2x-3x (change scaling with batch size)
  • ARROW-9977 - [Rust][Large] StringArray
  • ARROW-9979 - [Rust] Fix arrow crate clippy lints
  • ARROW-9980 - [Rust][Parquet] Fix clippy lints
  • ARROW-9981 - [Rust][Flight] Expose IpcWriteOptions on utils
  • ARROW-9983 - [C++][Dataset][Python] Use larger default batch size than 32K for Datasets API
  • ARROW-9984 - [Rust][DataFusion] Minor cleanup DRY
  • ARROW-9986 - [Rust] allow to_timestamp to parse local times without fractional seconds
  • ARROW-9987 - [Rust][DataFusion] Improved docs for Expr
  • ARROW-9988 - [Rust][DataFusion] Added +-/* as operators to logical expressions.
  • ARROW-9992 - [C++][Python] Refactor python to arrow conversions based on a reusable conversion API
  • ARROW-9998 - [Python] Support pickling DictionaryScalar
  • ARROW-9999 - [Python] Support constructing dictionary array directly through pa.array()
  • ARROW-10000 - [C++][Python] Support constructing StructArray from list of key-value pairs
  • ARROW-10001 - [Rust][DataFusion] Added developer guide to README.
  • ARROW-10010 - [Rust] Speedup arithmetic (1.3-1.9x)
  • ARROW-10015 - [Rust] Simd aggregate kernels
  • ARROW-10016 - [Rust] Implement is null / is not null kernels
  • ARROW-10018 - [CI] Disable Sphinx and API documentation build on master
  • ARROW-10019 - [Rust] Add substring kernel
  • ARROW-10023 - [C++][Gandiva] Implement split_part function in gandiva
  • ARROW-10024 - [C++][Parquet] Create nested reading benchmarks
  • ARROW-10028 - [Rust] Simplified macro
  • ARROW-10030 - [Rust] Add support for FromIter and IntoIter for primitive types
  • ARROW-10035 - [C++] Update vendored libraries
  • ARROW-10037 - [C++] Workaround to force find AWS SDK to look for shared libraries
  • ARROW-10040 - [Rust] Iterate over and combine boolean buffers with arbitrary offsets
  • ARROW-10043 - [Rust][DataFusion] Implement COUNT(DISTINCT col)
  • ARROW-10044 - [Rust] Improved Arrow's README.
  • ARROW-10046 - [Rust][DataFusion] Made RecordBatchReader implement Iterator
  • ARROW-10050 - [C++][Gandiva] Implement concat() in Gandiva for up to 10 arguments
  • ARROW-10051 - [C++][Compute] Move kernel state when merging
  • ARROW-10054 - [Python] don't crash when slice offset > length
  • ARROW-10055 - [Rust] DoubleEndedIterator implementation for NullableIter
  • ARROW-10057 - [C++] Add hand-written Parquet nested tests
  • ARROW-10058 - [C++] Improve repeated levels conversion without BMI2
  • ARROW-10059 - [R][Doc] Give more advice on how to set up C++ build
  • ARROW-10063 - [Archery][CI] Fetch main branch in archery build only when it is a pull request
  • ARROW-10064 - [C++] Resolve compile warnings on Apple Clang 12
  • ARROW-10065 - [Rust] Simplify code (+500, -1k)
  • ARROW-10066 - [C++] Make sure default AWS region selection algorithm is used
  • ARROW-10068 - [C++] Add bundled external project for aws-sdk-cpp
  • ARROW-10069 - [Java] Support running Java benchmarks from command line
  • ARROW-10070 - [C++][Compute] Implement var and std aggregate kernel
  • ARROW-10071 - [R] segfault with ArrowObject from previous session, or saved
  • ARROW-10074 - [C++] Use string constructor instead of string_view.to_string
  • ARROW-10075 - [C++] Use nullopt from arrow::util instead of vendored namespace
  • ARROW-10076 - [C++] Use temporary directory facility in all unit tests
  • ARROW-10077 - [C++] Fix possible integer multiplication overflow
  • ARROW-10083 - [C++] Improve Parquet fuzz seed corpus
  • ARROW-10084 - [Rust][DataFusion] Added length of LargeStringArray and fixed undefined behavior.
  • ARROW-10086 - [Rust] Renamed min/max_large_string kernels
  • ARROW-10090 - [C++][Compute] Improve mode kernel
  • ARROW-10092 - [Dev][Go] Add grpc generated go files to rat exclusion list
  • ARROW-10093 - [R] Add ability to opt-out of int64 -> int demotion
  • ARROW-10096 - [Rust][DataFusion] Removed unused code
  • ARROW-10099 - [C++][Dataset] Simplify type inference for partition columns
  • ARROW-10100 - [C++][Python][Dataset] Add ParquetFileFragment::Subset method
  • ARROW-10102 - [C++] Refactor BasicDecimal128 Multiplication to use unsigned helper
  • ARROW-10103 - [Rust] Add contains kernel
  • ARROW-10105 - [FlightRPC] Add client option to disable certificate validation with TLS
  • ARROW-10120 - [C++] Add two-level nested Parquet read to Arrow benchmarks
  • ARROW-10127 - Update specification for Decimal to allow for 256-bits
  • ARROW-10129 - [Rust] Cargo build is rebuilding dependencies on arrow changes
  • ARROW-10134 - [Python][Dataset] Add ParquetFileFragment.num_row_groups
  • ARROW-10139 - [C++] Add support for building arrow_testing without building tests
  • ARROW-10148 - [Rust] Improved rust/lib.rs that is shown in docs.rs
  • ARROW-10151 - [Python] Add support for MapArray conversion to Pandas
  • ARROW-10155 - [Rust][DataFusion] Improved lib.rs docs
  • ARROW-10156 - [Rust] Added github action to label PRs for rust.
  • ARROW-10157 - [Rust] Add an example to the take kernel
  • ARROW-10160 - [Rust] Improve DictionaryType documentation (clarify which type is which)
  • ARROW-10161 - [Rust][DataFusion] DRYed code in tests
  • ARROW-10162 - [Rust] Add pretty print support for DictionaryArray
  • ARROW-10164 - [Rust] Add support for DictionaryArray to cast kernel
  • ARROW-10167 - [Rust][DataFusion] Support DictionaryArray in sql.rs tests, by using standard pretty printer
  • ARROW-10171 - [Rust][DataFusion] Added ExecutionContext::From
  • ARROW-10190 - [Website] Add Jorge to list of committers
  • ARROW-10196 - [C++] Add Future::DeferNotOk
  • ARROW-10199 - [Rust][Parquet] Release Parquet at crates.io to remove debug prints
  • ARROW-10201 - [C++][CI] Disable S3 in arm64 job on Travis CI
  • ARROW-10202 - [CI][Windows] Use sf.net mirror for MSYS2
  • ARROW-10205 - [Java][FlightRPC] Allow disabling server validation
  • ARROW-10206 - [C++][Python][FlightRPC] Allow disabling server validation
  • ARROW-10215 - [Rust][DataFusion] Renamed Source to SendableRecordBatchReader.
  • ARROW-10217 - [CI] Run fewer GitHub Actions jobs
  • ARROW-10227 - [Ruby] Use a table size as the default for parquet chunk_size
  • ARROW-10229 - [C++] Remove errant log line
  • ARROW-10231 - [CI] Unable to download minio in arm32v7 docker image
  • ARROW-10233 - [Rust] Make array_value_to_string available in all Arrow builds
  • ARROW-10235 - [Rust][DataFusion] Improve documentation for type coercion
  • ARROW-10240 - [Rust] Optionally load data into memory before running benchmark query
  • ARROW-10251 - [Rust][DataFusion] MemTable::load() now loads partitions in parallel
  • ARROW-10252 - [Python] Add option to skip inclusion of Arrow headers in Python installation
  • ARROW-10256 - [C++][Flight] Disable -Werror carefully
  • ARROW-10257 - [R] Prepare news/docs for 2.0 release
  • ARROW-10260 - [Python] Missing MapType in to_pandas_dtype()
  • ARROW-10265 - [CI] Use smaller build when cache doesn't exist on Travis CI
  • ARROW-10266 - [CI][macOS] Ensure using Python 3.8 with Homebrew
  • ARROW-10267 - [Python] Skip flight test if disable_server_verification feature is not available
  • ARROW-10272 - [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
  • ARROW-10273 - [CI][Homebrew] Fix “brew audit” usage
  • ARROW-10287 - [C++] Avoid std::random_device
  • PARQUET-1845 - [C++] Add expected results of Int96 in big-endian
  • PARQUET-1878 - [C++] lz4 codec is not compatible with Hadoop Lz4Codec
  • PARQUET-1904 - [C++] Export file_offset in RowGroupMetaData

Apache Arrow 1.0.1 (2020-08-21)

Bug Fixes

  • ARROW-9535 - [Python] Remove symlink fixes from conda recipe
  • ARROW-9536 - [Java] Miss parameters in PlasmaOutOfMemoryException.java
  • ARROW-9544 - [R] Fix version argument of write_parquet()
  • ARROW-9549 - [Rust] Fixed version in dependency in parquet.
  • ARROW-9556 - [Python][C++] Segfaults in UnionArray with null values
  • ARROW-9560 - [Packaging] Add required conda-forge.yml
  • ARROW-9569 - [CI][R] Fix rtools35 builds for msys2 key change
  • ARROW-9570 - [Doc] Clean up sphinx sidebar
  • ARROW-9573 - [Python][Dataset] Provide read_table(ignore_prefixes=)
  • ARROW-9574 - [R] Cleanups for CRAN 1.0.0 release
  • ARROW-9575 - [R] gcc-UBSAN failure on CRAN
  • ARROW-9577 - [C++] Ignore EBADF error in posix_madvise()
  • ARROW-9589 - [C++/R] Forward declare structs as structs
  • ARROW-9592 - [CI] Update homebrew before calling brew bundle
  • ARROW-9596 - [CI][Crossbow] Fix homebrew-cpp again, again
  • ARROW-9598 - [C++][Parquet] Fix writing nullable structs
  • ARROW-9599 - [CI] Appveyor toolchain build fails because CMake detects different C and C++ compilers
  • ARROW-9600 - [Rust] pin proc macro
  • ARROW-9600 - [Rust][Arrow] pin older version of proc-macro2 during build
  • ARROW-9602 - [R] Improve cmake detection in Linux build
  • ARROW-9606 - [C++][Dataset] Support "a"_.In(<>).Assume(<compound>)
  • ARROW-9609 - [C++][Dataset] CsvFileFormat reads all virtual columns as null
  • ARROW-9621 - [Python] Skip test_move_file for in-memory fsspec filesystem
  • ARROW-9631 - [Rust] Make arrow not depend on flight
  • ARROW-9631 - [Rust] flight should depend on arrow, not the other way around
  • ARROW-9644 - [C++][Dataset] Don't apply ignore_prefixes to partition base_dir
  • ARROW-9659 - [C++] Fix RecordBatchStreamReader when source is CudaBufferReader
  • ARROW-9684 - [C++] Fix undefined behaviour on invalid IPC / Parquet input
  • ARROW-9700 - [Python] fix create_library_symlinks for macos
  • ARROW-9712 - [Rust][DataFusion] Fix parquet error handling and general code improvements
  • ARROW-9743 - [R] Sanitize paths in open_dataset
  • ARROW-10126 - [Python] Impossible to import pyarrow module in python. Generates this “ImportError: DLL load failed: The specified procedure could not be found.”
  • ARROW-10460 - [FlightRPC][Python] FlightRPC authentication mechanism changed and is undocumented, breaking current working code

New Features and Improvements

  • ARROW-9402 - [C++] Rework portable wrappers for checked integer arithmetic
  • ARROW-9563 - [Dev][Release] Use archery's changelog generator when creating release notes for the website
  • ARROW-9715 - [R] changelog/doc updates for 1.0.1
  • ARROW-9845 - [Rust] [Parquet] serde_json is only used in tests but isn't in dev-dependencies

Apache Arrow 1.0.0 (2020-07-24)

Bug Fixes

  • ARROW-1692 - [Java] UnionArray round trip not working
  • ARROW-3329 - [Python] Python tests for decimal to int and decimal to decimal casts
  • ARROW-3861 - [Python] ParquetDataset.read() respect specified columns and not include partition columns
  • ARROW-4018 - [C++] Fix RLE tests' failures on big-endian platforms
  • ARROW-4309 - [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled
  • ARROW-4600 - [Ruby] returns dictionary value
  • ARROW-5158 - [Packaging][Wheel] Symlink libraries in wheels
  • ARROW-5310 - [Python] better error message on creating ParquetDataset from empty directory
  • ARROW-5359 - [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas
  • ARROW-5572 - , ARROW-5310, ARROW-5666: [Python] ParquetDataset tests for new implementation
  • ARROW-5666 - [Python] Underscores in partition (string) values are dropped when reading dataset
  • ARROW-5744 - [C++] Allow Table::CombineChunks to leave string columns chunked
  • ARROW-5875 - [FlightRPC] integration tests for Flight features
  • ARROW-6235 - [R] Implement conversion from arrow::BinaryArray to R character vector
  • ARROW-6523 - [C++][Dataset] arrow_dataset target does not depend on anything
  • ARROW-6848 - [C++] Support building libraries targeting C++14 or higher
  • ARROW-7018 - [R] Non-UTF-8 data in Arrow <--> R conversion
  • ARROW-7028 - [R] Date roundtrip results in different R storage mode
  • ARROW-7084 - [C++] Check for full type equality in ArrayRangeEquals
  • ARROW-7173 - [Integration] Add test to verify Map field names can be arbitrary
  • ARROW-7208 - [Python][Parquet] Raise better error message when passing a directory path instead of a file path to ParquetFile
  • ARROW-7273 - [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conversion
  • ARROW-7480 - [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns
  • ARROW-7610 - [Java] Finish support for 64 bit int allocations
  • ARROW-7654 - [Python] Ability to set column_types to a Schema in csv.ConvertOptions is undocumented
  • ARROW-7681 - [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)
  • ARROW-7702 - [C++][Dataset] Provide (optional) deterministic order of batches
  • ARROW-7782 - [Python] Losing index information when using write_to_dataset with partition_cols
  • ARROW-7840 - [Java] [Integration] Java executables fail
  • ARROW-7843 - [Ruby] MSYS2 packages needed for Gandiva
  • ARROW-7925 - [C++][Docs] Better document use of IWYU, including new ‘match’ option
  • ARROW-7939 - [Python] crashes when reading parquet file compressed with snappy
  • ARROW-7967 - [CI][Crossbow] Pin macOS version in autobrew job to match CRAN
  • ARROW-8050 - [Python][Packaging] Do not include generated Cython source files in wheel packages
  • ARROW-8078 - [Python] Missing links in the docs regarding field and schema DataTypes
  • ARROW-8115 - [Python] Conversion when mixing NaT and datetime objects not working
  • ARROW-8251 - , ARROW-7782: [Python] Preserve pandas index and extension dtypes in write_to_dataset roundtrip
  • ARROW-8344 - [C#] Bug-fixes to binary array plus other improvements
  • ARROW-8360 - [C++][Gandiva] Fixes date32 support for date/time functions
  • ARROW-8374 - [R] : Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array
  • ARROW-8392 - [Java] Fix overflow related corner cases for vector value comparison
  • ARROW-8448 - [Packaging] Update linux-packages README
  • ARROW-8455 - [Rust] Parquet Arrow column read on partially compatible files FIX
  • ARROW-8455 - [Rust] Parquet Arrow column read on partially compatible files
  • ARROW-8471 - [C++][Integration] Represent 64 bit integers as strings
  • ARROW-8472 - [Go][Integration] Represent 64 bit integers as JSON::string
  • ARROW-8473 - [Rust] Untick “Statistics support”
  • ARROW-8480 - [Rust] Use NonNull well aligned pointer as Unique reference
  • ARROW-8503 - [Packaging][deb] Fix building apache-arrow-archive-keyring for RC
  • ARROW-8505 - [Release][C#] “sourcelink test” is failed by Apache.ArrowAssemblyInfo.cs
  • ARROW-8508 - [Rust] FixedSizeListArray improper offset for value
  • ARROW-8510 - [C++][Datasets] Do not use variant in WritePlan to fix compiler error with VS 2017
  • ARROW-8511 - [Release] In verify-release-candidate.bat, exit when CMake build fails, use Unity build
  • ARROW-8514 - [Developer][Release] Verify Python 3.5 Windows wheel
  • ARROW-8529 - [C++] Fix usage of NextCounts() on dictionary-encoded data
  • ARROW-8535 - [Rust] Specify arrow-flight version
  • ARROW-8536 - [Rust][Flight] Check in proto file, conditional build if file exists
  • ARROW-8537 - [C++] Revert Optimizing BitmapReader
  • ARROW-8539 - [CI] “AMD64 MacOS 10.15 GLib & Ruby” fails
  • ARROW-8554 - [C++][Benchmark] Fix building error “cannot bind lvalue”
  • ARROW-8556 - [R] zstd symbol not found if there are multiple installations of zstd
  • ARROW-8566 - [R] error when writing POSIXct to spark
  • ARROW-8568 - [C++] Fix decimal to decimal cast issues
  • ARROW-8577 - [Plasma][CUDA] Make CUDA initialization lazy
  • ARROW-8583 - [C++][Doc] Undocumented parameter in Dataset namespace
  • ARROW-8584 - [C++] Fix ORC link order
  • ARROW-8585 - [Packaging][Python] Windows wheels fail to build because of link error
  • ARROW-8586 - [R] installation failure on CentOS 7
  • ARROW-8587 - [C++] Fix linking Flight benchmarks
  • ARROW-8592 - [C++] Update docs to reflect LLVM 8
  • ARROW-8593 - [C++][Parquet] Fix build with musl libc
  • ARROW-8598 - [Rust] simd_compare_op creates buffer of incorrect length
  • ARROW-8602 - [C++][CMake] Fix ws2_32 link issue when cross-compiling on Linux
  • ARROW-8603 - [C++][Documentation] Add missing params comment
  • ARROW-8604 - [R][CI] Update CI to use R 4.0
  • ARROW-8608 - [C++] Update vendored ‘variant.hpp’ to fix CUDA 10.2
  • ARROW-8609 - [C++] Fix ORC Java JNI crash
  • ARROW-8610 - [Rust] DivideByZero when running arrow crate when simd feature is disabled
  • ARROW-8613 - [C++][Dataset][Python] Raise in discovery on unparsable partition expression
  • ARROW-8615 - [R] Error better and insist on RandomAccessFile in read_feather
  • ARROW-8617 - [Rust] Avoid loading simd_load_set_invalid which doesn't exist on aarch64
  • ARROW-8632 - [C++] Fix conversion error warning in array_union_test.cc
  • ARROW-8641 - [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader
  • ARROW-8643 - [Python] Fix failing pandas tests with DatetimeIndex on pandas master
  • ARROW-8644 - [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility
  • ARROW-8646 - [Java] Allow UnionListWriter to write null values
  • ARROW-8649 - [Java][Website] Java documentation on website is hidden
  • ARROW-8657 - [C++][Python] Add separate configuration for data pages
  • ARROW-8663 - [Documentation] Small correction to building.rst
  • ARROW-8680 - [Rust] Fix ComplexObjectArray null value shifting
  • ARROW-8684 - [Python] Workaround Cython type initialization bug
  • ARROW-8689 - [C++] Fix linking S3FS benchmarks
  • ARROW-8693 - [Python] Insert implicit cast in Dataset.get_fragments with filter
  • ARROW-8694 - [C++][Parquet] Relax string size limit when deserializing Thrift messages
  • ARROW-8701 - [Rust] Unresolved import `crate::compute::util::simd_load_set_invalid` on Raspberry Pi
  • ARROW-8704 - [C++] Fix Parquet undefined behaviour on invalid input
  • ARROW-8705 - copying null values from ComplexCopier
  • ARROW-8706 - [C++][Parquet] Tracking JIRA for PARQUET-1857 (unencrypted INT16_MAX Parquet row group limit)
  • ARROW-8710 - [Rust] Ensure right order of messages written, and flush stream when complete.
  • ARROW-8722 - [Dev] Pass environment variables to the container when running “archery docker run -e”
  • ARROW-8726 - [C++] Filename should not be part of DirectoryPartitioning
  • ARROW-8728 - [C++] Fix bitmap operation buffer overflow
  • ARROW-8729 - [C++][Dataset] Ensure non-empty batches when only virtual columns are projected
  • ARROW-8734 - [R] improve nightly build installation
  • ARROW-8741 - [Python][Packaging] Keep VS2015 with for the windows wheels
  • ARROW-8750 - [Python] Correctly default to lz4 compression for Feather V2 in Python
  • ARROW-8768 - [R][CI] Fix nightly as-cran spurious failure
  • ARROW-8775 - [C++][FlightRPC] fix integration tests
  • ARROW-8776 - [FlightRPC] Fix discrepancy between headers in Java and C++
  • ARROW-8798 - [C++] Fix Parquet crash on invalid input
  • ARROW-8799 - [C++][Parquet] NestedListReader needs to handle empty item batches
  • ARROW-8801 - [Python] Fix memory leak when converting datetime64-with-tz data to pandas
  • ARROW-8802 - [C++][Dataset] Preserve dataset schema's metadata on column projection
  • ARROW-8803 - [Java] Row count should be set before loading buffers in VectorLoader
  • ARROW-8808 - [Rust] Fix divide by zero error in builder
  • ARROW-8809 - [Rust] Fix JSON schema bug
  • ARROW-8811 - [Java] Fix CI
  • ARROW-8820 - [C++][Gandiva] fix date_trunc functions to return date types
  • ARROW-8821 - [Rust] fix type cast for nested binary expression using Like, NotLike, Not operators
  • ARROW-8825 - [C++] Mark parameter as unused
  • ARROW-8826 - [Crossbow] remote URL should always have .git
  • ARROW-8832 - [Python] Provide better error message when S3/HDFS is not enabled in installation
  • ARROW-8848 - [Ruby][CI] Fix MSYS2 update error
  • ARROW-8848 - [Ruby][CI] Fix MSYS2 update error
  • ARROW-8858 - [FlightRPC] ensure binary/multi-valued headers are properly exposed
  • ARROW-8860 - [C++] Fix IPC/Feather decompression for nested types (with child_data)
  • ARROW-8862 - [C++] NumericBuilder should use MemoryPool passed to CTOR
  • ARROW-8863 - [C++] Ensure that ArrayData::null_count is always set to 0 when using ArrayData::Make and supplying null validity bitmap
  • ARROW-8869 - [Rust][DataFusion] Add support for new scan nodes to type coercion rule
  • ARROW-8871 - [C++] Fix Gandiva for value_parsing.h refactor
  • ARROW-8872 - [CI] Restore ci/detect-changes.py
  • ARROW-8874 - [C++][Dataset] Scanner::ToTable race when ScanTask exit early with an error
  • ARROW-8878 - [R] try_download is confused when download.file.method isn't default
  • ARROW-8882 - [C#] Add .editorconfig to C# code
  • ARROW-8888 - [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable
  • ARROW-8889 - [Python] avoid SIGSEGV when comparing RecordBatch to None
  • ARROW-8892 - [C++][CI] CI builds for MSVC do not build benchmarks
  • ARROW-8909 - [Java] Out of order writes using setSafe
  • ARROW-8911 - [C++] Fix segfault when slicing ChunkedArray with zero chunks
  • ARROW-8924 - [C++][Gandiva] Avoid potential int overflow in castDATE_date32()
  • ARROW-8925 - [Rust][DataFusion] CsvExec::schema bug fix
  • ARROW-8930 - [C++] libz.so linking error with liborc.a
  • ARROW-8932 - [C++][CI] Fix link error at arrow-orc-adapter-test
  • ARROW-8946 - [Python] Add tests for parquet.write_metadata
  • ARROW-8948 - [Java][Integration] enable duplicate field names integration tests
  • ARROW-8951 - [C++] Fix compiler warnings on gcc8 in release builds
  • ARROW-8954 - [Website] ca-certificates should be listed in installation instructions
  • ARROW-8957 - [FlightRPC][C++] directly use IpcWriteOptions
  • ARROW-8959 - [Rust] Update benchmark to use new API (fixes broken build)
  • ARROW-8962 - [C++] Add explicit implementation for junk values
  • ARROW-8968 - [C++][Gandiva] set data layout for pre-compiled IR to llvm::module
  • ARROW-8975 - [FlightRPC][C++] try to fix MacOS flaky tests
  • ARROW-8977 - [R] Table$create with schema crashes with some dictionary index types
  • ARROW-8978 - [C++][CI] Fix valgrind warnings in cpp-conda-valgrind nightly build
  • ARROW-8980 - [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema
  • ARROW-8982 - [CI] Remove allow_failures for s390x on TravisCI
  • ARROW-8986 - [Archery][ursabot] Fix benchmark diff checkout of origin/master
  • ARROW-9000 - [Java] Update errorprone to 2.4.0
  • ARROW-9009 - [C++][Dataset] ARROW:schema should be removed from schema's metadata when reading Parquet files
  • ARROW-9013 - [C++] Validate CMake options
  • ARROW-9020 - [Python] read_json won't respect explicit_schema in parse_options
  • ARROW-9024 - [C++/Python] Install anaconda-client in conda-clean job
  • ARROW-9026 - [C++/Python] Force package removal from arrow-nightlies c…
  • ARROW-9037 - [C++] C-ABI: do not error out when importing array with null_count == -1
  • ARROW-9040 - [Python][Parquet]“_ParquetDatasetV2” fail to read with columns and use_pandas_metadata=True
  • ARROW-9057 - [Rust][Datafusion] Fix projection on in memory scan
  • ARROW-9059 - [Rust] Fix sign in array slice_data_docstring
  • ARROW-9066 - [Python] Raise correct error in isnull()
  • ARROW-9071 - [C++] Fixed a bug in MakeArrayOfNull
  • ARROW-9077 - [C++] Fix aggregate/scalar-compare benchmark null_percent calculation
  • ARROW-9080 - [C++] arrow::AllocateBuffer returns a Result<unique_ptr<Buffer>>
  • ARROW-9082 - [Rust] - Stream reader fail when steam not ended with (opt…
  • ARROW-9084 - [C++] CMake is unable to find zstd target when ZSTD_SOURCE=SYSTEM
  • ARROW-9085 - [C++][CI] Fix Windows build
  • ARROW-9087 - [C++] Support additional HDFS options
  • ARROW-9098 - [C++] Fixed ToStructArray handling of 0 column RecordBatches
  • ARROW-9105 - [C++][Dataset][Python] Pass an explicit schema to split_by_row_groups
  • ARROW-9120 - [C++] Do not suppress linting on files with “codegen” in their name
  • ARROW-9121 - [C++] Forbid empty or root path in FileSystem::DeleteDirContents
  • ARROW-9122 - [C++] Properly handle sliced arrays in ascii_lower, ascii_upper kernels
  • ARROW-9126 - [C++] Fix building trimmed Boost bundle on Windows
  • ARROW-9127 - [Rust] Update thrift dependency to 0.13 (latest)
  • ARROW-9134 - [Python] Parquet partitioning degrades Int32 to float64
  • ARROW-9141 - [R] Update cross-package documentation links
  • ARROW-9142 - [C++] random::RandomArrayGenerator::Boolean “probability” misdocumented / incorrect
  • ARROW-9143 - [C++] Do not produce internal ArrayData with kUnknownNullCount in RecordBatch::Slice if source ArrayData::null_count is set to 0
  • ARROW-9146 - [C++][Dataset] Lazily store fragment physical schema
  • ARROW-9151 - [R][CI] Fix Rtools 4.0 build: pacman sync
  • ARROW-9160 - [C++] Implement contains for exact matches
  • ARROW-9174 - [Go] Fix table panic on 386
  • ARROW-9183 - [C++] Fix build with clang & old libstdc++.
  • ARROW-9184 - [Rust][Datafusion] table scan without projection should return all columns
  • ARROW-9194 - [C++] Array::GetScalar not implemented for decimal type
  • ARROW-9195 - [Java] Fixed UNSAFE.get from bytearray usage
  • ARROW-9209 - [C++] Benchmarks fail to build ARROW_IPC=OFF and ARROW_BUILD_TESTS=OFF
  • ARROW-9219 - [R] coerce_timestamps in Parquet write options does not work
  • ARROW-9221 - [Java] account for big-endian buffers in ArrowBuf.setBytes
  • ARROW-9223 - [Python] Propagate timezone information in pandas conversion
  • ARROW-9230 - [FlightRPC][Python] pass through all options in flight.connect
  • ARROW-9233 - [C++] Add NullType code paths for is_valid, is_null kernels
  • ARROW-9236 - [Rust] CSV WriterBuilder never writes header
  • ARROW-9237 - [R] 0.17 install on Arch Linux
  • ARROW-9238 - [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
  • ARROW-9252 - [Integration] Factor out IPC integration tests into script, add back 0.14.1 “gold” files
  • ARROW-9260 - [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04
  • ARROW-9260 - [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed
  • ARROW-9261 - [Python] Fix CA certificate lookup with S3 filesystem on manylinux
  • ARROW-9274 - [Rust] Parse 64bit numbers from integration files as strings
  • ARROW-9282 - [R] Remove usage of _EXTPTR_PTR
  • ARROW-9284 - [Java] getMinorTypeForArrowType returns sparse minor type for dense union types
  • ARROW-9288 - [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning
  • ARROW-9297 - [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
  • ARROW-9298 - [C++] Fix crashes with invalid IPC input
  • ARROW-9303 - [R] Linux static build should always bundle dependencies
  • ARROW-9305 - [Python] Dependency load failure in Windows wheel build
  • ARROW-9315 - [Java] Fix the failure of testAllocationManagerType
  • ARROW-9317 - [Java] add few testcases for arrow-memory
  • ARROW-9326 - [Python] Remove setuptools pinning
  • ARROW-9326 - [FOLLOWUP] Use requirements-build.txt for installing setuptools (#7638)
  • ARROW-9326 - [Python][TRIAGE] Pin to setuptools version prior to distutils-related changes on July 3 (#7636)
  • ARROW-9330 - [C++] Fix crash and undefined behaviour on corrupt IPC input
  • ARROW-9334 - [Dev][Archery] Push ancestor docker images
  • ARROW-9336 - [Ruby] Add support for missing keys in StructArrayBuilder
  • ARROW-9343 - [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces
  • ARROW-9347 - [Python] Fix mv in fsspec handler for directories
  • ARROW-9350 - [C++] Fix Valgrind failures
  • ARROW-9351 - [C++] Fix CMake 3.2 detection in option value validation
  • ARROW-9353 - [Python][CI] Disable known failures in dask integration tests
  • ARROW-9354 - [C++] Turbodbc latest fails to build in the integration tests
  • ARROW-9355 - [R] : Fix -Wimplicit-int-float-conversion
  • ARROW-9360 - [CI][Crossbow] Nightly homebrew-cpp job times out
  • ARROW-9363 - [C++][Dataset] Preserve schema metadata in ParquetDatasetFactory
  • ARROW-9368 - [Python] Rename predicate argument to filter in split_by_row_group()
  • ARROW-9373 - [C++] Fix Parquet crash on invalid input (OSS-Fuzz)
  • ARROW-9380 - [C++] Fix Filter crashes and bug in kernels with NullHandling::OUTPUT_NOT_NULL
  • ARROW-9384 - [C++] Avoid memory blowup on invalid IPC input
  • ARROW-9385 - [Python] Fix JPype tests and JVM buffer lifetime
  • ARROW-9389 - [C++] Add binary metafunctions for the set lookup kernels isin and match that can be called with CallFunction
  • ARROW-9397 - [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
  • ARROW-9408 - [Integration] Fix Windows numpy datagen issues
  • ARROW-9409 - [CI][Crossbow] Nightly conda-r fails
  • ARROW-9410 - [CI][Crossbow] Fix homebrew-cpp again
  • ARROW-9413 - [Rust] Disable cpm_nan clippy error
  • ARROW-9415 - [C++] Arrow does not compile on Power9
  • ARROW-9416 - [Go] Add testcases for some datatypes
  • ARROW-9417 - [C++] Write length in IPC message by using little-endian
  • ARROW-9418 - [R] nyc-taxi Parquet files not downloaded in binary mode on Windows
  • ARROW-9419 - [C++] Expand fill_null function testing, test sliced arrays, fix some bugs
  • ARROW-9428 - [C++][Doc] Update buffer allocation documentation
  • ARROW-9436 - [C++][CI] Fix Valgrind failure
  • ARROW-9438 - [CI] Add spark patch to compile with recent Arrow Java changes
  • ARROW-9439 - [C++] Fix crash on invalid IPC input
  • ARROW-9440 - [Python] Expose Fill Null kernel
  • ARROW-9443 - [C++] Bundled bz2 build should only build libbz2
  • ARROW-9448 - [Java] fix empty ArrowBuf getting a null log in debug mode
  • ARROW-9449 - [R] Strip arrow.so
  • ARROW-9450 - [Python] Fix tests startup time
  • ARROW-9456 - [Python] Dataset segfault when not importing pyarrow.parquet
  • ARROW-9458 - [Python] Release GIL in ScanTask.execute
  • ARROW-9460 - [C++] Fix BinaryContainsExact for pattern with repeated characters
  • ARROW-9461 - [Rust] Fixed error in reading Date32 and Date64.
  • ARROW-9476 - [C++][Dataset] Fix incorrect dictionary association in HivePartitioningFactory
  • ARROW-9486 - [C++][Dataset] Support implicit cast of InExpression::set to dict
  • ARROW-9497 - [C++][Parquet] Fix fuzz failure case caused by malformed Parquet data
  • ARROW-9499 - [C++] AdaptiveIntBuilder::AppendNull does not increment the null count
  • ARROW-9500 - [C++] Do not use std::to_string to fix segfault on gcc 7.x in -O3 builds
  • ARROW-9501 - Add logic in timestampdiff() when end date is last day of…
  • ARROW-9503 - [Rust] Comparison sliced arrays is wrong
  • ARROW-9504 - [C++/Python] Segmentation fault on ChunkedArray.take
  • ARROW-9506 - [Packaging][Python] Fix macOS wheel build failures
  • ARROW-9512 - [C++] Avoid variadic template unpack inside lambda to work around gcc 4.8 bug
  • ARROW-9524 - [CI][Gandiva] Fix c++ unit test failure in Gandiva nightly build
  • ARROW-9527 - [Rust] Removed un-used dev dependencies.
  • ARROW-10126 - [Python] Impossible to import pyarrow module in python. Generates this “ImportError: DLL load failed: The specified procedure could not be found.”
  • PARQUET-1839 - Set values read for required column
  • PARQUET-1857 - [C++] Do not fail to read unencrypted files with over 32767 row groups. Change some DCHECKs causing segfaults to throw exceptions
  • PARQUET-1865 - [C++] Fix usages of C++17 extensions in parquet/encoding_benchmark.cc
  • PARQUET-1877 - [C++] Reconcile thrift limits
  • PARQUET-1882 - [C++] Buffered Reads should allow for 0 length

New Features and Improvements

  • ARROW-300 - [Format] Proposal for “trivial” IPC body buffer compression using either LZ4 or ZSTD codecs
  • ARROW-842 - [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
  • ARROW-971 - [C++][Compute] IsValid, IsNull kernels
  • ARROW-974 - [Website] Add Use Cases section to the website
  • ARROW-1277 - Completing integration tests for major implemented data types
  • ARROW-1567 - [C++] Implement “fill_null” function that replaces null values with a scalar value
  • ARROW-1570 - [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature
  • ARROW-1682 - [Doc] Expand S3/MinIO fileystem dataset documentation
  • ARROW-1796 - [Python] RowGroup filtering on file level
  • ARROW-2260 - [C++][Plasma] Use Gflags for command-line parsing
  • ARROW-2444 - [Python][C++] Better handle reading empty parquet files
  • ARROW-2702 - [Python] Change a couple of error types in numpy_to_arrow.cc
  • ARROW-2714 - [Python] Implement variable step slicing with Take
  • ARROW-2912 - [Website] Build more detailed Community landing page a la Apache Spark
  • ARROW-3089 - [Rust] Add ArrayBuilder for different Arrow arrays
  • ARROW-3134 - [C++] Implement n-ary iterator for a collection of chunked arrays with possibly different chunking layouts
  • ARROW-3154 - [Python] Expand documentation on Parquet metadata inspection and writing of _metadata
  • ARROW-3244 - [Python] Multi-file parquet loading without scan
  • ARROW-3275 - [Python] Add documentation about inspecting Parquet file metadata
  • ARROW-3308 - [R] Convert R character vector with data exceeding 2GB to Large type
  • ARROW-3317 - [R] Test/support conversions from data.frame with a single character column exceeding 2GB capacity of BinaryArray
  • ARROW-3446 - [R] Document mapping of Arrow <-> R types
  • ARROW-3509 - [C++] Standardize on using Field in Type/Array
  • ARROW-3520 - [C++] Add “list_flatten” vector kernel wrapper for Flatten method of ListArray types
  • ARROW-3688 - [Rust] Add append_values for primitive builders
  • ARROW-3764 - [C++] Port Python “ParquetDataset” business logic to C++
  • ARROW-3827 - [Rust] Implement UnionArray Updated
  • ARROW-4022 - [C++] Promote Datum variant out of compute namespace
  • ARROW-4221 - [C++][Python] Add canonical flag in COO sparse index
  • ARROW-4390 - [R] Serialize “labeled” metadata in Feather files, IPC messages
  • ARROW-4412 - [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.
  • ARROW-4427 - [Doc] Move Confluence Wiki pages to the Sphinx docs
  • ARROW-4429 - [Doc] Add Git conventions to contributing guidelines
  • ARROW-4526 - [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package
  • ARROW-5035 - [C#] ArrowBuffer.Builder<bool> is broken
  • ARROW-5082 - [Python] Substantially reduce Python wheel package and install size
  • ARROW-5143 - [Flight] Enable integration testing of batches with dictionaries
  • ARROW-5279 - [C++] Support reading delta dictionaries in IPC streams
  • ARROW-5377 - [C++] Make IpcPayload public and add GetPayloadSize
  • ARROW-5489 - [C++] Normalize kernels and ChunkedArray behavior
  • ARROW-5548 - [Documentation] http://arrow.apache.org/docs/latest/ is not latest
  • ARROW-5649 - [Integration][C++] Create integration test for extension types
  • ARROW-5708 - [C#] Null support for BooleanArray
  • ARROW-5760 - [C++] New compute::Take implementation for better performance, faster dispatch, smaller code size / faster compilation
  • ARROW-5854 - [Python] Expose compare kernels on Array class
  • ARROW-6052 - [C++] Split up arrow/array.h/cc into multiple files under arrow/array/, move ArrayData to separate header, make ArrayData::dictionary ArrayData
  • ARROW-6110 - [Java][Integration] Support LargeList Type and add integration test with C++
  • ARROW-6111 - [Java] Support LargeVarChar and LargeBinary types
  • ARROW-6439 - [R] Implement S3 file-system interface in R
  • ARROW-6456 - [C++] Possible to reduce object code generated in compute/kernels/take.cc?
  • ARROW-6501 - [C++] Remove non_zero_length_ field from SparseIndex class
  • ARROW-6521 - [C++] Add an API to query runtime build info
  • ARROW-6543 - [R] Support LargeBinary and LargeString types
  • ARROW-6602 - [Doc] Add a feature/implementation matrix
  • ARROW-6603 - [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
  • ARROW-6645 - [Python] Use common boundschecking function for checking dictionary indices when converting to pandas
  • ARROW-6689 - [Rust] [DataFusion] Query execution enhancements for 1.0.0 release
  • ARROW-6691 - [Rust] [DataFusion] Use tokio and Futures instead of spawning threads
  • ARROW-6775 - [C++][Python] Implement list_value_lengths and list_parent_indices functions
  • ARROW-6776 - [Python] Need a lite version of pyarrow
  • ARROW-6800 - [C++] Add CMake option to build libraries targeting a C++14 or C++17 toolchain environment
  • ARROW-6839 - [Java] Add APIs to read and write “custom_metadata” field of IPC file footer (#7231)
  • ARROW-6856 - [C++] Use ArrayData instead of Array for ArrayData::dictionary
  • ARROW-6917 - [Archery][Release] Add support for JIRA curation, changelog generation and commit cherry-picking for maintenance releases
  • ARROW-6945 - [Rust][Integration] Run rust integration tests
  • ARROW-6959 - [C++] Clarify what signatures are preferred for compute kernels
  • ARROW-6978 - [R] Add bindings for sum and mean compute kernels
  • ARROW-6979 - [R] Enable jemalloc in autobrew formula
  • ARROW-7009 - [C++] Refactor filter/take kernels to use Datum instead of overloads
  • ARROW-7010 - [C++] Implement decimal-to-float casts
  • ARROW-7011 - [C++] Implement casts from float/double to decimal
  • ARROW-7012 - [C++] Add comments explaining high level detail about ChunkedArray class and questions about chunk sizes
  • ARROW-7068 - [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array
  • ARROW-7075 - [C++] Boolean kernels should not allocate in Call()
  • ARROW-7175 - [Website] Add a security page to track when vulnerabilities are patched
  • ARROW-7229 - [C++] Unify ConcatenateTables APIs
  • ARROW-7230 - [C++] Use vendored std::optional instead of boost::optional in Gandiva
  • ARROW-7237 - [C++] Use Result in arrow/json APIs
  • ARROW-7243 - [Docs] Add common “implementation status” table to the README of each native language implementation, as well as top level README
  • ARROW-7285 - [C++] ensure C++ implementation meets clarified dictionary spec
  • ARROW-7300 - [C++][Gandiva] Implement functions to cast from strings to integers/floats
  • ARROW-7313 - [C++] Add function for retrieving a scalar from an array slot
  • ARROW-7371 - [GLib] Add GLib binding of Dataset
  • ARROW-7375 - [Python] Expose C++ MakeArrayOfNull
  • ARROW-7391 - [C++][Dataset] Remove Expression subclasses from bindings
  • ARROW-7495 - [Java] Remove “empty” concept from ArrowBuf, replace with custom referencemanager (#6433)
  • ARROW-7605 - [C++] Create and install “dependency bundle” static library including jemalloc, mimalloc, and any BUNDLED static library so that static linking to libarrow.a is possible
  • ARROW-7607 - [C++] Example of using Arrow as a dependency of another CMake project
  • ARROW-7673 - [C++][Dataset] Revisit File discovery failure mode
  • ARROW-7676 - [Packaging][Python] Ensure that the static libraries are not built in the wheel scripts
  • ARROW-7699 - [Java] Support concating dense union vectors in batch
  • ARROW-7705 - [Rust] Initial sort implementation
  • ARROW-7717 - [CI] Have nightly integration test for Spark's latest release
  • ARROW-7759 - [C++][Dataset] Add CsvFileFormat
  • ARROW-7778 - [Integration][C++] Enable nested dictionaries
  • ARROW-7784 - [C++] Improve compilation time of arrow/array/diff.cc and reduce code size
  • ARROW-7801 - [Developer] Add issue_comment workflow to fix lint/style/codegen
  • ARROW-7803 - [R][CI] Autobrew/homebrew tests should not always install from master
  • ARROW-7831 - [Java] Fix build error from #6402
  • ARROW-7831 - [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
  • ARROW-7902 - [Integration] Unskip nested dictionary integration tests
  • ARROW-7910 - [C++] Add internal GetPageSize() function
  • ARROW-7924 - [Rust] Add sort for float types
  • ARROW-7950 - [Python] Determine + test minimal pandas version + raise error when pandas is too old
  • ARROW-7955 - [Java] Support large buffer for file/stream IPC
  • ARROW-8020 - [Java] Implement vector validate functionality
  • ARROW-8023 - [Website] Write a blog post about the C data interface
  • ARROW-8025 - [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc
  • ARROW-8025 - [C++] Implement cast from String to Binary
  • ARROW-8046 - [Developer][Integration] Makefile.docker's target names are broken
  • ARROW-8062 - [C++][Dataset] Implement ParquetDatasetFactory
  • ARROW-8065 - [C++][Dataset] Refactor ScanOptions and Fragment relation
  • ARROW-8074 - [C++][Dataset][Python] FileFragments from buffers and NativeFiles
  • ARROW-8108 - [Java] Extract a common interface for dictionary encoders
  • ARROW-8111 - [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl
  • ARROW-8114 - [Java][Integration] Enable custom_metadata integration test
  • ARROW-8121 - [Java] Enhance code style checking for Java code (add spaces after commas, semi-colons and type casts)
  • ARROW-8149 - [C++/Python] Enable CUDA Support in conda recipes
  • ARROW-8157 - [C++][Gandiva] Support building with LLVM 9
  • ARROW-8162 - [Format][Python] Add serialization for CSF sparse tensors to Python
  • ARROW-8169 - [Java] Improve the performance of JDBC adapter by allocating memory proactively
  • ARROW-8171 - [Java] Consider pre-allocating memory for fix-width vector in Avro adapter iterator (#7211)
  • ARROW-8190 - [FlightRPC][C++] Expose IPC options
  • ARROW-8229 - [Java] Move ArrowBuf into the Arrow package (#6729)
  • ARROW-8230 - [Java] Remove netty dependency from arrow-memory (#7347)
  • ARROW-8261 - [Rust-DataFusion] Made limit accept integers and no longer accept expressions.
  • ARROW-8263 - [Rust][DataFusion] Added some documentation to available SQL functions.
  • ARROW-8281 - [R] Name collision of arrow.dll on Windows conda
  • ARROW-8283 - [Python] Limit FileSystemDataset constructor from fragments/paths, no filesystem interaction
  • ARROW-8287 - [Rust] Add “pretty” util to help with printing tabular output of RecordBatches
  • ARROW-8293 - [Python] Run flake8 on python/examples also
  • ARROW-8297 - [FlightRPC][C++] Implement Flight DoExchange for C++
  • ARROW-8301 - [R] Handle ChunkedArray and Table in C data interface
  • ARROW-8312 - [Java][Gandiva] support TreeNode in IN expression
  • ARROW-8314 - [Python] Add a Table.select method to select a subset of columns
  • ARROW-8318 - [C++][Dataset] Construct FileSystemDataset from fragments
  • ARROW-8399 - [Rust] Extend memory alignments to include other architectures
  • ARROW-8413 - [C++][Parquet] Refactor Generating validity bitmap for values column
  • ARROW-8422 - [Rust][Parquet] Arrow to Parquet schema conversion
  • ARROW-8430 - [CI] Configure self-hosted runners for Github Actions
  • ARROW-8434 - [C++] Avoid multiple schema deserializations in RecordBatchFileReader
  • ARROW-8440 - [C++] Refine SIMD header files
  • ARROW-8443 - [Gandiva][C++] Fix Trunc and Round output types.
  • ARROW-8447 - [C++][Dataset] Ensure row deterministic ordering in Scanner::ToTable
  • ARROW-8456 - [Release] Add python script to help curating JIRA
  • ARROW-8467 - [C++] Fix TestArrayImport tests for big-endian platforms
  • ARROW-8474 - [CI][Crossbow] Skip some nightlies we don't need to run
  • ARROW-8477 - [C++] Enable reading and writing of long filenames for Windows
  • ARROW-8481 - [Java] Provide an allocation manager based on Unsafe API
  • ARROW-8483 - [Ruby] Removed irrelevant bits of documentation in Arrow::Table
  • ARROW-8485 - [Integration][Java] Implement extension types integration
  • ARROW-8486 - [C++] Fix BitArray failures on big-endian platforms
  • ARROW-8487 - [FlightRPC] Provide a way to target a particular payload size
  • ARROW-8488 - [R] Remove VALUE_OR_STOP and STOP_IF_NOT_OK macros
  • ARROW-8496 - [C++] Refine ByteStreamSplitDecodeScalar
  • ARROW-8497 - [Archery] Add missing components to build options
  • ARROW-8499 - [C++][Dataset] In ScannerBuilder, batch_size will not wor…
  • ARROW-8500 - [C++] Add benchmark for using Filter on RecordBatch
  • ARROW-8501 - [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6
  • ARROW-8502 - [Release][APT][Yum] Ignore all Linux packages for arm64v8
  • ARROW-8504 - [C++] Add BitRunReader and use it in parquet
  • ARROW-8506 - [C++] Add tests to verify the encoded stream of RLE with bit_width > 8
  • ARROW-8507 - [Release] Detect .git directory automatically in changelog.py
  • ARROW-8509 - [GLib] Add low level record batch read/write functions
  • ARROW-8512 - [C++] Remove unused expression/operator prototype code
  • ARROW-8513 - [Python] Expose Take with Table input in Python
  • ARROW-8515 - [C++] Bitmap::ToString should group by bytes
  • ARROW-8516 - [Rust] Improve PrimitiveBuilder::append_slice performance
  • ARROW-8517 - [Release] Update Crossbow release verification tasks for 0.17.0 RC0
  • ARROW-8520 - [Developer] Use .asf.yaml to direct GitHub notifications to JIRA and mailing lists
  • ARROW-8521 - [Release] Update CHANGELOG.md to include patch releases
  • ARROW-8522 - [Release][Developer] Add option to bootstrap NPM when running release verification script
  • ARROW-8524 - [CI] Free up space on github actions
  • ARROW-8526 - [Python] Fix non-deterministic row order failure in dataset tests
  • ARROW-8531 - [C++] Deprecate ARROW_USE_SIMD CMake option
  • ARROW-8538 - [Packaging] Remove boost from homebrew formula
  • ARROW-8540 - [C++] Add memory allocation benchmarks
  • ARROW-8541 - [Release] Don't remove previous source releases automatically
  • ARROW-8542 - [Release] Fix checksum url in the website post release script
  • ARROW-8543 - [C++] Single pass coalescing algorithm + Rebase
  • ARROW-8544 - [CI][Crossbow] Add a status.json to the gh-pages summary of nightly builds to get around rate limiting
  • ARROW-8548 - [Website] 0.17 release post
  • ARROW-8549 - [R] Assorted post-0.17 release cleanups
  • ARROW-8550 - [CI] Don't run cron GHA jobs on forks
  • ARROW-8551 - [CI][Gandiva] Use LLVM 8 in gandiva linux build
  • ARROW-8552 - [Rust] support iterate parquet row columns
  • ARROW-8553 - [C++] Optimize unaligned bitmap operations
  • ARROW-8555 - [FlightRPC][Java] implement DoExchange
  • ARROW-8558 - [Rust][CI] GitHub Actions missing rustfmt
  • ARROW-8559 - [Rust] Consolidate Record Batch reader traits in main arrow crate
  • ARROW-8560 - [Rust] Docs for MutableBuffer resize are incorrect
  • ARROW-8561 - [C++][Gandiva] Stop using deprecated google::protobuf::MessageLite::ByteSize()
  • ARROW-8562 - [C++] IO: Parameterize I/O Coalescing using S3 metrics
  • ARROW-8563 - [Go] Minor change to make newBuilder public
  • ARROW-8564 - [Website] Add Ubuntu 20.04 LTS to supported package list
  • ARROW-8569 - [CI] Upgrade xcode version for testing homebrew formulae
  • ARROW-8571 - [C++] Switch AppVeyor image to VS 2017
  • ARROW-8572 - [Python] expose UnionArray fields to Python
  • ARROW-8573 - [Rust] Upgrade Rust to 1.44 nightly
  • ARROW-8574 - [Rust] Implement Debug for all plain types
  • ARROW-8575 - [Developer] Add issue_comment workflow to rebase a PR
  • ARROW-8590 - [Rust] Use arrow crate pretty util in DataFusion
  • ARROW-8591 - [Rust] Reverse lookup for a key in DictionaryArray
  • ARROW-8597 - [Rust] Lints and readability improvements for arrow crate
  • ARROW-8606 - [CI] Don't trigger all builds on a change to any file in ci/
  • ARROW-8607 - [R][CI] Unbreak builds following R 4.0 release
  • ARROW-8611 - [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3
  • ARROW-8612 - [GLib] Add GArrowReadOptions and GArrowWriteOptions
  • ARROW-8616 - [Rust] Turn explicit SIMD off by default
  • ARROW-8619 - [C++] Use distinct enum values for MonthInterval, DayTimeInterval
  • ARROW-8622 - [Rust] Allow the parquet crate to be compiled on aarch64 platforms
  • ARROW-8623 - [C++][Gandiva] Reduce use of Boost, remove Boost headers from header files
  • ARROW-8624 - [Website] Install page should mention arrow-dataset packages
  • ARROW-8628 - [Dev] Wrap docker-compose commands with archery
  • ARROW-8629 - [Rust] Eliminate indirection of zero sized allocations
  • ARROW-8633 - [C++] Add ValidateAscii function
  • ARROW-8634 - [Java] Add Getting Started section to Java README
  • ARROW-8639 - [C++][Plasma] Require gflags
  • ARROW-8645 - [C++] Missing gflags dependency for plasma
  • ARROW-8647 - [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type
  • ARROW-8648 - [Rust] Optimize Rust CI Workflows
  • ARROW-8650 - [Rust][Website] Add documentation to Arrow website
  • ARROW-8651 - [Python][Dataset] Support pickling of Dataset objects
  • ARROW-8656 - [Python] Switch to VS2017 in the windows wheel builds
  • ARROW-8659 - [Rust] ListBuilder allocate with_capacity
  • ARROW-8660 - [C++][Gandiva] Reduce usage of Boost in Gandiva codebase
  • ARROW-8662 - [CI] Consolidate appveyor scripts
  • ARROW-8664 - [Java] Add flag to skip null check
  • ARROW-8668 - [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
  • ARROW-8669 - [C++] Add IpcWriteOptions argument to GetRecordBatchSize()
  • ARROW-8671 - [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files
  • ARROW-8671 - [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata
  • ARROW-8682 - [Ruby][Parquet] Add support for column level compression
  • ARROW-8687 - [Java] Remove references to io.netty.buffer.ArrowBuf
  • ARROW-8690 - [Python] Clean-up dataset+parquet tests now order is determinstic
  • ARROW-8692 - [C++] Avoid memory copies when downloading from S3
  • ARROW-8695 - [Java] Remove references to PlatformDependent in arrow-memory
  • ARROW-8696 - [Java] Convert tests to maven failsafe
  • ARROW-8699 - [R] Fix automatic r_to_py conversion
  • ARROW-8702 - [Packaging][C#] Build NuGet packages in release process
  • ARROW-8703 - [R] schema$metadata should be properly typed
  • ARROW-8707 - [CI] Docker push fails because of wrong dockerhub credentials
  • ARROW-8708 - [CI] Utilize github actions cache for docker-compose volumes
  • ARROW-8711 - [Python] Expose timestamp_parsers in csv.ConvertOptions
  • ARROW-8717 - [CI][Packaging] Add build dependency on boost to homebrew
  • ARROW-8720 - [C++] Fix checked_pointer_cast ifdef logic
  • ARROW-8721 - [CI] Fix R build matrix
  • ARROW-8723 - [Rust] Remove SIMD specific benchmark code
  • ARROW-8724 - [Packaging][deb][RPM] Use directory in host as build directory
  • ARROW-8725 - [Rust] remove redundant directory walk in parquet datasource
  • ARROW-8727 - [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function
  • ARROW-8730 - [Rust] Use slice instead of &Vec for function args
  • ARROW-8733 - [C++][Dataset][Python] Expose RowGroupInfo statistics values
  • ARROW-8736 - [Rust][DataFusion] Table API should provide a schema() method
  • ARROW-8740 - [CI] Fix archery option in pandas master cron test
  • ARROW-8742 - [C++][Python] Add GRPC Mutual TLS for clients and server
  • ARROW-8743 - [CI][C++] Add a test job for s390x
  • ARROW-8744 - [Rust] handle channel close in parquet batch iterator
  • ARROW-8745 - [C++] Enhance Bitmap::ToString test for big-endian platforms
  • ARROW-8747 - [C++] Write compressed size in little-endian format for Feather V2
  • ARROW-8751 - [Rust] support empty parquet file in arrow array reader
  • ARROW-8752 - [Rust] remove unused hashmaps in build_array_reader
  • ARROW-8753 - [CI][C++] Add a test job for ARM
  • ARROW-8754 - [C++][CI] Enable additional tests on s390x
  • ARROW-8756 - [C++] Fix Bitmap Words tests' failures on big-endian platforms
  • ARROW-8757 - [C++][Plasma] Write Plasma header in little-endian format
  • ARROW-8758 - [R] Updates for compatibility with dplyr 1.0
  • ARROW-8759 - [C++][Plasma] Fix TestPlasmaSerialization.DeleteReply failure on big-endian platforms
  • ARROW-8762 - [C++] Use arrow::internal::BitmapAnd directly in Gandiva
  • ARROW-8763 - [C++] Add RandomAccessFile::WillNeed
  • ARROW-8764 - [C++] Make executor configurable in ReadAsync and ReadRangeCache
  • ARROW-8766 - [Python] Allow implementing filesystems in Python
  • ARROW-8769 - [C++][R] Add convenience accessor for StructScalar fields
  • ARROW-8770 - [C++][CI] Enable arrow-csv-test on s390x
  • ARROW-8772 - [C++] Unrolled aggregate dense for better speculative execution
  • ARROW-8777 - [Rust] Parquet.rs does not support reading fixed-size binary fields.
  • ARROW-8778 - [C++][Gandiva] Fix SelectionVector related failure on big-endian platform
  • ARROW-8779 - [R] Implement conversion to List
  • ARROW-8781 - [CI][MinGW] Enable ccache
  • ARROW-8782 - [Rust] Add benchmark crate
  • ARROW-8783 - [Rust][DataFusion] Add ParquetScan and CsvScan variants in LogicalPlan
  • ARROW-8784 - [Rust][DataFusion] Remove use of Arc from LogicalPlan
  • ARROW-8785 - [Python][Packaging] Build the windows wheels with MIMALLOC enabled
  • ARROW-8786 - [Packaging][rpm] Use bundled zstd in the CentOS 8 build
  • ARROW-8788 - [C#] Introduce bit-packed builder for null support in builders
  • ARROW-8789 - [Rust] Add separate crate for integration test binaries
  • ARROW-8790 - [C++][CI] Enable arrow-flight-test on s390x
  • ARROW-8791 - [Rust] Allow creation of StringDictionaryBuilder with an existing array of dictionary values
  • ARROW-8792 - [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
  • ARROW-8793 - [C++] Do not inline BitUtil::SetBitsTo
  • ARROW-8794 - [C++] Expand performance coverage of parquet to arrow reading
  • ARROW-8795 - [C++] Limited iOS support
  • ARROW-8800 - [C++] Split ChunkedArray into arrow/chunked_array.h/cc
  • ARROW-8804 - [R][CI] Followup to Rtools40 upgrade
  • ARROW-8814 - [Dev][Release] Binary upload script keeps raising locale warnings
  • ARROW-8815 - [Dev][Release] Binary upload script should retry on unexpected bintray request error
  • ARROW-8818 - [Rust] Failing to build on master due to Flatbuffers/Union issues
  • ARROW-8822 - [Rust][DataFusion] Add InMemoryScan to LogicalPlan
  • ARROW-8827 - [Rust] Add initial skeleton for Rust integration tests
  • ARROW-8830 - [GLib] Add support for Tell against not seekable GIO output stream
  • ARROW-8831 - [Rust] change simd_compare_op in comparison kernel to use bitmask SIMD operation to significantly improve performance
  • ARROW-8833 - [Rust] Implement VALIDATE mode in integration tests
  • ARROW-8834 - [Rust][Integration Testing] Implement stream-to-file, file-to-stream
  • ARROW-8835 - [Rust] Implement arrow-stream-to-file for integration testing
  • ARROW-8836 - [Website] Update copyright end year automatically
  • ARROW-8837 - [Rust] Implement Null data type
  • ARROW-8838 - [Rust] File reader fails to read header from valid files
  • ARROW-8839 - [Rust][DataFusion] support CSV schema inference in logical plan
  • ARROW-8840 - [Rust][DataFusion] implement std::error:Error trait for ExecutionError
  • ARROW-8841 - [C++] Add benchmark and unittest for encoding::PLAIN spaced
  • ARROW-8843 - [C++] Compare bitmaps in words
  • ARROW-8844 - [C++] Transfer bitmap in words
  • ARROW-8846 - [Dev][Python] Autoformat Python files with archery
  • ARROW-8847 - [C++] Pass task hints in Executor API
  • ARROW-8851 - [Python][Documentation] Fix FutureWarnings in Python Plas…
  • ARROW-8852 - [R] Post-0.17.1 adjustments
  • ARROW-8854 - [Rust][Integration Testing] Standardize error handling
  • ARROW-8855 - [Rust][Integration] Complete record_batch_from_json types
  • ARROW-8856 - [Rust][Integration] Return None from an empty IPC message
  • ARROW-8864 - [R] Add methods to Table/RecordBatch for consistency with data.frame
  • ARROW-8866 - [C++] Split UNION into SPARSE_UNION and DENSE_UNION
  • ARROW-8867 - [R] Support converting POSIXlt type
  • ARROW-8875 - [C++] use AWS SDK SetResponseStreamFactory to avoid a copy of bytes
  • ARROW-8877 - [Rust][DataFusion] introduce CsvReadOption struct to simplify UX
  • ARROW-8879 - [FlightRPC][Java] FlightStream should unwrap ExecutionExceptions
  • ARROW-8880 - [R][Linux] Make R Binary Install Friendlier
  • ARROW-8881 - [Rust] Add large binary, string and list support
  • ARROW-8885 - [R] Don't include everything everywhere
  • ARROW-8886 - [C#] Resize to negative length no longer permitted
  • ARROW-8887 - [Java] Avoid runway doubling of buffer size for complex vectors
  • ARROW-8890 - [R] Fix C++ lint issues
  • ARROW-8895 - [C++] Test temporal types with Take and Filter, expand types supported by RandomArrayGenerator::ArrayOf
  • ARROW-8896 - [C++] Use Take to implement dictionary to T casts
  • ARROW-8899 - [R] Add R metadata like pandas metadata for round-trip fidelity
  • ARROW-8901 - [C++] Reduce number of take kernels
  • ARROW-8903 - [C++] Implement optimized “unsafe take” for use with selection vectors for kernel execution
  • ARROW-8904 - [Python] Adapt to child->field API migration/deprecation
  • ARROW-8906 - [Rust][DataFusion] support schema inference from multiple CSV files
  • ARROW-8907 - [Rust] Implement scalar comparison operations
  • ARROW-8912 - [Ruby] Keep reference of Arrow::Buffer's data for GC
  • ARROW-8913 - [Ruby] Use “field” instead of “child”
  • ARROW-8914 - [C++] Keep BasicDecimal128 in native-endian order
  • ARROW-8915 - [Dev][Archery] Require Click 7
  • ARROW-8917 - [C++] Formalize “metafunction” concept. Add Take and Filter metafunctions, port R and Python bindings
  • ARROW-8918 - [C++][Python] Implement cast metafunction to allow use of “cast” with CallFunction, use in Python
  • ARROW-8922 - [C++] Add illustrative “ascii_upper” and “ascii_length” scalar string functions valid for Array and Scalar inputs
  • ARROW-8923 - [C++] Improve usability of arrow::compute::CallFunction
  • ARROW-8926 - [C++] Improve arrow/compute/*.h comments, correct typos and outdated language
  • ARROW-8927 - [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions
  • ARROW-8929 - [C++] Set the default for compute::Arity::VarArgs to 0
  • ARROW-8931 - [Rust] add lexical sort support to arrow compute kernel
  • ARROW-8933 - [C++] Trim redundant generated code from compute/kernels/vector_hash.cc
  • ARROW-8934 - [C++] Enable compute::Subtract with timestamp inputs to return duration
  • ARROW-8937 - [C++] Implement strptime scalar string to timestamp kernel
  • ARROW-8938 - [R] Provide binding for arrow::compute::CallFunction
  • ARROW-8940 - [Java] Fix the performance degradation of integration tests
  • ARROW-8941 - [C++/Python] Add cleanup script for arrow-nightlies conda repository
  • ARROW-8942 - [R] Detect compression in reading CSV/JSON
  • ARROW-8943 - [C++][Python][Dataset] Add partitioning support to ParquetDatasetFactory
  • ARROW-8950 - [C++] Avoid HEAD when possible in S3 filesystem
  • ARROW-8958 - [FlightRPC][Python] implement DoExchange
  • ARROW-8960 - [MINOR][FORMAT] fix typo
  • ARROW-8961 - [C++] Add utf8proc library to toolchain
  • ARROW-8963 - [C++][Parquet] optimize LeafReader::NextBatch to save memory
  • ARROW-8965 - [Python][Doc] Pyarrow documentation for pip nightlies references 404'd location
  • ARROW-8966 - [C++] Move arrow::ArrayData to a separate header file
  • ARROW-8969 - [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators
  • ARROW-8970 - [C++] Reduce shared library / binary code size (umbrella issue)
  • ARROW-8972 - [Java] Support range value comparison for large varchar/varbinary vectors
  • ARROW-8973 - [Java] Support batch value appending for large varchar/varbinary vectors
  • ARROW-8974 - [C++] Simplify TransferBitmap
  • ARROW-8976 - [C++] compute::CallFunction can't Filter/Take with ChunkedArray
  • ARROW-8979 - [C++] Refine bitmap unaligned word access
  • ARROW-8984 - [R] Revise install guides now that Windows conda package exists
  • ARROW-8985 - [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility
  • ARROW-8989 - [C++][Doc] Document available compute functions
  • ARROW-8993 - [Rust] support reading non-seekable sources
  • ARROW-8994 - [C++] Disable include-what-you-use cpplint lint checks
  • ARROW-8996 - [C++] Add AVX version for aggregate sum/mean with runtime dispatch
  • ARROW-8997 - [Archery] Improve benchmark comparison formatting
  • ARROW-9004 - [C++][Gandiva] Support building with LLVM 10
  • ARROW-9005 - [Rust][Datafusion] support sort expression
  • ARROW-9007 - [Rust] Support appending array data to builders
  • ARROW-9011 - [Python][Packaging] Move the anaconda cleanup script to crossbow
  • ARROW-9014 - [Packaging] Bump the minor part of the automatically generated version in crossbow
  • ARROW-9015 - [Java] Make BaseAllocator package private
  • ARROW-9016 - [Java] Remove direct references to Netty/Unsafe Allocators
  • ARROW-9017 - [C++][Python] Refactor scalar bindings
  • ARROW-9018 - [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior
  • ARROW-9021 - [Python] Add the filesystem explanation to parquet.read_table docstring
  • ARROW-9022 - [C++] Add/Sub/Mul arithmetic kernels with overflow check
  • ARROW-9029 - [C++] Implement BitBlockCounter for much faster block popcounts of bitmaps
  • ARROW-9030 - [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib
  • ARROW-9031 - [R] Implement conversion from Type::UINT64 to R vector
  • ARROW-9032 - [C++] Split up arrow/util/bit_util.h into multiple header files
  • ARROW-9034 - [C++] Implement “BinaryBitBlockCounter”, add single-word functions to BitBlockCounter
  • ARROW-9042 - [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior
  • ARROW-9043 - [Go][FOLLOWUP] Move license file copy to correct location
  • ARROW-9043 - [Go] Temporarily copy LICENSE.txt to go/
  • ARROW-9045 - [C++] Expand / improve Take and Filter benchmarks for enhanced baseline
  • ARROW-9046 - [C++][R] Put more things in type_fwds
  • ARROW-9047 - [Rust] Fix a segfault when setting zero bits in a zero-length bitset.
  • ARROW-9050 - [Release] Use 1.0.0 as the next version
  • ARROW-9051 - [GLib] Refer Array related objects from Array
  • ARROW-9052 - [CI][MinGW] Enable Gandiva
  • ARROW-9055 - [C++] Add sum/mean/minmax kernels for Boolean type
  • ARROW-9058 - [Packaging][wheel] Use sourceforge.net to download Boost
  • ARROW-9060 - [GLib] Add support for building Apache Arrow Datasets GLib with non-installed Apache Arrow Datasets
  • ARROW-9061 - [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib
  • ARROW-9062 - [Rust] json reader dictionary support
  • ARROW-9067 - [C++] Create reusable branchless / vectorized index boundschecking functions
  • ARROW-9070 - [C++] StructScalar needs field accessor methods
  • ARROW-9073 - [C++] Fix RapidJSON include directory detection with RapidJSONConfig.cmake
  • ARROW-9074 - [GLib] Add missing arrow-json check
  • ARROW-9075 - [C++] Optimized Filter implementation: faster performance + compilation, smaller code size
  • ARROW-9079 - [C++] Write benchmark for arithmetic kernels
  • ARROW-9083 - [R] collect int64, uint32, uint64 as R integer type if not out of bounds
  • ARROW-9086 - [CI][Homebrew] Enable Gandiva
  • ARROW-9088 - [Rust] Make prettyprint optional
  • ARROW-9089 - [Python] A PyFileSystem handler for fsspec-based filesystems
  • ARROW-9090 - [C++] Bump versions of bundled libraries
  • ARROW-9091 - [C++][Compute] Add default FunctionOptions
  • ARROW-9093 - [FlightRPC][C++][Python] expose generic gRPC transport options
  • ARROW-9094 - [Python] Bump versions of compiled dependencies in manylinux wheels
  • ARROW-9095 - [Rust] Spec-compliant NullArray
  • ARROW-9099 - [C++][Gandiva] Implement trim function for string
  • ARROW-9100 - [C++] Add ascii_lower kernel
  • ARROW-9101 - [Doc][C++] Document encoding expected for CSV data
  • ARROW-9102 - [Packaging] Upload built manylinux docker images
  • ARROW-9106 - [Python] Allow specifying CSV file encoding
  • ARROW-9108 - [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion
  • ARROW-9109 - [Python][Packaging] Enable S3 support in manylinux wheels
  • ARROW-9110 - [C++] Fix CPU cache size detection on macOS
  • ARROW-9112 - [R] Update autobrew script location
  • ARROW-9115 - [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch
  • ARROW-9116 - [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length
  • ARROW-9116 - [C++] Add BaseBinaryArray::total_values_length
  • ARROW-9118 - [C++] Add more general BoundsCheck function that also checks for arbitrary lower limits in integer arrays
  • ARROW-9119 - [C++] Add support for building with system static gRPC
  • ARROW-9123 - [Python][wheel] Use libzstd.a explicitly
  • ARROW-9124 - [Rust][Datafusion] optimize DFParser::parse_sql to take query string as &str
  • ARROW-9125 - [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
  • ARROW-9129 - [Python][JPype] Remove JPype version check
  • ARROW-9130 - [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask
  • ARROW-9131 - [C++] Faster ascii_lower and ascii_upper.
  • ARROW-9132 - [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor
  • ARROW-9133 - [C++] Add utf8_upper and utf8_lower
  • ARROW-9137 - [GLib] Add gparquet_arrow_file_reader_read_row_group()
  • ARROW-9138 - [Docs][Format] Make sure format version is hard coded in the docs
  • ARROW-9139 - [Python] Switch parquet.read_table to use new datasets API by default
  • ARROW-9144 - [CI] OSS-Fuzz build fails because recent changes in the google repository
  • ARROW-9145 - [C++] Implement BooleanArray::true_count and false_count, add Python bindings
  • ARROW-9152 - [C++] Specialized implementation of filtering Binary/LargeBinary-based types
  • ARROW-9153 - [Python] Add bindings for StructScalar
  • ARROW-9154 - [Developer] Use GitHub issue templates better
  • ARROW-9155 - [Archery] Less precise but faster default settings for “archery benchmark diff”
  • ARROW-9156 - [C++] Reducing the code size of the tensor module
  • ARROW-9157 - [Rust][Datafusion] create_physical_plan should take self as immutable reference
  • ARROW-9158 - [Rust][Datafusion] projection physical plan compilation should preserve nullability
  • ARROW-9159 - [Python] Implement Array.isnull/isvalid methods
  • ARROW-9162 - [Python] Expose Add/Subtract/Multiply arithmetic kernels
  • ARROW-9163 - [C++] Validate UTF8 contents of a StringArray
  • ARROW-9166 - [Website] Add overview page
  • ARROW-9167 - [Doc][Website] /docs/c_glib/index.html is overwritten
  • ARROW-9168 - [C++][Flight] Don't share TCP connection among clients
  • ARROW-9173 - [C++][Doc] Document how to use Arrow from a third-party CMake project
  • ARROW-9175 - [FlightRPC][C++] Expose peer to server
  • ARROW-9176 - [Rust] Fix for memory leaks in Arrow allocator
  • ARROW-9178 - [R] Improve documentation about CSV reader
  • ARROW-9179 - [R] Replace usage of iris dataset in tests
  • ARROW-9180 - [Developer] Remove usage of whitelist, blacklist, slave, etc.
  • ARROW-9181 - [C++] Instantiate fewer templates for cast kernels
  • ARROW-9182 - [C++] Use “applicator” namespace for some kernel execution functors. Streamline some applicator implementations
  • ARROW-9185 - [Java][Gandiva] Make llvm build optimisation configurable from java
  • ARROW-9188 - [C++] Use Brotli shared libraries if they are available
  • ARROW-9189 - [Website] Improve contributor guide
  • ARROW-9190 - [Website][C++] Add blog post on efforts to make building lighter and easier
  • ARROW-9191 - [Rust] Do not panic when milliseconds is less than zero as chrono can handle…
  • ARROW-9192 - [CI][Rust] Add support for running clippy
  • ARROW-9193 - [C++] Avoid spurious intermediate string copy in ToDateHolder
  • ARROW-9197 - [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size
  • ARROW-9201 - [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests
  • ARROW-9202 - [GLib] Add GArrowDatum
  • ARROW-9203 - [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install
  • ARROW-9204 - [C++][Flight] Change records_per_stream to int64
  • ARROW-9206 - [C++][Flight] Add latency benchmark
  • ARROW-9207 - [Python] Clean-up internal FileSource class
  • ARROW-9210 - [C++] Use BitBlockCounter in array/visitor_inline.h
  • ARROW-9214 - [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline
  • ARROW-9216 - [C++] Use BitBlockCounter for plain spaced encoding/decoding
  • ARROW-9217 - [C++] Cover 0.01% null for the plain spaced benchmark
  • ARROW-9220 - [C++] Make utf8proc optional even with ARROW_COMPUTE=ON
  • ARROW-9222 - [Format] Columnar.rst changes for removing validity bitmap from union types
  • ARROW-9224 - [Dev][Archery] clone local source with --shared
  • ARROW-9225 - [C++][Compute] Speed up counting sort
  • ARROW-9231 - [Format] Increment MetadataVersion from V4 to V5
  • ARROW-9234 - [GLib][CUDA] Add support for dictionary memo on reading record batch from buffer
  • ARROW-9241 - [C++] Add forward compatibility check for Decimal bit width
  • ARROW-9242 - [Java] Add forward compatibility check for Decimal bit width
  • ARROW-9247 - [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray
  • ARROW-9248 - [C++] Add “list_size” function that returns Int32Array/Int64Array giving list cell sizes
  • ARROW-9249 - [C++] Implement “list_parent_indices” vector function
  • ARROW-9250 - [C++] Instantiate fewer templates in IsIn, Match kernel implementations
  • ARROW-9251 - [C++] Relocate integration testing JSON code implementation to src/arrow/testing
  • ARROW-9254 - [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primitive scalar data opaquely
  • ARROW-9255 - [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
  • ARROW-9256 - [C++] Incorrect variable name ARROW_CXX_FLAGS
  • ARROW-9258 - [FORMAT] Add V5 MetadataVersion to Schema.fbs
  • ARROW-9259 - [Format] Add language indicating that unsigned dictionary indices are supported but that signed integers are preferred
  • ARROW-9262 - [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI
  • ARROW-9263 - [C++] Promote compute aggregate benchmark size to 1M.
  • ARROW-9264 - [C++][Parquet] Refactor and modernize schema conversion code
  • ARROW-9265 - [C++] Allow writing and reading V4-compliant IPC data
  • ARROW-9268 - [C++] add string_is{alpnum,alpha...,upper} kernels
  • ARROW-9272 - [C++][Python] Reduce complexity in python to arrow conversion
  • ARROW-9276 - [Dev] Enable ARROW_CUDA when generating API documentations
  • ARROW-9277 - [C++] Fix docs of reading CSV files
  • ARROW-9278 - [C++][Python] Remove validity bitmap from Union types, update IPC read/write and integration tests
  • ARROW-9280 - [Rust][Parquet] Calculate page and column statistics
  • ARROW-9281 - [R] Turn off utf8proc in R builds
  • ARROW-9283 - [Python] Expose build info
  • ARROW-9287 - [C++] Support unsigned dictionary indices
  • ARROW-9289 - [R] Remove deprecated functions
  • ARROW-9290 - [Rust][Parquet] Add features to allow opting out of dependencies
  • ARROW-9291 - [R] : Support fixed size binary/list types
  • ARROW-9292 - [Doc] Remove Rust from feature matrix
  • ARROW-9294 - [GLib] Add GArrowFunction and related objects
  • ARROW-9300 - [Java] Separate Netty Memory to its own module
  • ARROW-9306 - [Ruby] Add support for Arrow::RecordBatch.new(raw_table)
  • ARROW-9307 - [Ruby] Add Arrow::RecordBatchIterator#to_a
  • ARROW-9308 - [Format] Add Feature enum for forward compatibility.
  • ARROW-9316 - [C++] Use “Dataset” instead of “Datasets”
  • ARROW-9321 - [C++][Dataset] Populate statistics opportunistically
  • ARROW-9322 - [R] Dataset documentation polishing
  • ARROW-9323 - [Ruby] Add Red Arrow Dataset
  • ARROW-9327 - [Rust] Fix all clippy errors for arrow crate
  • ARROW-9329 - [C++][Gandiva] Implement castTimestampToDate function in gandiva
  • ARROW-9331 - [C++] Improve the performance of Tensor-to-SparseTensor conversion
  • ARROW-9333 - [Python] Expose more IPC options
  • ARROW-9335 - [Website] Update website for 1.0
  • ARROW-9337 - [R] On C++ library build failure, give an unambiguous message
  • ARROW-9339 - [Rust] Comments on SIMD in Arrow README are incorrect
  • ARROW-9340 - [R] Use CRAN version of decor package
  • ARROW-9341 - [GLib] Use arrow::Datum version Take()
  • ARROW-9345 - [C++][Dataset] Support casting scalars to dictionary scalars
  • ARROW-9346 - [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo
  • ARROW-9362 - [Java] Support reading/writing V5 MetadataVersion
  • ARROW-9365 - [Go] Added the rest of the implemented array builders to NewBuilder
  • ARROW-9370 - [Java] Bump Netty version
  • ARROW-9374 - [C++][Python] Expose MakeArrayFromScalar
  • ARROW-9379 - [Rust] Add support for unsigned dictionary keys
  • ARROW-9383 - [Python] Support fsspec filesystems in Dataset API
  • ARROW-9386 - [Rust] RecordBatch.schema() should not return &Arc<Schema>
  • ARROW-9390 - [C++][Followup] Add underscores to is* string functions
  • ARROW-9390 - [Doc] Add missing file
  • ARROW-9390 - [C++][Doc] Review compute function names
  • ARROW-9391 - [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.
  • ARROW-9393 - [Doc] update supported types documentation for Java
  • ARROW-9395 - [Python] allow configuring MetadataVersion
  • ARROW-9399 - [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
  • ARROW-9403 - [Python] add Array.tolist as alias of .to_pylist
  • ARROW-9407 - [Python] Recognize more pandas null sentinels in sequence type inference when converting to Arrow
  • ARROW-9411 - [Rust] Update dependencies
  • ARROW-9424 - [C++][Parquet] Disable writing files with LZ4 codec
  • ARROW-9425 - [Rust][DataFusion] Made ExecutionContext sharable and sync
  • ARROW-9427 - [Rust][DataFusion] Added ExecutionContext.tables()
  • ARROW-9437 - [Python][Packaging] Homebrew fails to install build dependencies in the macOS wheel builds
  • ARROW-9442 - [Python] Do not call Validate() in pyarrow_wrap_table
  • ARROW-9445 - [Python] Revert Array.equals changes + expose comparison ops in compute
  • ARROW-9446 - [C++] Add compiler id, version, and build flags to BuildInfo
  • ARROW-9447 - [Rust][DataFusion] Made ScalarUDF (Send + Sync)
  • ARROW-9452 - [Rust][DataFusion] Optimize ParquetScanExec
  • ARROW-9470 - [CI][Java] Run Maven in parallel
  • ARROW-9472 - [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed
  • ARROW-9473 - [Doc] Polishing for 1.0
  • ARROW-9478 - [C++] Improve error message for unsupported casts
  • ARROW-9484 - [Docs] Update is* functions to be is_* in the compute docs
  • ARROW-9485 - [R] Better shared library stripping
  • ARROW-9493 - [Python] Enable dictionary encoding in read_table with datasets API
  • ARROW-9509 - [Release] Don't test Gandiva in the windows wheel verification script
  • ARROW-9511 - [Packaging][Release] Set conda packages' build number to 0
  • ARROW-9514 - [Python] The new Dataset API will not work with files on Azure Blob
  • ARROW-9519 - [Rust] Improved error message when getting a field by name.
  • ARROW-9529 - [Dev][Release] Improvements to release verification scripts
  • ARROW-9531 - [Packaging][Release] Update conda forge dependency pins
  • PARQUET-1820 - [C++] pre-buffer specified columns of row group
  • PARQUET-1843 - [C++] Drop duplicated assignment
  • PARQUET-1855 - [C++] Improve parquet *MetaData documentation
  • PARQUET-1861 - [Parquet][Documentation] Clarify buffered stream option

Apache Arrow 0.17.1 (2020-05-18)

Bug Fixes

  • ARROW-8503 - [Packaging][deb] Fix building apache-arrow-archive-keyring for RC
  • ARROW-8505 - [Release][C#] “sourcelink test” is failed by Apache.ArrowAssemblyInfo.cs
  • ARROW-8584 - [C++] Fix ORC link order
  • ARROW-8608 - [C++] Update vendored ‘variant.hpp’ to fix CUDA 10.2
  • ARROW-8609 - [C++] Fix ORC Java JNI crash
  • ARROW-8641 - [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader
  • ARROW-8657 - [C++][Python] Add separate configuration for data pages
  • ARROW-8684 - [Python] Workaround Cython type initialization bug
  • ARROW-8694 - [C++][Parquet] Relax string size limit when deserializing Thrift messages
  • ARROW-8704 - [C++] Fix Parquet undefined behaviour on invalid input
  • ARROW-8706 - [C++][Parquet] Tracking JIRA for PARQUET-1857 (unencrypted INT16_MAX Parquet row group limit)
  • ARROW-8728 - [C++] Fix bitmap operation buffer overflow
  • ARROW-8741 - [Python][Packaging] Keep VS2015 with for the windows wheels
  • ARROW-8750 - [Python] Correctly default to lz4 compression for Feather V2 in Python
  • PARQUET-1857 - [C++] Do not fail to read unencrypted files with over 32767 row groups. Change some DCHECKs causing segfaults to throw exceptions

New Features and Improvements

  • ARROW-7731 - [C++][Parquet] Support LargeListArray
  • ARROW-8501 - [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6
  • ARROW-8549 - [R] Assorted post-0.17 release cleanups
  • ARROW-8699 - [R] Fix automatic r_to_py conversion
  • ARROW-8758 - [R] Updates for compatibility with dplyr 1.0
  • ARROW-8786 - [Packaging][rpm] Use bundled zstd in the CentOS 8 build

Apache Arrow 0.17.0 (2020-04-20)

Bug Fixes

  • ARROW-1907 - [C++/Python] Feather format cannot accommodate string columns containing more than a total of 2GB of data
  • ARROW-2255 - [C++][Developer][Integration] Serialize custom field/schema metadata
  • ARROW-2587 - [Python][Parquet] Verify nested data can be written
  • ARROW-3004 - [Documentation] Builds docs for master rather than a pinned commit
  • ARROW-3543 - [R] Better support for timestamp format and time zones in R
  • ARROW-5265 - [Python][CI] Add integration test with kartothek
  • ARROW-5473 - [C++] Fix googletest_ep build failure on windows+ninja
  • ARROW-5981 - [C++] Propagate errors from MemoTable to DictionaryBuilder
  • ARROW-6528 - [C++] Spurious Flight test failures (port allocation failure)
  • ARROW-6547 - [C++] valgrind errors in diff-test
  • ARROW-6738 - [Java] Fix problems with current union comparison logic
  • ARROW-6757 - [Release] Use same CMake generator for C++ and Python when verifying RC, remove Python 3.5 from wheel verification
  • ARROW-6871 - [Java] Enhance TransferPair related parameters check and tests
  • ARROW-6872 - [Python] Fix empty table creation from schema with dictionary field
  • ARROW-6890 - [Rust] [Parquet] ArrowReader fails with seg fault
  • ARROW-6895 - [C++][Parquet] Do not reset dictionary in ByteArrayDictionaryRecordReader during incremental reads
  • ARROW-7008 - [C++] Check binary offsets and data buffers for nullness in validation. Produce valid arrays in DictionaryEncode on zero-length arrays
  • ARROW-7049 - [C++] Fix MinGW64 warning in FieldRef::Get
  • ARROW-7301 - [Java] Sql type DATE should correspond to DateDayVector
  • ARROW-7335 - [C++][Gandiva] Add day_time_interval functions: castBIGINT, extractDay
  • ARROW-7390 - [C++][Dataset] Fix RecordBatchProjector race
  • ARROW-7405 - [Java] ListVector isEmpty API is incorrect
  • ARROW-7466 - [CI][Java] Fix gandiva-jar-osx nightly build failure
  • ARROW-7467 - [Java] ComplexCopier does incorrect copy for Map nullable info
  • ARROW-7507 - [Rust] Bump Thrift version to 0.13 in parquet-format and parquet
  • ARROW-7520 - [R] Writing many batches causes a crash
  • ARROW-7546 - [Java] Use new implementation to concat vectors values in batch
  • ARROW-7624 - [Rust] Soundness issues via Buffer methods
  • ARROW-7628 - [Python] Clarify docs of csv reader skip_rows and nulls in strings
  • ARROW-7631 - [C++][Gandiva] return zero if there is an overflow while downscaling a decimal
  • ARROW-7672 - [C++] NULL pointer dereference bug
  • ARROW-7680 - [C++] Fix dataset.factory(...) with Windows paths
  • ARROW-7701 - [FlightRPC][C++] disable flaky MacOS test
  • ARROW-7713 - [Java] TastLeak was put at the wrong location
  • ARROW-7722 - [FlightRPC][Java] disable flaky Flight auth test
  • ARROW-7734 - [C++] check status details for nullptr in equality
  • ARROW-7740 - [C++] Fix StructArray::Flatten corruption
  • ARROW-7755 - [Python] Windows wheel cannot be installed on Python 3.8
  • ARROW-7758 - [Python] Safe cast to nanosecond timestamps in to_pandas conversion
  • ARROW-7760 - [Release] Fix verify-release-candidate.sh since pip3 seems to no longer be in miniconda, install miniconda unconditionally
  • ARROW-7762 - [Python] Do not ignore exception for invalid version in ParquetWriter
  • ARROW-7766 - [Python][Packaging] Windows py38 wheels are built with wrong ABI tag
  • ARROW-7772 - [R][C++][Dataset] Unable to filter on date32 object with date64 scalar
  • ARROW-7775 - [Rust] fix: Don't let safe code arbitrarily transmute readers and writers
  • ARROW-7777 - [Go] Fix StructBuilder and ListBuilder panics on index out of range
  • ARROW-7780 - [Release] Fix Windows wheel RC verification script given lack of “m” ABI tag in Python 3.8
  • ARROW-7781 - [C++] Improve message when referencing a missing field
  • ARROW-7783 - [C++] Set ARROW_COMPUTE=ON if ARROW_DATASET=ON
  • ARROW-7785 - [C++] Improve compilation performance of sparse tensor related code
  • ARROW-7786 - [R] Wire up check_metadata in Table.Equals method
  • ARROW-7789 - [R] Can't initialize arrow objects when R.oo package is loaded
  • ARROW-7791 - [C++][Parquet] Fix building error “cannot bind lvalue”
  • ARROW-7792 - [R] read_* functions should close connection to file
  • ARROW-7793 - [Java] Release accounted-for reservation memory to parent in case of leak
  • ARROW-7794 - [Rust][Flight] Remove hard-coded relative path to Flight.proto
  • ARROW-7794 - [Rust] Support releasing arrow-flight
  • ARROW-7797 - [Release][Rust] Fix arrow-flight's version in datafusion crate
  • ARROW-7802 - [C++][Python] Support LargeBinary and LargeString in the hash kernel
  • ARROW-7806 - [Python] Support LargeListArray and list conversion to pandas.
  • ARROW-7807 - [R] Installation on RHEL 7 Cannot call io___MemoryMappedFile__Open()
  • ARROW-7809 - [R] vignette does not run on Win 10 nor ubuntu
  • ARROW-7813 - [Rust] Remove and fix unsafe code
  • ARROW-7815 - [C++] Improve input validation
  • ARROW-7827 - [Python] conda-forge pyarrow package does not have s3 enabled
  • ARROW-7832 - [R] Patches to 0.16.0 release
  • ARROW-7836 - [Rust] “allocate_aligned”/“reallocate” need to initialize memory to avoid UB
  • ARROW-7837 - [JAVA] copyFromSafe fails due to a bug in handleSafe
  • ARROW-7838 - [C++] Only link Boost libraries with tests, not libarrow.so
  • ARROW-7841 - [C++] Use ${HADOOP_HOME}/lib/native/ to find libhdfs.so again
  • ARROW-7844 - [R] Converter_List is not thread-safe
  • ARROW-7848 - [C++][Python][Doc] Add MapType API doc
  • ARROW-7852 - [Python] 0.16.0 wheels not compatible with older numpy
  • ARROW-7857 - [Python] Revert temporary changes to pandas extension array tests
  • ARROW-7861 - [C++][Parquet] Add fuzz regression corpus for parquet reader
  • ARROW-7884 - [C++] Relax concurrency rules around GetSize()
  • ARROW-7887 - [Rust] Add date/time/duration/timestamp types to filter kernel
  • ARROW-7889 - [Rust] Add support to datafusion-cli for parquet files.
  • ARROW-7899 - [Integration][Java] Fix Flight integration test client to verify each batch
  • ARROW-7908 - [R] Can't install package without setting LIBARROW_DOWNLOAD=true
  • ARROW-7922 - [CI][Crossbow] Nightly macOS wheel builds fail (brew bundle edition)
  • ARROW-7923 - [CI][Crossbow] macOS autobrew fails on homebrew-versions
  • ARROW-7926 - [Dev] Improve “archery lint” UI
  • ARROW-7928 - [Python] Update Python flight server and client examples for latest API
  • ARROW-7931 - [C++] Fix crash on corrupt Map array input (OSS-Fuzz)
  • ARROW-7936 - [Python] Fix and exercise tests on python 3.5
  • ARROW-7940 - [C++] Remove ARROW_USE_CLCACHE handling
  • ARROW-7944 - [Python] Test failures without Pandas
  • ARROW-7956 - [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
  • ARROW-7958 - [Java] Update Avro to version 1.9.2
  • ARROW-7962 - [R][Dataset] Followup to “Consolidate Source and Dataset classes”
  • ARROW-7968 - [C++] orc_ep build fails on 64-bit Raspbian
  • ARROW-7973 - [Developer][C++] ResourceWarnings in run_cpplint.py
  • ARROW-7974 - [C++][Developer] Fix linter warnings when PYTHONDEVMODE enabled
  • ARROW-7975 - [C++] Preserve intended buffer size by default when writing to IPC format
  • ARROW-7978 - [Dev] Do not run IWYU in Github Actions “lint” workflow
  • ARROW-7980 - [Python] Fix creation of tz-aware datetime dtype on first pandas import
  • ARROW-7981 - [C++][Dataset] Fix compilation on gcc 5.4
  • ARROW-7985 - [C++] Fix builder capacity check
  • ARROW-7990 - [Developer][C++] Add option to run “archery lint --iwyu” on all C++ files, not just the ones that you changed. Add “match” option to iwyu.sh
  • ARROW-7992 - [C++] Fix MSVC warning (#6525)
  • ARROW-7996 - [Python] Error serializing empty pandas DataFrame with pyarrow
  • ARROW-7997 - [Python] Schema equals method with inconsistent docs in pyarrow
  • ARROW-7999 - [C++] Fix crash on corrupt List / Map array input
  • ARROW-8000 - [C++] Fix compilation on gcc 4.8
  • ARROW-8003 - [C++] Use CMAKE_C_COMPILER when building bundled bzip2
  • ARROW-8006 - [C++] Initialize spaced data when reading nulls from Parquet
  • ARROW-8007 - [Python] Remove unused and defunct assert_get_object_equal in plasma tests
  • ARROW-8008 - [C++/Python] Set Python3_FIND_FRAMEWORK=LAST
  • ARROW-8009 - [Java] Fix the hash code methods for BitVector
  • ARROW-8011 - [C++] Fix buffer size when reading Parquet data to Arrow
  • ARROW-8013 - [Python][Packaging] Fix building manylinux wheels
  • ARROW-8021 - [Python] Install test requirements including pandas in Appveyor
  • ARROW-8029 - [R] rstudio/r-base:3.6-centos7 GHA build failing on master
  • ARROW-8036 - [C++] Avoid gtest 1.10 deprecation warnings
  • ARROW-8042 - [Python] Clean up docstring and error message when creating ChunkedArray with no chunks
  • ARROW-8057 - [Python] Do not compare schema metadata in Schema.equals and Table.equals by default
  • ARROW-8070 - [C++] Cast segfaults on unsupported cast from list to utf8
  • ARROW-8071 - [GLib] Fix build error with configure
  • ARROW-8075 - [R] Loading R.utils after arrow breaks some arrow functions
  • ARROW-8088 - [C++][Dataset] Support dictionary partition columns
  • ARROW-8091 - [CI][Crossbow] Fix nightly homebrew and R failures
  • ARROW-8092 - [CI][Crossbow] OSX wheels fail on bundled bzip2
  • ARROW-8094 - [CI][Crossbow] Nightly valgrind test fails
  • ARROW-8095 - [C++] Add support for string dictionary value with length
  • ARROW-8098 - [Go] Avoid unsafe unsafe.Pointer usage
  • ARROW-8099 - [Integration] archery integration --with-LANG flags don't work
  • ARROW-8101 - [FlightRPC][Java] Fix null arrays in Flight with no buffers
  • ARROW-8102 - [Dev] Crossbow‘s version detection doesn’t work in the comment bot's scenario
  • ARROW-8105 - [Python] Fix segfault when shrunken masked array is passed to pyarrow.array
  • ARROW-8106 - [Python] Ensure extension array conversion tests passes with latest pandas
  • ARROW-8110 - [C#] BuildArrays fails if NestedType is included
  • ARROW-8112 - [FlightRPC][C++] make sure status codes round-trip through gRPC
  • ARROW-8119 - [Dev] Make Yaml optional dependency for archery
  • ARROW-8122 - [Python] Empty numpy arrays with shape cannot be deserialized
  • ARROW-8125 - [C++] Restore link between tests created with add_arrow_test and arrow-tests target
  • ARROW-8127 - [C++][Parquet] Incorrect column chunk metadata for multipage batch writes
  • ARROW-8128 - [C#] NestedType children serialized on wrong length
  • ARROW-8132 - [C++] Fix S3FileSystem tests on Windows
  • ARROW-8133 - [CI] Github Actions sometimes fail to checkout Arrow
  • ARROW-8136 - [Python] More robust inference of local relative path in dataset
  • ARROW-8136 - [Python] Restore creating a dataset from a relative path
  • ARROW-8138 - [C++] parquet::arrow::FileReader cannot read multiple RowGroup
  • ARROW-8139 - [C++] FileSystem enum causes attributes warning
  • ARROW-8142 - [C++][Compute] Explicit no chunks case for WrapDatumsLike
  • ARROW-8144 - [CI] Cmake 3.2 nightly build fails
  • ARROW-8154 - [Python] HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release
  • ARROW-8159 - [Python] Support pandas.ExtensionDtype in Schema.from_pandas
  • ARROW-8166 - [C++] fix AVX512 intrinsics fail with clang-8
  • ARROW-8176 - [FlightRPC] bind to a free port for integration tests
  • ARROW-8186 - [Python] Fix dataset expression operation with invalid scalar
  • ARROW-8188 - [R] Adapt to latest checks in R-devel
  • ARROW-8193 - [C++] Fix gcc 4.8 compilation error with non-copyable types in Iterator::ToVector
  • ARROW-8197 - [Rust][DataFusion] Fix schema returned by physical plan
  • ARROW-8206 - [R] Minor fix for backwards compatibility on Linux installation
  • ARROW-8209 - [Python] Improve error message when trying to access duplicate Table column
  • ARROW-8213 - [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message
  • ARROW-8216 - [C++][Compute] Filter out nulls by default
  • ARROW-8217 - [R] Unskip previously failing test on Win32 in test-dataset.R from ARROW-7979
  • ARROW-8219 - [Rust] sqlparser crate needs to be bumped to version 0.2.5
  • ARROW-8223 - [Python] Schema.from_pandas breaks with pandas nullable integer dtype
  • ARROW-8233 - [CI][GLib][R] Fix timeount on MinGW
  • ARROW-8234 - [CI] Build timeouts on “AMD64 Windows RTools 35”
  • ARROW-8236 - [Rust] Linting GitHub Actions task failing
  • ARROW-8237 - [Python][Documentation] Minor corrections to python minimal build documentation
  • ARROW-8237 - [Python][Documentation] Review Python developer documentation, add Dockerfile showing minimal source build with conda and pip/virtualenv
  • ARROW-8238 - [C++] Fix FieldPath type definition
  • ARROW-8239 - [Java] fix param checks in splitAndTransfer method
  • ARROW-8245 - [Python][Parquet] Skip hidden directories when reading partitioned parquet files
  • ARROW-8254 - [Rust] [DataFusion] CLI is not working as expected
  • ARROW-8255 - [Rust][DataFusion] Bug fix for COUNT(*)
  • ARROW-8259 - [Rust][DataFusion] ProjectionPushDown now respects LIMIT
  • ARROW-8268 - [CI][Ruby] Enable Zstandard on Ubuntu 16.04
  • ARROW-8269 - [Python] Add pandas mark to test_parquet_row_group_fragments to fix nopandas build
  • ARROW-8270 - [Python][Flight] Update Python server example to support TLS
  • ARROW-8272 - [CI][Python] Fix test failure on Python 3.5
  • ARROW-8274 - [C++] Use LZ4 frame format for “LZ4” compression in IPC
  • ARROW-8276 - [C++][Dataset] Use Scanner for Fragment.to_table
  • ARROW-8280 - [C++] Use c-ares_INCLUDE_DIR
  • ARROW-8286 - [Python] Ensure to create FileSystemDataset when passing pathlib path
  • ARROW-8298 - [C++][MinGW] Fix gRPC detection
  • ARROW-8303 - [Python] Fix test failure on Python 3.5 caused by non-deterministic dict key ordering
  • ARROW-8304 - [Flight][Python] Fix client example with TLS
  • ARROW-8305 - [Java] ExtensionTypeVector should make sure underlyingVector not null
  • ARROW-8310 - [C++] Improve auto-retry in S3 tests
  • ARROW-8315 - [Python] Fix dataset tests on Python 3.5
  • ARROW-8323 - [C++] Add pragmas wrapping proto_utils.h to disable conversion warnings
  • ARROW-8326 - [C++] Use TYPED_TEST_SUITE instead of deprecated TYPED_TEST_CASE
  • ARROW-8327 - [FlightRPC][Java] check gRPC trailers for null
  • ARROW-8331 - [C++] Fix filter_benchmark.cc compilation
  • ARROW-8333 - [C++] Compile benchmarks in at least one C++ CI entry
  • ARROW-8334 - [C++][Gandiva] Missing DATE32 in LLVM Types
  • ARROW-8342 - [Python] Continue to return dict from “metadata” properties accessing KeyValueMetadata
  • ARROW-8345 - [Python] Ensure feather read/write can work without pandas installed
  • ARROW-8346 - [CI][GLib] Follow pkg-config change in Homebrew
  • ARROW-8349 - [CI][NIGHTLY:gandiva-jar-osx] Use latest pygit2
  • ARROW-8353 - [C++] Fix some compiler warnings in release builds
  • ARROW-8354 - [R] Fix segfault in Table to Array conversion
  • ARROW-8357 - [Rust][DataFusion] Add format dir to dockerfile for CLI
  • ARROW-8358 - [C++] Fix some clang-11 compiler warnings
  • ARROW-8365 - [C++] Error when writing files to S3 larger than 5 GB
  • ARROW-8366 - [Rust][Rust] Support releasing arrow-flight"
  • ARROW-8369 - [CI] Fix crossbow wildcard groups
  • ARROW-8373 - [CI][GLib] Find gio-2.0 manually on macOS
  • ARROW-8380 - Export StringDictionaryBuilder from arrow::array crate
  • ARROW-8384 - [Python][C++] Allow configuring Kerberos ticket cache path
  • ARROW-8386 - [Python] Fix error when pyarrow.jvm gets an empty vector
  • ARROW-8388 - [C++][CI] Ensure Arrow compiles with GCC 4.8
  • ARROW-8397 - [C++] Fail to compile aggregate_test.cc on Ubuntu 16.04
  • ARROW-8406 - [C++][Python] Fix file URI handling
  • ARROW-8410 - [C++] Fix compilation errors on modest ARMv8 platforms (rockpro64, rpi4)
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8415 - [C++][Packaging] Fix gandiva linux job
  • ARROW-8416 - [Python] Add feather alias for ipc format in dataset API
  • ARROW-8420 - [C++] Distinguish ARMv7 from ARMv8 in SetupCxxFlags.cmake
  • ARROW-8427 - [C++][Dataset] Only apply ignore_prefixes to selector results
  • ARROW-8428 - [C++] GCC 4.8 Implicit move-on-return failure in C++ tests
  • ARROW-8429 - [C++] Implement missing checks in IPC MessageDecoder
  • ARROW-8432 - [CI] Don't depend on a single apache mirror for dependencies
  • ARROW-8437 - [C++] Remove std::move return value from MakeRandomNullBitmap test utility
  • ARROW-8438 - [C++] Fix crash in io-memory-benchmark
  • ARROW-8439 - [Python] Update options usage in S3FileSystem docs
  • ARROW-8441 - [C++] Check invalid input in ipc::MessageDecoder
  • ARROW-8442 - [Python] Change NullType.to_pandas_dtype to return object instead of float64
  • ARROW-8460 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8465 - [Packaging][Python] Windows py35 wheel build fails because of boost
  • ARROW-8466 - [Packaging] The python unittests are not running in the windows wheel builds
  • ARROW-8468 - [C++][Documentation] Fix the incorrect null bits description
  • ARROW-8469 - [Dev] Fix nightly docker tests on azure
  • ARROW-8478 - [Java] Revert "ARROW-7534
  • ARROW-8498 - [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works
  • PARQUET-1780 - [C++] Set ColumnMetadata.encoding_stats field
  • PARQUET-1788 - Remove UBSan when rep/dev levels are null
  • PARQUET-1797 - [C++] Fix fuzzer issues
  • PARQUET-1799 - [C++] Stream API: Relax schema checking when reading
  • PARQUET-1810 - [C++] Fix undefined behaviour on invalid enum values (OSS-Fuzz)
  • PARQUET-1813 - [C++] Remove debug print statement from parquet-arrow-schema-test
  • PARQUET-1819 - [C++] Refactor decoding
  • PARQUET-1819 - [C++] Fix crashes on invalid input
  • PARQUET-1823 - [C++] Invalid RowGroup returned by parquet::arrow::FileReader
  • PARQUET-1824 - [C++] Fix crashes and undefined behaviour on invalid input
  • PARQUET-1829 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1831 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1835 - [C++] Fix crashes on invalid input

New Features and Improvements

  • ARROW-590 - [Integration][C++] Implement union types
  • ARROW-1470 - [C++] Add BufferAllocator abstract interface
  • ARROW-1560 - [C++] Kernel implementations for “match” function
  • ARROW-1571 - [C++][Compute] Optimize sorting integers in small value range
  • ARROW-1581 - [Packaging] Tooling to make nightly wheels available for install
  • ARROW-1582 - [Python] Set up + document nightly conda builds for macOS
  • ARROW-1636 - [C++][Integration] Implement integration test parsing in C++ for null type, add integration test data generation
  • ARROW-2447 - [C++] Device and MemoryManager API
  • ARROW-2882 - [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
  • ARROW-3054 - [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel
  • ARROW-3410 - [C++][Python] Add streaming CSV reader.
  • ARROW-3750 - [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate
  • ARROW-4120 - [Python] Testing utility for checking for “macro” memory leaks detectible with psutil.Process
  • ARROW-4226 - [C++] Add sparse CSF tensor support
  • ARROW-4286 - [C++/R] Namespace vendored Boost
  • ARROW-4304 - [Rust] Enhance documentation for arrow
  • ARROW-4428 - [R] Feature flags for R build
  • ARROW-4482 - [Website] Add blog archive page
  • ARROW-4815 - [Rust][DataFusion] Add support for SQL wilcard operator
  • ARROW-5357 - [Rust] Change Buffer::len to represent total bytes instead of used bytes
  • ARROW-5405 - [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript
  • ARROW-5497 - [Release] Build and publish R/Java/JS docs
  • ARROW-5501 - [R] Reorganize read/write file/stream functions
  • ARROW-5510 - [C++][Python][R][GLib] Implement Feather “V2” using Arrow IPC file format
  • ARROW-5563 - [Format] Update integration test JSON format documentation
  • ARROW-5585 - [Go] Rename TypeEquals to TypeEqual
  • ARROW-5742 - [CI][C++] Add nightly Valgrind build
  • ARROW-5757 - [Python] Remove Python 2.7 support
  • ARROW-5949 - [Rust] Implement Dictionary Array
  • ARROW-6165 - [Integration] Run integration tests on multiple cores
  • ARROW-6176 - [Python] Basic implementation of arrow_ext_class, in pure Python
  • ARROW-6275 - [C++] Deprecate RecordBatchReader::ReadNext
  • ARROW-6393 - [C++] Add EqualOptions support in SparseTensor::Equals
  • ARROW-6479 - [C++] Inline errors from externalprojects on failure
  • ARROW-6510 - [Python][Filesystem] Expose nanosecond resolution mtime
  • ARROW-6666 - [Rust] Datafusion parquet string literal support
  • ARROW-6724 - [C++] Allow simpler BufferOutputStream creation
  • ARROW-6821 - [C++][Parquet] Do not require Thrift compiler when building (but still require library)
  • ARROW-6823 - [C++][Python][R] Support metadata in the feather format?
  • ARROW-6829 - [Docs] Migrate integration test docs to Sphinx, fix instructions after ARROW-6466
  • ARROW-6837 - [C++] Add APIs to read and write “custom_metadata” field of IPC file footer
  • ARROW-6841 - [C++] Migrate to LLVM 8
  • ARROW-6875 - [FlightRPC] implement criteria for ListFlights
  • ARROW-6915 - [Developer] Do not overwrite point release fix versions with merge tool
  • ARROW-6947 - [Rust][DataFusion] Scalar UDF support
  • ARROW-6996 - [Python] Expose boolean filter kernel on ChunkedArray/RecordBatch/Table
  • ARROW-7044 - [Release] Create a post release script for the home-brew formulas
  • ARROW-7048 - [Java] Support for combining multiple vectors under VectorSchemaRoot
  • ARROW-7063 - [C++][Python] Add metadata output and toggle in PrettyPrint, add pyarrow.Schema.to_string, disable metadata output by default
  • ARROW-7073 - [Java] Support concating vectors values in batch
  • ARROW-7080 - [C++][Parquet] Read and write “field_id” attribute in Parquet files, propagate to Arrow field metadata. Assorted additional changes
  • ARROW-7091 - [C++] Move DataType factory decls to type_fwd.h
  • ARROW-7119 - [C++][CI] Show automatic backtraces
  • ARROW-7201 - [GLib][Gandiva] Add support for BooleanNode
  • ARROW-7202 - [R][CI] Improve rwinlib building on CI to stop re-downloading dependencies
  • ARROW-7222 - [Python][Release] Wipe any existing generated Python API documentation when updating website
  • ARROW-7233 - [C++] Use Result in remaining value-returning IPC APIs
  • ARROW-7256 - [C++] Remove ARROW_MEMORY_POOL_DEFAULT macro
  • ARROW-7330 - [C++] Migrate Arrow Cuda to Result
  • ARROW-7332 - [C++][Python] Propagate Arrow Status through Parquet errors
  • ARROW-7336 - [C++][Compute] fix minmax kernel options
  • ARROW-7338 - [C++] Improve InMemoryDataSource to support generator instead of static list
  • ARROW-7365 - [Python] Convert FixedSizeList in to_pandas
  • ARROW-7373 - [C++][Dataset] Remove FileSource
  • ARROW-7400 - [Java] Avoid the worst case for quick sort
  • ARROW-7412 - [C++][Dataset] Provide FieldRef to disambiguate field references
  • ARROW-7419 - [Python] Support SparseCSCMatrix
  • ARROW-7427 - [Python] Support SparseCSFTensor
  • ARROW-7428 - [Format][C++] Add serialization for CSF sparse tensors
  • ARROW-7444 - [GLib] Add LocalFileSystem support
  • ARROW-7462 - [C++] Add CpuInfo detection for Arm64 Architecture
  • ARROW-7491 - [Java] Improve the performance of aligning
  • ARROW-7499 - [C++] CMake should collect libs when making static build
  • ARROW-7501 - [C++] CMake build_thrift should build flex and bison if necessary
  • ARROW-7515 - [C++] Rename nonexistent and non_existent to not_found
  • ARROW-7524 - [C++][CI] Enable Parquet in the VS2019 GHA job
  • ARROW-7530 - [Developer] Do not include list of PR commits in commit message when using PR merge tool
  • ARROW-7534 - [Java] Create a new java/contrib module
  • ARROW-7547 - [C++][Dataset][Python] Add ParquetFileFormat options
  • ARROW-7555 - [Python] Drop support for python 2.7
  • ARROW-7587 - [C++][Compute] Implement nth_to_indices kernel
  • ARROW-7608 - [C++][Dataset] Add the ability to list files in FileSystemSource
  • ARROW-7615 - [CI][Gandiva] Ensure gandiva_jni library has only a whitelisted set of shared dependencies
  • ARROW-7616 - [Java] Support comparing value ranges for dense union vector
  • ARROW-7625 - [Parquet][GLib] Add support for writer properties
  • ARROW-7641 - [R] Make dataset vignette have executable code:
  • ARROW-7662 - [R] Support creating ListArray from R list
  • ARROW-7664 - [C++] Rework FileSystemFromUri
  • ARROW-7675 - [R][CI] Move Windows CI from Appveyor to GHA
  • ARROW-7679 - [R] Cleaner interface for creating UnionDataset
  • ARROW-7684 - [Rust] Example Flight client and server for DataFusion
  • ARROW-7685 - [Developer] Add support for GitHub Actions to Crossbow
  • ARROW-7691 - [C++] Check non-scalar Flatbuffers fields are not null
  • ARROW-7708 - [Developer][Release] Include PARQUET issues in release changelogs by scraping git history
  • ARROW-7712 - [CI][Crossbow] Delete fuzzit jobs
  • ARROW-7720 - [C++][Python] Add check_metadata argument to Table.equals
  • ARROW-7725 - [C++] Add infrastructure for unity builds and precompiled headers
  • ARROW-7726 - [CI][C++] Use boost binaries on Windows GHA build
  • ARROW-7729 - [Python][CI] Pin pandas version to 0.25 in the dask integration test
  • ARROW-7733 - [Developer] Download new enough Go locally in release verification script
  • ARROW-7735 - [Release][Python] Use pip to install dependencies for wheel verification
  • ARROW-7736 - [Release] Retry binary download on transient error
  • ARROW-7739 - [GLib] Use placement new to initialize shared_ptr object in private structs
  • ARROW-7741 - [C++] Adds parquet write support for nested types
  • ARROW-7742 - [GLib] Add support for MapArray
  • ARROW-7745 - [Doc][C++] Update Parquet documentation
  • ARROW-7749 - [C++] Link more tests together
  • ARROW-7750 - [Release] Make the source release verification script restartable
  • ARROW-7751 - [Release] macOS wheel verification also needs arrow-testing
  • ARROW-7752 - [Release] Enable and test dataset in the verification script
  • ARROW-7754 - [C++] Make Result<> faster
  • ARROW-7761 - [C++][Python] Support S3 URIs
  • ARROW-7764 - [C++] Don't keep a null bitmap in ArrayData if null_count == 0
  • ARROW-7771 - [Developer] Use ARROW_TMPDIR environment variable in the verification scripts instead of TMPDIR
  • ARROW-7774 - [Packaging][Python] Update macos and windows wheel filenames
  • ARROW-7787 - [Rust] Added .collect to Table API
  • ARROW-7788 - [C++][Parquet] Enable Arrow Schema to Parquet Schema for missing types
  • ARROW-7790 - [Website] Update how to install Linux packages
  • ARROW-7795 - [Rust] Added support for NOT
  • ARROW-7796 - [R] write_* functions should invisibly return their inputs
  • ARROW-7799 - [R][CI] Remove flatbuffers from homebrew formulae
  • ARROW-7804 - [C++][R] Compile error on macOS 10.11
  • ARROW-7812 - [Packaging][Python] Use LLVM 8 in manylinux1 wheels
  • ARROW-7817 - [CI] macOS R autobrew nightly failed on installing dependency from source
  • ARROW-7819 - [C++][Gandiva] Add DumpIR to Filter/Projector object
  • ARROW-7824 - [C++][Dataset] WriteFragments to disk
  • ARROW-7828 - [Release] Remove SSH keys for internal use
  • ARROW-7829 - [R] Test R bindings on clang
  • ARROW-7833 - [R] Make install_arrow() actually install arrow
  • ARROW-7834 - [Release] Post release task for updating the documentations
  • ARROW-7839 - [Python][Dataset] Expose IPC format in python bindings
  • ARROW-7846 - [Python][Dev] Remove dependencies on six
  • ARROW-7847 - [Website] Write a blog post about fuzzing
  • ARROW-7849 - [Packaging][Python] Remove the remaining py27 crossbow wheel tasks from the nightlies
  • ARROW-7858 - [C++][Python] Support casting from ExtensionArray
  • ARROW-7859 - [R] Minor patches for CRAN submission 0.16.0.2
  • ARROW-7860 - [C++] Support cast to/from halffloat
  • ARROW-7862 - [R] Linux installation should run quieter by default
  • ARROW-7863 - [C++][Python][CI] Ensure running HDFS related tests
  • ARROW-7864 - [R] Make sure bundled installation works even if there are system packages
  • ARROW-7865 - [R] Test builds on latest Linux versions
  • ARROW-7868 - [Crossbow] Reduce GitHub API query parallelism
  • ARROW-7869 - [Python] Remove boost::system and boost::filesystem from Python wheels
  • ARROW-7872 - [C++/Python] Support conversion of list of structs to pandas
  • ARROW-7874 - [Python][Archery] Validate docstrings with numpydoc
  • ARROW-7876 - [R] Installation fails in the documentation generation image
  • ARROW-7877 - [Packaging] Fix crossbow deployment to github artifacts
  • ARROW-7879 - [C++][Doc] Add doc for the Device API
  • ARROW-7880 - [CI][R] R sanitizer job is not really working
  • ARROW-7881 - [C++] Fix -Wpedantic warnings
  • ARROW-7882 - [C++][Gandiva] Optimise like function for substring pattern
  • ARROW-7886 - [C++][Dataset][Python][R] Consolidate Source and Dataset classes
  • ARROW-7888 - [Python] Update pyarrow.jvm to support jpype 0.7+
  • ARROW-7890 - [C++] Add Future implementation
  • ARROW-7891 - [C++][GLib][Python][R] Make uniform use of check_metadata=false default. Add Py/R/GLib bindings for RecordBatch::Equals with check_metadata
  • ARROW-7892 - [Python] Add FileSystemDataset.format attribute
  • ARROW-7895 - [Python] Remove more python 2.7 cruft
  • ARROW-7896 - [C++] Refactor from #include guards to #pragma once
  • ARROW-7897 - [Packaging] Temporarily disable artifact uploading until we fix the deployment issues
  • ARROW-7898 - [Python] Reduce the number docstring violations using numpydoc
  • ARROW-7904 - [C++][Python] Revamp metadata display, change show_metadata to verbose_metadata
  • ARROW-7907 - [Python] Add test case for previously failing code involving slicing a 0-length ChunkedArray
  • ARROW-7912 - [Format] C data interface
  • ARROW-7913 - [C++][Python][R] C++ implementation of C data interface
  • ARROW-7915 - [CI][Python] Enable development mode in tests
  • ARROW-7916 - [C++] Project IPC batches to materialized fields only
  • ARROW-7917 - [C++] Find Python 3 in CMake configuration
  • ARROW-7919 - [R] install_arrow() should conda install if appropriate
  • ARROW-7920 - [R] Fill in some missing input validation
  • ARROW-7921 - [Go] Add Reset method to various components and clean up comments.
  • ARROW-7927 - [C++] Fix ‘cpu_info.cc’ compilation warning.
  • ARROW-7929 - [C++] Align CMake target names to upstreams
  • ARROW-7930 - [CI][Python] Test jpype integration
  • ARROW-7932 - [Rust] implement array_reader for temporal types
  • ARROW-7934 - [C++] Fix UriEscape for empty string
  • ARROW-7935 - [Java] Remove Netty dependency for BufferAllocator and ReferenceManager
  • ARROW-7937 - [Python][Packaging] Remove boost from the macos wheels
  • ARROW-7941 - [Rust][DataFusion] Add support for named columns in logical plan
  • ARROW-7943 - [C++][Parquet] Add code to generate rep/def levels for nested arrays
  • ARROW-7947 - [Rust][Flight][DataFusion] Implement get_schema example
  • ARROW-7949 - [Git] Ignore macOS specific file: ‘Brewfile.lock.json’
  • ARROW-7951 - [Python] Expose BYTE_STREAM_SPLIT in pyarrow
  • ARROW-7959 - [Ruby] Add support for Ruby 2.3 again
  • ARROW-7963 - [C++][Dataset][Python] Expose Dataset Fragments to Python
  • ARROW-7965 - [Python] Refine higher level dataset API
  • ARROW-7966 - [FlightRPC][C++] Validate individual batches in integration
  • ARROW-7969 - [Packaging] Use cURL to upload artifacts
  • ARROW-7970 - [Packaging][Python] Use system boost to build the macOS wheels
  • ARROW-7971 - [Rust] Create rowcount utility
  • ARROW-7977 - [C++] Rename fs::FileStats to fs::FileInfo
  • ARROW-7979 - [C++] Add experimental buffer compression to IPC write path. Add “field” selection to read path. Migrate some APIs to Result. Read/write Message metadata
  • ARROW-7982 - [C++] Add function VisitArrayDataInline() helper
  • ARROW-7983 - [CI][R] Nightly builds should be more verbose when they fail
  • ARROW-7984 - [R] Check for valid inputs in more places
  • ARROW-7986 - [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector
  • ARROW-7987 - [CI][R] Fix for verbose nightly builds
  • ARROW-7988 - [R] Fix on.exit calls in reticulate bindings
  • ARROW-7991 - [C++][Plasma] Allow option for evicting if full when creating an object
  • ARROW-7993 - [Java] Support decimal type in ComplexCopier
  • ARROW-7994 - [CI][C++][GLib][Ruby] Move MinGW CI to GitHub Actions from AppVeyor
  • ARROW-7995 - [C++] Add facility to coalesce and cache reads
  • ARROW-7998 - [C++][Plasma] Make Seal requests synchronous
  • ARROW-8005 - [Tools] Update apache mirror links
  • ARROW-8014 - [C++] Provide CMake targets exercising tests with a label
  • ARROW-8016 - [Developer] Fix jira-python deprecation warning in merge_arrow_pr.py
  • ARROW-8018 - [C++][Parquet]Parquet Modular Encryption
  • ARROW-8024 - [R] Bindings for BinaryType and FixedSizeBinaryType
  • ARROW-8026 - [Python] Support memoryview as a value type for creating binary-like arrays
  • ARROW-8027 - [Integration] Add test case for duplicated field names
  • ARROW-8028 - [Go] Allow duplicate field names in schemas and nested types
  • ARROW-8030 - [Plasma] Uniform comments style
  • ARROW-8035 - [Developer][Integration] Add integration tests for extension types
  • ARROW-8039 - [Python] Use dataset API in existing parquet readers and tests
  • ARROW-8044 - [CI][NIGHTLY:gandiva-jar-osx] Pin pygit2 at 1.0.3 for OSX
  • ARROW-8055 - [GLib][Ruby] Add some metadata bindings to GArrowSchema
  • ARROW-8058 - [Dataset] Relax DatasetFactory discovery validation
  • ARROW-8059 - [Python] Make FileSystem objects serializable
  • ARROW-8060 - [Python] Make dataset Expression objects serializable
  • ARROW-8061 - [C++][Dataset] Provide RowGroup fragments for ParquetFileFormat
  • ARROW-8063 - [Python][Dataset] Start user guide for pyarrow.dataset
  • ARROW-8064 - [Dev] Implement Comment bot via Github actions
  • ARROW-8069 - [C++] Should the default value of “check_metadata” arguments of Equals methods be “true”?
  • ARROW-8072 - [Plasma] Add const for plasma protocol
  • ARROW-8077 - [Python][Packaging] Add Windows Python 3.5 wheel build script
  • ARROW-8079 - [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant
  • ARROW-8080 - [C++] Add ARROW_SIMD_LEVEL option
  • ARROW-8082 - [Plasma] Add JNI list() interface
  • ARROW-8083 - [GLib] Add support for Peek() to GIOInputStream
  • ARROW-8086 - [Java] Support writing decimal from big endian byte array in UnionListWriter
  • ARROW-8087 - [C++][Dataset] Partitioning schema fields follow paths' segment ordering
  • ARROW-8096 - [C++][Gandiva] fix TreeExprBuilder::MakeNull to create node for interval type
  • ARROW-8097 - [Dev] Comment bot's crossbow command acts on the master branch
  • ARROW-8103 - [R] Make default Linux build more minimal
  • ARROW-8104 - [C++] Don't install bundled Thrift
  • ARROW-8107 - [Packaging][APT] Use HTTPS for LLVM APT repository for Debian GNU/Linux stretch
  • ARROW-8109 - [Packaging][APT] Drop support for Ubuntu Disco
  • ARROW-8117 - [Datafusion][Rust] allow cast SQLTimestamp to Timestamp
  • ARROW-8118 - [R] dim method for FileSystemDataset
  • ARROW-8120 - [Packaging][APT] Add support for Ubuntu Focal
  • ARROW-8123 - [Rust][DataFusion] Add LogicalPlanBuilder
  • ARROW-8124 - [Rust] Update library dependencies
  • ARROW-8126 - [C++][Compute] Add nth-to-indices kernel benchmark
  • ARROW-8129 - [C++][Compute] Refine compare sort kernel
  • ARROW-8130 - [C++][Gandiva] fix dex visitor to handle interval type
  • ARROW-8140 - [Dev] Follow class name change
  • ARROW-8141 - [C++] speed unpack1_32 using intrinsics API
  • ARROW-8145 - [C++] Rename FileSystem::GetTargetInfos to GetFileInfo
  • ARROW-8146 - [C++] Add per-filesystem facility to sanitize a path
  • ARROW-8150 - [Rust] Allow writing custom FileMetaData k/v pairs
  • ARROW-8151 - [Dataset][Benchmarking] benchmark S3File performance
  • ARROW-8153 - [Packaging] Update the conda feedstock files and upload artifacts to Anaconda
  • ARROW-8158 - [Java] Getting length of data buffer and base variable width vector
  • ARROW-8164 - [C++][Dataset] Provide Dataset::ReplaceSchema()
  • ARROW-8165 - [Packaging] Make nightly wheels available on a PyPI server
  • ARROW-8167 - [CI] Add support for skipping builds with skip pattern in pull request title
  • ARROW-8168 - [Java][Plasma] Improve Java Plasma client off-heap memory usage
  • ARROW-8177 - [rust] Make schema_to_fb_offset public because it is very useful!
  • ARROW-8178 - [C++] Update to Flatbuffers 1.12.0
  • ARROW-8179 - [R] Windows build script tweaking for nightly packaging on GHA
  • ARROW-8181 - [Java][FlightRPC] Expose transport error metadata
  • ARROW-8182 - [Packaging] Increment the version number detected from the latest git tag
  • ARROW-8183 - [C++][Python][FlightRPC] Expose transport error metadata
  • ARROW-8184 - [Packaging] Use arrow-nightlies organization name on Anaconda and Gemfury to host the nightlies
  • ARROW-8185 - [Packaging] Document the available nightly wheels and conda packages
  • ARROW-8187 - [R] Make test assertions robust to i18n
  • ARROW-8191 - [Packaging][APT] Fix cmake removal in Debian GNU/Linux Stretch
  • ARROW-8192 - [C++] script for unpack avx512 intrinsics code
  • ARROW-8194 - [CI] Run tests in parallel on Github Actions
  • ARROW-8195 - [CI][C++][MSVC] Use preinstalled Boost
  • ARROW-8198 - [C++] Format Diff of NullArrays
  • ARROW-8200 - [GLib] Rename garrow_file_system_target_info{,s}() to ..._file_info{,s}()
  • ARROW-8203 - [C#] Use the latest SourceLink
  • ARROW-8204 - [Rust][DataFusion] Add support for aliased expressions in SQL
  • ARROW-8207 - [Packaging][wheel] Use LLVM 8 in manylinux2010 and manylinux2014
  • ARROW-8215 - [CI][GLib] Fix install error on macOS
  • ARROW-8218 - [C++] Decompress record batch messages in parallel at field level. Only allow LZ4_FRAME, ZSTD compression
  • ARROW-8220 - [Python] Make dataset FileFormat objects serializable
  • ARROW-8222 - [C++] Use bcp to make a slim boost for bundled build
  • ARROW-8224 - [C++] Remove APIs deprecated prior to 0.16.0
  • ARROW-8225 - [Rust] Continuation marker check was in wrong location.
  • ARROW-8225 - [Rust] Rust Arrow IPC reader must respect continuation markers.
  • ARROW-8227 - [C++] Refine SIMD feature definitions
  • ARROW-8231 - [Rust] Parse parquet key_value_metadata
  • ARROW-8232 - [Python] Deprecate pyarrow.open_stream and pyarrow.open_file APIs in favor of accessing via pyarrow.ipc namespace
  • ARROW-8235 - [C++][Compute] Filter out nulls by default
  • ARROW-8241 - [Rust] Add Schema convenience methods index_of and field_with_name
  • ARROW-8242 - [C++] Flight fails to compile on GCC 4.8
  • ARROW-8243 - [Rust][DataFusion] Fix inconsistency in LogicalPlanBuilder api
  • ARROW-8244 - [Python] Fix parquet.write_to_dataset to set file path in metadata_collector
  • ARROW-8246 - [C++] Add -Wa,-mbig-obj to CXXFLAGS on MinGW if it is supported
  • ARROW-8247 - [Python] Expose Parquet writing “engine” setting in pyarrow.parquet.write_table
  • ARROW-8249 - [Rust][DataFusion] Table API now uses LogicalPlanBuilder
  • ARROW-8252 - [CI][Ruby] Add Ubuntu 20.04
  • ARROW-8256 - [Rust][DataFusion] Update CLI documentation for 0.17.0 release
  • ARROW-8264 - [Rust][DataFusion] Add utility for printing batches
  • ARROW-8266 - [C++] Provide backup mirrors for thrift externalproject
  • ARROW-8267 - [CI][GLib] Fix build error on Ubuntu 16.04
  • ARROW-8271 - [Packaging] Allow wheel upload failures to gemfury
  • ARROW-8275 - [Python] Update Feather documentation for V2, Python IPC API cleanups / deprecations
  • ARROW-8277 - [Python] implemented eq, repr, and provided a wrapper of Take() for RecordBatch
  • ARROW-8279 - [C++] Do not export Codec implementation symbols, remove codec-specific headers
  • ARROW-8288 - [Python] Expose with_ modifiers on DataType
  • ARROW-8290 - [Python] Improve FileSystemDataset constructor
  • ARROW-8291 - [Packaging] Conda nightly builds can't locate Numpy
  • ARROW-8292 - [Python] Allow to manually specify schema in dataset() function
  • ARROW-8294 - [Flight] Add DoExchange to Flight.proto
  • ARROW-8295 - [C++][Dataset] Push down projection to IpcReadOptions
  • ARROW-8299 - [C++] Reusable “optional ParallelFor” function for optional use of multithreading
  • ARROW-8300 - [R] Documentation and changelog updates for 0.17
  • ARROW-8307 - [Python] Add memory_map= option to pyarrow.feather.read_table
  • ARROW-8308 - [Rust] Implement DoExchange on examples
  • ARROW-8309 - [CI] C++/Java/Rust workflows should trigger on changes to Flight.proto
  • ARROW-8311 - [C++] Add push style stream format reader
  • ARROW-8316 - [CI] Set docker-compose to use docker-cli instead of docker-py for building images
  • ARROW-8319 - [CI] Install thrift compiler in the debian build
  • ARROW-8320 - [Format] Add clarification to CDataInterface.rst regarding memory alignment of buffers
  • ARROW-8321 - [CI] Use bundled thrift in Fedora 30 build
  • ARROW-8322 - [CI] Fix C# workflow file syntax
  • ARROW-8325 - [R][CI] Stop including boost in R windows bundle
  • ARROW-8329 - [Documentation][C++] Undocumented FilterOptions argument in Filter kernel
  • ARROW-8330 - [Documentation] The post release script generates the documentation with a development version
  • ARROW-8332 - [C++] Don't require Thrift compiler for Parquet build
  • ARROW-8335 - [Release] Add crossbow jobs to run release verification
  • ARROW-8336 - [Packaging][deb] Use libthrift-dev on Debian 10 and Ubuntu 19.10 or later
  • ARROW-8341 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8343 - [GLib] Add GArrowRecordBatchIterator
  • ARROW-8347 - [C++] Migrate Array methods to Result
  • ARROW-8351 - [R][CI] Store the Rtools-built Arrow C++ library as a build artifact
  • ARROW-8352 - [R] Add install_pyarrow()
  • ARROW-8356 - [Developer] Support * wildcards with “crossbow submit” via GitHub actions
  • ARROW-8361 - [C++] Add Result APIs to Buffer methods and functions
  • ARROW-8362 - [Crossbow] Ensure that the locally generated version is used in the docker tasks
  • ARROW-8367 - [C++] Deprecate Buffer::FromString(..., MemoryPool*)
  • ARROW-8368 - [C++][C Data Interface] Move several child arrays
  • ARROW-8370 - [C++] Migrate type/schema APIs to Result
  • ARROW-8371 - [Crossbow] Implement and exercise sanity checks for tasks.yml
  • ARROW-8372 - [C++] Migrate Table and RecordBatch APIs to Result
  • ARROW-8375 - [CI][R] Make Windows tests more verbose in case of segfault
  • ARROW-8376 - [R] Add experimental interface to ScanTask/RecordBatch iterators
  • ARROW-8387 - [Rust] Make schema_to_fb public
  • ARROW-8389 - [Integration] Run tests in parallel
  • ARROW-8390 - [R] Expose schema unification features
  • ARROW-8393 - [C++][Gandiva] Make gandiva function registry case-insensitive
  • ARROW-8396 - [Rust] Removes libc dependency
  • ARROW-8398 - [Python] Remove deprecated API usage from python tests
  • ARROW-8401 - [C++] Add byte-stream-split AVX2/AVX512 implementation
  • ARROW-8403 - [C++] Add ToString() to ChunkedArray, Table and RecordBatch
  • ARROW-8407 - [Rust] Add documentation for Dictionary data type
  • ARROW-8408 - [Python] Add memory_map argument to feather.read_feather
  • ARROW-8409 - [R] Add R wrappers for getting and setting global CPU thread pool capacity
  • ARROW-8412 - [C++][Gandiva] Fix gandiva date_diff function definitions
  • ARROW-8433 - [R] Add feather alias for ipc format in dataset API
  • ARROW-8444 - [Documentation] Fix spelling errors across the codebase
  • ARROW-8449 - [R] Use CMAKE_UNITY_BUILD everywhere
  • ARROW-8450 - [Integration][C++] Implement large offsets types
  • ARROW-8457 - [C++] Add expected results for ArrowSchema in big-endian
  • ARROW-8458 - [C++] Prefer the original mirrors for the bundled thirdparty dependencies
  • ARROW-8461 - [Packaging][deb] Use zstd package for Ubuntu Xenial
  • ARROW-8463 - [CI] Balance the nightly test builds between CircleCI, Azure and Github
  • ARROW-8679 - [Python] supporting pandas sparse series in pyarrow
  • PARQUET-458 - [C++][Parquet] Add support for reading/writing DataPageV2 format
  • PARQUET-1663 - [C++] Provide API to check the presence of repeated fields
  • PARQUET-1716 - [C++] Add BYTE_STREAM_SPLIT encoder and decoder
  • PARQUET-1770 - [C++][CI] Add fuzz target for reading Parquet files
  • PARQUET-1785 - [C++] Implement ByteStreamSplitDecoder::DecodeArrow and refactor tests
  • PARQUET-1786 - [C++] Improve ByteStreamSplit decoder using SSE2
  • PARQUET-1806 - [C++] Improve fuzzing seed corpus
  • PARQUET-1825 - [C++] Fix compilation error in column_io_benchmark.cc
  • PARQUET-1828 - [C++] Use SSE2 for the ByteStreamSplit encoder
  • PARQUET-1840 - [C++] Stop Early on DecodeSpaced

Apache Arrow 0.16.0 (2020-02-07)

Bug Fixes

  • ARROW-3783 - [R] Incorrect collection of float type
  • ARROW-3962 - [Go] Handle null values in CSV
  • ARROW-4470 - [Python] Pyarrow using considerable more memory when reading partitioned Parquet file
  • ARROW-4998 - [R] R package fails to install on OSX
  • ARROW-5575 - [C++] Split Targets.cmake for each module
  • ARROW-5655 - [Python] Table.from_pydict/from_arrays not using types in specified schema correctly
  • ARROW-5680 - [Rust][DataFusion] GROUP BY sql tests are now deterministic
  • ARROW-6157 - [C++] Array data validation
  • ARROW-6195 - [C++] Detect Apache mirror without Python
  • ARROW-6298 - [Rust] [CI] Examples are not being tested in CI
  • ARROW-6320 - [C++] Arrow utilities are linked statically
  • ARROW-6429 - [CI][Crossbow] Nightly spark integration job fails
  • ARROW-6445 - [CI][Crossbow] Nightly Gandiva jar trusty job fails
  • ARROW-6567 - [Rust][DataFusion] Wrap aggregate in projection when needed
  • ARROW-6581 - [C++] Fix fuzzit job submission
  • ARROW-6704 - [C++] Check for out of bounds timestamp in unsafe cast
  • ARROW-6708 - [C++] Fix hardcoded boost library names
  • ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
  • ARROW-6736 - [Rust][DataFusion] Evaluate the input to the aggregate expression just once per batch
  • ARROW-6740 - [C++] Unmap MemoryMappedFile as soon as possible
  • ARROW-6745 - [Rust] Fix a variety of minor typos.
  • ARROW-6749 - [Python] Let Array.to_numpy use general conversion code with zero_copy_only=True
  • ARROW-6750 - [Python] Silence S3 error logs by default
  • ARROW-6761 - [Rust] Travis build now uses the correct Rust toolchain
  • ARROW-6762 - [C++] Support reading JSON files with no newline at end
  • ARROW-6785 - [JS] Remove superfluous child assignment
  • ARROW-6786 - [C++] arrow-dataset-file-parquet-test is slow
  • ARROW-6795 - [C#] Fix for reading large (2GB+) files
  • ARROW-6798 - [CI] [Rust] Improve build times by caching dependencies in the Docker image
  • ARROW-6801 - [Rust] Arrow source release tarball is missing benchmarks
  • ARROW-6806 - [C++][Python] Fix crash validating an IPC-originating empty array
  • ARROW-6808 - [Ruby] Ensure requiring suitable MSYS2 package
  • ARROW-6809 - [RUBY] Gem does not install on macOS due to glib2 3.3.7 compilation failure
  • ARROW-6812 - [Java] Fix License header
  • ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
  • ARROW-6820 - [Format] Update Map type child to “entries”
  • ARROW-6834 - [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds
  • ARROW-6835 - [Archery][CMake] Restore ARROW_LINT_ONLY cmake option
  • ARROW-6842 - [Website] Jekyll error building website
  • ARROW-6844 - [C++][Parquet] Fix regression in reading List types with item name that is not “item”
  • ARROW-6846 - [C++] Build failures with glog enabled
  • ARROW-6857 - [C++] Fix DictionaryEncode for zero-chunk ChunkedArray
  • ARROW-6859 - [CI][Nightly] Disable docker layer caching for CircleCI tasks
  • ARROW-6860 - [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so
  • ARROW-6861 - [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder
  • ARROW-6864 - [C++] Add compression-related compile definitions before adding any unit tests
  • ARROW-6867 - [FlightRPC][Java] clean up default executor
  • ARROW-6868 - [Go] Fix slicing struct arrays
  • ARROW-6869 - [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add “FinishDelta” method and “ResetFull” method
  • ARROW-6873 - [Python] Remove stale CColumn references
  • ARROW-6874 - [Python] Fix memory leak when converting to Pandas object data
  • ARROW-6876 - [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns
  • ARROW-6877 - [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions
  • ARROW-6878 - [Python] Fix creating array from list of dicts with bytes keys
  • ARROW-6882 - [C++] Ensure the DictionaryArray indices has no dictionary data
  • ARROW-6885 - [Python] Remove superfluous skipped timedelta test
  • ARROW-6886 - [C++] Fix arrow::io nvcc compiler warnings
  • ARROW-6898 - [Java][hotfix] fix ArrowWriter memory leak
  • ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
  • ARROW-6899 - [Python] Decode dictionary-encoded List children to dense when converting to pandas
  • ARROW-6901 - [Rust][Parquet] Increment total_num_rows when closing a row group
  • ARROW-6903 - [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc
  • ARROW-6905 - [Gandiva][Crossbow] Use xcode9.4 for osx builds, do not build dataset, filesystem
  • ARROW-6910 - [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this
  • ARROW-6913 - [R] Potential bug in compute.cc
  • ARROW-6914 - [CI] docker-clang-format nightly failing
  • ARROW-6922 - [Python] Compat with pandas for MultiIndex.levels.names
  • ARROW-6925 - [C++] Only add -stdlib flag on MacOS when using clang.
  • ARROW-6929 - [C++] Remove first offset==0 check from Validate()
  • ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
  • ARROW-6938 - [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues
  • ARROW-6948 - [Rust][Parquet] Fix boolean array in arrow reader.
  • ARROW-6950 - [C++][Dataset] Add dataset benchmark example
  • ARROW-6957 - [CI][Crossbow] Nightly R with sanitizers build fails installing dependencies
  • ARROW-6962 - [C++][CI] Stop compiling with -Weverything
  • ARROW-6966 - [Go] Set a default memset for when the platform doesn't set one
  • ARROW-6977 - [C++] Disable jemalloc background_thread on macOS
  • ARROW-6983 - [C++] Fix ThreadedTaskGroup lifetime issue
  • ARROW-6989 - [Python] Check for out of range precision decimals in python conversion
  • ARROW-6992 - [C++] : Undefined Behavior sanitizer build option fails with GCC
  • ARROW-6999 - [Python] Fix unnamed index when specifying schema in Table.from_pandas
  • ARROW-7013 - [C++] arrow-dataset pkgconfig is incomplete
  • ARROW-7020 - [Java] Fix the bugs when calculating vector hash code
  • ARROW-7021 - [Java] UnionFixedSizeListWriter decimal type should check writer index
  • ARROW-7022 - , ARROW-7023: [Python] fix handling of pandas Index and Period/Interval extension arrays in pa.array
  • ARROW-7023 - [Python] pa.array does not use “from_pandas” semantics for pd.Index
  • ARROW-7024 - [CI][R] Update R dependencies for Conda build
  • ARROW-7027 - [Python] Correctly raise error in pa.table(..) on invalid input
  • ARROW-7033 - [C++] Set SDKROOT automatically on macOS
  • ARROW-7045 - [R] Preserve factor in Parquet roundtrip
  • ARROW-7050 - [R] Fix compiler warnings in R bindings
  • ARROW-7053 - [Python] setuptools-scm produces incorrect version at apache-arrow-0.15.1 tag
  • ARROW-7056 - [Python] Fix test_fs failures when S3 not enabled
  • ARROW-7059 - [C++][Parquet] Mostly fix performance regression when reading Parquet file with many columns
  • ARROW-7074 - [C++] ASSERT_OK_AND_ASSIGN should use ASSERT_OK instead of EXPE…
  • ARROW-7077 - [C++] Casting dictionary to unrelated value type shouldn't crash
  • ARROW-7087 - [Python] Metadata disappear from pandas dataset
  • ARROW-7097 - [Rust][CI] Apply rustfmt nightly
  • ARROW-7100 - [C++][HDFS] Fix search directories for libjvm.so
  • ARROW-7105 - [CI][Crossbow] Nightly homebrew-cpp job fails
  • ARROW-7106 - [Java] Fix the problem that flight perf test hangs endlessly
  • ARROW-7117 - [C++][CI] Fix the hanging C++ tests in Windows 2019
  • ARROW-7128 - [CI] Use proper version for fedora tests in GitHub actions cron jobs
  • ARROW-7133 - [CI] Allow GH Actions to run on all branches
  • ARROW-7142 - [C++] GCC compilation failures in nightlies
  • ARROW-7152 - [Java] Delete useless class DiffFunction
  • ARROW-7157 - [R] Add validation, helpful error message to Object$new()
  • ARROW-7158 - [C++] Use compiler information provided by CMake
  • ARROW-7163 - [Doc] Fix double-and typos
  • ARROW-7164 - [CI] Dev cron github action is failing every 15 minutes
  • ARROW-7167 - [CI][Python] Add nightly tests for additional pandas versions to Github Actions
  • ARROW-7168 - [Python] Respect the specified dictionary type for pd.Categorical conversion
  • ARROW-7170 - [C++] Fix linking with bundled ORC
  • ARROW-7180 - [CI] Java builds are not triggered on the master branch
  • ARROW-7181 - [C++] Fix an Arrow module search bug with pkg-config
  • ARROW-7183 - [CI][Crossbow] Re-skip r-sanitizer nightly tests
  • ARROW-7187 - [C++][Doc] doxygen broken on master because of @
  • ARROW-7188 - [C++][Doc] doxygen broken on master: missing param implicit_casts
  • ARROW-7189 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-7194 - [Rust] Fix CSV writer recursion issues
  • ARROW-7199 - [Java] Fix ConcurrentModificationException in BaseAllocator::getChildAllocators
  • ARROW-7200 - [C++][Flight] Enable the server to serve to remote clients
  • ARROW-7209 - [Python] Fix tests on pandas master related to extension dtype conversion
  • ARROW-7212 - [Go] add missing Release to benchmark code
  • ARROW-7214 - [Python] Fix pickling of DictionaryArray
  • ARROW-7217 - [CI][Python] Use correct python version in Github Actions
  • ARROW-7225 - [C++] Fix *std::move(Result<T>) for move-only T
  • ARROW-7249 - [CI] Release test fails in master due to new arrow-flight Rust crate
  • ARROW-7250 - [C++] Define constexpr symbols explicitly in StringToFloatConverter::Impl
  • ARROW-7253 - [CI] Fix failure in release test
  • ARROW-7254 - [Java] BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
  • ARROW-7264 - [Java] RangeEqualsVisitor type check is not correct
  • ARROW-7266 - [C++] Fix ArrayDataVisitor on sliced binary-like array
  • ARROW-7271 - [C++][Flight] Use the single parameter version of SetTotalBytesLimit
  • ARROW-7281 - [C++] Make Adaptive builders' length match expectations
  • ARROW-7282 - [Python] IO functions should raise the right exceptions
  • ARROW-7291 - [Dev] Fix FORMAT_DIR
  • ARROW-7294 - [Python] converted_type_name_from_enum(): Incorrect name for INT_64
  • ARROW-7295 - [R] Fix bad test that causes failure on R < 3.5
  • ARROW-7298 - [C++] Fix thirdparty dependency downloader script
  • ARROW-7314 - [Python] Fix compiler warning in pyarrow.union
  • ARROW-7318 - [C#] TimestampArray serialization failure
  • ARROW-7320 - [C++] Specify CMAKE_INSTALL_LIBDIR for gbenchmark
  • ARROW-7327 - [CI] Failing C GLib and R buildbot builders
  • ARROW-7328 - [CI] GitHub Actions should trigger on changes to GitHub Actions configuration
  • ARROW-7341 - [CI] Unbreak nightly Conda R job
  • ARROW-7343 - [Java][FlightRPC] prevent leak in DoGet
  • ARROW-7349 - [C++] Fix the bug of parsing string hex values
  • ARROW-7353 - [C++] Ignore -Wmissing-braces when building with clang
  • ARROW-7354 - [C++] Fix crash in test-io-hdfs
  • ARROW-7355 - [CI] Environment variables are defined twice for the fuzzit builds
  • ARROW-7358 - [CI] [Dev] [C++] ccache disabled on conda-python-hdfs
  • ARROW-7359 - [C++][Gandiva] Don't throw error for locate function for start position exceeding string length
  • ARROW-7360 - [R] Can't use dplyr filter() with variables defined in parent scope
  • ARROW-7361 - [Rust] Build directory is not passed to ci/scripts/rust_test.sh
  • ARROW-7362 - [Python][C++] Added ListArray.Flatten() that properly flattens a ListArray
  • ARROW-7374 - [Dev][C++] Fix cuda-cpp docker build
  • ARROW-7381 - [C++] Unbreak manylinux1 wheels after Iterator refactor
  • ARROW-7386 - [C#] Array offset does not work properly
  • ARROW-7388 - [Python] Skip HDFS tests if libhdfs cannot be located
  • ARROW-7389 - [Python][Packaging] Remove pyarrow.s3fs import check from the recipe
  • ARROW-7393 - [Plasma] Fix plasma executable name in plasma_java build
  • ARROW-7395 - [C++] Do not warn or error on logical “or” with constants
  • ARROW-7397 - [C++][JSON] Fix white space length detection error
  • ARROW-7404 - [C++][Gandiva] Fix utf8 char length error on Arm64
  • ARROW-7406 - [Java] NonNullableStructVector#hashCode should pass hasher to child vectors
  • ARROW-7407 - [Python] Declare NumPy a PEP517 build dependency
  • ARROW-7408 - [C++] Fix compilation of reference benchmarks
  • ARROW-7435 - [C++] Validate all list / binary offsets in ValidateFull()
  • ARROW-7436 - [Archery] Enable more benchmark binaries in archery benchmark
  • ARROW-7437 - [Java] ReadChannel#readFully does not set writer index correctly
  • ARROW-7442 - [Ruby] Add abstract type check to Arrow::DataType.resolve
  • ARROW-7447 - [Java] ComplexCopier does incorrect copy in some cases
  • ARROW-7450 - [C++] Also link boost_filesystem when using static test linkage
  • ARROW-7458 - [GLib] Fix incorrect build dependency in Makefile
  • ARROW-7471 - [CI][Python] Run flake8 on Cython files
  • ARROW-7472 - [Java] Fix some incorrect behavior in UnionListWriter
  • ARROW-7478 - [Rust][DataFusion] Group by expression ignored unless paired with aggregate expression
  • ARROW-7492 - [CI][Crossbow] Nightly homebrew-cpp job fails on Python installation
  • ARROW-7497 - [Python] Stop relying on (deprecated) pandas.util.testing, move to pandas.testing
  • ARROW-7500 - [C++][Dataset] Remove std::regex usage
  • ARROW-7503 - [Rust][Parquet] Fix build failures
  • ARROW-7506 - [Java] JMH benchmarks should be called from main methods
  • ARROW-7508 - [C#] DateTime32 Reading is Broken
  • ARROW-7510 - [C++] Make ArrayData::null_count thread-safe
  • ARROW-7516 - [C#] Fix .NET Benchmarks
  • ARROW-7518 - [Python] Use PYARROW_WITH_HDFS when building wheels, conda packages
  • ARROW-7527 - [Python] Fix pandas/feather tests for unsupported types with pandas master
  • ARROW-7528 - [Python] Remove usage of deprecated pd.np and pd.datetime in tests
  • ARROW-7535 - [C++] Fix ASAN failures in Array::Validate()
  • ARROW-7543 - [R] Fixes R arrow::write_parquet() documentation code examples
  • ARROW-7545 - [C++] [Dataset] Scanning dataset with dictionary type hangs
  • ARROW-7551 - [FlightRPC][C++] Flight test on macOS fails due to Homebrew gRPC
  • ARROW-7552 - [C++][CI] Disable timing-sensitive tests on public CI
  • ARROW-7554 - [C++] Add support for building on FreeBSD
  • ARROW-7559 - [Rust] Incorrect index check assertion in StringArray and BinaryArray
  • ARROW-7561 - [Doc][Python] Add missing conda_env_gandiva.yml in python.rst
  • ARROW-7563 - [Rust] failed to select a version for `byteorder`
  • ARROW-7582 - [Rust][Flight] Unable to compile arrow.flight.protocol.rs
  • ARROW-7583 - [FlightRPC][C++] relax auth tests due to nondeterminism
  • ARROW-7591 - [Python] Fix DictionaryArray.to_numpy() to return decoded numpy array
  • ARROW-7592 - [C++] Fix crashes on corrupt IPC input
  • ARROW-7593 - [CI][Python] Python datasets failing / not run on CI
  • ARROW-7595 - [R][CI] R appveyor job fails due to pacman compression change
  • ARROW-7596 - [Python] Only permit zero-copy DataFrame block construction when split_blocks=True
  • ARROW-7599 - [Java] Fix build break due to change in RangeEqualsVisitor
  • ARROW-7603 - [Packaging][RPM] Add workaround for LLVM on CentOS 8
  • ARROW-7611 - [Packaging][Python] Fix artifacts patterns for wheel
  • ARROW-7612 - [Packaging][Python] Fix artifacts path for Conda on Windows
  • ARROW-7614 - [Python] Limit size of data in test_parquet.py::test_set_data_page_size
  • ARROW-7618 - [C++] Fix crashes or undefined behaviour on corrupt IPC input
  • ARROW-7620 - [Rust] Remove call to flatc
  • ARROW-7621 - [Doc] Fix doc build
  • ARROW-7634 - [Python] Run pyarrow.dataset tests on Appveyor + fix failures to parse Windows file paths
  • ARROW-7638 - [C++][Dataset] Fix a segfault in DirectoryPartitioningFactory
  • ARROW-7639 - [R] Cannot convert Dictionary Array to R when values aren't strings
  • ARROW-7640 - [C++][Dataset][Parquet] Detect missing compression support
  • ARROW-7647 - [C++] Repair JSON parser's handling of ListArrays
  • ARROW-7650 - [C++][Dataset] enable dataset tests on Windows
  • ARROW-7651 - [CI][Crossbow] Nightly macOS wheel builds fail
  • ARROW-7652 - [Python][Dataset] Use implicit cast in ScannerBuilder.filter
  • ARROW-7661 - [Python] Test for optimal CSV chunking
  • ARROW-7689 - [FlightRPC][C++] bump bundled gRPC to 1.25 to fix MacOS test failure
  • ARROW-7690 - [R] Cannot write parquet to OutputStream
  • ARROW-7693 - [CI] Fix test name for Spark integration, add new tests
  • ARROW-7709 - [Python] Preserve column name in conversion from Table column to pandas for non-ns timestamps
  • ARROW-7714 - [Release] Add missing variable expansion
  • ARROW-7718 - [Release] Fix auto-retry in the binary release script
  • ARROW-7723 - [Python] Triage untested functional regression when converting tz-aware timestamp inside struct to pandas/NumPy format
  • ARROW-7727 - [Python] Unable to read a ParquetDataset when schema validation is on.
  • ARROW-8135 - [Python] Problem importing PyArrow on a cluster
  • ARROW-8638 - Arrow Cython API Usage Gives an error when calling CTable API Endpoints
  • PARQUET-1692 - [C++] Don't use the same CMake variable name for thirdparty version and found version
  • PARQUET-1692 - [C++] LogicalType::FromThrift error on Centos 7 RPM
  • PARQUET-1693 - [C++] Fix parquet examples with compression define guards
  • PARQUET-1702 - [C++] Make BufferedRowGroupWriter compatible with parquet encryption
  • PARQUET-1706 - [C++] Wrong dictionary_page_offset when writing only data pages via BufferedPageWriter
  • PARQUET-1707 - [C++] : parquet-arrow-test fails with UBSAN
  • PARQUET-1709 - [C++] Avoid unnecessary temporary std::shared_ptr copies
  • PARQUET-1715 - [C++] Add the Parquet code samples to CI + Refactor Parquet Encryption Samples
  • PARQUET-1720 - [C++] JSONPrint not showing version correctly
  • PARQUET-1747 - [C++] Access to ColumnChunkMetaData fails when encryption is on
  • PARQUET-1766 - [C++] Handle parquet::Statistics NaNs and -0.0f as per upstream parquet-mr
  • PARQUET-1772 - [C++] ParquetFileWriter: Data overwritten in append mode

New Features and Improvements

  • ARROW-412 - [Format][Documentation] Clarify that Buffer.size in Flatbuffers should reflect the actual memory size rather than the padded size
  • ARROW-501 - [C++] Implement concurrent / buffering InputStream for streaming data use cases
  • ARROW-772 - [C++] Implement take kernel functions
  • ARROW-843 - [C++][Dataset] Ensure Schemas are unified in DataSourceDiscovery
  • ARROW-976 - [C++][Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes
  • ARROW-1036 - [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)
  • ARROW-1119 - [Python/C++] Implement NativeFile interfaces for Amazon S3
  • ARROW-1175 - [Java] Implement/test dictionary-encoded subfields
  • ARROW-1456 - [Python] Run s3fs unit tests in Travis CI
  • ARROW-1562 - [C++] Numeric kernel implementations for add
  • ARROW-1638 - [Java] IPC roundtrip for null type
  • ARROW-1900 - [C++] Add kernel for min / max
  • ARROW-2428 - [Python] Support pandas ExtensionArray in Table.to_pandas conversion
  • ARROW-2602 - [Packaging] Automate build of development docker containers
  • ARROW-2863 - [Python] Add context manager APIs to RecordBatch*Writer/Reader classes
  • ARROW-3085 - [Rust] Add an adapter for parquet.
  • ARROW-3408 - [C++] Add CSV option to automatically attempt dict encoding
  • ARROW-3444 - [Python] Add Array/ChunkedArray/Table nbytes attribute
  • ARROW-3706 - [Rust] Add record batch reader trait.
  • ARROW-3789 - [Python] Use common conversion path for Arrow to pandas.Series/DataFrame. Zero copy optimizations for DataFrame, add split_blocks and self_destruct options
  • ARROW-3808 - [R] Array extract, including Take method
  • ARROW-3813 - [R] lower level construction of Dictionary Arrays
  • ARROW-4059 - [Rust] Parquet/Arrow Integration
  • ARROW-4091 - [C++] Curate default list of CSV null spellings
  • ARROW-4208 - [CI/Python] Have automatized tests for S3
  • ARROW-4219 - [Rust][Parquet] Initial support for arrow reader.
  • ARROW-4223 - [Python] Support scipy.sparse integration
  • ARROW-4224 - [Python] Support integration with pydata/sparse library
  • ARROW-4225 - [Format][C++] Add CSC sparse matrix support
  • ARROW-4722 - [C++] Implement Bitmap class to modularize handling of bitmaps
  • ARROW-4748 - [Rust][DataFusion] Optimize GROUP BY aggregate queries
  • ARROW-4930 - [C++] Improve find_package() support
  • ARROW-5180 - [Rust] IPC Support
  • ARROW-5181 - [Rust] Initial support for Arrow File reader
  • ARROW-5182 - [Rust] Arrow IPC file writer
  • ARROW-5227 - [Rust] [DataFusion] Re-implement query execution with an extensible physical query plan
  • ARROW-5277 - [C#] MemoryAllocator.Allocate(length: 0) doesn't return null
  • ARROW-5333 - [C++] Clamp build option summary width to 90
  • ARROW-5366 - [Rust] Duration and Interval Arrays
  • ARROW-5400 - [Rust] Test/ensure that reader and writer support zero-length record batches
  • ARROW-5445 - [Website] Remove language that encourages pinning a version
  • ARROW-5454 - [C++] Implement Take on ChunkedArray for DataFrame use
  • ARROW-5502 - [R] file readers should mmap
  • ARROW-5508 - [C++] Create reusable Iterator<T> interface
  • ARROW-5523 - [Python][Packaging] Use HTTPS consistently for downloading wheel dependencies
  • ARROW-5712 - [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly
  • ARROW-5767 - [Format] Permit dictionary replacements in IPC protocol
  • ARROW-5801 - [CI] Dockerize (add to docker-compose) all Travis CI Linux tasks
  • ARROW-5802 - [CI][Archery] Dockerify lint utilities
  • ARROW-5804 - [C++] Dockerize C++ CI job with conda-forge toolchain, code coverage from Travis CI
  • ARROW-5805 - [Python] Dockerize (add to docker-compose) Python Travis CI job
  • ARROW-5806 - [CI] Dockerize (add to docker-compose) Integration tests Travis CI entry
  • ARROW-5807 - [JS] Dockerize NodeJS Travis CI entry
  • ARROW-5808 - [GLib][Ruby] Dockerize (add to docker-compose) current GLib + Ruby Travis CI entry
  • ARROW-5809 - [CI][Rust] Travis runs dockerized Rust build
  • ARROW-5810 - [Go] Dockerize Travis CI Go build
  • ARROW-5831 - [Release] Add Python program to download binary artifacts in parallel, allow abort/resume
  • ARROW-5839 - [Python] Test manylinux2010 in CI
  • ARROW-5855 - [Python] Support for Duration (timedelta) type
  • ARROW-5859 - [Python] Support ExtensionArray.to_numpy using storage array
  • ARROW-5971 - [Website] Blog post introducing Arrow Flight
  • ARROW-5994 - [CI] [Rust] Create nightly releases of the Rust implementation
  • ARROW-6003 - [C++] Better input validation and error messaging in CSV reader
  • ARROW-6074 - [FlightRPC][Java] Middleware
  • ARROW-6091 - [Rust][DataFusion] Implement physical execution plan for LIMIT
  • ARROW-6109 - [Integration] Docker image for integration testing can't be built on windows
  • ARROW-6112 - [Java] Support int64 buffer lengths in Java
  • ARROW-6184 - [Java] Provide hash table based dictionary encoder
  • ARROW-6251 - [Developer] Add PR merge tool to apache/arrow-site
  • ARROW-6257 - [C++] Add fnmatch compatible globbing function
  • ARROW-6274 - [Rust][DataFusion] Add support for writing results to CSV
  • ARROW-6277 - [C++][Parquet] Support direct DictionaryArray write of all parquet types
  • ARROW-6283 - [Rust][DataFusion] Implement Context::write_csv to write partitioned CSV results
  • ARROW-6285 - [GLib] Add support for LargeBinary and LargeString types
  • ARROW-6286 - [GLib] Add support for LargeList type
  • ARROW-6299 - [C++] Simplify FileFormat classes to singletons
  • ARROW-6321 - [Python] Ability to create ExtensionBlock on conversion to pandas
  • ARROW-6340 - [R] Implements low-level bindings to Dataset classes
  • ARROW-6341 - [Python] Implement low-level bindings for Dataset
  • ARROW-6352 - [Java] Add implementation of DenseUnionVector
  • ARROW-6367 - [C++][Gandiva] Implement string reverse
  • ARROW-6378 - [C++][Dataset] Implement recursive TreeDataSource
  • ARROW-6386 - [C++][Documentation] Explicit documentation of null slot interpretation
  • ARROW-6394 - [Java] Support conversions between delta vector and partial sum vector
  • ARROW-6396 - [C++] Add overloads of Boolean kernels implementing Kleene logic
  • ARROW-6398 - [C++] Consolidate ScanOptions and ScanContext
  • ARROW-6405 - [Python] Add std::move wrapper for use in Cython
  • ARROW-6452 - [Java] Override ValueVector toString() method
  • ARROW-6463 - [C++][Python] Rename arrow::fs::Selector to FileSelector
  • ARROW-6466 - [Integration][CI] Move integration test code to archery integration command. Dockerize integration tests
  • ARROW-6468 - [C++] Remove unused hashing routines
  • ARROW-6473 - Dictionary encoding format clarifications/future proofing
  • ARROW-6503 - [C++] Add an argument of memory pool object to SparseTensorConverter
  • ARROW-6508 - [C++] Add Tensor and SparseTensor factory function with validations
  • ARROW-6515 - [C++] Clean type_traits.h definitions
  • ARROW-6578 - [C++] Allow casting number to string
  • ARROW-6592 - [Java] Add support for skipping decoding of columns/field in Avro converter
  • ARROW-6594 - [Java] Support logical type encodings from Avro
  • ARROW-6598 - [Java] Sort the code for ApproxEqualsVisitor and provide an interface for custom vector equality
  • ARROW-6608 - [C++] Make default for ARROW_HDFS to be OFF
  • ARROW-6610 - [C++] Add cmake option to disable filesystem layer
  • ARROW-6611 - [C++] Make ARROW_JSON=OFF the default
  • ARROW-6612 - [C++] Add ARROW_CSV CMake build flag
  • ARROW-6619 - [Ruby] Add support for building Gandiva::Expression by Arrow::Schema#build_expression
  • ARROW-6624 - [C++][Python] Add SparseTensor.ToTensor() method
  • ARROW-6625 - [C++][Python] Allow concat_tables to null fill missing columns
  • ARROW-6631 - [C++] Do not build any compression libraries by default in C++ build
  • ARROW-6632 - [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default
  • ARROW-6633 - [C++] Vendor double-conversion library
  • ARROW-6634 - [C++][FOLLOWUP] Remove Flatbuffers EP remnants from C++ Dockerfiles
  • ARROW-6634 - [C++] Vendor Flatbuffers and check in compiled sources
  • ARROW-6635 - [C++] Disable glog integration by default
  • ARROW-6636 - [C++] Do not build command line tools by default
  • ARROW-6637 - [Packaging][FOLLOWUP] Enable necessary components in Autobrew build for R
  • ARROW-6637 - [C++] Further streamline default build, add ARROW_CSV CMake option
  • ARROW-6646 - [Go] Write no IPC buffer metadata for NullType
  • ARROW-6650 - [Rust][Integration] Compare integration JSON with schema & batch
  • ARROW-6656 - [Rust][Datafusion] Add MAX, MIN expressions
  • ARROW-6657 - [Rust][DataFusion] Add Count Aggregate Expression
  • ARROW-6658 - [Rust][Datafusion] Implement AVG expression
  • ARROW-6659 - [Rust][DataFusion] Refactor of HashAggregateExec to support custom merge
  • ARROW-6662 - [Java] Implement equals/approxEquals API for VectorSchemaRoot
  • ARROW-6671 - [C++][Python] Use more consistent names for sparse tensor items
  • ARROW-6672 - [Java] Extract a common interface for dictionary builders
  • ARROW-6685 - [C++] Ignore trailing slashes in S3FS
  • ARROW-6686 - [CI] Pull and push docker images to speed up the nightly builds
  • ARROW-6688 - [Packaging] Include s3 support in the conda packages
  • ARROW-6690 - [Rust][DataFusion] Optimize aggregates without GROUP BY to use SIMD
  • ARROW-6692 - [Rust][DataFusion] Update examples to use physical query plan
  • ARROW-6693 - [Rust] [DataFusion] Update unit tests to use physical query plan
  • ARROW-6694 - [Rust][DataFusion] Integration tests now use physical query plan
  • ARROW-6695 - [Rust][DataFusion] Remove legacy code for executing logical plan
  • ARROW-6696 - [Rust][DataFusion] Implement simple math operations in physical query plan
  • ARROW-6700 - [Rust][DataFusion] Use new Arrow Parquet reader
  • ARROW-6707 - [Java] Improve the performance of JDBC adapters by using nullable information
  • ARROW-6710 - [Java] Add JDBC adapter test to cover cases which contains some null values
  • ARROW-6711 - [C++] Consolidate Filter and Expression
  • ARROW-6721 - [JAVA] Avro adapter benchmark only runs once in JMH
  • ARROW-6722 - [Java] Provide a uniform way to get vector name
  • ARROW-6729 - [C++] Prevent data copying in StlStringBuffer
  • ARROW-6730 - [CI] Use GitHub Actions for “C++ with clang 7” docker image
  • ARROW-6731 - [CI] [Rust] Set up Github Action to run Rust tests
  • ARROW-6732 - [Java] Implement quick sort in a non-recursive way to avoid stack overflow
  • ARROW-6741 - [Release] Update changelog.py to use APACHE_ prefixed JIRA_USERNAME and JIRA_PASSWORD environment variables
  • ARROW-6742 - [C++] Remove boost::filesystem dependency in hdfs_internal.cc
  • ARROW-6743 - [C++] Remove usage of boost::filesystem
  • ARROW-6744 - [Rust] Publicly expose JsonEqual
  • ARROW-6754 - [C++] Merge allocator.h into stl.h
  • ARROW-6758 - [Developer] Install local NodeJS via nvm when running release verification
  • ARROW-6764 - [C++] Create a readahead iterator
  • ARROW-6767 - [JS] Lazily bind batches in scan/scanReverse
  • ARROW-6768 - [C++][Dataset] Add method to convert from Scanner to Table
  • ARROW-6769 - [Dataset][C++] End to end test
  • ARROW-6770 - [CI][Travis] Download Minio quietly
  • ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
  • ARROW-6778 - [C++] Support cast for DurationType
  • ARROW-6782 - [C++] Do not require Boost for minimal C++ build
  • ARROW-6784 - [C++][R] Move filter and take for ChunkedArray, RecordBatch, and Table from Rcpp to C++ library
  • ARROW-6787 - [CI][C++] Decommission “C++ with clang 7 and system packages” Travis CI job
  • ARROW-6788 - [CI][Dev] Exercise merge script tests
  • ARROW-6789 - [Python] Improve ergonomics by automatically boxing Action and Result in do_action RPC
  • ARROW-6790 - [Release] Enable selected integration tests in release verification
  • ARROW-6793 - [R] Arrow C++ binary packaging for Linux
  • ARROW-6797 - [Release] Use a separately cloned arrow-site repository in the website post release script
  • ARROW-6802 - [Packaging][deb][RPM] Update qemu-user-static package URL
  • ARROW-6803 - [Rust][DataFusion] Performance optimization for single partition aggregate queries
  • ARROW-6804 - [CI][Rust] Migrate Travis job to Github Actions
  • ARROW-6807 - [Java][FlightRPC] Expose gRPC service & client
  • ARROW-6810 - [Website] Add docs for R package 0.15 release
  • ARROW-6811 - [R] Assorted post-0.15 release cleanups
  • ARROW-6814 - [C++] Resolve compiler warnings occurred on release build
  • ARROW-6822 - [Website] merge_pr.py is published
  • ARROW-6824 - [Plasma] Allow creation of multiple objects through a single IPC in Plasma Store
  • ARROW-6825 - [C++] Rework CSV reader IO around readahead iterator
  • ARROW-6831 - [R] Update R macOS/Windows builds for change in cmake compression defaults
  • ARROW-6832 - [R] Implement Codec::IsAvailable
  • ARROW-6833 - [R][CI] Add crossbow job for full R autobrew macOS build
  • ARROW-6836 - [Format][KeyValue] field to the Footer table in File.fbs
  • ARROW-6843 - [Website] Disable deploy on pull request
  • ARROW-6847 - [C++] Add range_expression adapter to Iterator
  • ARROW-6850 - [Java] Jdbc converter support Null type
  • ARROW-6852 - [C++] Fix build issue on memory-benchmark
  • ARROW-6853 - [Java] Support vector and dictionary encoder use different hasher for calculating hashCode
  • ARROW-6855 - [FlightRPC][C++][Python] Flight middleware for C++/Python
  • ARROW-6862 - [Developer] Check pull request title
  • ARROW-6863 - [Java] Provide parallel searcher
  • ARROW-6865 - [Java] Improve the performance of comparing an ArrowBuf against a byte array
  • ARROW-6866 - [Java] Improve the performance of calculating hash code for struct vector
  • ARROW-6879 - [Rust] Add explicit SIMD for sum kernel
  • ARROW-6880 - [Rust] Add explicit SIMD for min/max kernel
  • ARROW-6881 - [Rust] Remove “array_ops” in favor of the “compute” sub-module
  • ARROW-6884 - [Python] Format friendlier message in Python when a server-side RPC handler fails
  • ARROW-6887 - [Java] Create prose documentation for using ValueVectors
  • ARROW-6888 - [Java] Support copy operation for vector value comparators
  • ARROW-6889 - [Java] ComplexCopier enable FixedSizeList type & fix RangeEqualsVisitor StackOverFlow
  • ARROW-6891 - [Rust][Parquet] utf8 support for arrow reader.
  • ARROW-6902 - [C++][Compute] Add String/Binary support to Compare kernel
  • ARROW-6904 - [Python] Add support for MapArray
  • ARROW-6907 - [Plasma] Allow Plasma to send batched notifications.
  • ARROW-6911 - [Java] Provide composite comparator
  • ARROW-6912 - [Java] Extract a common base class for avro converter consumers
  • ARROW-6916 - [Developer] Sort tasks by name in Crossbow e-mail report
  • ARROW-6918 - [R] Make docker-compose setup faster
  • ARROW-6919 - [Python] Expose more builders in Cython
  • ARROW-6920 - [Packaging] Build python 3.8 wheels
  • ARROW-6926 - [Python] Support sizeof protocol for Python objects
  • ARROW-6927 - [C++] Add gRPC version check
  • ARROW-6928 - [Rust] Add support for FixedSizeListArray
  • ARROW-6930 - [Java] Create utility class for populating vector values used for test purpose only
  • ARROW-6932 - [JAVA] incorrect log on known extension type
  • ARROW-6933 - [Java] Suppor linear dictionary encoder
  • ARROW-6936 - [Python] Improve error message when unwrapping object fails
  • ARROW-6942 - [Developer] Add support for Parquet in pull request check by GitHub Actions
  • ARROW-6943 - [Website] Translate Apache Arrow Flight introduction to Japanese
  • ARROW-6944 - [Rust] Add String, FixedSizeBinary types
  • ARROW-6949 - [Java] Fix promotable writer to handle nullvectors
  • ARROW-6951 - [C++][Dataset] Column projection in ParquetFragment
  • ARROW-6952 - [C++][Dataset] Implement predicate pushdown with ParqueFileFragment
  • ARROW-6954 - [Python][CI] Add Python 3.8 to CI matrix
  • ARROW-6960 - [R] Add lz4 and zstd to R PKGBUILD
  • ARROW-6961 - [C++][Gandiva] Add string lower function in Gandiva
  • ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds
  • ARROW-6964 - [C++][Dataset] Add multithread support to Scanner::ToTable
  • ARROW-6965 - [C++][Dataset] Optionally expose partition keys as columns
  • ARROW-6967 - [C++][Dataset] IN, IS_VALID filter expressions
  • ARROW-6969 - [C++][Dataset] ParquetScanTask defer memory usage
  • ARROW-6970 - [Packaging][RPM] Add support for CentOS 8
  • ARROW-6973 - [C++][ThreadPool] Use perfect forwarding in Submit
  • ARROW-6975 - [C++] Put make_unique in its own header
  • ARROW-6980 - [R] dplyr backend for RecordBatch/Table
  • ARROW-6984 - [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
  • ARROW-6986 - [R] Add basic Expression class
  • ARROW-6987 - [CI] Travis OSX failing to install sdk headers
  • ARROW-6991 - [Packaging][deb] Add support for Ubuntu 19.10
  • ARROW-6994 - [C++] Fix aggressive RSS inflation on macOS when jemalloc background_thread is not enabled
  • ARROW-6997 - [Packaging][RPM] Add apache-arrow-release
  • ARROW-7000 - [C++][Gandiva] Handle empty inputs in string upper, lower functions
  • ARROW-7003 - [Rust] Generate flatbuffers files in docker build image
  • ARROW-7004 - [Plasma] Make it possible to bump up object in LRU cache
  • ARROW-7006 - [Rust] Bump flatbuffers version to avoid vulnerability
  • ARROW-7007 - [C++] Add use_mmap option to LocalFS
  • ARROW-7014 - [Developer][Release] Add “wheels” verification option to verify-release-candidate.sh for Linux and macOS
  • ARROW-7015 - [Developer] Write script to verify macOS wheels given local environment with conda or virtualenv
  • ARROW-7016 - [Developer][Python] Add Windows batch script to test Python wheels for release candidate
  • ARROW-7019 - [Java] Improve the performance of loading validity buffers
  • ARROW-7026 - [Java] Remove assertions in MessageSerializer/vector/writer/reader
  • ARROW-7031 - [Python] Correct LargeListArray.offsets attribute
  • ARROW-7031 - [Python] Expose the offsets of a ListArray in python
  • ARROW-7032 - [Release] Run the python unit tests in the release verification script
  • ARROW-7034 - [CI][Crossbow] Skip known nightly failures
  • ARROW-7035 - [R] Default arguments are unclear in write_parquet docs
  • ARROW-7036 - [C++] Version up ORC to avoid compile errors
  • ARROW-7037 - [C++ ] Compile error on the combination of protobuf >= 3.9 and clang
  • ARROW-7039 - [Python] Fix pa.table/record_batch typecheck to work without pandas
  • ARROW-7047 - [C++] Insert implicit casts in ScannerBuilder::Finish
  • ARROW-7052 - [C++] Fix linking of datasets example when ARROW_BUILD_SHARED=OFF
  • ARROW-7054 - [Docs] Enable overriding project version with environment variable when building Sphinx docs
  • ARROW-7057 - [C++] Add API to parse URI query strings
  • ARROW-7058 - [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to its base dir
  • ARROW-7060 - [R] Post-0.15.1 cleanup
  • ARROW-7061 - [C++][Dataset] Add ignore file options to FileSystemDataSourceDiscovery
  • ARROW-7062 - [C++][Dataset] Ensure ParquetFileFormat::Open catch parqu…
  • ARROW-7064 - [R] Support null type using vctrs::unspecified()
  • ARROW-7066 - [Python] Allow returning ChunkedArray in arrow_array
  • ARROW-7067 - [CI] Disable code coverage on Travis-CI
  • ARROW-7069 - [C++][Dataset] Replace ConstantPartitionScheme with PrefixDictionaryPartitionScheme
  • ARROW-7070 - [Packaging][deb] Update package names for 1.0.0
  • ARROW-7072 - [Java] Support concating validity bits efficiently
  • ARROW-7082 - [Packaging][deb] Add apache-arrow-archive-keyring package
  • ARROW-7086 - [C++] Provide a wrapper for invoking factories to produce a Result
  • ARROW-7092 - [R] Add vignette for dplyr and datasets
  • ARROW-7093 - [R] Support creating ScalarExpressions for more data types
  • ARROW-7094 - [C++] FileSystemDataSource should use an owning pointer for fs::Filesystem
  • ARROW-7095 - [R] Require an explicit call to pull Datasets into memory
  • ARROW-7096 - [C++] Unified ConcatenateTables APIs
  • ARROW-7098 - [Java] Improve the performance of comparing two memory blocks
  • ARROW-7099 - [C++] Disambiguate function calls in csv parser test
  • ARROW-7101 - [CI] Refactor docker-compose setup and use it with GitHub Actions
  • ARROW-7103 - [R] Various minor cleanups
  • ARROW-7107 - [C++][MinGW] Enable Flight on AppVeyor
  • ARROW-7110 - [GLib] Add filter support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7111 - [GLib] Add take support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7113 - [Rust] Add unowned buffer.
  • ARROW-7116 - [CI] Use the docker repository provided by apache organization
  • ARROW-7120 - [C++][CI] Add .ccache to the docker-compose volume mounts
  • ARROW-7146 - [R][CI] Various fixes and speedups for the R docker-compose setup
  • ARROW-7147 - [C++][Dataset] Refactor dataset's API to use Result<T>
  • ARROW-7148 - [C++][Dataset] Major API cleanup
  • ARROW-7149 - [C++] Remove experimental status on filesystem APIs
  • ARROW-7155 - [Java][CI] add maven wrapper to make setup process simple
  • ARROW-7159 - [CI] Run HDFS tests as cron task
  • ARROW-7160 - [C++] Update string_view backport
  • ARROW-7161 - [C++] Migrate filesystem APIs from Status to Result
  • ARROW-7162 - [C++] Cleanup warnings in cmake_modules/SetupCxxFlags.cmake
  • ARROW-7166 - [Java] Remove redundant code for Jdbc adapters
  • ARROW-7169 - [C++] Vendor uriparser library
  • ARROW-7171 - [Ruby] Pass Array for Arrow::Table#filter
  • ARROW-7172 - [C++][Dataset] Improve format of Expression::ToString
  • ARROW-7176 - [C++] Fix arrow::ipc compiler warning
  • ARROW-7178 - [C++] Vendor forward compatible std::optional
  • ARROW-7185 - [R][Dataset] Add bindings for IN, IS_VALID expressions
  • ARROW-7186 - [R] Add inline comments to document the dplyr code
  • ARROW-7192 - [Rust] Implement Flight crate
  • ARROW-7193 - [Rust] Arrow stream reader
  • ARROW-7195 - [Ruby] Improve #filter, #take, and #is_in
  • ARROW-7196 - [Ruby] Remove needless BinaryArrayBuilder#append_values
  • ARROW-7197 - [Ruby] Suppress keyword argument related warnings with Ruby 2.7
  • ARROW-7204 - [C++][Dataset] Implicit cast support for InExpression
  • ARROW-7206 - [Java] Avoid string concatenation when calling Preconditions#checkArgument
  • ARROW-7207 - [Rust] Update generated fbs files
  • ARROW-7210 - [C++][R] Allow Numeric <-> Temporal Scalar casts
  • ARROW-7211 - [Rust] Support byte buffers as a parquet sink
  • ARROW-7216 - [Java] Improve the performance of setting/clearing individual bits
  • ARROW-7219 - [Python][CI] Test with pickle5 installed
  • ARROW-7227 - [Python] Added a python wrapper for ConcatenateTablesWithPromotions
  • ARROW-7228 - [Python] Added a python wrapper for RecordBatch.FromStructArray()
  • ARROW-7235 - [C++] Add Result APIs to IO layer
  • ARROW-7236 - [C++] Add Result APIs to arrow/csv
  • ARROW-7240 - [C++] Add Result to APIs to arrow/util
  • ARROW-7246 - [CI][Python] Use Python 3 for docker-compose
  • ARROW-7247 - [CI][Python] Fix wheel build error on macOS
  • ARROW-7248 - [Rust] Automatically Generate IPC Messages
  • ARROW-7255 - [CI] Re-enable source release test on pull request
  • ARROW-7257 - [CI] Fix Homebrew formula audit error by openssl
  • ARROW-7258 - [CI] Fix fuzzit build directory
  • ARROW-7259 - [Java] Support subfield encoder use different hasher
  • ARROW-7260 - [CI] Remove Ubuntu 14.04 test job
  • ARROW-7261 - [Python] Add Python support for Fixed Size List type
  • ARROW-7262 - [C++][Gandiva] Added replace function
  • ARROW-7263 - [C++][Gandiva] Implemented locate function
  • ARROW-7268 - [Rust] Add custom_metadata field from IPC message to Schema.
  • ARROW-7269 - [Python] Add ORC to api documentation
  • ARROW-7270 - [Go] preserve CSV reading behaviour, improve memory usage
  • ARROW-7274 - [C++] Add Result APIs to Decimal class
  • ARROW-7275 - [Ruby] Add support for Arrow::ListDataType.new(data_type)
  • ARROW-7276 - [Ruby][...]
  • ARROW-7277 - [Java][Doc] Add discussion about vector lifecycle
  • ARROW-7279 - [C++] Rename UnionArray::type_ids to type_codes
  • ARROW-7284 - [Java] ensure java implementation meets clarified dictionary spec
  • ARROW-7289 - [C#] ListType constructor argument is redundant
  • ARROW-7290 - [C#] Implement ListArray Builder
  • ARROW-7292 - [CI][C++] Add ASAN / UBSAN run
  • ARROW-7293 - [Dev][C++] Persist ccache in docker-compose build volumes
  • ARROW-7296 - [Python] Add ORC api documentation
  • ARROW-7299 - [GLib] Use Result instead of Status
  • ARROW-7303 - [C++] Refactor CSV benchmarks to use Result APIs
  • ARROW-7306 - [C++] Add Result-returning version of FileSystemFromUri
  • ARROW-7307 - [CI][GLib] Ensure generating documentation
  • ARROW-7309 - [Python] Support HDFS federation viewfs
  • ARROW-7310 - [Python] Expose HDFS implementation for pyarrow.fs
  • ARROW-7311 - [Python] Return filesystem and path from URI
  • ARROW-7312 - [Rust] Implement std::error::Error for ArrowError.
  • ARROW-7317 - [C++] Migrate Iterator to a Result API
  • ARROW-7319 - [C++] Refactor Iterator<T> to yield Result<T>
  • ARROW-7321 - [CI][GLib] Disable development mode
  • ARROW-7322 - [CI][Python] Fall back to arrowdev dockerhub organization for manylinux images
  • ARROW-7323 - [CI][Rust] Use the same toolchain
  • ARROW-7324 - [Rust] Add timezone to timestamp
  • ARROW-7325 - [Rust][Parquet] Update to parquet-format 2.6 and thrift 0.12
  • ARROW-7329 - [Java] AllocationManager: Allow managing different types …
  • ARROW-7333 - [CI][Rust] Remove duplicated nightly job
  • ARROW-7334 - [CI][Python] Use Python 3 on macOS
  • ARROW-7339 - [CMake] Thrift version not respected in CMake configuration version.txt
  • ARROW-7340 - [CI] Prune defunct appveyor build setup
  • ARROW-7344 - [Packaging][Python] Build manylinux2014 wheels
  • ARROW-7346 - [CI] Explicit usage of ccache across the builds
  • ARROW-7347 - [C++] Update bundled Boost to 1.71.0
  • ARROW-7348 - [Rust] Add api to return null bitmap buffer.
  • ARROW-7351 - [Developer] Only suggest cpp-* versions by default for PARQUET issues in merge tool
  • ARROW-7357 - [Go] migrate to x/xerrors
  • ARROW-7366 - [C++][Dataset] Use PartitionSchemeDiscovery in DataSourceDiscovery
  • ARROW-7367 - [Python] Use np.full instead of np.array.repeat in ParquetDatasetPiece
  • ARROW-7368 - [Ruby] Use :arrow_file and :arrow_streaming for format name
  • ARROW-7369 - [GLib] Add garrow_table_combine_chunks
  • ARROW-7370 - [C++] Fix old Protobuf with AUTO detection failure
  • ARROW-7377 - [C++][Dataset] Add ScanOptions::MaterializedFields
  • ARROW-7378 - [C++][Gandiva] Fix loop vectorization in gandiva
  • ARROW-7379 - [C++] Introduce SchemaBuilder companion class and Field::IsCompatibleWith
  • ARROW-7380 - [C++][Dataset] Implement DatasetFactory
  • ARROW-7382 - [C++][Dataset] Insert missing directories in FileSystemDataSourceDiscovery::Make
  • ARROW-7387 - [C#] Support ListType Serialization
  • ARROW-7392 - [Packaging] Add conda packaging tasks for python 3.8
  • ARROW-7398 - [Packaging][Python] Conda builds are failing on macOS
  • ARROW-7399 - [C++][Gandiva] set Mcpu based on host cpu
  • ARROW-7402 - [C++] Add more information on CUDA error
  • ARROW-7403 - [C++][JSON] Enable Rapidjson on Arm64 Neon
  • ARROW-7410 - [Doc][Python] Document filesystem API
  • ARROW-7411 - [C++][Flight] Improve the output of Arrow Flight benchmark
  • ARROW-7413 - [Python] Expose and test the partioning discovery
  • ARROW-7414 - [R][Dataset] Implement *PartitionSchemeDiscovery in R
  • ARROW-7415 - [C++][Dataset] implement IpcFormat
  • ARROW-7416 - [R][Nightly] Fix macos-r-autobrew build on R 3.6.2
  • ARROW-7417 - [C++] Add a docker-compose entry for CUDA 10.1
  • ARROW-7418 - [C++] Fix build error on Ubuntu 16.04
  • ARROW-7420 - [C++] Migrate tensor related APIs to Result-returning version
  • ARROW-7429 - [Java] Enhance code style checking for Java code (remove consecutive spaces)
  • ARROW-7430 - [Python] Add more docstrings to dataset bindings
  • ARROW-7431 - [Python] Add dataset API to reference docs
  • ARROW-7432 - [Python] Add higher level open_dataset function
  • ARROW-7439 - [C++][Dataset] Remove pointer aliases
  • ARROW-7449 - [GLib] Make GObject Introspection optional
  • ARROW-7452 - [GLib] Make GArrowTimeDataType abstract
  • ARROW-7453 - [Ruby]
  • ARROW-7454 - [Ruby] Add support for saving/loading TSV
  • ARROW-7455 - [Ruby] Use Arrow::DataType.resolve for all GArrowDataType input
  • ARROW-7456 - [C++] Add support for YYYY-MM-DDThh and YYYY-MM-DDThh:mm timestamp formats
  • ARROW-7457 - [Doc] fix typos
  • ARROW-7459 - [Python] Fix document lint error
  • ARROW-7460 - [Rust] Improve some kernel performance
  • ARROW-7461 - [Java] fix typos
  • ARROW-7463 - [Doc] fix a broken link and typo
  • ARROW-7464 - [C++] Refine CpuInfo singleton with std::call_once
  • ARROW-7465 - [C++] Add Arrow memory benchmark for Arm64
  • ARROW-7468 - [Python] fix typos
  • ARROW-7469 - [C++] Improve division related bit operations
  • ARROW-7470 - [JS] fix typos
  • ARROW-7474 - [Ruby] Improve CSV save performance
  • ARROW-7475 - [Rust] Arrow IPC Stream writer
  • ARROW-7477 - [Java][FlightRPC] set up gRPC reflection metadata
  • ARROW-7479 - [Rust][Ruby][R] Fix typos
  • ARROW-7481 - [C#] fix typo
  • ARROW-7482 - [C++] Fix typos
  • ARROW-7484 - [C++][Gandiva] Fix typos
  • ARROW-7485 - [C++][Prasma] Fix typos
  • ARROW-7487 - [Developer] Fix typos
  • ARROW-7488 - [GLib] Fix typos and broken links
  • ARROW-7489 - [CI] Fix typos
  • ARROW-7490 - [Java] Avro converter should convert attributes and props to FieldType metadata
  • ARROW-7493 - [Python] Expose sum kernel in pyarrow.compute and support ChunkedArray inputs
  • ARROW-7498 - [Dataset] Rename core classes before stable API
  • ARROW-7502 - [Integration] Remove Spark patch not needed
  • ARROW-7513 - [JS][tutorial] - Rich cols part 1
  • ARROW-7514 - [C#] Make GetValueOffset Obsolete
  • ARROW-7519 - [Python] Build wheels, conda packages with dataset support
  • ARROW-7521 - [Rust] Remove tuple on FixedSizeList
  • ARROW-7523 - [Developer] Relax clang-tidy check
  • ARROW-7526 - [C++][Compute] Optimize small integer sorting
  • ARROW-7532 - [CI] Unskip brew test after Homebrew fixes it upstream
  • ARROW-7537 - [CI][R] Nightly macOS autobrew job should be more verbose if it fails
  • ARROW-7538 - [Java] Clarify actual and desired size in AllocationManager
  • ARROW-7540 - [C++] Install license files and README
  • ARROW-7541 - [GLib] Install license files
  • ARROW-7542 - [CI][C++] Use $(sysctl -n hw.ncpu) instead of $(nproc) on macOS
  • ARROW-7549 - [Java] Reorganize Flight modules to keep top level clean/organized
  • ARROW-7550 - [R][CI] Run donttest examples in CI
  • ARROW-7557 - [C++][Compute] Validate sorting stability
  • ARROW-7558 - [Packaging][deb][RPM] Use the host owner and group for artifacts
  • ARROW-7560 - [Rust] Reduce Rc/Refcell usage
  • ARROW-7565 - [Website] Add support for download URL redirect
  • ARROW-7566 - [CI] Use more recent Miniconda on AppVeyor
  • ARROW-7567 - [Java] Fix races in checkstyle upgdae
  • ARROW-7567 - [Java] Bump Checkstyle from 6.19 to 8.19
  • ARROW-7568 - [Java] Bump Apache Avro from 1.9.0 to 1.9.1
  • ARROW-7569 - [Python] Add API to map Arrow types to pandas ExtensionDtypes in to_pandas conversions
  • ARROW-7570 - [Java] Fix high severity issues
  • ARROW-7571 - [Java] Correct minimal Java version on README
  • ARROW-7572 - [Java] Enforce Maven 3.3+ as mentioned in README
  • ARROW-7573 - [Rust] Reduce boxing and cleanup
  • ARROW-7575 - [R] Linux binary packaging followup
  • ARROW-7576 - [C++][Dev] Improve fuzzing setup
  • ARROW-7577 - [CI][C++] Check OSS-Fuzz build in Github Actions
  • ARROW-7578 - [R] Add support for datasets with IPC files and with multiple sources
  • ARROW-7580 - [Website] 0.16 release post
  • ARROW-7581 - [R] Documentation/polishing for 0.16 release
  • ARROW-7590 - [C++] Don't ignore managed files in thirdparty
  • ARROW-7597 - [C++] More compact CMake configuration summary
  • ARROW-7600 - [C++][Parquet] failing disabled unittest for nested parquet.
  • ARROW-7601 - [Doc][C++] Update fuzzing doc
  • ARROW-7602 - [Archery] Add more archery build options
  • ARROW-7613 - [Rust] Remove redundant :: prefixes
  • ARROW-7622 - [Format] Mark Tensor and SparseTensor fields required
  • ARROW-7623 - [C++] Update generated flatbuffers code
  • ARROW-7626 - [Parquet][GLib] Add support for version macros
  • ARROW-7627 - [C++][Gandiva] Optimize string truncate function
  • ARROW-7629 - [C++][CI] Add fuzz regression files to arrow-testing
  • ARROW-7630 - [C++][CI] Check fuzz crash regressions in CI
  • ARROW-7632 - [C++][CI] Add extension type data to IPC fuzz seed corpus
  • ARROW-7635 - [C++] Add pkg-config support for each components
  • ARROW-7636 - [Python] Clean-up the pyarrow.dataset.partitioning() API
  • ARROW-7644 - Add vcpkg installation instructions
  • ARROW-7645 - [Packaging][deb][RPM] Fix arm64 packaging build
  • ARROW-7648 - [C++] Sanitize local paths on Windows
  • ARROW-7658 - [R] Support dplyr filtering on date/time
  • ARROW-7659 - [Rust] Reduce Rc usage
  • ARROW-7660 - [C++][Gandiva] Optimise castVarchar(string, int) function for single byte characters
  • ARROW-7665 - [R] Build in parallel in linuxLibs.R
  • ARROW-7666 - [Packaging][deb] Always use Ninja to reduce build time
  • ARROW-7667 - [Packaging][deb] Add ubuntu-eoan to nightly jobs
  • ARROW-7668 - [Packaging][RPM] Use Ninja if possible to reduce build time
  • ARROW-7670 - [Python][Dataset] More ergonomical API
  • ARROW-7671 - [Python][Dataset] Add bindings for the DatasetFactory
  • ARROW-7674 - [Dev] Add helpful message for captcha challenge in merge_arrow_pr.py
  • ARROW-7682 - [Packaging] Add support for arm64 APT/Yum repositories
  • ARROW-7683 - [Packaging] Set 0.16.0 as the next version
  • ARROW-7686 - [Packaging][deb][RPM] Include more arrow-*.pc
  • ARROW-7687 - [C++] Fix dead links in README
  • ARROW-7692 - [Rust] Simplify some Option / Result pattern matches
  • ARROW-7694 - [Packaging][deb][RPM] Add support for RC to repository packages
  • ARROW-7695 - [Release] Update java versions to 0.16-SNAPSHOT
  • ARROW-7696 - [Release] Add support for running unit test on release branch
  • ARROW-7697 - [Release] Add a test for updating Linux packages by 00-prepare.sh
  • ARROW-7710 - [Release][C#] Add support for redirecting .NET download URL
  • ARROW-7711 - [C#] Make Date32 test independent of system timezone
  • ARROW-7715 - [Release][APT] Ignore some arm64 verifications
  • ARROW-7716 - [Packaging][APT] Use the “main” component for Ubuntu 19.10
  • ARROW-7719 - [Python][Dataset] Table equality check occasionally fails
  • ARROW-7724 - [Release][Yum] Ignore some arm64 verifications
  • ARROW-7743 - [Rust] [Parquet] Support reading timestamp micros
  • ARROW-7768 - [Rust] Implement Length and TryClone traits for Cursor<Vec<u8>> in reader.rs
  • ARROW-8015 - [Python] Build 0.16.0 wheel install for Windows + Python 3.5 and publish to PyPI
  • PARQUET-517 - [C++] Use arrow::MemoryPool for all heap allocations
  • PARQUET-1300 - [C++] Implement encrypted Parquet read and write support
  • PARQUET-1664 - [C++] Provide API to return metadata string from FileMetadata.
  • PARQUET-1678 - [C++] Provide classes for reading/writing using input/output operators
  • PARQUET-1688 - [C++] StreamWriter/StreamReader can't be built with g++ 4.8.5 on CentOS 7
  • PARQUET-1689 - [C++] Stream API: Allow for columns/rows to be skipped when reading
  • PARQUET-1701 - [C++] Stream API: Add support for optional fields
  • PARQUET-1704 - [C++] Add re-usable encryption buffer to SerializedPageWriter
  • PARQUET-1705 - [C++] Disable shrink-to-fit on the re-usable decryption buffer
  • PARQUET-1712 - [C++] Stop using deprecated APIs in examples
  • PARQUET-1721 - [C++][Parquet] Add missing arrow dependency to parquet.pc
  • PARQUET-1734 - [C++] Fix typo
  • PARQUET-1769 - [C++] Update parquet.thrift to parquet-format 2.8.0

Apache Arrow 0.15.1 (2019-11-01)

Bug Fixes

  • ARROW-6464 - [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API (#5293)
  • ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
  • ARROW-6740 - [C++] Unmap MemoryMappedFile as soon as possible
  • ARROW-6762 - [C++] Support reading JSON files with no newline at end
  • ARROW-6795 - [C#] Fix for reading large (2GB+) files
  • ARROW-6806 - [C++][Python] Fix crash validating an IPC-originating empty array
  • ARROW-6809 - [RUBY] Gem does not install on macOS due to glib2 3.3.7 compilation failure
  • ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
  • ARROW-6834 - [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds
  • ARROW-6844 - [C++][Parquet] Fix regression in reading List types with item name that is not “item”
  • ARROW-6857 - [C++] Fix DictionaryEncode for zero-chunk ChunkedArray
  • ARROW-6860 - [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so
  • ARROW-6861 - [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder
  • ARROW-6869 - [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add “FinishDelta” method and “ResetFull” method
  • ARROW-6873 - [Python] Remove stale CColumn references
  • ARROW-6874 - [Python] Fix memory leak when converting to Pandas object data
  • ARROW-6876 - [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns
  • ARROW-6877 - [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions
  • ARROW-6878 - [Python] Fix creating array from list of dicts with bytes keys
  • ARROW-6882 - [C++] Ensure the DictionaryArray indices has no dictionary data
  • ARROW-6886 - [C++] Fix arrow::io nvcc compiler warnings
  • ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
  • ARROW-6903 - [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc
  • ARROW-6905 - [Gandiva][Crossbow] Use xcode9.4 for osx builds, do not build dataset, filesystem
  • ARROW-6910 - [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this
  • ARROW-6922 - [Python] Compat with pandas for MultiIndex.levels.names
  • ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
  • ARROW-6938 - [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues
  • ARROW-6962 - [C++][CI] Stop compiling with -Weverything
  • ARROW-6977 - [C++] Disable jemalloc background_thread on macOS
  • ARROW-6983 - [C++] Fix ThreadedTaskGroup lifetime issue
  • ARROW-7422 - [Python] Improper CPU flags failing pyarrow install in ARM devices
  • ARROW-7423 - Pyarrow ARM install fails from source with no clear error
  • ARROW-9349 - [Python] parquet.read_table causes crashes on Windows Server 2016 w/ Xeon Processor

New Features and Improvements

  • ARROW-6610 - [C++] Add cmake option to disable filesystem layer
  • ARROW-6661 - [Java] Implement APIs like slice to enhance VectorSchemaRoot (#5470)
  • ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
  • ARROW-6852 - [C++] Fix build issue on memory-benchmark
  • ARROW-6927 - [C++] Add gRPC version check
  • ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

Apache Arrow 0.15.0 (2019-10-05)

New Features and Improvements

  • ARROW-453 - [C++] Filesystem implementation for Amazon S3
  • ARROW-517 - [C++] array comparison, uses D**2 space Myers
  • ARROW-750 - [Format][C++] Add LargeBinary and LargeString types
  • ARROW-1324 - [C++] Add support for bundled Boost with MSVC
  • ARROW-1561 - [C++] Kernel implementations for IsIn
  • ARROW-1566 - [C++] Implement non-materializing sort kernels
  • ARROW-1741 - [C++] Add DictionaryArray::CanCompareIndices
  • ARROW-1786 - [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification
  • ARROW-1789 - [Format] Consolidate specification documents and improve clarity for new implementation authors
  • ARROW-1875 - [Java] Write 64-bit ints as strings in integration test JSON files
  • ARROW-2006 - [C++] Add option to trim excess padding when writing IPC messages
  • ARROW-2431 - [Rust] Schema fidelity
  • ARROW-2769 - [Python] Deprecate and rename add_metadata methods
  • ARROW-2931 - [Crossbow] Windows builds are attempting to run linux and osx packaging tasks
  • ARROW-3032 - [C++] Clean up Numpy-related headers
  • ARROW-3204 - [R] Enable R package to be made available on CRAN
  • ARROW-3243 - [C++] Upgrade jemalloc to version 5
  • ARROW-3246 - [C++][Python][Parquet] Direct writing of DictionaryArray to Parquet columns, automatic decoding to Arrow
  • ARROW-3325 - [Python][FOLLOWUP] In Python 2.7, a class's doc member is not writable (#5018)
  • ARROW-3325 - [Python][Parquet] Add “read_dictionary” argument to parquet.read_table, ParquetDataset to enable direct-to-DictionaryArray reads
  • ARROW-3531 - [Python] add Schema.field() method / deprecate field_by_name
  • ARROW-3538 - [Python] ability to override the automated assignment of uuid for filenames when writing datasets
  • ARROW-3579 - [Crossbow] Unintuitive error message when remote branch has not been pushed
  • ARROW-3643 - [Rust] optimize BooleanBufferBuilder::append_slice
  • ARROW-3710 - [Crossbow][Python] Run nightly tests against pandas master
  • ARROW-3772 - [C++][Parquet] Write Parquet dictionary indices directly to DictionaryBuilder rather than routing through dense form
  • ARROW-3777 - [C++] Add Slow input streams and slow filesystem
  • ARROW-3817 - [R] Extract methods for RecordBatch and Table
  • ARROW-3829 - [Python] add arrow_array protocol to support third-party array classes in conversion to Arrow
  • ARROW-3943 - [R] Write vignette for R package
  • ARROW-4036 - [C++] Pluggable Status message, by exposing an abstract delegate class.
  • ARROW-4095 - [C++] Optimize DictionaryArray::Transpose() for trivial transpositions
  • ARROW-4111 - [Python] Create time types from Python sequences of integers
  • ARROW-4218 - [Rust][Parquet] Initial support for array reader.
  • ARROW-4220 - [Python] Add buffered IO benchmarks with simulated high latency, allow duck-typed files in input_stream/output_stream
  • ARROW-4365 - [Rust][Parquet] Implement arrow record reader.
  • ARROW-4398 - [C++][Python][Parquet] Improve BYTE_ARRAY PLAIN encoding write performance. Add BYTE_ARRAY write benchmarks
  • ARROW-4473 - [Website] Add instructions to do a test-deploy of Arrow website and fix bugs
  • ARROW-4507 - [Format] Create outline and introduction for new document.
  • ARROW-4508 - [Format] Copy content from Layout.rst to new document.
  • ARROW-4509 - [Format] Copy content from Metadata.rst to new document.
  • ARROW-4510 - [Format] copy content from IPC.rst to new document.
  • ARROW-4511 - [Format][Docs] Revamp Format documentation, consolidate columnar format docs into a more coherent single document. Add Versioning/Stability page
  • ARROW-4648 - [Doc] Add documentation about C++ file naming
  • ARROW-4648 - [C++] Use underscores in source file names
  • ARROW-4649 - [C++/CI/R] Add nightly job that tests the homebrew formula
  • ARROW-4752 - [Rust] Add explicit SIMD vectorization for the divide kernel
  • ARROW-4810 - [Format][C++] Add LargeList type
  • ARROW-4841 - [C++] Add arrowOptions.cmake with options used to build arrow
  • ARROW-4860 - [C++] Build AWS C++ SDK for Windows in conda-forge
  • ARROW-5134 - [R][CI] Run nightly tests against multiple R versions
  • ARROW-5211 - [Format] Missing documentation under `Dictionary encoding` section on MetaData page
  • ARROW-5216 - [CI] Add Appveyor badge to README
  • ARROW-5307 - [CI][GLib] Enable GTK-Doc
  • ARROW-5337 - [C++] Add RecordBatch::field method, possibly deprecate “column”
  • ARROW-5343 - [C++] Refactor dictionary unification to incremental interface, and use Buffer for transpose map allocations
  • ARROW-5344 - [C++] Use ArrayDataVisitor in dict-to-anything cast
  • ARROW-5351 - [Rust] Take kernel
  • ARROW-5358 - [Rust] Implement equality check for ArrayData and Array
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5439 - [Java] Utilize stream EOS in File format
  • ARROW-5444 - [Release][Website] After 0.14 release, update what is an “official” release
  • ARROW-5458 - [C++] Apache Arrow parallel CRC32c computation optimization
  • ARROW-5480 - [Python] Add unit test asserting specifically that pandas.Categorical roundtrips to Parquet format without special options
  • ARROW-5483 - [Java] add ValueVector constructors that take Field object
  • ARROW-5494 - [Python] Create FileSystem bindings
  • ARROW-5505 - [R] Normalize file and class names, stop masking base R functions, add vignette, improve documentation
  • ARROW-5527 - [C++] Uses Buffer/Builder in HashTable and MemoTable
  • ARROW-5558 - [C++] Support Array::View on arrays with non-zero offset
  • ARROW-5559 - [C++] Add an IpcOptions structure
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5579 - [Java] Shade flatbuffers
  • ARROW-5580 - [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
  • ARROW-5588 - [C++] Better support for building union arrays
  • ARROW-5594 - [C++] add UnionArrays support to Take/Filter kernels
  • ARROW-5610 - [Python] define extension types in Python
  • ARROW-5646 - [Crossbow][Documentation] Move the user guide to the Sphinx documentation
  • ARROW-5681 - [FlightRPC] Add Flight-specific error APIs
  • ARROW-5686 - [R] Review R Windows CI build
  • ARROW-5716 - [Developer] Improve merge PR script to attribute multiple authors
  • ARROW-5717 - [Python] Unify variable dictionaries when converting to pandas
  • ARROW-5719 - [Java] Support in-place vector sorting
  • ARROW-5722 - [Rust] Implement Debug for List/Struct/BinaryArray
  • ARROW-5734 - [Python] Dispatch to Table.from_arrays from pyarrow.table factory function
  • ARROW-5736 - [Format][C++] Support small bit-width indices in sparse tensor
  • ARROW-5741 - [JS] Make numeric vector from functions consistent with TypedArray.from
  • ARROW-5743 - [C++] Add cmake option and macros for enabling large memory tests
  • ARROW-5746 - [Website] Move website source out of apache/arrow
  • ARROW-5747 - [C++] Improve CSV header and column names options
  • ARROW-5758 - [C++][Gandiva][Java] Support casting decimals to varchar and vice versa
  • ARROW-5762 - [JS] Align Map type impl with the spec
  • ARROW-5777 - [C++] Add microbenchmark for some Decimal128 operations
  • ARROW-5778 - [Java] Extract the logic for vector data copying to the super classes
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5786 - [Release] Use arrow-jni profile to run “mvm release:perform”
  • ARROW-5788 - [Rust] Use both “path” and “version” for internal dependencies
  • ARROW-5789 - [C++] Minor fixes for warnings, remove unused ubsan.cc
  • ARROW-5792 - [Rust] Add TypeVisitor for parquet type.
  • ARROW-5798 - [Packaging][deb] Update doc architecture
  • ARROW-5800 - [R] Dockerize R Travis CI tests so they can be run anywhere via docker-compose
  • ARROW-5803 - [CI] Dockerize C++ with clang 7 Travis CI
  • ARROW-5812 - [Java] Refactor method name and param type in BaseIntVector
  • ARROW-5813 - [C++] Fix TensorEquals for different contiguous tensors
  • ARROW-5814 - [Java] Implement a <Object, int> HashMap for DictionaryEncoder
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5830 - [C++] Stop using memcmp in TensorEquals for tensors with float values
  • ARROW-5832 - [Java] Support search operations for vector data
  • ARROW-5833 - [C++] Factor out Status-enriching code
  • ARROW-5834 - [Java] Apply new hash map in DictionaryEncoder
  • ARROW-5835 - [Java] Support Dictionary Encoding for binary type
  • ARROW-5841 - [Website] Add 0.14.0 release note
  • ARROW-5842 - [Java] Revise the semantic of lastSet in ListVector
  • ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount
  • ARROW-5844 - [Java] Support comparison & sort for more numeric types
  • ARROW-5846 - [Java] Create Avro adapter module and add dependencies
  • ARROW-5853 - [Python] Expose boolean filter kernel on Array
  • ARROW-5861 - [Java] Initial implement to convert Avro record with primitive types
  • ARROW-5862 - [Java] Provide dictionary builder
  • ARROW-5864 - [Python] Simplify Result class cython wrapper
  • ARROW-5865 - [Release] Helper script to rebase PRs on master
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5867 - [C++][Gandiva] add support for cast int to decimal
  • ARROW-5872 - [C++][Gandiva] Support mod(double, double) function in Gandiva
  • ARROW-5876 - [C++][Python] add basic auth flight proto message to C++ and Python
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5880 - [C++][Parquet] Use TypedBufferBuilder instead of ArrayBuilder in writer.cc
  • ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits
  • ARROW-5883 - [Java] Support dictionary encoding for List and Struct type
  • ARROW-5888 - [C++][Parquet][Python] Restore timezone metadata when original Arrow schema has been stored in Parquet metadata
  • ARROW-5891 - [C++][Gandiva] Remove duplicates in function registry
  • ARROW-5892 - [C++][Gandiva] Support function aliases
  • ARROW-5893 - [C++][Python][GLib][Ruby][MATLAB][R] Remove arrow::Column class
  • ARROW-5897 - [Java] Remove duplicated logic in MapVector
  • ARROW-5898 - [Java] Provide functionality to efficiently compute hash code for arbitrary memory segment
  • ARROW-5900 - [Java] Bounds check for decimal args.
  • ARROW-5901 - [Rust] Add equals to json arrays.
  • ARROW-5902 - [Java] Implement hash table and equals & hashCode API for dictionary encoding
  • ARROW-5903 - [Java] Optimise set methods in decimal vector
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5906 - [CI] Turn off ARROW_VERBOSE_THIRDPARTY_BUILD by default in Docker builds
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5909 - [Java] Optimize ByteFunctionHelpers equals & compare logic
  • ARROW-5911 - [Java] Make ListVector and MapVector create reader lazily
  • ARROW-5917 - [Java] Redesign the dictionary encoder
  • ARROW-5918 - [Java] Add get to BaseIntVector interface
  • ARROW-5919 - [R] Test R-in-conda as a nightly build
  • ARROW-5920 - [Java] Support sort & compare for all variable width vectors
  • ARROW-5924 - [Plasma] return a replica of GpuProcessHandle::ptr when create or get an object
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5943 - [GLib][Gandiva] Add support for function aliases
  • ARROW-5944 - [C++][Gandiva] Remove ‘div’ alias for ‘divide’
  • ARROW-5945 - [Rust][DataFusion] Table trait can now be used to build real queries
  • ARROW-5947 - [Rust][DataFusion] Remove serde crate dependency
  • ARROW-5948 - [Rust] [DataFusion] create_logical_plan should not call optimizer
  • ARROW-5955 - [Plasma] Support setting memory quotas per plasma client for better isolation
  • ARROW-5957 - [C++][Gandiva] Implement div function in Gandiva
  • ARROW-5958 - [Python] Link zlib statically in the wheels
  • ARROW-5961 - [R] Be able to run R-only tests even without C++ library
  • ARROW-5962 - [CI][Python] Remove manylinux1 builds from Travis CI
  • ARROW-5967 - [Java] DateUtility#timeZoneList is not correct
  • ARROW-5970 - [Java] Provide pointer to Arrow buffer
  • ARROW-5974 - [C++] Support reading concatenated compressed streams
  • ARROW-5975 - [C++][Gandiva] support castTIMESTAMP(date)
  • ARROW-5976 - [C++] RETURN_IF_ERROR(ctx) should be namespaced
  • ARROW-5977 - [C++][Python] Allow specifying which columns to include
  • ARROW-5979 - [FlightRPC] Expose opaque (de)serialization of protocol types
  • ARROW-5985 - [Developer] Do not suggest setting Fix Version for patch releases by default
  • ARROW-5986 - [Java] Code cleanup for dictionary encoding
  • ARROW-5988 - [Java] Avro adapter implement simple Record type
  • ARROW-5997 - [Java] Support dictionary encoding for Union type
  • ARROW-5998 - [Java] Open a document to track the API changes
  • ARROW-6000 - [Python] Add support for LargeString and LargeBinary types
  • ARROW-6008 - [Release] Stop parallel binary artifacts upload
  • ARROW-6009 - [JS] Ignore NPM errors in the javascript release script
  • ARROW-6013 - [Java] Support range searcher
  • ARROW-6017 - [FlightRPC] Enable creating Flight Locations for unknown schemes
  • ARROW-6020 - [Java] Refactor ByteFunctionHelper#hash with new added ArrowBufHasher
  • ARROW-6021 - [Java] Extract copyFrom and copyFromSafe methods to ValueVector interface
  • ARROW-6022 - [Java] Support equals API in ValueVector to compare two vectors equal
  • ARROW-6023 - [C++][Gandiva] Add functions in Gandiva
  • ARROW-6024 - [Java] Provide more hash algorithms
  • ARROW-6026 - [Doc] Add CONTRIBUTING.md
  • ARROW-6030 - [Java] Efficiently compute hash code for ArrowBufPointer
  • ARROW-6031 - [Java] Support iterating a vector by ArrowBufPointer
  • ARROW-6034 - [C++][Gandiva] Add string functions in Gandiva
  • ARROW-6035 - [Java] Avro adapter support convert nullable value
  • ARROW-6036 - [GLib] Add support for skip rows and column_names CSV read option
  • ARROW-6037 - [GLib] Add a missing version macro
  • ARROW-6039 - [GLib] Add garrow_array_filter()
  • ARROW-6041 - [Website] Blog post announcing R library availability on CRAN
  • ARROW-6042 - [C++][Parquet] Add Dictionary32Builder that always returns 32-bit dictionary indices
  • ARROW-6045 - [C++] Add benchmark for double and float encoding/decoding, as well as NaN encoding
  • ARROW-6048 - [C++] Add ChunkedArray::View method that dispatches to Array::View
  • ARROW-6049 - [C++] Support view from one dictionary type to another in Array::View
  • ARROW-6053 - [Python] Fix pyarrow's RecordBatchStreamReader::Open2 type signature
  • ARROW-6063 - [FlightRPC] implement half-closed semantics for DoPut
  • ARROW-6065 - [C++][Parquet] Clean up parquet/arrow/reader.cc, reduce code duplication, improve readability
  • ARROW-6069 - [Rust][Parquet] Add converter.
  • ARROW-6070 - [Java] Avoid creating new schema before IPC sending
  • ARROW-6077 - [C++][Parquet] Build Arrow “schema tree” from Parquet schema to help with nested data implementation
  • ARROW-6078 - [Java] Implement dictionary-encoded subfields for List type
  • ARROW-6079 - [Java] Implement/test UnionFixedSizeListWriter for FixedSizeListVector
  • ARROW-6080 - [Java] Support search operation for BaseRepeatedValueVector
  • ARROW-6083 - [Java] Refactor Jdbc adapter consume logic
  • ARROW-6084 - [Python] Support LargeList
  • ARROW-6085 - [Rust][DataFusion] Add traits for physical query plan
  • ARROW-6086 - [Rust][DataFusion] Add support for partitioned Parquet data sources
  • ARROW-6087 - [Rust] [DataFusion] Implement parallel execution for CSV scan
  • ARROW-6088 - [Rust][DataFusion] Projection execution plan
  • ARROW-6089 - [Rust][DataFusion] Implement physical plan for “selection” operator
  • ARROW-6090 - [Rust][DataFusion] Physical plan for HashAggregate
  • ARROW-6093 - [Java] reduce branches in algo for first match in VectorRangeSearcher
  • ARROW-6094 - [FlightRPC] Add Flight RPC method getFlightSchema
  • ARROW-6096 - [C++] conditionally use boost regex for gcc < 4.9
  • ARROW-6097 - [Java] Avro adapter implement unions type
  • ARROW-6100 - [Rust] Pin to specific nightly rust for reproducible/stable builds
  • ARROW-6101 - [Rust][DataFusion] Parallel execution of physical query plan
  • ARROW-6102 - [Testing] Add partitioned CSV file to arrow-testing repo
  • ARROW-6104 - [Rust][DataFusion] Remove use of bare trait objects
  • ARROW-6105 - [C++][Parquet][Python] Add test case showing dictionary-encoded subfields in nested type
  • ARROW-6113 - [Java] Support vector deduplicate function
  • ARROW-6115 - [Python] Support LargeBinary and LargeString in conversion to python
  • ARROW-6118 - [Java] Replace google Preconditions with Arrow Preconditions
  • ARROW-6121 - [Tools] Improve merge tool ergonomics
  • ARROW-6125 - [Python] Remove Python APIs deprecated in 0.14.x and prior
  • ARROW-6127 - [Website] Add favicons and meta tags
  • ARROW-6128 - [C++] Suppress a class-memaccess warning
  • ARROW-6130 - [Release] Use 0.15.0 as the next release
  • ARROW-6134 - [C++][Gandiva] Add concat function in Gandiva
  • ARROW-6137 - [C++][Gandiva] Use snprintf instead of stringstream in castVARCHAR(timestamp)
  • ARROW-6137 - [C++][Gandiva] Change output format of castVARCHAR(timestamp) in Gandiva
  • ARROW-6138 - [C++] Add a basic (single RecordBatch) implementation of Dataset
  • ARROW-6139 - [Documentation][R] Build R docs (pkgdown) site and add to arrow-site
  • ARROW-6141 - [C++] Enable memory-mapping a file region
  • ARROW-6142 - [R] Install instructions on linux could be clearer
  • ARROW-6143 - [Java] Unify the copyFrom and copyFromSafe methods for all vectors
  • ARROW-6144 - [C++][Gandiva] Implement random functions in Gandiva
  • ARROW-6155 - [Java] Extract a super interface for vectors whose elements reside in continuous memory segments
  • ARROW-6156 - [Java] Support compare semantics for ArrowBufPointer
  • ARROW-6161 - [C++][Dataset] Implements ParquetFragment
  • ARROW-6162 - [C++][Gandiva] Do not truncate string in castVARCHAR_utf8 if output length is zero
  • ARROW-6164 - [Docs][Format] Document project versioning schema and forward/backward compatibility policies
  • ARROW-6172 - [Java] Provide benchmarks to set IntVector with different methods
  • ARROW-6177 - [C++] Add Array::Validate()
  • ARROW-6180 - [C++][Parquet] Add RandomAccessFile::GetStream that returns InputStream that reads a file segment independent of the file's state, fix concurrent buffered Parquet column reads
  • ARROW-6181 - [R] Only allow R package to install without libarrow on linux
  • ARROW-6183 - [R] Document that you don‘t have to use tidyselect if you don’t want
  • ARROW-6185 - [Java] Provide hash table based dictionary builder
  • ARROW-6187 - [C++] Fallback to storage type when writing ExtensionType to Parquet
  • ARROW-6188 - [GLib] Add garrow_array_is_in()
  • ARROW-6192 - [GLib] Use the same SO version as C++
  • ARROW-6194 - [Java] Add non-static approach in DictionaryEncoder making it easy to extend and reuse
  • ARROW-6196 - [Ruby] Add support for building Arrow::TimeNNArray by .new
  • ARROW-6197 - [GLib] Add garrow_decimal128_rescale()
  • ARROW-6199 - [Java] Avro adapter avoid potential resource leak.
  • ARROW-6203 - [GLib] Add garrow_array_sort_to_indices()
  • ARROW-6204 - [GLib] Add garrow_array_is_in_chunked_array()
  • ARROW-6206 - [Java][Docs] Document environment variables/java properties
  • ARROW-6209 - [Java] Extract set null method to the base class for fixed width vectors
  • ARROW-6212 - [Java] Support vector rank operation
  • ARROW-6216 - [C++][Parquet] Expose codec compression level to user, add to Parquet writer properties
  • ARROW-6217 - [Website] Remove needless _site/ directory
  • ARROW-6219 - [Java] Add API for JDBC adapter that can convert less then the full result set at a time
  • ARROW-6220 - [Java] Add API to avro adapter to limit number of rows returned at a time.
  • ARROW-6225 - [Website] Update arrow-site/README and any other places to point website contributors in right direction
  • ARROW-6229 - [C++][Dataset] implement FileSystemBasedDataSource
  • ARROW-6230 - [R] Reading in Parquet files are 20x slower than reading fst files in R
  • ARROW-6231 - [C++] Allow generating CSV column names
  • ARROW-6232 - [C++] Rename Argsort kernel to SortToIndices
  • ARROW-6237 - [R] Allow compilation flags to be passed for R package with ARROW_R_CXXFLAGS
  • ARROW-6238 - [C++][Dataset] Implement SimpleDataSource, SimpleDataFragment and SimpleScanTask
  • ARROW-6240 - [Ruby] Arrow::Decimal128Array#get_value returns BigDecimal
  • ARROW-6242 - [C++][Dataset] Implement Dataset, Scanner and ScannerBuilder
  • ARROW-6243 - [C++][Dataset] Filter expressions
  • ARROW-6244 - [C++][Dataset] Add partition key to DataSource interface
  • ARROW-6246 - [Website] Add link to R documentation site
  • ARROW-6247 - [Java] Provide a common interface for float4 and float8 vectors
  • ARROW-6249 - [Java] Remove useless class ByteArrayWrapper
  • ARROW-6250 - [Java] Implement ApproxEqualsVisitor comparing approx for floating point
  • ARROW-6252 - [C++][Python] Add Array::Diff in C++ and Array.diff in Python to return diff as string
  • ARROW-6253 - [Python] Expose “enable_buffered_stream” option from parquet::ReaderProperties in pyarrow.parquet.read_table
  • ARROW-6258 - [R] Add macOS build scripts
  • ARROW-6260 - [Website] Use deploy key on Travis to build and push to asf-site
  • ARROW-6262 - [Developer] Show JIRA issue before merging
  • ARROW-6264 - [Java] There is no need to consider byte order in ArrowBufHasher
  • ARROW-6265 - [Java] Avro adapter implement Array/Map/Fixed type
  • ARROW-6267 - [Ruby] Add Arrow::Time for Arrow::Time{32,64}DataType value
  • ARROW-6271 - [Rust][DataFusion] Add example for running SQL against Parquet
  • ARROW-6272 - [Rust][DataFusion] Add register_parquet convenience method to ExecutionContext
  • ARROW-6278 - [R] Read parquet files from raw vector
  • ARROW-6279 - [Python] Add Table.slice, getitem support to match RecordBatch, Array, others
  • ARROW-6284 - [C++] Allow references in std::tuple when converting tuple to arrow array
  • ARROW-6287 - [Rust][DataFusion] TableProvider.scan() returns thread-safe BatchIterator
  • ARROW-6288 - [Java] Implement TypeEqualsVisitor comparing vector type equals considering names and metadata
  • ARROW-6289 - [Java] Add empty() in UnionVector to create instance
  • ARROW-6292 - [C++] Add option to use the mimalloc allocator
  • ARROW-6294 - [C++] Use hyphen for plasma-store-server executable
  • ARROW-6295 - [Rust][DataFusion] ExecutionError Cannot compare Float32 with Float64
  • ARROW-6296 - [Java] Cleanup JDBC interfaces and eliminate one memcopy for binary/varchar fields
  • ARROW-6297 - [Java] Compare ArrowBufPointers by unsinged integers
  • ARROW-6300 - [C++] Add Abort() method to streams
  • ARROW-6303 - [Rust] Add a feature to disable SIMD
  • ARROW-6304 - [Java][Doc] Add a description to each module
  • ARROW-6306 - [Java] Support stable sort by stable comparators
  • ARROW-6310 - [C++] Write 64-bit integers as strings in JSON integration test files
  • ARROW-6311 - [Java] Make ApproxEqualsVisitor accept DiffFunction to make it more flexible
  • ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files.
  • ARROW-6314 - [C#] Implement IPC message format alignment changes, provide backwards compatibility and “legacy” option to emit old message format
  • ARROW-6314 - [C++] Implement IPC message format alignment changes, provide backwards compatibility and “legacy” option to emit old message format
  • ARROW-6315 - [Java] Make change to ensure flatbuffer reads are aligned
  • ARROW-6316 - [Go] implement new ARROW format with 32b-aligned buffers
  • ARROW-6317 - [JS] Implement IPC message format alignment changes
  • ARROW-6318 - [Integration] Run tests against pregenerated files
  • ARROW-6319 - [C++] Move the core of NumericTensor::Value() to Tensor::Value()
  • ARROW-6326 - [C++] Nullable fields when converting std::tuple to Table
  • ARROW-6328 - [Developer][crossbow] Click.option-s should have help text
  • ARROW-6329 - [Format] Add a padding for Flatbuffer alignment, use 8-byte EOS
  • ARROW-6331 - [Java] Incorporate ErrorProne into the java build
  • ARROW-6334 - [Java] Improve the dictionary builder API to return the position of the value in the dictionary
  • ARROW-6335 - [Java] Improve the performance of DictionaryHashTable
  • ARROW-6336 - [Python] Add notes to pyarrow.serialize/deserialize to clarify that these functions do not read or write the standard IPC protocol
  • ARROW-6337 - [R] Changed as_tible to as_dataframe in the R package
  • ARROW-6338 - [R] Type function names don't match type names
  • ARROW-6342 - [Python] Add pyarrow.record_batch factory function with same basic API / semantics as pyarrow.table
  • ARROW-6346 - [GLib] Add garrow_array_view()
  • ARROW-6347 - [GLib] Add garrow_array_diff()
  • ARROW-6350 - [Ruby] Remove Arrow::Struct and use Hash instead
  • ARROW-6351 - [Ruby] Improve Arrow#values performance
  • ARROW-6353 - [Python][C++] Expose compression_level option to parquet.write_table
  • ARROW-6355 - [Java] Make range equal visitor reusable
  • ARROW-6356 - [Java] Avro adapter implement Enum type and nested Record
  • ARROW-6357 - [C++] Issue S3 file writes in the background by default
  • ARROW-6358 - [C++] Add FileSystem::DeleteDirContents
  • ARROW-6360 - [R] Update support for compression
  • ARROW-6362 - [C++] Allow customizing S3 credentials provider
  • ARROW-6365 - [R] Should be able to coerce numeric to integer with schema
  • ARROW-6366 - [Java] Make field vectors final explicitly
  • ARROW-6368 - [C++][Dataset] Add interface for “projecting” RecordBatch from one schema to another, inserting null values where needed
  • ARROW-6373 - [C++] Make FixedWidthBinaryBuilder consistent with other fixed width builders in zeroing memory when appending null batches
  • ARROW-6375 - [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
  • ARROW-6379 - [C++] Write no IPC buffer metadata for NullType
  • ARROW-6381 - [C++] BufferOutputStream::Write does extra work that slows down small writes
  • ARROW-6383 - [Java] Report outstanding child allocators on close
  • ARROW-6384 - [C++] Bump dependency versions
  • ARROW-6385 - [C++] Use xxh3 instead of custom hashing code for non-tiny strings
  • ARROW-6391 - [Python][Flight] Add built-in methods on FlightServerBase to start server and wait for it to be available
  • ARROW-6397 - [C++][CI] Generate minio server connect string
  • ARROW-6401 - [Java] Implement dictionary-encoded subfields for Struct type
  • ARROW-6402 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6403 - [Python] Expose FileReader::ReadRowGroups() to Python
  • ARROW-6408 - [Rust] use “if cfg!” pattern
  • ARROW-6413 - [R] Support autogenerating column names
  • ARROW-6415 - [R] Remove usage of R CMD config CXXCPP
  • ARROW-6416 - [Python] Improve API & documentation regarding chunksizes
  • ARROW-6417 - [C++][Parquet] Miscellaneous optimizations yielding slightly better Parquet binary read performance
  • ARROW-6419 - [Website] Blog post about Parquet dictionary performance work coming in 0.15.x release
  • ARROW-6422 - [Gandiva] Fix double-conversion linker issue
  • ARROW-6426 - [FlightRPC][C++][Java] Expose gRPC configuration knobs
  • ARROW-6427 - [GLib] Add support for column names autogeneration CSV read option
  • ARROW-6438 - [R] : Add bindings for filesystem API
  • ARROW-6447 - [C++] Allow rest of arrow_objlib to build in parallel while memory_pool.cc is waiting on jemalloc_ep
  • ARROW-6450 - [C++] Use 2x reallocation strategy in BufferBuilder instead of 1.5x
  • ARROW-6451 - [Format] Add clarifications to Columnar.rst about the contents of “null” slots in Varbinary or List arrays
  • ARROW-6453 - [C++] More informative error messages with S3
  • ARROW-6454 - [LICENSE] Add LLVM's license due to static linkage
  • ARROW-6458 - [Java] Remove value boxing/unboxing for ApproxEqualsVisitor
  • ARROW-6460 - [Java] Add benchmark and large fake data UT for avro adapter
  • ARROW-6462 - [C++] Fix build error on CentOS 6 x86_64 with bundled double-conversion
  • ARROW-6465 - [Python] Improvement to Windows build instructions
  • ARROW-6474 - [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable
  • ARROW-6475 - [C++] Don't try to dictionary encode dictionary arrays
  • ARROW-6477 - [Packaging][Crossbow] Use Azure Pipelines to build linux packages
  • ARROW-6480 - [Crossbow] Summary report e-mailer with polling logic
  • ARROW-6484 - [Java] Enable create indexType for DictionaryEncoding according to dictionary value count
  • ARROW-6487 - [Rust][DataFusion] Introduce common test module
  • ARROW-6489 - [Developer][Documentation] Fix merge script and readme
  • ARROW-6490 - [Java][Memory] Log error for leak in allocator close
  • ARROW-6491 - [Java][Hotfix] fix master fail caused by ErrorProne
  • ARROW-6494 - [C++][Dataset] Implement basic PartitionScheme
  • ARROW-6504 - [Python][Packaging] Add mimalloc to conda packages for better performance
  • ARROW-6505 - [Website] Add new committers
  • ARROW-6518 - [Packaging][Python] Flight failing in OSX Python wheel builds
  • ARROW-6519 - [Java] Use IPC continuation prefix as part of 8-byte EOS
  • ARROW-6524 - [Developer][Packaging] Nightly build report's subject should contain Arrow
  • ARROW-6525 - [C++] Avoid aborting in CloseFromDestructor()
  • ARROW-6526 - [C++] Poison data in debug mode
  • ARROW-6527 - [C++] Add OutputStream::Write(Buffer)
  • ARROW-6531 - [Python] Add detach() method to buffered streams
  • ARROW-6532 - [R] write_parquet() uses writer properties (general and arrow specific)
  • ARROW-6533 - [R] Compression codec should take a “level”
  • ARROW-6534 - [Java] Fix typos and spelling
  • ARROW-6539 - [R] Provide mechanism to write out old format
  • ARROW-6540 - [R] Add Validate() methods
  • ARROW-6541 - [Format][C++] Update Columnar.rst for two-part EOS, update C++ implementation
  • ARROW-6542 - [R] : Add View() method to array types
  • ARROW-6544 - [R] Documentation/polishing for 0.15 release
  • ARROW-6545 - [Go] update IPC writer to use two-part EOS
  • ARROW-6546 - [C++] Add missing FlatBuffers source dependency
  • ARROW-6549 - [C++] Switch to jemalloc 5.2.x
  • ARROW-6556 - [Python] Fix warning for pandas SparseDataFrame removal
  • ARROW-6556 - [Python] Handle future removal of pandas SparseDataFrame
  • ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas. Add mechanism to preserve “column names” from RecordBatch, Table as Series.name
  • ARROW-6558 - [C++] Refactor Iterator to type erased handle
  • ARROW-6559 - [Developer][C++] Add option to pass ARROW_PACKAGE_PREFIX when using ‘archery benchmark’
  • ARROW-6563 - [Rust][DataFusion] MergeExec
  • ARROW-6569 - [Website] Add support for auto deployment by GitHub Actions
  • ARROW-6570 - [Python] Use Arrow's allocators for creating NumPy array instead of leaving it to NumPy
  • ARROW-6580 - [Java] Support comparison for unsigned integers
  • ARROW-6584 - [Python][Wheel] Bundle zlib again with the windows wheels
  • ARROW-6588 - [C++] Suppress class-memaccess warning with g++ 9.2.1
  • ARROW-6589 - [C++] Error propagation, tests for /MakeArray(OfNulls|FromScalar)/
  • ARROW-6590 - [C++] Do not require ARROW_JSON to build ARROW_IPC when unit tests are off
  • ARROW-6591 - [R] Ignore .Rhistory files in source control
  • ARROW-6599 - [Rust][DataFusion] Add aggregate traits and SUM implementation to physical query plan
  • ARROW-6601 - [Java] Improve JDBC adapter performance & add benchmark
  • ARROW-6605 - [C++][Filesystem] Add recursion depth control to fs::Selector
  • ARROW-6606 - [C++] Add PathTree tree structure
  • ARROW-6609 - [C++] Add Dockerfile for minimal C++ build
  • ARROW-6613 - [C++] Remove dependency on boost::filesystem
  • ARROW-6614 - [C++][Dataset] Implement FileSystemDataSourceDiscovery
  • ARROW-6616 - [Website] Release announcement blog post for 0.15
  • ARROW-6621 - [Rust][DataFusion] Run DataFusion examples in CI
  • ARROW-6629 - [Doc][C++] Add filesystem docs
  • ARROW-6630 - [Doc] Document C++ file formats
  • ARROW-6644 - [JS] Amend NullType IPC protocol to append no buffers
  • ARROW-6647 - [C++] Stop using member initializer for shared_ptr
  • ARROW-6648 - [Go] Expose the bitutil package
  • ARROW-6649 - [R] print methods for Array, ChunkedArray, Table, RecordBatch
  • ARROW-6653 - [Developer] Add support for auto JIRA link on pull request
  • ARROW-6655 - [Python] Filesystem bindings for S3
  • ARROW-6664 - [C++] Add CMake option to build without SSE4.2 instructions
  • ARROW-6665 - [Rust][DataFusion] Implement physical expression for numeric literal types
  • ARROW-6667 - [Python] remove cyclical object references in pyarrow.parquet
  • ARROW-6668 - [Rust][DataFusion] Implement CAST expression
  • ARROW-6669 - [Rust][DataFusion] Implement binary expression for physical plan
  • ARROW-6675 - [JS] Add scanReverse function to dataFrame and filteredDataframe
  • ARROW-6683 - [Python] Test for fastparquet <-> pyarrow cross-compatibility
  • ARROW-6725 - [CI] Disable 3rdparty fuzzit nightly builds
  • ARROW-6735 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6752 - [Go] implement Stringer for Null array
  • ARROW-6755 - [Release] Improvements to Windows release verification script
  • ARROW-6771 - [Packaging][Python] Missing pytest dependency from conda and wheel builds
  • PARQUET-1468 - [C++] Clean up ColumnReader/internal::RecordReader code duplication

Bug Fixes

  • ARROW-1184 - [Java] Dictionary.equals is not working correctly
  • ARROW-2041 - [Python] pyarrow.serialize has high overhead for list of NumPy arrays
  • ARROW-2248 - [Python] Nightly or on-demand HDFS test builds
  • ARROW-2317 - [Python] Fix C linkage warning with Cython
  • ARROW-2490 - [C++] Normalize input stream concurrency
  • ARROW-3176 - [Python] Overflow in Date32 column conversion to pandas
  • ARROW-3203 - [C++] Build error on Debian Buster
  • ARROW-3651 - [Python] Handle ‘datetime’ logical type when reconstructing pandas columns from custom metadata
  • ARROW-3652 - [Python][Parquet] Add unit test exhibiting that pandas.CategoricalIndex survives roundtrip to Parquet format
  • ARROW-3762 - [Python] Add large_memory unit test exercising BYTE_ARRAY overflow edge cases from ARROW-3762
  • ARROW-3933 - [C++][Parquet] Handle non-nullable struct children when reading Parquet file, better error messages
  • ARROW-4187 - [C++] Enable file-benchmark on Windows
  • ARROW-4746 - [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime
  • ARROW-4836 - [C++] Support Tell() on compressed streams
  • ARROW-4848 - [C++] Static libparquet not compiled with -DARROW_STATIC on Windows
  • ARROW-4880 - [Python] Rehabilitate ASV benchmark build scripts
  • ARROW-4883 - [Python] read_csv() returns garbage if given file object in text mode
  • ARROW-5028 - [Python] Avoid malformed ListArray types caused by reaching StringBuilder capacity when converting from Python sequence
  • ARROW-5072 - [Python] write_table fails silently on S3 errors
  • ARROW-5085 - [C++][Parquet][Python] Do not allow reading to dictionary type unless we have implemented support for it
  • ARROW-5086 - [Python][Parquet] Opt in to file memory-mapping when reading Parquet files rather than opting out
  • ARROW-5089 - [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size
  • ARROW-5103 - [Python] Segfault when using chunked_array.to_pandas on array different types (edge case)
  • ARROW-5125 - [Python] Round-trip extreme dates on windows
  • ARROW-5161 - [Python] Cannot convert struct type from Pandas object column
  • ARROW-5220 - [Python] Follow-up to improve error messages and docs for from_pandas schema argument
  • ARROW-5220 - [Python] Specified schema in from_pandas also includes the index
  • ARROW-5292 - [C++] Work around symbol visibility issues so building static libraries is not necessary when building unit tests on WIN32 platform
  • ARROW-5300 - [C++] Remove the ARROW_NO_DEFAULT_MEMORY_POOL macro
  • ARROW-5374 - [Python][C++] Improve ipc.read_record_batch docstring, fix IPC message type error messages generated in C++
  • ARROW-5414 - [C++] default to release build on windows
  • ARROW-5450 - [Python] Always return datetime.datetime in TimestampValue.as_py for units other than nanoseconds
  • ARROW-5471 - [C++][Gandiva] Array offset is ignored in Gandiva projector
  • ARROW-5522 - [Packaging][Documentation] Comments out of date in python/manylinux1/build_arrow.sh
  • ARROW-5525 - [C++] Add Continuous Fuzzing Integration setup with Fuzzit
  • ARROW-5560 - [C++][Plasma] Cannot create Plasma object after OutOfMemory error
  • ARROW-5562 - [C++][Parquet] Write negative zero or small epsilons as positive zero when computing Parquet statistics
  • ARROW-5630 - [C++][Parquet] Fix RecordReader accounting for repeated fields with non-nullable leaf
  • ARROW-5638 - [C++][CMake] Fixes for xcode project builds
  • ARROW-5651 - [Python] Fix Incorrect conversion from strided Numpy array
  • ARROW-5682 - [Python] Raise error when trying to convert non-string dtype to string
  • ARROW-5731 - [CI] Switch turbodbc branch for integration testing
  • ARROW-5753 - [Rust] Fix test failure in CI code coverage
  • ARROW-5772 - [GLib][Plasma][CUDA] Fix a bug that data can't be got
  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5776 - [Gandiva][Crossbow] Use commit id instead of fetch head.
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5817 - [Python] Use pytest mark for flight tests
  • ARROW-5823 - [Rust] CI scripts miss --all-targets cargo argument
  • ARROW-5824 - [Gandiva][C++] Fix decimal null literals.
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5848 - [C++] SO versioning schema after release 1.0.0
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5860 - [Java][Vector] Fix decimal utils to handle negative values.
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5870 - [C++][Docs] Refine source build instructions, do not tell people to install flex/bison if they don't need them
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5884 - [Java] Fix the get method of StructVector
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5894 - [Gandiva][C++] Added a linker script for libgandiva.so to restrict libstdc++ symbols.
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5910 - [Python] Support non-seekable streams in ipc.read_tensor, ipc.read_message, add Message.serialize_to method
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • ARROW-5923 - [C++][Parquet] Reword comment about UBSan and Int96 in writer.cc
  • ARROW-5925 - [Gandiva][C++] fix rounding in decimal to int cast
  • ARROW-5930 - [Python] Make Flight server init phase explicit
  • ARROW-5930 - [FlightRPC][Python] Disable Flight test causing segfault in Travis
  • ARROW-5935 - [C++] ArrayBuilder::type() should be kept accurate
  • ARROW-5946 - [Rust][DataFusion] Fix bug in projection push down logic
  • ARROW-5952 - [Python] fix conversion of chunked dictionary array with 0 chunks
  • ARROW-5959 - [CI] report branch+commit to fuzzit
  • ARROW-5960 - [C++] Fix Boost dependencies link order
  • ARROW-5963 - [R] R Appveyor job does not test changes in the C++ library
  • ARROW-5964 - [C++][Gandiva] Remove overflow check after rounding in BasicDecimal128::FromDouble
  • ARROW-5965 - [Python] Regression: segfault when reading hive table with v0.14
  • ARROW-5966 - [Python] Also use ChunkedStringBuilder when converting NumPy string types to Arrow StringType
  • ARROW-5968 - [Java] Remove duplicate Preconditions check in JDBC adapter
  • ARROW-5969 - [R] Fix R lint Failures
  • ARROW-5973 - [Java] Variable width vectors' get methods should return null when the underlying data is null
  • ARROW-5978 - [FlightRPC][Java] Properly release buffers in Flight integration client
  • ARROW-5989 - [C++] Accommodate openjdk-8 path search prefix
  • ARROW-5990 - [Python] add bounds check to RowGroupMetaData.column
  • ARROW-5992 - [C++][Python] Support String->Binary in Array::View. Add Python bindings for Array::View
  • ARROW-5993 - [Python] Reading a dictionary column from Parquet results in disproportionate memory usage
  • ARROW-5996 - [Java] Avoid potential resource leak in flight service
  • ARROW-5999 - [C++] decouple Iterator from ARROW_DATASETS
  • ARROW-6002 - [C++][Gandiva] test casting int64 to decimal
  • ARROW-6004 - [C++] Turn non-ignored empty CSV lines into null/empty values
  • ARROW-6005 - [C++] extend GetRecordBatchReader test to cover reading a single row group
  • ARROW-6006 - [C++] Do not fail to read empty IPC stream with schema having dictionary types
  • ARROW-6012 - [C++] Fall back on known Apache mirror for Thrift downloads
  • ARROW-6015 - [Python] Add note to python/README.md about installing Visual C++ Redistributable on Windows when using pip
  • ARROW-6016 - [Python] Fix get_library_dirs() when Arrow installed as a system package
  • ARROW-6029 - [R] Improve R docs on how to fix library version mismatch
  • ARROW-6032 - [C++] Ensure 64-bit pointer alignment in CountSetBits()
  • ARROW-6038 - [C++] Faster type equality
  • ARROW-6040 - [Java] Dictionary entries are required in IPC streams even when empty
  • ARROW-6046 - [C++] Do not write excess varbinary offsets in IPC messages from sliced BinaryArray
  • ARROW-6047 - [Rust] Rust nightly 1.38.0 builds failing
  • ARROW-6050 - [Java] Update out-of-date java/flight/README.md
  • ARROW-6054 - [Python] Fix the type erasion bug when serializing structured type ndarray.
  • ARROW-6058 - [C++][Parquet] Validate whole ColumnChunk raw data reads so that underlying filesystem issues are caught earlier
  • ARROW-6059 - [Python] Regression memory issue when calling pandas.read_parquet
  • ARROW-6060 - [C++] ChunkedBinaryBuilder should only grow when necessary, address runaway memory use in Parquet binary column read
  • ARROW-6061 - [C++] Add ARROW_JSON feature flag for configuring arrow builds without RapidJSON
  • ARROW-6066 - [Website] Fix blog post author header
  • ARROW-6067 - [Python] Fix failing large memory Python tests
  • ARROW-6068 - [C++] Allow passing Field instances to StructArray::Make
  • ARROW-6073 - [C++] Reset Decimal128Builder in Finish().
  • ARROW-6082 - [Python] check type of the index_type passed to pa.dictionary()
  • ARROW-6092 - [Python] Fix C++ arrow-python-test on Python 2.7
  • ARROW-6095 - [C++] Fix unit test build when only building static libraries, add cpp-static-only to tests.yml
  • ARROW-6108 - [C++] Workaround Windows CRT crash on invalid locale
  • ARROW-6116 - [C++][Gandiva] Fix bug in TimedTestFilterAdd2
  • ARROW-6117 - [Java] Fix the set method of FixedSizeBinaryVector
  • ARROW-6119 - [Python] PyArrow wheel import fails on Windows Python 3.7
  • ARROW-6120 - [C++] Forbid use of in public header files
  • ARROW-6126 - [C++] Return error when an IPC stream terminates in the middle of receiving dictionaries
  • ARROW-6132 - [Python] validate result in ListArray.from_arrays
  • ARROW-6135 - [C++] Make KeyValueMetadata::Equals() order-insensitive
  • ARROW-6136 - [FlightRPC][Java] don't double-close response stream
  • ARROW-6145 - [Java] UnionVector created by MinorType#getNewVector could not keep field type info properly
  • ARROW-6148 - [Packaging] Improve aarch64 support
  • ARROW-6152 - [C++][Parquet] Add parquet::ColumnWriter::WriteArrow method, refactor
  • ARROW-6153 - [R] Address parquet deprecation warning
  • ARROW-6158 - [C++/Python] Validate child array types with type fields of StructArray
  • ARROW-6159 - [C++] Properly indent first line of PrettyPrint with Schema
  • ARROW-6160 - [Java] AbstractStructVector#getPrimitiveVectors fails to work with complex child vectors
  • ARROW-6166 - [Go] Fix index out of bounds panic when slicing a slice
  • ARROW-6167 - [R] macOS binary R packages on CRAN don't have arrow_available
  • ARROW-6168 - [C++] IWYU docker-compose job is broken
  • ARROW-6170 - [R] Faster docker-compose build
  • ARROW-6171 - [R][CI] Fix R library search path
  • ARROW-6174 - [C++] Validate chunks in ChunkedArray::Validate. Fix validation of sliced ListArray, values null checks
  • ARROW-6175 - [Java] Fix MapVector#getMinorType and extend AbstractContainerVector addOrGet complex vector API
  • ARROW-6178 - [Developer] Keep prompting for authors in merge script for multi-author PRs if given bad input
  • ARROW-6182 - [R] Add note to README about r-arrow conda installation
  • ARROW-6186 - [Packaging][deb] Add missing headers to libplasma-dev for Ubuntu 16.04
  • ARROW-6190 - [C++] Define and declare functions regardless of NDEBUG
  • ARROW-6193 - [GLib] Add missing require in test
  • ARROW-6200 - [Java] Method getBufferSizeFor in BaseRepeatedValueVector/ListVector not correct
  • ARROW-6202 - [Java] Add unit test for large resultsets
  • ARROW-6205 - [C++] ARROW_DEPRECATED warning when including io/interfaces.h
  • ARROW-6208 - [Java] Correct byte order before comparing in ByteFunctionHelpers
  • ARROW-6210 - [Java] remove equals API from ValueVector
  • ARROW-6211 - [Java] Remove dependency on RangeEqualsVisitor from ValueVector interface
  • ARROW-6214 - [R] Add R sanitizer docker image
  • ARROW-6215 - [Java] Fix case when ZeroVector is compared against other vector types
  • ARROW-6218 - [Java] Add UINT type test in integration to avoid potential overflow
  • ARROW-6223 - [C++] Configuration error with Anaconda Python 3.7.4
  • ARROW-6224 - [Python] fix deprecated usage of .data (previouly Column.data)
  • ARROW-6227 - [Python] Apply from_pandas option in pyarrow.array consistently across types
  • ARROW-6234 - [Java] ListVector hashCode() is not correct
  • ARROW-6241 - [Java] Failures on master
  • ARROW-6255 - [Rust] [Parquet] Cannot use any published parquet crate due to parquet-format breaking change
  • ARROW-6259 - [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings
  • ARROW-6263 - [Python] Use RecordBatch::Validate in RecordBatch.from_arrays. Normalize API vs. Table.from_arrays. Add record_batch factory function
  • ARROW-6266 - [Java] Resolve the ambiguous method overload in RangeEqualsVisitor
  • ARROW-6268 - [Java] Empty buffers to have a valid address.
  • ARROW-6269 - [C++] check decimal precision in IPC code
  • ARROW-6270 - [C++] check buffer_index bounds in IpcComponentSource.GetBuffer
  • ARROW-6290 - [Rust][DataFusion] Fix bug in type coercion rule
  • ARROW-6291 - [C++] Do not override ARROW_PARQUET if other PARQUET options are enabled
  • ARROW-6293 - [Rust] datafusion 0.15.0-SNAPSHOT error
  • ARROW-6301 - [C++][Python] Prevent ExtensionType-related race condition in Python process teardown by exposing shared_ptr to global “ExtensionTypeRegistry”
  • ARROW-6302 - [C++][Parquet][Python] Restore ordered type property when reading dictionary type with serialized Arrow schema
  • ARROW-6309 - [C++][Parquet] Stop needless static linking
  • ARROW-6323 - [R] Expand file paths when passing to readers
  • ARROW-6325 - [Python] fix conversion of strided boolean arrays
  • ARROW-6330 - [C++] Include missing API headers
  • ARROW-6332 - [Java][C++][Gandiva] Misc fixes for varwidth vector allocation.
  • ARROW-6339 - [Python] Raise ValueError when accessing unset statistics
  • ARROW-6343 - [Java][Vector] Fix allocation helper.
  • ARROW-6344 - [C++][Gandiva] Handle multibyte characters in substring function
  • ARROW-6345 - [C++][Python] “ordered” flag seemingly not taken into account when comparing DictionaryType values for equality
  • ARROW-6348 - [R] arrow::read_csv_arrow namespace error when package not loaded
  • ARROW-6354 - [C++] Fix failing build when ARROW_PARQUET=OFF
  • ARROW-6363 - [R] segfault in Table__from_dots with unexpected schema
  • ARROW-6364 - [R] Handling unexpected input to time64() et al:
  • ARROW-6369 - [C++] Handle Array.to_pandas case for type=list
  • ARROW-6371 - [Doc] Row to columnar conversion example mentions arrow::Column in comments
  • ARROW-6372 - [Rust][Datafusion] Casting from Un-signed to Signed Integers not supported
  • ARROW-6376 - [Developer] Use target ref of PR when merging instead of hard-coding “master”
  • ARROW-6387 - [Archery] Errors with make
  • ARROW-6392 - [FlightRPC][Python] check type of list_flights result
  • ARROW-6395 - [Python] Bug when using bool arrays with stride greater than 1
  • ARROW-6406 - [C++] Fix jemalloc URL for offline build in thirdparty/versions.txt
  • ARROW-6411 - [Python][Parquet] Improve performance of DictEncoder::PutIndices
  • ARROW-6412 - [C++] Improve TCP port allocation in tests
  • ARROW-6418 - [C++][Plasma] Remove cmake project directive for plasma
  • ARROW-6423 - [C++] Fix crash when trying to instantiate Snappy CompressedOutputStream
  • ARROW-6424 - [C++] Fix IPC fuzzing test name
  • ARROW-6425 - [C++] ValidateArray fail for slice of list array
  • ARROW-6428 - [CI][Crossbow] Nightly turbodbc job fails
  • ARROW-6430 - [CI][Crossbow] Nightly R docker job fails
  • ARROW-6431 - [Python] Test suite fails without pandas installed
  • ARROW-6432 - [CI][Crossbow] Remove alpine nightly crossbow jobs
  • ARROW-6433 - [Java][CI] Fix java docker image
  • ARROW-6434 - [CI][Crossbow] Nightly HDFS integration job fails
  • ARROW-6435 - [Python] Use pandas null coding consistently on List and Struct types
  • ARROW-6440 - [Packaging][deb] Follow plasma-store-server name change
  • ARROW-6441 - [Packaging][RPM] Follow plasma-store-server name change
  • ARROW-6442 - [CI][Crossbow] Nightly gandiva jar osx build fails
  • ARROW-6443 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-6444 - [CI][Crossbow] Nightly conda Windows builds fail (time out)
  • ARROW-6446 - [OSX][Python][Wheel] Turn off ORC feature in the wheel building scripts
  • ARROW-6449 - [R] io “tell()” methods are inconsistently named and untested
  • ARROW-6457 - [C++] Always set CMAKE_BUILD_TYPE if it is not defined
  • ARROW-6461 - [Java] Prevent EchoServer from closing the client socket after writing
  • ARROW-6472 - [Java] ValueVector#accept may has potential cast exception
  • ARROW-6476 - [Java][CI] Fix java docker build script
  • ARROW-6478 - [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues
  • ARROW-6481 - [C++] Avoid copying large ConvertOptions
  • ARROW-6488 - [Python] fix equality with pyarrow.NULL to return NULL
  • ARROW-6492 - [Python] Handle pandas_metadata created by fastparquet with missing field_name
  • ARROW-6502 - [GLib][CI] Pin gobject-introspection gem to 3.3.7
  • ARROW-6506 - [C++] Fix validation of ExtensionArray with struct storage type
  • ARROW-6509 - [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure
  • ARROW-6509 - [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily
  • ARROW-6520 - [Python] More consistent handling of specified schema when creating Table
  • ARROW-6522 - [Python] Fix failing pandas tests on older pandas / older python
  • ARROW-6530 - [CI][Crossbow][R] Nightly R job doesn't install all dependencies
  • ARROW-6550 - [C++] Filter expressions PR failing manylinux package builds
  • ARROW-6551 - [Python] Dask Parquet integration test failure
  • ARROW-6552 - [C++] boost::optional in STL test fails compiling in gcc 4.8.2
  • ARROW-6560 - [Python] Fix nopandas integration tests
  • ARROW-6561 - [Python] Fix python tests to pass on pandas master
  • ARROW-6562 - [GLib] Fix returning wrong sliced data of GArrowBuffer
  • ARROW-6564 - [Python] Do not require pandas for invoking Array.array
  • ARROW-6565 - [Rust][DataFusion] Fix intermittent test failure
  • ARROW-6568 - [C++] ChunkedArray constructor needs type when chunks is empty
  • ARROW-6572 - [C++] Fix Parquet decoding returning uninitialized data
  • ARROW-6573 - [Python] Add test case to probe additional behavior in schema-data mismatch in Table.from_pydict
  • ARROW-6576 - [R] Fix sparklyr integration tests
  • ARROW-6586 - [Python][Packaging] Windows wheel builds failing with “DLL load failure”
  • ARROW-6597 - [Python] Sanitize Python datetime handling
  • ARROW-6618 - [Python] Fix read_message() segfault on end of stream
  • ARROW-6620 - [Python][CI] pandas-master build failing due to removal of “to_sparse” method
  • ARROW-6622 - [R] Normalize paths for filesystem API on Windows
  • ARROW-6623 - [CI][Python] Dask docker integration test broken perhaps by statistics-related change
  • ARROW-6639 - [Packaging][RPM] Add support for CentOS 7 on aarch64
  • ARROW-6640 - [C++] Do not reset buffer_pos_ in BufferedInputStream/OutputStream when enlarging buffer
  • ARROW-6641 - [C++] Remove Deprecated WriteableFile warning
  • ARROW-6642 - [Python] Link parent objects in Parquet's metadata and statistics objects
  • ARROW-6651 - Fix conda R job
  • ARROW-6652 - [Python] Fix ChunkedArray.to_pandas to retain timezone
  • ARROW-6652 - [Python] Fix Array.to_pandas to retain timezone
  • ARROW-6660 - [Rust][DataFusion] Minor docs update for 0.15.0 release
  • ARROW-6670 - [CI][R] Fix fixes for R nightly jobs
  • ARROW-6674 - [Python] Fix or ignore the test warnings
  • ARROW-6677 - [FlightRPC][C++] Document Flight in C++
  • ARROW-6678 - [C++][Parquet] Binary data stored in Parquet metadata must be base64-encoded to be UTF-8 compliant
  • ARROW-6679 - [RELEASE] Add license info for the autobrew scripts
  • ARROW-6682 - [C#] Ensure file footer block lengths are always 8 byte aligned.
  • ARROW-6687 - [Rust][DataFusion] Add regression tests for np.nan parquet file
  • ARROW-6687 - [Rust][DataFusion] Bug fix in DataFusion Parquet reader
  • ARROW-6701 - [C++][R] Lint failing on R cpp code
  • ARROW-6703 - [Packaging][Linux] Restore ARROW_VERSION environment variable
  • ARROW-6705 - [Rust][DataFusion] README has invalid github URL
  • ARROW-6709 - [JAVA] Jdbc adapter currentIndex should increment when va…
  • ARROW-6714 - [R] Fix untested RecordBatchWriter case
  • ARROW-6716 - [Rust] Bump nightly to nightly-2019-09-25 to fix CI
  • ARROW-6748 - [RUBY] gem compilation error
  • ARROW-6751 - [CI] ccache doesn't cache on Travis-CI
  • ARROW-6760 - [C++] JSON: improve error message when column changed type
  • ARROW-6773 - [C++] Filter kernel returns invalid data when filtering with an Array slice
  • ARROW-6796 - Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table
  • ARROW-7112 - Wrong contents when initializinga pyarrow.Table from boolean DataFrame
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files
  • PARQUET-1631 - [C++] ParquetInputWrapper::GetSize returns Tell
  • PARQUET-1640 - [C++] Fix crash in parquet-encoding-benchmark

Apache Arrow 0.14.1 (2019-07-22)

Bug Fixes

  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5791 - [C++] Fix infinite loop with more the 32768 columns.
  • ARROW-5816 - [Release] Do not curl in background in verify-release-candidate.sh
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files

New Features and Improvements

  • ARROW-5101 - [Packaging] Avoid bundling static libraries in Windows conda packages
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5785 - [Rust] Make the datafusion cli dependencies optional
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicated known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5958 - [Python] Link zlib statically in the wheels

Apache Arrow 0.14.0 (2019-07-04)

New Features and Improvements

  • ARROW-258 - [Format] clarify definition of Buffer in context of RPC, IPC, File
  • ARROW-653 - [Python / C++] Add debugging function to print an array's buffer contents in hexadecimal
  • ARROW-767 - [C++] Filesystem abstraction
  • ARROW-835 - [Format][C++][Java] Create a new Duration type
  • ARROW-840 - [Python] Expose extension types
  • ARROW-973 - [Website] Add FAQ page
  • ARROW-1012 - [C++] Configurable batch size for parquet RecordBatchReader
  • ARROW-1207 - [C++] Implement MapArray, MapBuilder, MapType classes, and IPC support
  • ARROW-1261 - [Java] Add MapVector with reader and writer
  • ARROW-1278 - [Integration] Adding integration tests for fixed_size_list
  • ARROW-1279 - [Integration] Enable MapType integration tests
  • ARROW-1280 - [C++] add fixed size list type
  • ARROW-1349 - [Packaging] Provide APT and Yum repositories
  • ARROW-1496 - [JS] Upload coverage data to codecov.io
  • ARROW-1558 - [C++] Implement boolean filter (selection) kernel, rename comparison kernel-related functions
  • ARROW-1587 - [Format] Add metadata for user-defined logical types
  • ARROW-1774 - [C++] Add Array::View()
  • ARROW-1833 - [Java] Add accessor methods for data buffers that skip null checking
  • ARROW-1957 - [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
  • ARROW-1983 - [C++][Parquet] Add AppendRowGroups and WriteMetaDataFile methods
  • ARROW-2057 - [Python] Expose option to configure data page size threshold in parquet.write_table
  • ARROW-2102 - [C++] Implement Take kernel
  • ARROW-2103 - [C++] Implement take kernel functions - string/binary value type
  • ARROW-2104 - [C++] take kernel functions for nested types
  • ARROW-2105 - [C++] Implement take kernel functions - properly handle special indices
  • ARROW-2186 - [C++] Clean up architecture specific compiler flags
  • ARROW-2217 - [C++] Add option to use dynamic linking for compression library dependencies
  • ARROW-2298 - [Python] Add unit tests to assert that float64 with NaN values can be safely coerced to integer types when converting from pandas
  • ARROW-2412 - [Integration] Add nested dictionary test case, skipped for now
  • ARROW-2467 - [Rust] Add generated IPC code
  • ARROW-2517 - [Java] Add list<decimal> writer
  • ARROW-2618 - [Rust] Bitmap constructor should accept for flag for default state (0 or 1)
  • ARROW-2667 - [C++/Python] Add pandas-like take method to Array
  • ARROW-2707 - [C++] Add Table::Slice
  • ARROW-2709 - [Python] write_to_dataset poor performance when splitting
  • ARROW-2730 - [C++] Set up CMAKE_C_FLAGS more thoughtfully instead of using CMAKE_CXX_FLAGS
  • ARROW-2796 - [C++] Simplify version script used for linking
  • ARROW-2818 - [Python] Better error message when trying to convert sparse pandas data to arrow Table
  • ARROW-2835 - [C++] Make file position undefined after ReadAt()
  • ARROW-2969 - [R] Convert between StructArray and “nested” data.frame column containing data frame in each cell
  • ARROW-2981 - [C++] improve clang-tidy usability
  • ARROW-2984 - [JS] Refactor release verification script to share code with main source release verification script
  • ARROW-3040 - [Go] add support for comparing Arrays
  • ARROW-3041 - [Go] add support for TimeArray
  • ARROW-3052 - [C++] Detect Apache ORC C++ libraries in system/conda toolchain, add to conda requirements
  • ARROW-3087 - [C++] Implement Compare filter kernel
  • ARROW-3144 - [C++/Python] Move “dictionary” member from DictionaryType to ArrayData to allow for variable dictionaries
  • ARROW-3150 - [Python] Enable Flight in Python wheels for Linux and Windows
  • ARROW-3166 - [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp
  • ARROW-3191 - [Java] Make ArrowBuf work with arbitrary underlying memory
  • ARROW-3200 - [C++] Support dictionaries in Flight streams
  • ARROW-3290 - [C++] Toolchain support for secure gRPC
  • ARROW-3294 - [C++][Flight] Support Flight on Windows
  • ARROW-3314 - [R] Set -rpath using pkg-config when building
  • ARROW-3330 - [C++] Spawn multiple Flight performance servers in flight-benchmark to test parallel get performance
  • ARROW-3419 - [C++] Run include-what-you-use checks as nightly build
  • ARROW-3459 - [C++][Gandiva] Add support for variable length output vectors
  • ARROW-3475 - [C++] Allow builders to finish to the corresponding array type
  • ARROW-3570 - [Packaging] Don't bundle test data files with python wheels
  • ARROW-3572 - [Crossbow] Raise more helpful exception if Crossbow queue has an SSH origin URL
  • ARROW-3671 - [Go] implement MonthInterval and DayTimeInterval
  • ARROW-3676 - [Go] implement Decimal128 array
  • ARROW-3679 - [Go] implement read/write IPC for Decimal128
  • ARROW-3680 - [Go] implement Float16 array
  • ARROW-3686 - [Python] support masked arrays in pa.array
  • ARROW-3702 - [R] POSIXct mapped to DateType not TimestampType?
  • ARROW-3714 - [CI] Run RAT checks in pre-commit hooks
  • ARROW-3729 - [C++][Parquet] Use logical annotations in Arrow Parquet reader/writer
  • ARROW-3732 - [R] Add functions to write RecordBatch or Schema to Message value, then read back
  • ARROW-3758 - [R] Build R library and dependencies on Windows in Appveyor CI
  • ARROW-3759 - [R][CI] Build and test (no libarrow) on Windows in Appveyor
  • ARROW-3767 - [C++] Add cast from null to any other type
  • ARROW-3780 - [R] : Failed to fetch data: invalid data when collecting int16
  • ARROW-3791 - [C++ / Python] Add boolean type inference to the CSV parser
  • ARROW-3794 - [R] : Consider mapping INT8 to integer() not raw()
  • ARROW-3804 - [R] Support older versions of R runtime
  • ARROW-3810 - [R] type= argument for Array and ChunkedArray
  • ARROW-3811 - [R] : Support inferring data.frame column as StructArray in array constructors
  • ARROW-3814 - [R] RecordBatch$from_arrays()
  • ARROW-3815 - [R] : refine record batch factory
  • ARROW-3848 - [R] allow nbytes to be missing in RandomAccessFile$Read()
  • ARROW-3897 - [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file
  • ARROW-3904 - [C++/Python] Validate scale and precision of decimal128 type
  • ARROW-4013 - [Docs][C++] Add how to build on MSYS2
  • ARROW-4020 - [Release] Add a post release script to remove RC
  • ARROW-4047 - [Python] Document use of int96 timestamps and options in Parquet docs
  • ARROW-4086 - [Java] Add apis to debug memory alloc failures
  • ARROW-4121 - [C++] Refactor memory allocation from InvertKernel
  • ARROW-4159 - [C++] Build with -Wdocumentation when using clang and BUILD_WARNING_LEVEL=CHECKIN
  • ARROW-4194 - [Format][Docs] Remove duplicated / out-of-date logical type information from documentation
  • ARROW-4302 - [C++] Add OpenSSL to C++ build toolchain (#4384)
  • ARROW-4337 - [C#] Implemented Fluent API for building arrays and record batches
  • ARROW-4343 - [C++] Add docker-compose test for gcc 4.8 / Ubuntu 14.04 (Trusty), expand Xenial/16.04 Dockerfile to test Flight
  • ARROW-4356 - [CI] Add integration (docker) test for turbodbc
  • ARROW-4369 - [Packaging] Release verification script should test linux packages via docker
  • ARROW-4452 - [Python] Serialize sparse torch tensors
  • ARROW-4453 - [Python] Create Cython wrappers for SparseTensor
  • ARROW-4467 - [Rust][DataFusion] Create a REPL & Dockerfile for DataFusion
  • ARROW-4503 - [C#] Eliminate allocations in ArrowStreamReader when reading from a Stream
  • ARROW-4504 - [C++] Reduce number of C++ unit test executables from 128 to 82
  • ARROW-4505 - [C++] adding pretty print for dates, times, and timestamps
  • ARROW-4566 - [Flight] Add option to run Flight benchmark against separate server
  • ARROW-4596 - [Rust][DataFusion] Implement COUNT
  • ARROW-4622 - [C++][Python] MakeDense and MakeSparse in UnionArray should accept a vector of Field
  • ARROW-4625 - [Flight][Java] Add method to await Flight server termination in Java
  • ARROW-4626 - [Flight] Add application-defined metadata to DoGet/DoPut
  • ARROW-4627 - [Flight] Add application metadata field to DoPut
  • ARROW-4701 - [C++] Add JSON chunker benchmarks
  • ARROW-4702 - [C++] Update dependency versions
  • ARROW-4708 - [C++] add multithreaded json reader
  • ARROW-4708 - [C++] refactoring JSON parser to prepare for multithreaded impl
  • ARROW-4714 - [C++][JAVA] Providing JNI interface to Read ORC file via Arrow C++
  • ARROW-4717 - [C#] Consider exposing ValueTask instead of Task
  • ARROW-4719 - [C#] Implement ChunkedArray, Column and Table in C#
  • ARROW-4741 - [Java] Add missing type javadoc and enable checkstyle
  • ARROW-4787 - [C++] Add support for Null in MemoTable and related kernels
  • ARROW-4788 - [C++] Less verbose API for constructing StructArray
  • ARROW-4800 - [C++] Introduce a Result class
  • ARROW-4805 - [Rust] Write temporal arrays to CSV
  • ARROW-4806 - [Rust] Temporal array casts
  • ARROW-4824 - [Python] Fix error checking in read_csv()
  • ARROW-4827 - [C++] Implement benchmark comparison
  • ARROW-4847 - [Python] Add pyarrow.table factory function
  • ARROW-4904 - [C++] Move implementations in arrow/ipc/test-common.h into libarrow_testing
  • ARROW-4911 - [R] Progress towards completing windows support
  • ARROW-4912 - [C++] add method for easy renaming of a Table's columns
  • ARROW-4913 - [Java][Memory] Add additional methods for observing allocations.
  • ARROW-4945 - [Flight] Enable integration tests in Travis
  • ARROW-4956 - [C#] Allow ArrowBuffers to wrap external Memory
  • ARROW-4959 - [C++][Gandiva][Crossbow] Gandiva crossbow packaging changes.
  • ARROW-4968 - [Rust] Assert that struct array field types match data in…
  • ARROW-4971 - [Go] Add type equality test function
  • ARROW-4972 - [Go] implement ArrayEquals
  • ARROW-4973 - [Go] implement ArraySliceEqual
  • ARROW-4974 - [Go] implement ArrayApproxEqual
  • ARROW-4990 - [C++] Support Array-Array comparison
  • ARROW-4993 - [C++] Add simple build configuration summary
  • ARROW-5000 - [Python] Fix ‘SO’ DeprecationWarning in setup.py
  • ARROW-5007 - [C++] Remove DCHECK in intrinsic headers
  • ARROW-5020 - [CI] Split Gandiva-related packages into separate .yml file
  • ARROW-5027 - [Python] Python bindings for JSON reader
  • ARROW-5037 - [Rust] [DataFusion] Refactor aggregate module
  • ARROW-5038 - [Rust][DataFusion] Implement AVG aggregate function
  • ARROW-5039 - [Rust][DataFusion] Re-implement CAST support
  • ARROW-5040 - [C++] ArrayFromJSON can't parse Timestamp from strings
  • ARROW-5045 - [Rust] Code coverage silently failing in CI
  • ARROW-5053 - [Rust][DataFusion] Use ARROW_TEST_DATA env var
  • ARROW-5054 - [Release][Flight] Test Flight in Linux/macOS release verification scripts
  • ARROW-5056 - [Packaging] Adjust conda recipes to use ORC conda-forge package on unix systems
  • ARROW-5061 - [Release] Improve 03-binary performance
  • ARROW-5062 - [Java][FlightRPC] Shade com.google.guava usage in Flight
  • ARROW-5063 - [FlightRPC][Java] Test that Flight client connections are independent
  • ARROW-5064 - [Release] Pass PKG_CONFIG_PATH to glib in the verification script
  • ARROW-5066 - [Integration] Add flags to enable/disable implementations in integration/integration_test.py
  • ARROW-5071 - [Archery] Implement running benchmark suite
  • ARROW-5076 - [Release] Improve post binary upload performance
  • ARROW-5077 - [Rust] Change Cargo.toml to use release versions
  • ARROW-5078 - [Documentation] Sphinx is failed by RemovedInSphinx30Warning
  • ARROW-5079 - [Release] Add a script that releases C# package
  • ARROW-5080 - [Release] Add a script that releases Rust packages
  • ARROW-5081 - [C++] Use PATH_SUFFIXES when searching for dependencies
  • ARROW-5083 - [Developer] PR merge script improvements: set already-released Fix Version, display warning when no components set
  • ARROW-5088 - [C++] Only add -Werror in debug builds. Add C++ documentation about compiler warning levels
  • ARROW-5091 - [Flight] Rename FlightGetInfo message to FlightInfo
  • ARROW-5093 - [Packaging] Add support for selective binary upload
  • ARROW-5094 - [Packaging] Add APT/Yum verification scripts
  • ARROW-5102 - [C++] Reduce header dependencies
  • ARROW-5108 - [Go] implement reading primitive arrays from Arrow file
  • ARROW-5109 - [Go] implement reading binary/string arrays from Arrow file
  • ARROW-5110 - [Go] implement reading struct arrays from Arrow file
  • ARROW-5111 - [Go] implement reading list arrays from Arrow file
  • ARROW-5112 - [Go] implement writing IPC Arrow stream/file
  • ARROW-5113 - [C++] Fix DoPut with dictionary arrays, add tests
  • ARROW-5115 - [JS] Add Vector Builders and high-level stream primitives
  • ARROW-5116 - [Rust] move kernel related files under compute/kernels
  • ARROW-5124 - [C++] Add support for Parquet in MinGW build
  • ARROW-5126 - [Rust][Parquet] Convert parquet column desc to arrow data type
  • ARROW-5127 - [Rust][Parquet] Add page iterator.
  • ARROW-5136 - [Flight] Call options
  • ARROW-5137 - [Flight] Implement auth API
  • ARROW-5145 - [C++] More input validation in release mode
  • ARROW-5150 - [Ruby] Add Arrow::Table#raw_records
  • ARROW-5155 - [GLib][Ruby] Add support for building union arrays from data type
  • ARROW-5157 - [Website] Add MATLAB to powered by Apache Arrow website
  • ARROW-5162 - [Rust][Parquet] Rename mod reader to arrow.
  • ARROW-5163 - [Gandiva] Cast timestamp/date are incorrectly evaluating year 0097 to 1997
  • ARROW-5164 - [Gandiva][C++] Introduce murmur32 for 32 bit types.
  • ARROW-5165 - [Python] update dev installation docs for --build-type + validate in setup.py
  • ARROW-5168 - [GLib] Add garrow_array_take()
  • ARROW-5171 - [C++] Use LESS instead of LOWER in compare enum
  • ARROW-5172 - [Go] implement reading fixed-size binary arrays from Arrow file
  • ARROW-5178 - [Python] Add Table.from_pydict()
  • ARROW-5179 - [Python] Return plain dicts, not OrderedDict, on Python 3.7+
  • ARROW-5185 - [C++] Add support for Boost with CMake configuration file
  • ARROW-5187 - [Rust] Add ability to convert StructArray to RecordBatch
  • ARROW-5188 - [Rust] Add temporal types to struct builders
  • ARROW-5189 - [Rust][Parquet] Format / display individual fields within a parquet row
  • ARROW-5190 - [R] : Discussion: tibble dependency in R package
  • ARROW-5191 - [Rust] Expose CSV and JSON reader schemas
  • ARROW-5203 - [GLib] Add support for Compare filter
  • ARROW-5204 - [C++] Improve builder performance
  • ARROW-5212 - [Go] Support reserve for the data buffer in the BinaryBuilder
  • ARROW-5218 - [C++] Improve build when third-party library locations are specified
  • ARROW-5219 - [C++] Build protobuf_ep in parallel when using Ninja build
  • ARROW-5222 - [Python] Revise pyarrow installation instructions for macOS
  • ARROW-5225 - [Java] Improve performance of BaseValueVector#getValidityBufferSizeFromCount
  • ARROW-5226 - [Gandiva] Add cmp functions for decimals
  • ARROW-5238 - [Python] Convert arguments to pyarrow.dictionary
  • ARROW-5241 - [Python] expose option to disable writing statistics to parquet file
  • ARROW-5250 - [Java] Add javadoc comments to public methods, remove style check suppression.
  • ARROW-5252 - [C++] Use standard-compliant std::variant backport
  • ARROW-5256 - [C++] Add support for LLVM 7.1
  • ARROW-5257 - [Website] Update site to use “official” Apache Arrow logo, add clearly marked links to logo
  • ARROW-5258 - [C++/Python] Collect file metadata of dataset pieces
  • ARROW-5261 - [C++] Add missing scalar defintions for Intervals
  • ARROW-5262 - [Python] Fix typo
  • ARROW-5264 - [Java] Allow enabling/disabling boundary checking by environmental variable
  • ARROW-5266 - [Go] implement read/write IPC for Float16
  • ARROW-5268 - [GLib] Add GArrowJSONReader
  • ARROW-5269 - [C++][Archery] Mark relevant benchmarks as regression
  • ARROW-5275 - [C++] Generic filesystem tests
  • ARROW-5281 - [Rust] Extract DataPageBuilder to test common
  • ARROW-5284 - [Rust] Replace libc with std::alloc for memory allocation
  • ARROW-5286 - [Python] support struct type in from_pandas
  • ARROW-5288 - [Documentation] Enhance the contribution guidelines page
  • ARROW-5289 - [C++] Move arrow/util/concatenate* to arrow/array
  • ARROW-5290 - [Java] Provide a flag to enable/disable null-checking in vector's get methods
  • ARROW-5291 - [Python] Add wrapper for take kernel on Array
  • ARROW-5298 - [Rust] Add debug implementation for buffer data.
  • ARROW-5299 - [C++] ListArray comparison is incorrect
  • ARROW-5309 - [Python] clarify that Schema.append returns new object
  • ARROW-5311 - [C++] use more specific error status types in take
  • ARROW-5313 - [Format] Comments on Field table are a bit confusing
  • ARROW-5317 - [Rust][Parquet] impl IntoIterator for SerializedFileReader
  • ARROW-5319 - [C++][CI][travis skip]
  • ARROW-5321 - [Gandiva][C++] add isnull impl for string types
  • ARROW-5323 - [CI][skip travis]
  • ARROW-5328 - [R] Add shell scripts to do a full package rebuild and test locally
  • ARROW-5329 - [MATLAB] Add support for building MATLAB interface to Feather directly within MATLAB
  • ARROW-5334 - [C++] Ensure all type classes end with “Type”
  • ARROW-5335 - [Python] Raise exception on variable dictionaries in conversion to Python/pandas
  • ARROW-5339 - [C++] Add jemalloc URL to thirdparty/versions.txt so download_dependencies.sh gets it
  • ARROW-5341 - [C++][Documentation] developers/cpp.rst should mention documentation warnings
  • ARROW-5342 - [Format] Formalize “extension types” in Arrow protocol metadata
  • ARROW-5346 - [C++] Revert changed to vendored datetime library
  • ARROW-5349 - [C++][Parquet] Add method to set file path in a parquet::FileMetaData instance
  • ARROW-5361 - [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144
  • ARROW-5363 - [GLib] Fix coding styles
  • ARROW-5364 - [C++] Use ASCII rather than UTF-8 in BuildUtils.cmake comment
  • ARROW-5365 - [C++][CI] Enable ASAN/UBSAN in CI
  • ARROW-5368 - [C++] Disable jemalloc by default with MinGW
  • ARROW-5369 - [C++] Add support for glog on Windows
  • ARROW-5370 - [C++] Use system uriparser if available
  • ARROW-5372 - [GLib] Add support for null/boolean values CSV read option
  • ARROW-5378 - [C++] Local filesystem implementation
  • ARROW-5384 - [Go] implement FixedSizeList array
  • ARROW-5389 - [C++] Add Temporary Directory facility
  • ARROW-5392 - [C++][CI] Disable static build with MinGW on AppVeyor
  • ARROW-5393 - [R] Add tests and example for read_parquet()
  • ARROW-5395 - [C++] Utilize stream EOS in File format
  • ARROW-5396 - [JS] Support files and streams with no record batches
  • ARROW-5401 - [CI][skip appveyor]
  • ARROW-5404 - [C++] force usage of nonstd::sv_lite::string_view instead of std::string_view
  • ARROW-5407 - [C++] Allow building only integration test targets
  • ARROW-5413 - [C++] Skip UTF8 BOM in CSV files
  • ARROW-5415 - [Release] Release script should update R version everywhere
  • ARROW-5416 - [Website] Add Homebrew to project installation page
  • ARROW-5418 - [CI][R] Run code coverage and report to codecov.io
  • ARROW-5420 - [Java] Implement or remove getCurrentSizeInBytes in Variab…
  • ARROW-5427 - [Python] pandas conversion preserve_index=True to force RangeIndex serialization
  • ARROW-5428 - [C++] Add option to set “read extent” in arrow::io::BufferedInputStream
  • ARROW-5429 - [Java] Provide alternative buffer allocation policy
  • ARROW-5432 - [Python] Add NativeFile.read_at()
  • ARROW-5433 - [C++][Parquet] Improve parquet-reader columns information, strip trailing whitespace from test case
  • ARROW-5434 - [Memory][Java] Introduce wrappers for backward compatibility.
  • ARROW-5436 - [Python] parquet.read_table add filters keyword
  • ARROW-5438 - [JS] EOS bytes for sequential readers
  • ARROW-5441 - [C++] Implement FindArrowFlight.cmake
  • ARROW-5442 - [Website] Clarify what makes a release artifact “official”
  • ARROW-5443 - [Crossbow] Turn parquet build off for Gandiva.
  • ARROW-5447 - [Ruby] Ensure flushing test gz file
  • ARROW-5449 - [C++] Test extended-length paths on Windows
  • ARROW-5451 - [C++][Gandiva] Support cast/round functions for decimal
  • ARROW-5452 - [R] Add API documentation website (pkgdown)
  • ARROW-5461 - [Java] Add micro-benchmarks for Float8Vector and allocators
  • ARROW-5463 - [Rust] Add AsRef trait for Buffer.
  • ARROW-5464 - [Archery] Fix default diff --benchmark-filter
  • ARROW-5465 - [Crossbow] Support writing submitted job definition yaml to a file
  • ARROW-5466 - [Java] Dockerize Java builds in Travis CI, run multiple JDKs in single entry
  • ARROW-5467 - [Go] implement read/write IPC for Time32/64 arrays
  • ARROW-5468 - [Go] implement read/write IPC for Timestamp arrays
  • ARROW-5469 - [Go] implement read/write IPC for Date32/64 arrays
  • ARROW-5470 - [CI] Fix Travis-CI R job that broke with the local fs patch
  • ARROW-5472 - [Development] Add warning to PR merge tool if no JIRA component is set
  • ARROW-5474 - [C++] Document Boost 1.58 as minimum supported version, add docker-compose entry for it, fix broken cpp/Dockerfile* builds
  • ARROW-5475 - [Python] Add Python binding for arrow::Concatenate
  • ARROW-5476 - [Java][Memory] Fix Netty Arrow Buf.
  • ARROW-5477 - [C++] Check required RapidJSON version
  • ARROW-5478 - [Packaging] Drop Ubuntu 14.04 support
  • ARROW-5481 - [GLib] Add “error” parameter document
  • ARROW-5485 - [C++] Install libraries from googletest_ep into build output directory on non-Windows platforms.
  • ARROW-5485 - [Crossbow] Disable unit tests in Gandiva macOS crossbow job until underlying issue resolved
  • ARROW-5486 - [GLib] Add binding of gandiva::FunctionRegistry and related things
  • ARROW-5488 - [R] Workaround when C++ lib not available
  • ARROW-5490 - [C++] Remove ARROW_BOOST_HEADER_ONLY
  • ARROW-5491 - [C++] Remove unecessary semicolons following MACRO definitions
  • ARROW-5492 - [R] Add “col_select” argument to read_* functions to read subset of columns
  • ARROW-5495 - [C++] Update some dependency URLs from http to https
  • ARROW-5496 - [R][CI] Fix relative paths in R codecov.io reporting
  • ARROW-5498 - [C++][CI] Fix Flatbuffers related error with MinGW
  • ARROW-5499 - [R] Alternate bindings for when libarrow is not found
  • ARROW-5500 - [R] read_csv_arrow() signature should match readr::read_csv()
  • ARROW-5503 - [R] : add read_json()
  • ARROW-5504 - [R] : move use_threads argument to global option
  • ARROW-5509 - [R] Add basic write_parquet
  • ARROW-5511 - [Packaging] Enable Flight in Conda packages
  • ARROW-5512 - [C++] Rough API skeleton for C++ Datasets API / framework
  • ARROW-5513 - [Java] Refactor method name for getstartOffset to use camel case
  • ARROW-5516 - [Python][Documentation] Development page for pyarrow has a missing dependency in using pip
  • ARROW-5518 - [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear
  • ARROW-5524 - [C++] Turn off PARQUET_BUILD_ENCRYPTION in CMake if OpenSSL not found (#4494)
  • ARROW-5526 - [GitHub] Add more prominent notice to ISSUE_TEMPLATE.md to direct bug reports to JIRA
  • ARROW-5529 - [Flight] Allow serving with multiple TLS certificates
  • ARROW-5531 - [Python] Implement Array.from_buffers for varbinary and nested types, add DataType.num_buffers property
  • ARROW-5533 - [C++][Plasma] make plasma client thread safe
  • ARROW-5534 - [GLib] Add garrow_table_concatenate()
  • ARROW-5535 - [GLib] Add garrow_table_slice()
  • ARROW-5537 - [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
  • ARROW-5538 - [C++] Restrict minimum OpenSSL version to 1.0.2
  • ARROW-5541 - [R] : cast from negative int32 to uint32 and uint64 are now safe
  • ARROW-5544 - [Archery] Don't return non-zero on regressions
  • ARROW-5545 - [C++][Docs] Clarify expectation of UTC values for timestamps with time zones
  • ARROW-5547 - [C++][FlightRPC] Support pkg-config for Arrow Flight
  • ARROW-5552 - [Go] make Schema, Field and simpleRecord implement Stringer
  • ARROW-5554 - [Python] Added a python wrapper for arrow::Concatenate()
  • ARROW-5555 - [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries
  • ARROW-5556 - [Doc][Python] Document JSON reader
  • ARROW-5557 - [C++] Add VisitBits benchmark
  • ARROW-5565 - [Python][Docs] Add instructions how to use gdb to debug C++ libraries when running Python unit tests
  • ARROW-5567 - [C++] Fix build error of memory-benchmark
  • ARROW-5571 - [R] Rework handing of ARROW_R_WITH_PARQUET
  • ARROW-5574 - [R] documentation error for read_arrow()
  • ARROW-5581 - [Java] Provide interfaces and initial implementations for vector sorting
  • ARROW-5582 - [Go] implement RecordEqual
  • ARROW-5586 - [R] convert Array of LIST type to R lists
  • ARROW-5587 - [Java] Add more style check rule for Java code
  • ARROW-5590 - [R] Run “no libarrow” R build in the same CI entry if possible
  • ARROW-5591 - [Go] implement read/write IPC for Duration & Intervals
  • ARROW-5597 - [Packaging] Add Flight deb packages
  • ARROW-5600 - [R] R package namespace cleanup
  • ARROW-5602 - [Java][Gandiva] Add tests for round/cast
  • ARROW-5604 - [Go] improve coverage of TypeTraits
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5612 - [Python][Doc] Add prominent note that date_as_object option changed with Arrow 0.13
  • ARROW-5621 - [Go] implement read/write IPC for Decimal128 arrays
  • ARROW-5622 - [C++][Dataset] Support pkg-config for Arrow Datasets
  • ARROW-5625 - [R] convert Array of struct type to data frame columns
  • ARROW-5632 - [Doc] Basic instructions for using Xcode with Arrow
  • ARROW-5633 - [Python] Enable bz2 in Linux wheels
  • ARROW-5635 - [C++] Added a Compact() method to Table.
  • ARROW-5637 - [Java][C++][Gandiva] Complete In Expression Support
  • ARROW-5639 - [Java] Remove floating point computation from getOffsetBufferValueCapacity
  • ARROW-5641 - [GLib] Remove enums files generated by GNU Autotools from Git targets
  • ARROW-5643 - [FlightRPC] Add ability to override SSL hostname checking
  • ARROW-5650 - [Python] Update manylinux dependency versions
  • ARROW-5652 - [CI] Fix lint docker image
  • ARROW-5653 - [CI] Fix cpp docker image
  • ARROW-5656 - [Python][Packaging] Fix macOS wheel builds, add Flight support
  • ARROW-5659 - [C++] Add support for finding OpenSSL installed by Homebrew
  • ARROW-5660 - [GLib][CI] Use Xcode 10.2
  • ARROW-5661 - [Gandiva][C++] support hash functions for decimals in gandiva
  • ARROW-5662 - [C++] Add support for BOOST_SOURCE=AUTO|BUNDLED|SYSTEM
  • ARROW-5663 - [Packaging][RPM] Update CentOS packages for 0.14.0
  • ARROW-5664 - [Crossbow] Execute nightly crossbow tests on CircleCI instead of Travis
  • ARROW-5668 - [C++/Python] Include ‘not null’ in schema fields pretty print
  • ARROW-5669 - [Python][Packaging] Add ARROW_TEST_DATA env variable to Crossbow Linux Wheel build
  • ARROW-5670 - [Crossbow] get_apache_mirror.py fails with TLS error on macOS with Python 3.5
  • ARROW-5671 - [crossbow] mac os python wheels failing
  • ARROW-5672 - [Java] Refactor redundant method modifier
  • ARROW-5683 - [R] Add snappy to Rtools Windows builds
  • ARROW-5684 - [Packaging][deb] Add support for Ubuntu 19.04
  • ARROW-5685 - [Packaging][deb] Add support for Apache Arrow Datasets
  • ARROW-5687 - [C++] Remove remaining uses of ARROW_BOOST_VENDORED
  • ARROW-5690 - [Packaging][Python] Fix macOS wheel building
  • ARROW-5694 - [Python] Support list of Decimals in conversion to pandas
  • ARROW-5695 - [C#][Release] Run sourcelink test in verify-release-candidate.sh
  • ARROW-5696 - [C++][Gandiva] Introduce castVarcharVarchar
  • ARROW-5699 - [C++] Optimize decimal128 parsing
  • ARROW-5701 - [C++][Gandiva] Build expr with specific sv
  • ARROW-5702 - [C++] parquet::arrow::FileReader::GetSchema()
  • ARROW-5704 - [C++] Stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl
  • ARROW-5705 - [Java] Optimize BaseValueVector#computeCombinedBufferSize logic
  • ARROW-5706 - [Java] Remove type conversion in getValidityBufferValueCapacity
  • ARROW-5707 - [Java] Improve the performance and code structure for ArrowRecordBatch
  • ARROW-5710 - [C++] Allow compiling Gandiva with Ninja on Windows
  • ARROW-5715 - [Release] Verify Ubuntu 19.04 APT repository
  • ARROW-5718 - [R] auto splice data frames in record_batch() and table()
  • ARROW-5720 - [C++] Create benchmarks for decimal related classes.
  • ARROW-5721 - [Rust] Move array related code into a separate module
  • ARROW-5724 - [R][CI] AppVeyor build should use ccache
  • ARROW-5725 - [Crossbow] Port conda recipes to azure pipelines
  • ARROW-5726 - [Java] Implement a common interface for int vectors
  • ARROW-5727 - [Python][CI] Install pytest-faulthandler before running tests
  • ARROW-5748 - [Packaging][deb] Add support for Debian GNU/Linux buster
  • ARROW-5749 - [Python] Added python binding for Table::CombineChunks
  • ARROW-5751 - [Python][Packaging] Ensure that c-ares is linked statically in Python wheels
  • ARROW-5752 - [Java] Improve the performance of ArrowBuf#setZero
  • ARROW-5755 - [Rust][Parquet] Derive clone for Type.
  • ARROW-5768 - [Release] Remove needless empty lines at the end of CHANGELOG.md
  • ARROW-5773 - [R] Clean up documentation before release
  • ARROW-5780 - [C++] Add benchmark for Decimal operations
  • ARROW-5782 - [Release] Setup test data for Flight in dev/release/01-perform.sh
  • ARROW-5783 - [Release][C#] Exclude dummy.git from RAT check
  • ARROW-5785 - [Rust] Rust datafusion implementation should not depend on rustyline
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicate known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5818 - [Java][Gandiva] support varlen output vectors
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5826 - [Website] Blog post for 0.14.0 release announcement
  • PARQUET-1243 - [C++] Throw more informative exception when reading a length-0 Parquet file
  • PARQUET-1411 - [C++] Add parameterized logical annotations to Parquet metadata
  • PARQUET-1422 - [C++] Use common Arrow IO interfaces throughout codebase
  • PARQUET-1517 - [C++] Crypto package updates to match the final spec
  • PARQUET-1523 - [C++] Vectorize Comparator interface, remove virtual calls on inner loop. Refactor Statistics to not require PARQUET_EXTERN_TEMPLATE
  • PARQUET-1569 - [C++] Consolidate shared unit testing header files
  • PARQUET-1582 - [C++] Add ToString method to ColumnDescriptor
  • PARQUET-1583 - [C++] Remove superfluous parquet::Vector class
  • PARQUET-1586 - [C++] Add --dump options to parquet-reader tool to dump def/rep levels
  • PARQUET-1603 - [C++] rename parquet::LogicalType to parquet::ConvertedType

Bug Fixes

  • ARROW-61 - [Java] Method can return the value bigger than long MAX_VALUE
  • ARROW-352 - [Format] Interval(DAY_TIME) has no unit
  • ARROW-1837 - [Java][Integration] Fix unsigned round trip integration tests
  • ARROW-2119 - [IntegrationTest] Add test case with a stream having no record batches
  • ARROW-2136 - [Python] Check null counts for non-nullable fields when converting from pandas.DataFrame with supplied schema
  • ARROW-2256 - [C++] Fix libfuzzer builds for clang-7
  • ARROW-2461 - [Python] Build manylinux2010 wheels
  • ARROW-2590 - [Python] Pyspark python_udf serialization error on grouped map (Amazon EMR)
  • ARROW-3344 - [Python] Disable flaky Plasma test
  • ARROW-3399 - [Python] Implementing numpy matrix serialization
  • ARROW-3650 - [Python] warn on converting DataFrame with mixed type column names
  • ARROW-3801 - [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
  • ARROW-4021 - [Ruby] Error building red-arrow on msys2
  • ARROW-4076 - [Python] Validate ParquetDataset schema after filtering
  • ARROW-4139 - [Python][Parquet] Wrap new parquet::LogicalType, cast min/max statistics based on LogicalType
  • ARROW-4301 - [Java] use arrow-jni profile for both gandiva/orc
  • ARROW-4301 - [Java][Gandiva] Update version manually
  • ARROW-4324 - [Python] Triage broken type inference logic in presence of a mix of NumPy dtype-having objects and other scalar values
  • ARROW-4350 - [Python] Fix conversion from Python to Arrow with nested lists and NumPy dtype=object items
  • ARROW-4433 - [R] Segmentation fault when instantiating arrow::table from data frame
  • ARROW-4447 - [C++] Investigate dynamic linking for libthift
  • ARROW-4516 - [Python] Error while creating a ParquetDataset on a path without `_common_dataset` but with an empty `_tempfile`
  • ARROW-4523 - [JS] Add row proxy generation benchmark
  • ARROW-4651 - [Flight] Use URIs instead of host/port pair
  • ARROW-4665 - [C++] With glog activated, DCHECK macros are redefined
  • ARROW-4675 - [Python] Fix pyarrow.deserialize failure when reading payload in Python 3 payload generated in Python 2
  • ARROW-4694 - [CI] Improve detect-changes.py on Travis PRs
  • ARROW-4723 - [Python] Ignore “hidden” files that starts with underscore
  • ARROW-4725 - [C++] Enable dictionary builder tests with MinGW build
  • ARROW-4823 - [C++][Python] Do not close raw file handle in ReadaheadSpooler, check that file handles passed to read_csv are not closed
  • ARROW-4832 - [Python] pandas Index metadata for RangeIndex is incorrect
  • ARROW-4845 - [R] Compiler warnings on Windows MingW64
  • ARROW-4851 - [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off
  • ARROW-4877 - [Plasma] CI failure in test_plasma_list
  • ARROW-4884 - [C++] conda-forge thrift-cpp package not available via pkg-config or cmake
  • ARROW-4885 - [C++/Python] Enable Decimal parsing in CSV
  • ARROW-4886 - [Rust] Cast to list with offset
  • ARROW-4923 - [Java] Add methods to set long value at given index in DecimalVector
  • ARROW-4934 - [Python] Address deprecation notice that will be a bug in Python 3.8
  • ARROW-5019 - [C#] ArrowStreamWriter doesn't work on a non-seekable stream
  • ARROW-5049 - [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark
  • ARROW-5051 - [GLib][Gandiva] Don't return temporary memory
  • ARROW-5055 - [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
  • ARROW-5058 - [Release] Fix typos in vote e-mail template
  • ARROW-5059 - [C++][Gandiva] cbrt_* floating point tests can fail due to exact comparisons
  • ARROW-5065 - [Rust] cast kernel does not support casting from Int64
  • ARROW-5068 - [Gandiva][Packaging] Fix gandiva nightly builds after the CMake refactor
  • ARROW-5090 - Parquet linking fails on MacOS due to @rpath in dylib
  • ARROW-5092 - [C#] Create a dummy .git directory to download the source files from GitHub with Source Link
  • ARROW-5095 - [Flight][C++] Expose server error message in DoGet
  • ARROW-5096 - [Packaging][deb] Add missing plasma-store-server packages
  • ARROW-5097 - [Packaging][CentOS6] Remove needless dependencies
  • ARROW-5098 - [Website] Update how to install .deb by APT
  • ARROW-5100 - [JS] Remove swap while collapsing contiguous buffers
  • ARROW-5117 - [Go] fix panic when nil or empty slices are appended to builders
  • ARROW-5119 - [Go] fix Boolean stringer implementation
  • ARROW-5122 - [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
  • ARROW-5128 - [Packaging][CentOS][Conda] Numpy not found in nightly builds
  • ARROW-5129 - [Rust] Column writer bug: check dictionary encoder when adding a new data page
  • ARROW-5130 - [C++][Python] Limit exporting of std::* symbols
  • ARROW-5132 - [Java] Errors on building gandiva_jni.dll on Windows with Visual Studio 2017
  • ARROW-5138 - [Python] Add documentation about pandas preserve_index option
  • ARROW-5140 - [Bug?][Parquet] Can write a jagged array column of strings to disk, but hit `ArrowNotImplementedError` on read
  • ARROW-5142 - , ARROW-5732, ARROW-5735: [CI] Emergency fixes
  • ARROW-5144 - [Python] ParquetDataset and ParquetPiece not serializable
  • ARROW-5146 - [Dev] Fix project name inference in merge script
  • ARROW-5147 - [C++] Add missing dependencies to Brewfile
  • ARROW-5148 - [Gandiva] Allow linking with RTTI-disabled LLVM builds
  • ARROW-5149 - [Packaging][Wheel] Pin LLVM to version 7 in windows builds
  • ARROW-5152 - [Python] Fix CMake warnings
  • ARROW-5159 - [Rust] Unable to build benches in arrow crate.
  • ARROW-5160 - [C++] Don't evaluate expression twice in ABORT_NOT_OK
  • ARROW-5166 - [Python][Parquet] Statistics for uint64 columns may overflow
  • ARROW-5167 - [C++] Upgrade string-view-light to latest
  • ARROW-5169 - [Python] preserve field nullability of specified schema in Table.from_pandas
  • ARROW-5173 - [Go] handle multiple concatenated record batches
  • ARROW-5174 - [Go] implement Stringer for DataTypes
  • ARROW-5177 - [C++/Python] Check column index when reading Parquet column
  • ARROW-5183 - [CI] Fix AppVeyor failure
  • ARROW-5184 - [Rust] Broken links and other documentation warnings
  • ARROW-5186 - [Plasma] Fix crash caused by improper free on CUDA memory
  • ARROW-5194 - [C++][Plasma] TEST(PlasmaSerialization, GetReply) is failing
  • ARROW-5195 - [C++] Detect null strings in CSV string columns
  • ARROW-5201 - [Python] handle collections.abc deprecation warnings
  • ARROW-5208 - [Python] Add mask argument to pyarrow.infer_type, do not look at masked values when inferring output type in pyarrow.array
  • ARROW-5214 - [C++] Fix thirdparty download script
  • ARROW-5217 - [Rust][DataFusion] Fix failing tests
  • ARROW-5232 - [Java] Avoid runaway doubling of vector size
  • ARROW-5233 - [Go] Migrate to flatbuffers-v1.11.0
  • ARROW-5237 - [Python] populate _pandas_api.version
  • ARROW-5240 - [C++][CI] pin cmake_format
  • ARROW-5242 - [C++] Update vendored HowardHinnant/date to master
  • ARROW-5243 - [Java][Gandiva] Add decimal compare tests
  • ARROW-5245 - [CI][C++] Unpin cmake format (current version is 5.1)
  • ARROW-5246 - [Go] use Go-1.12.x in CI
  • ARROW-5249 - [Java] Add auth capability to Flight async operations (#4238)
  • ARROW-5253 - [C++] Fix snappy external build
  • ARROW-5254 - [Flight][Java] Change Flight doAction to allow multiple responses in Java
  • ARROW-5255 - [Java] Proof-of-concept of Java extension types
  • ARROW-5260 - [Python] Fix crash when deserializating from components in another process
  • ARROW-5274 - [JavaScript] Wrong array type for countBy
  • ARROW-5283 - [C++][Plasma] Erase object id in client when abort object
  • ARROW-5285 - [C++][Plasma] Implement to release GpuProcessHandle
  • ARROW-5293 - [C++] Take kernel on DictionaryArray does not preserve ordered flag
  • ARROW-5294 - [Python][CI] Fix manylinux1 build
  • ARROW-5296 - [Java] Ignore timeout-based Flight tests for now
  • ARROW-5301 - [Python] update parquet docs on multithreading
  • ARROW-5304 - [C++] fix thread-safe on CudaDeviceManager::GetInstance
  • ARROW-5306 - [CI][GLib] Disable GTK-Doc
  • ARROW-5308 - [Go] remove deprecated Feather format
  • ARROW-5314 - [Go] fix bug for String Arrays with offset
  • ARROW-5314 - [Go] Fix bug for FixedSizeBinary with offset
  • ARROW-5318 - [Python] pyarrow hdfs reader overrequests
  • ARROW-5325 - [Archery][Benchmark] Output properly formatted jsonlines from benchmark diff cli command
  • ARROW-5330 - [CI][skip appveyor]
  • ARROW-5332 - [R] Update R package README with richer installation instructions
  • ARROW-5348 - [Java][CI] Add missing gandiva javadoc
  • ARROW-5360 - [Rust] Update rustyline to fix build
  • ARROW-5362 - [C++] Fix compression test memory usage
  • ARROW-5371 - [Release] Add tests for dev/release/00-prepare.sh
  • ARROW-5373 - [Java] Add missing details for Gandiva Java Build
  • ARROW-5376 - [C++] Workaround for gcc 5.4.0 bug
  • ARROW-5383 - [Go] Update flatbuf for new Duration type
  • ARROW-5387 - [Go] properly handle sub-slice of List
  • ARROW-5388 - [Go] use arrow.TypeEquals in array.NewChunked
  • ARROW-5390 - [CI][skip appveyor]
  • ARROW-5397 - [FlightRPC] Add TLS certificates for testing Flight
  • ARROW-5398 - [Python] Fix Flight tests
  • ARROW-5403 - [C++] Use GTest shared libraries with BUNDLED build, always use BUNDLED with MSVC
  • ARROW-5411 - [C++][Python] Build error building on Mac OS Mojave
  • ARROW-5412 - [Integration] Add Java option for netty reflection
  • ARROW-5419 - [C++] Allow recognizing empty strings as null strings in CSV files
  • ARROW-5421 - [Packaging][Crossbow] Duplicated key in nightly test configuration
  • ARROW-5422 - [CI] [C++] Build failure with Google Benchmark
  • ARROW-5430 - [Python] Raise ArrowInvalid for pyints larger than int64
  • ARROW-5435 - [Java] Add test for IntervalYearVector#getAsStringBuilder
  • ARROW-5437 - [Python] Missing pandas pytest marker from parquet tests
  • ARROW-5446 - [C++][CMake] Install arrow/util/config.h into CMAKE_INSTALL_INCLUDEDIR
  • ARROW-5448 - [C++][CI][MinGW][skip travis]
  • ARROW-5453 - [C++] Update to cmake-format=0.5.2 and pin again
  • ARROW-5455 - [Rust] Build broken by 2019-05-30 Rust nightly
  • ARROW-5456 - [GLib][Plasma] Fix dependency order on building document
  • ARROW-5457 - [GLib][Plasma] Fix environment variable name for test
  • ARROW-5459 - [Go] implement Stringer for float16 DataType
  • ARROW-5462 - [Go] support writing zero-length List arrays
  • ARROW-5479 - [Rust][DataFusion] Use ARROW_TEST_DATA instead of relative path for testing
  • ARROW-5487 - [Docs] Fix Sphinx failure
  • ARROW-5493 - [Go][Integration] add Go support for IPC integration tests
  • ARROW-5507 - [Plasma][CUDA] Fix compile error
  • ARROW-5514 - [C++] Fix pretty-printing uint64 values
  • ARROW-5517 - [C++] Only check header basename for ‘internal’ when collecting public headers
  • ARROW-5520 - [Packaging][deb] Add support for building on arm64
  • ARROW-5521 - [Packaging] Use Apache RAT 0.13
  • ARROW-5528 - [C++] Fixed a bug when Concatenate() arrays with no value buffers.
  • ARROW-5532 - [JS] Field Metadata Not Read
  • ARROW-5551 - [Go] implement FixedSizeArrays with 2-buffers layout
  • ARROW-5553 - [Ruby] Use the official packages to install Apache Arrow
  • ARROW-5576 - [C++] Query ASF mirror system for URL and use when downloading Thrift
  • ARROW-5577 - [C++][Alpine] Correct googletest shared library paths on non-Windows to fix Alpine build
  • ARROW-5583 - [Java] When the isSet of a NullableValueHolder is 0, the buffer field should not be used
  • ARROW-5584 - [Java] Add import for link reference in FieldReader javadoc
  • ARROW-5589 - [C++] Add missing nullptr check during flatbuffer decoding
  • ARROW-5592 - [Go] implement Duration array
  • ARROW-5596 - [Python] Fix Python-3 syntax only in test_flight.py
  • ARROW-5601 - [C++][Gandiva] fail if the output type is not supported
  • ARROW-5603 - [Python] Register custom pytest markers to avoid warnings
  • ARROW-5605 - [C++] Verify Flatbuffer messages in more places to prevent crashes due to bad inputs
  • ARROW-5606 - [Python] deal with deprecated RangeIndex._start/_stop/_step
  • ARROW-5608 - [C++][parquet] Fix invalid memory access when using parquet::arrow::ColumnReader
  • ARROW-5615 - [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R literal
  • ARROW-5616 - [C++][Python] Fix -Wwrite-strings warning when building against Python 2.7 headers
  • ARROW-5617 - [C++] thrift_ep 0.12.0 fails to build when using ARROW_BOOST_VENDORED=ON
  • ARROW-5619 - [C++] Make get_apache_mirror.py workable with Python 3.5
  • ARROW-5623 - [GLib][CI] Use system Meson on macOS
  • ARROW-5624 - [C++] Fix typo causing build failure when -Duriparser_SOURCE=BUNDLED
  • ARROW-5626 - [C++] Fix caching of expressions with decimals
  • ARROW-5629 - [C++] Fix Coverity issues
  • ARROW-5631 - [C++] Fix FindBoost targets with cmake3.2
  • ARROW-5644 - [Python] test_flight.py::test_tls_do_get appears to hang
  • ARROW-5647 - [Python] Accessing a file from Databricks using pandas read_parquet using the pyarrow engine fails with : Passed non-file path: /mnt/aa/example.parquet
  • ARROW-5648 - [C++] Avoid using codecvt
  • ARROW-5654 - [C++][Python] Add ChunkedArray::Validate method that checks chunk types for consistency, invoke in Python
  • ARROW-5657 - [C++] “docker-compose run cpp” broken in master
  • ARROW-5674 - [Python] Missing pandas pytest markers from test_parquet.py
  • ARROW-5675 - [Doc] Fix typo in Xcode workflow documentation
  • ARROW-5678 - [R][Lint] Fix hadolint docker linting error
  • ARROW-5693 - [Go] skip IPC integration tests for Decimal128
  • ARROW-5697 - [GLib] Use system pkg-config in c_glib/Dockerfile to correctly find system libraries such as libglib
  • ARROW-5698 - [R] Fix docker-compose build
  • ARROW-5709 - [C++] Fix gandiva-date_time_test failure on Windows
  • ARROW-5714 - [JS] Inconsistent behavior in Int64Builder with/without BigNum
  • ARROW-5723 - [C++][Arrow] Fix crossbow failure
  • ARROW-5728 - [Python] Pin jpype1 version to 0.6.3 due to CI breakage from 0.7.0
  • ARROW-5729 - [Python][Java] ArrowType.Int object has no attribute ‘isSigned’
  • ARROW-5730 - [Python][CI] Selectively skip test cases in the dask integration test
  • ARROW-5732 - [C++] macOS builds failing idiosyncratically on master with warnings from pmmintrin.h
  • ARROW-5735 - [C++] Appveyor builds failing persistently in thrift_ep build
  • ARROW-5737 - [Crossbow] Use Python version version 2.7 in the gandiva tasks
  • ARROW-5738 - [Crossbow][Conda] OSX package builds are failing with missing intrinsics
  • ARROW-5739 - [CI] Fix python docker image
  • ARROW-5750 - [Java] Fix java compilation errors
  • ARROW-5754 - [C++] Add override mark for ~GrpcStreamWriter
  • ARROW-5765 - [C++] Fix TestDictionary.Validate in release mode, add docker-compose job for testing C++ release build
  • ARROW-5769 - [Release] Ensure setting up test data in dev/release/00-prepare.sh
  • ARROW-5770 - [C++] Fix -Wpessimizing-move in result.h
  • ARROW-5771 - [Python] Add pytz to conda_env_python.yml to fix python-nopandas build
  • ARROW-5774 - [Java][Documentation] Document the need to checkout git submodules for flight
  • ARROW-5781 - [Archery] Ensure benchmark clone accepts remote in revision
  • ARROW-5791 - [Python] pyarrow.csv.read_csv hangs + eats all RAM
  • ARROW-5816 - [Release] Parallel curl does not work reliably in verify-release-candidate-sh
  • ARROW-5922 - [Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API
  • PARQUET-1402 - [C++] Parquet files with dictionary page offset as 0 is not readable
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1565 - [C++] Add default case to catch all unhandled physical types
  • PARQUET-1571 - [C++] Fix BufferedInputStream when buffer exactly exhausted
  • PARQUET-1574 - [C++] fix parquet-encoding-test
  • PARQUET-1581 - [C++] Fix undefined behavior in encoding.cc

Apache Arrow 0.13.0 (2019-04-01)

Bug Fixes

  • ARROW-295 - [Documentation] Add DOAP file
  • ARROW-1171 - [C++] Segmentation faults on Fedora 24 with pyarrow-manylinux1 and self-compiled turbodbc
  • ARROW-2392 - [C++] Check schema compatibility when writing a RecordBatch
  • ARROW-2399 - [Rust] Builder<T> should not provide a set() method
  • ARROW-2598 - [Python] table.to_pandas segfault
  • ARROW-3086 - [GLib] GISCAN fails due to conda-shipped openblas
  • ARROW-3096 - [Python] Update Python source build instructions given Anaconda/conda-forge toolchain migration
  • ARROW-3133 - [C++] Remove allocation from Binary Boolean Kernels.
  • ARROW-3133 - [C++] Remove allocations from InvertKernel
  • ARROW-3208 - [C++] Fix Cast dictionary to numeric segfault
  • ARROW-3426 - [CI] Java integration test very verbose
  • ARROW-3564 - [C++] Fix dictionary encoding logic for Parquet 2.0
  • ARROW-3578 - [Release] Resolve all hard and symbolic links in tar.gz
  • ARROW-3593 - [R] CI builds failing due to GitHub API rate limits
  • ARROW-3606 - [Crossbow] Fix flake8 crossbow warnings
  • ARROW-3669 - [Python] Raise error on Numpy byte-swapped array
  • ARROW-3843 - [C++][Python] Allow a “degenerate” Parquet file with no columns
  • ARROW-3923 - [Java] JDBC Time Fetches Without Timezone
  • ARROW-4007 - [Java][Plasma] Plasma JNI tests failing
  • ARROW-4050 - [Python][Parquet] core dump on reading parquet file
  • ARROW-4081 - [Go] Sum methods panic when the array is empty
  • ARROW-4104 - [Java] race in AllocationManager during release
  • ARROW-4108 - [Python/Java] Spark integration tests do not work
  • ARROW-4117 - [Python] “asv dev” command fails with latest revision
  • ARROW-4140 - [C++][Gandiva] Compiled LLVM bitcode file path may result in libraries being non-relocatable
  • ARROW-4145 - [C++] Find Windows-compatible strptime implementation
  • ARROW-4181 - [Python] Fixes for Numpy struct array conversion
  • ARROW-4192 - [CI] Fix broken dev/run_docker_compose.sh script
  • ARROW-4213 - [Flight] Fix incompatibilities between C++ and Java
  • ARROW-4244 - [Format] Clarify padding/alignment rationale/recommendation.
  • ARROW-4250 - [C++] adding explicit epsilon for ApproxEquals and corresponding assert macro
  • ARROW-4252 - [C++] Fix missing Status code and newline
  • ARROW-4253 - [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
  • ARROW-4254 - [C++][Gandiva] Build with Boost from Ubuntu Trusty apt
  • ARROW-4255 - [C++] Eagerly initialize name_to_index_ to avoid race
  • ARROW-4261 - [C++] Make CMake paths for IPC, Flight, Thrift, and Plasma subproject compatible
  • ARROW-4264 - [C++] Clarify use of DCHECKs in Kernels
  • ARROW-4267 - [C++/Parquet] Handle duplicate and struct columns in RowGroup reads
  • ARROW-4274 - [C++][Gandiva] split decimal into two parts
  • ARROW-4275 - [C++][Gandiva] Fix slow decimal test
  • ARROW-4280 - Update README.md to reflect parquet deps
  • ARROW-4282 - [Rust] builder benchmark is broken
  • ARROW-4284 - [C#] File / Stream serialization fails due to type mismatch / missing footer
  • ARROW-4295 - [C++][Plasma] Fix incorrect log message
  • ARROW-4296 - [Plasma] Use one mmap file by default, prevent crash with -f
  • ARROW-4308 - [Python] pyarrow has a hard dependency on pandas
  • ARROW-4311 - [Python] Regression on pq.ParquetWriter incorrectly handling source string
  • ARROW-4312 - [C++] Only run 2 * os.cpu_count() clang-format instances at once
  • ARROW-4319 - [C++][Plasma] plasma/store.h pulls in flatbuffer dependency
  • ARROW-4320 - [C++] Add tests for non-contiguous tensors
  • ARROW-4322 - [C++] Don't use _GLIBCXX_USE_CXX11_ABI=0 anymore in docker scripts
  • ARROW-4323 - [Packaging] Fix failing OSX clang conda forge builds
  • ARROW-4326 - [C++] Development instructions in python/development.rst will not work for many Linux distros with new conda-forge toolchain
  • ARROW-4327 - [Python] Add requirements-build.txt convenience file
  • ARROW-4328 - Add a ARROW_USE_OLD_CXXABI configure var to R
  • ARROW-4329 - Python should include the parquet headers
  • ARROW-4342 - [Gandiva][Java] Ignore flaky test.
  • ARROW-4347 - [CI][Python] Also run Python builds when Java affected.
  • ARROW-4349 - [C++] Add static linking option for benchmarks, fix Windows benchmark build failures
  • ARROW-4351 - [C++] Fix CMake errors when neither building shared libraries nor tests
  • ARROW-4355 - [C++] Reorder testing code into src/arrow/testing
  • ARROW-4360 - [C++] Query homebrew for Thrift
  • ARROW-4364 - [C++] Fix CHECKIN warnings
  • ARROW-4366 - [Docs] Change extension from format/README.md to format/README.rst
  • ARROW-4367 - [C++] StringDictionaryBuilder segfaults on Finish with only null entries
  • ARROW-4368 - [Docs] Fix install document for Ubuntu 16.04 or earlier
  • ARROW-4370 - [Python][Bool] to pandas
  • ARROW-4374 - [C++] DictionaryBuilder does not correctly report length and null_count
  • ARROW-4381 - [CI] Update linter container build instructions
  • ARROW-4382 - [C++] Improve new cpplint output readability
  • ARROW-4384 - [C++] Running “format” target on new Windows 10 install opens “how do you want to open this file” dialog
  • ARROW-4385 - [Packaging] Fix PyArrow version update pattern on release
  • ARROW-4389 - [R] Don't install clang-tools in test job
  • ARROW-4395 - [JS] Fix ts-node error running bin/arrow2csv
  • ARROW-4400 - [CI] Switch to https repo for llvm
  • ARROW-4403 - [Rust] Fix format errors
  • ARROW-4404 - [CI] AppVeyor toolchain build does not build anything
  • ARROW-4407 - [C++] Cache compiler for CMake external projects
  • ARROW-4410 - [C++] Fix edge cases in InvertKernel
  • ARROW-4413 - [Python] Fix pa.hdfs.connect() on Python 2
  • ARROW-4414 - [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros
  • ARROW-4417 - [C++] Fix doxygen build
  • ARROW-4420 - [INTEGRATION] Make spark integration test pass and test against spark's master branch
  • ARROW-4421 - [C++][Flight] Handle large RPC messages in Flight
  • ARROW-4434 - [Python] Allow creating trivial StructArray
  • ARROW-4440 - [C++] Revert recent changes to flatbuffers EP causing flakiness
  • ARROW-4457 - [Python] Allow creating Decimal array from Python ints
  • ARROW-4469 - [CI] Pin conda-forge binutils version to 2.31 for now
  • ARROW-4471 - [C++] Pass AR and RANLIB to all external projects
  • ARROW-4474 - Use signed integers in FlightInfo payload size fields
  • ARROW-4480 - [Python] Drive letter removed when writing parquet file
  • ARROW-4487 - [C++] Appveyor toolchain build does not actually build the project
  • ARROW-4494 - [Java] arrow-jdbc JAR is not uploaded on release
  • ARROW-4496 - [Python] Pin to gfortran<4
  • ARROW-4498 - [Plasma] Fix building Plasma with CUDA enabled
  • ARROW-4500 - [C++] Remove pthread / librt hacks causing linking issues in some Linux environments
  • ARROW-4501 - Fix out-of-bounds read in DoubleCrcHash
  • ARROW-4525 - [Rust][Parquet] Enable conversion of ArrowError to ParquetError
  • ARROW-4527 - [Packaging][Linux] Use LLVM 7
  • ARROW-4532 - [Java] fix bug causing very large varchar value buffers
  • ARROW-4533 - [Python] Document how to run hypothesis tests
  • ARROW-4535 - [C++] Fix MakeBuilder to preserve ListType's field name
  • ARROW-4536 - [GLib] Add data_type argument in garrow_list_array_new
  • ARROW-4538 - [Python] Remove index column from subschema in write_to_dataframe
  • ARROW-4549 - [C++] Can't build benchmark code on CUDA enabled build
  • ARROW-4550 - [JS] Fix AMD pattern
  • ARROW-4559 - [Python] Allow Parquet files with special characters in their names
  • ARROW-4563 - [Python] Validate decimal128() precision input
  • ARROW-4571 - [Format] Tensor.fbs file has multiple root_type declarations
  • ARROW-4573 - [Python] Add Flight unit tests
  • ARROW-4576 - [Python] Fix error during benchmarks
  • ARROW-4577 - [C++] Don't set interface link libs on arrow_shared where there are none
  • ARROW-4581 - [C++] Do not require googletest_ep or gbenchmark_ep for library targets
  • ARROW-4582 - [Python/C++] Acquire the GIL on Py_INCREF
  • ARROW-4584 - [Python] Add built wheel to manylinux1 dockerignore
  • ARROW-4585 - [C++] Add protoc dependency to flight_testing
  • ARROW-4587 - [C++] Fix segfaults around DoPut implementation
  • ARROW-4597 - [C++] Targets for system Google Mock shared library are missing
  • ARROW-4601 - [Python] Add license header to dockerignore
  • ARROW-4606 - [Rust] [DataFusion] FilterRelation created RecordBatch with empty schema
  • ARROW-4608 - [C++] cmake script assumes that double-conversion installs static libs
  • ARROW-4617 - [C++] Support double-conversion<3.1
  • ARROW-4624 - [C++] Fix building benchmarks
  • ARROW-4629 - [Python] Pandas arrow conversion slowed down by imports
  • ARROW-4635 - [Java] allocateNew to use last capacity
  • ARROW-4639 - [CI] Switch off GFLAGS_SHARED for osx
  • ARROW-4641 - [C++][Flight] Suppress strict aliasing warnings from “unsafe” casts in client.cc
  • ARROW-4642 - [R] change f to file in read_parquet_file()
  • ARROW-4653 - [C++] Fix bug in decimal multiply
  • ARROW-4654 - [C++] Explicit flight.cc source dependencies
  • ARROW-4657 - Don't build benchmarks in release verify script
  • ARROW-4658 - [C++] Shared gflags is also a run-time conda requirement
  • ARROW-4659 - [CI] ubuntu/debian nightlies fail because of missing gandiva files
  • ARROW-4660 - [C++] Use set_target_properties for defining GFLAGS_IS_A_DLL
  • ARROW-4664 - [C++] Do not execute expressions inside DCHECK macros in release builds
  • ARROW-4669 - [Java] Add validity checks to slice
  • ARROW-4672 - [CI] Fix clang-7 build entry
  • ARROW-4680 - [CI][Rust] Travis CI builds fail with latest Rust 1.34.0…
  • ARROW-4684 - [Python] CI failures in test_cython.py
  • ARROW-4687 - [Python] Stop Flight server on incoming signals
  • ARROW-4688 - [C++][Parquet] Chunk binary column reads at 2^31 - 1 byte boundaries to avoid splitting chunk inside nested string cell
  • ARROW-4696 - Better CUDA detection in release verification script
  • ARROW-4699 - [C++] remove json chunker's requirement of null terminated buffers
  • ARROW-4704 - [GLib][CI] Ensure killing plasma_store_server
  • ARROW-4710 - [C++][R] New linting script skip files with “cpp” extension
  • ARROW-4712 - [C++][CI] fix build (sum.cc) has warnings in clang
  • ARROW-4721 - [Rust][DataFusion] Propagate schema in filter
  • ARROW-4724 - [C++][CI] Enable Python build and test in MinGW build
  • ARROW-4728 - [JS] Fix Table#assign when passed zero-length RecordBatches
  • ARROW-4737 - run C# tests in CI
  • ARROW-4744 - [C++][CI] Change mingw builds back to debug. Cleanup up some version warnings
  • ARROW-4750 - [C++] RapidJSON triggers Wclass-memaccess on GCC 8+
  • ARROW-4760 - [C++] protobuf 3.7 defines EXPECT_OK that clashes with Arrow's macro
  • ARROW-4766 - [C++] Fix empty array cast segfault
  • ARROW-4767 - [C#] ArrowStreamReader crashes while reading the end of a stream
  • ARROW-4768 - [C++][CI] Don't run flaky tests in MinGW build
  • ARROW-4774 - [C++] Fix FileWriter::WriteTable segfault
  • ARROW-4775 - [Site] Site navbar cannot be expanded
  • ARROW-4783 - [C++][CI] Disable arrow thread-pool test on mingw to avoid appveyor timeouts
  • ARROW-4793 - [Ruby] Suppress unused variable warning
  • ARROW-4796 - [Flight/Python] Keep underlying Python object alive in FlightServerBase.do_get
  • ARROW-4802 - [Python] Follow symlinks when deriving Hadoop classpath for HDFS
  • ARROW-4807 - [Rust] Fix csv_writer benchmark
  • ARROW-4811 - [C++] Fix misbehaving CMake dependency on flight_grpc_gen
  • ARROW-4813 - [Ruby] Add tests for == and !=
  • ARROW-4820 - [Python] hadoop class path derived not correct
  • ARROW-4822 - [C++/Python] Check for None on calls to equals
  • ARROW-4828 - [Python] manylinux1 docker-compose context should be python/manylinux1
  • ARROW-4850 - [CI] Ensure integration_test.py returns non-zero on failures
  • ARROW-4853 - [Rust] Array slice doesn't work on ListArray and StructArray
  • ARROW-4857 - [C++/Python/CI] docker-compose in manylinux1 crossbow jobs too old
  • ARROW-4866 - [C++] Fix zstd_ep build for Debug, static CRT builds. Add separate CMake variable for propagating compiler toolchain to ExternalProjects
  • ARROW-4867 - [Python] Respect ordering of columns argument passed to Table.from_pandas
  • ARROW-4869 - [C++] Fix gmock usage in compute/kernels/util-internal-test.cc
  • ARROW-4870 - [Ruby] Fix mys2_mingw_dependencies
  • ARROW-4871 - [Java/Flight] Handle large Flight messages
  • ARROW-4872 - [Python] Keep backward compatibility for ParquetDatasetPiece
  • ARROW-4879 - [C++] cmake can‘t use conda’s flatbuffers
  • ARROW-4881 - [C++] remove references to ARROW_BUILD_TOOLCHAIN
  • ARROW-4900 - [C++] polyfill __cpuidex on mingw-w64
  • ARROW-4903 - [C++] Fix static/shared-only builds
  • ARROW-4906 - [Format] Write about SparseMatrixIndexCSR format is sorted
  • ARROW-4918 - [C++] Add cmake-format to pre-commit
  • ARROW-4928 - [Python] Fix Hypothesis test failures
  • ARROW-4931 - [C++] CMake fails on gRPC ExternalProject
  • ARROW-4938 - [Glib] Undefined symbols error occurred when GIR file is being generated.
  • ARROW-4942 - [Ruby] Remove needless omits in tests
  • ARROW-4948 - [JS] Nightly test failure
  • ARROW-4950 - [C++] Fix CMake 3.2 build
  • ARROW-4952 - [C++] Floating-point comparisons should consider NaNs unequal
  • ARROW-4953 - [Ruby] Not loading libarrow-glib
  • ARROW-4954 - [Python] Fix test failure with Flight enabled
  • ARROW-4958 - [C++] Parquet benchmarks depend on its static test libs
  • ARROW-4961 - [C++] Add documentation note that GTest_SOURCE=BUNDLED is current required on Windows
  • ARROW-4962 - [C++] Warning level to CHECKIN can't compile on modern GCC
  • ARROW-4976 - [JS] Invalidate RecordBatchReader node/dom streams on reset()
  • ARROW-4982 - [GLib][CI] Run tests on AppVeyor
  • ARROW-4984 - Check if Flight gRPC server starts properly
  • ARROW-4986 - [CI] Travis fails to install llvm@7
  • ARROW-4989 - [C++] Find re2 on Ubuntu if asked to
  • ARROW-4991 - [CI] Bump travis node version to 11.12
  • ARROW-4997 - [C#] ArrowStreamReader doesn‘t consume whole stream and doesn’t implement sync read.
  • ARROW-5009 - [C++] Remove using std::.* where I could find them
  • ARROW-5010 - [Release] Fix source release docker
  • ARROW-5012 - [C++] Install testing headers
  • ARROW-5023 - [Release] Fix default value syntax in 02-source.sh
  • ARROW-5024 - [Release] Fix missing variable with --arrow-version
  • ARROW-5025 - [Python][Packaging] Fix gandiva.dll detection
  • ARROW-5026 - [Python][Packaging] Fix gandiva.dll detection on non Windows
  • ARROW-5029 - [C++] Fix compilation warnings in release mode
  • ARROW-5031 - [Dev] Run CUDA Python tests in release verification script
  • ARROW-5042 - [Release] Use the correct dependency source in verification script
  • ARROW-5043 - [Release][Ruby] Fix dependency error in verification script
  • ARROW-5044 - [Release][Rust] Use stable toolchain for format check in verification script
  • ARROW-5046 - [Release][C++] Exclude fragile Plasma test from verification script
  • ARROW-5047 - [Release] Always set up parquet-testing in verification script
  • ARROW-5048 - [Release][Rust] Set up arrow-testing in verification script
  • ARROW-5050 - [C++] cares_ep should build before grpc_ep
  • ARROW-5087 - [Debian] APT repository no longer contains libarrow-dev
  • ARROW-5658 - [JAVA] Provide ability to resync VectorSchemaRoot if types change
  • PARQUET-1482 - [C++] Add branch to TypedRecordReader::ReadNewPage for …
  • PARQUET-1494 - [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards
  • PARQUET-1532 - [C++] Fix build error with MinGW

New Features and Improvements

  • ARROW-47 - [C++] Preliminary arrow::Scalar object model
  • ARROW-331 - [Doc] Add statement about Python 2.7 compatibility
  • ARROW-549 - [C++] Add arrow::Concatenate function to combine multiple arrays into a single Array
  • ARROW-572 - [C++] Apply visitor pattern in IPC metadata
  • ARROW-585 - [C++] Experimental public API for user-defined extension types and arrays
  • ARROW-694 - [C++] Initial parser interface for reading JSON into RecordBatches
  • ARROW-1425 - [Python][Documentation] Examples of convert Timestamps to/from pandas via arrow
  • ARROW-1572 - [C++] Implement “value counts” kernels for tabulating value frequencies
  • ARROW-1639 - [Python] Serialize RangeIndex as metadata via Table.from_pandas instead of converting to a column of integers
  • ARROW-1642 - [GLib] Build GLib using Meson in Appveyor
  • ARROW-1807 - [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers
  • ARROW-1896 - [C++] Do not allocate memory inside CastKernel. Clean up template instantiation to not generate dead identity cast code
  • ARROW-2015 - [Java] Replace Joda time with Java 8 time
  • ARROW-2022 - [Format] Add metadata to message
  • ARROW-2112 - [C++] Enable cpplint to be run on Windows
  • ARROW-2243 - [C++] Enable IPO/LTO
  • ARROW-2409 - [Rust] Deny warnings in CI.
  • ARROW-2460 - [Rust] Schema and DataType::Struct should use Vec<Rc<Field>>
  • ARROW-2487 - [C++] Provide a variant of AppendValues that takes bytemaps for the nullability
  • ARROW-2523 - [Rust] Implement CAST operations for arrays
  • ARROW-2620 - [Rust] Integrate memory pool abstraction with rest of codebase
  • ARROW-2627 - [Python] Add option to pass memory_map argument to ParquetDataset
  • ARROW-2904 - [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc
  • ARROW-3066 - [Wiki] Add “How to contribute” to developer wiki
  • ARROW-3084 - [Python] Do we need to build both unicode variants of pyarrow wheels?
  • ARROW-3107 - [C++] arrow::PrettyPrint for Column instances
  • ARROW-3121 - [C++] Mean aggregate kernel
  • ARROW-3123 - [C++] Implement Count aggregate kernel
  • ARROW-3135 - [C++] Add helper functions for validity bitmap propagation in kernel context
  • ARROW-3149 - [C++] Use gRPC (when it exists) from conda-forge for CI builds
  • ARROW-3162 - [Python][Flight] Enable implementing Flight servers in Python
  • ARROW-3162 - Flight Python bindings
  • ARROW-3239 - [C++] Implement simple random array generation
  • ARROW-3255 - [C++/Python] Migrate Travis CI jobs off Xcode 6.4
  • ARROW-3289 - [C++] Implement Flight DoPut
  • ARROW-3292 - [C++] Test Flight RPC in Travis CI
  • ARROW-3295 - [Packaging] Package gRPC libraries in conda-forge for use in builds, packaging
  • ARROW-3297 - [Python] Python bindings for Flight C++ client
  • ARROW-3311 - [R] Functions for deserializing IPC components from arrow::Buffer or from IO interface
  • ARROW-3328 - [Flight] Allow for optional unique flight identifier to be sent with FlightGetInfo
  • ARROW-3361 - [R] Also run cpplint on Rcpp source files
  • ARROW-3364 - [Docs] Add docker-compose integration documentation
  • ARROW-3367 - [INTEGRATION] Port Spark integration test to the docker-compose setup
  • ARROW-3422 - [C++] Uniformly add ExternalProject builds to the “toolchain” target. Fix gRPC EP build on Linux
  • ARROW-3434 - [Packaging] Add Apache ORC C++ library to conda-forge
  • ARROW-3435 - [C++] Add option to use dynamic linking with re2
  • ARROW-3511 - [Gandiva] Link filter and project operations
  • ARROW-3532 - [Python] Emit warning when looking up for duplicate struct or schema fields
  • ARROW-3550 - [C++] use kUnknownNullCount for the default null_count argument
  • ARROW-3554 - [C++] Reverse traits for C++
  • ARROW-3594 - [Packaging] Build “cares” library in conda-forge
  • ARROW-3595 - [Packaging] Build boringssl in conda-forge
  • ARROW-3596 - [Packaging] Build gRPC in conda-forge
  • ARROW-3619 - [R] Expose global thread pool optins
  • ARROW-3631 - [C#] Add Appveyor configuration
  • ARROW-3653 - [C++][Python] Support data copying between different GPU devices
  • ARROW-3735 - [Python] Add test for calling cast() with None
  • ARROW-3761 - [R] Bindings for CompressedInputStream, CompressedOutputStream
  • ARROW-3763 - [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
  • ARROW-3769 - [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
  • ARROW-3770 - [C++] Validate schema for each table written with parquet::arrow::FileWriter
  • ARROW-3816 - [R] nrow.RecordBatch method
  • ARROW-3824 - [R] Add basic build and test documentation
  • ARROW-3838 - [Rust] CSV Writer
  • ARROW-3846 - [Gandiva][C++] Build Gandiva C++ libraries and get unit tests passing on Windows
  • ARROW-3882 - [Rust] Cast Kernel for most types
  • ARROW-3903 - [Python] Random array generator for Arrow conversion and Parquet testing
  • ARROW-3926 - [Python] Add Gandiva bindings to Python manylinux1 wheels
  • ARROW-3951 - [Go] implement a CSV writer
  • ARROW-3954 - [Rust] Add Slice to Array and ArrayData
  • ARROW-3965 - [Java] JDBC-To-Arrow Configuration
  • ARROW-3966 - [Java] JDBC Column Metadata in Arrow Field Metadata
  • ARROW-3972 - [C++] Migrate to LLVM 7. Add option to disable using ld.gold
  • ARROW-3981 - [C++] Rename json.h
  • ARROW-3985 - [C++] Let ccache preserve comments
  • ARROW-4012 - [Website] Add documentation how to install Apache Arrow on MSYS2
  • ARROW-4014 - [C++] Fix “LIBCMT” warnings on MSVC
  • ARROW-4023 - [Gandiva] Address long CI times in macOS builds
  • ARROW-4024 - [Python] Raise minimal Cython version to 0.29
  • ARROW-4031 - [C++] Refactor bitmap building
  • ARROW-4040 - [Rust] Add array_ops method for filtering an array
  • ARROW-4056 - [C++] Unpin boost-cpp in conda_env_cpp.yml
  • ARROW-4061 - [Rust][Parquet] Implement spaced version for non-diction…
  • ARROW-4068 - [Gandiva] Support building with Xcode 6.4
  • ARROW-4071 - [Rust] Add rustfmt as a pre-commit hook
  • ARROW-4072 - [Rust] Set default value for PARQUET_TEST_DATA
  • ARROW-4092 - [Rust] Implement common Reader / DataSource trait for CSV and Parquet
  • ARROW-4094 - [Python] Store RangeIndex in Parquet files as metadata rather than a physical data column
  • ARROW-4110 - [C++] Do not generate distinct cast kernels when input and output type are the same
  • ARROW-4123 - [C++] Enable linting tools to be run on Windows
  • ARROW-4124 - [C++] Draft Aggregate and Sum kernels
  • ARROW-4142 - [Java] JDBC Array -> Arrow ListVector
  • ARROW-4165 - [C++] Port cpp/apidoc/Windows.md and other files to Sphinx / rst
  • ARROW-4180 - [Java] Make CI tests use logback.xml
  • ARROW-4196 - [Rust] Add explicit SIMD vectorization for arithmetic ops in “array_ops”
  • ARROW-4198 - [Gandiva] Added support to cast timestamp
  • ARROW-4204 - [Gandiva] add support for decimal subtract
  • ARROW-4205 - [Gandiva] Support for decimal multiply
  • ARROW-4206 - [Gandiva] support decimal divide and mod
  • ARROW-4212 - [C++][Python] CudaBuffer view of arbitrary device memory object
  • ARROW-4230 - [C++] Fix Flight builds with gRPC/Protobuf/c-ares
  • ARROW-4232 - [C++] Follow conda-forge compiler ABI migration
  • ARROW-4234 - [C++] Improve memory bandwidth test
  • ARROW-4235 - [GLib] Use “column_builder” in GArrowRecordBatchBuilder
  • ARROW-4236 - [java] Distinct plasma client create exceptions
  • ARROW-4245 - [Rust] Add Rustdoc header to source files
  • ARROW-4247 - [Packaging] Update verify script for 0.12.0
  • ARROW-4251 - [C++][Release] Add option to set ARROW_BOOST_VENDORED environment variable in verify-release-candidate.sh
  • ARROW-4262 - [Website] Preview to Spark with Arrow and R improvements
  • ARROW-4263 - [Rust] Donate DataFusion
  • ARROW-4265 - [C++] Automatic conversion between Table and std::vector<std::tuple<..>>
  • ARROW-4268 - [C++] Native C type TypeTraits
  • ARROW-4271 - [Rust] Move Parquet specific info to Parquet Readme
  • ARROW-4273 - [Release] Fix verification script to use cf201901 conda-forge label
  • ARROW-4277 - [C++] Add gmock to the toolchain
  • ARROW-4281 - [CI] Use Ubuntu Xenial VMs on Travis-CI
  • ARROW-4285 - [Python] Use proper builder interface for serialization
  • ARROW-4287 - [C++] Ensure minimal bison version on OSX for Thrift
  • ARROW-4289 - [C++] Forward AR and RANLIB to thirdparty builds
  • ARROW-4290 - [C++/Gandiva] Support detecting correct LLVM version in Homebrew
  • ARROW-4291 - [Dev] Support selecting features in release verification scripts
  • ARROW-4294 - [C++][Plasma] Add support for evicting Plasma objects to external store
  • ARROW-4297 - [C++] Fix build error with MinGW-w64 32-bit
  • ARROW-4298 - [Java] Add javax.annotation-api dependency for JDK >= 9
  • ARROW-4299 - [Ruby] Depend on the same version as Red Arrow
  • ARROW-4300 - [C++] Restore apache-arrow Homebrew recipe and define process for maintaining and updating for releases
  • ARROW-4303 - [Gandiva/Python] Build LLVM with RTTI in manylinux1 container
  • ARROW-4305 - [Rust] Fix parquet version number in README
  • ARROW-4307 - [C++] Fix Doxygen warnings
  • ARROW-4310 - [Website] Update install document for 0.12.0
  • ARROW-4313 - Define general benchmark database schema
  • ARROW-4315 - [Website] Add Go and Rust to list of supported languages
  • ARROW-4318 - [C++] Add Tensor::CountNonZero
  • ARROW-4321 - [CI] Setup conda-forge channel globally in docker containers
  • ARROW-4330 - [C++] More robust discovery of pthreads
  • ARROW-4331 - [C++] Extend Scalar Datum to support more types
  • ARROW-4332 - [Website] Improve documentation for publishing site
  • ARROW-4334 - [CI] Setup conda-forge channel globally in travis builds
  • ARROW-4335 - [C++] Better document sparse tensor support
  • ARROW-4336 - [C++] Change default build type to RELEASE
  • ARROW-4339 - [C++][Python] Developer documentation overhaul for 0.13 release
  • ARROW-4340 - [C++][CI] Build IWYU for LLVM 7 in iwyu docker-compose job
  • ARROW-4341 - [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder
  • ARROW-4344 - [Java] Further cleanup mvn output, upgrade rat plugin
  • ARROW-4345 - [C++] Add Apache 2.0 license file to the Parquet-testing repository
  • ARROW-4346 - [C++] Fix class-memaccess warning on gcc 8.x
  • ARROW-4352 - [C++] Add support for system Google Test
  • ARROW-4353 - [CI] Add MinGW builds
  • ARROW-4358 - [CI] Restore support for trusty in CI
  • ARROW-4361 - [Website] Update commiters list
  • ARROW-4362 - [Java] Test OpenJDK 11 in CI
  • ARROW-4363 - [CI][C++] Add CMake format checks
  • ARROW-4372 - [C++] Embed precompiled bitcode in the gandiva library
  • ARROW-4373 - [Packaging] Travis fails to deploy conda packages on OSX
  • ARROW-4375 - [CI] Sphinx dependencies were removed from docs conda environment
  • ARROW-4376 - [Rust] Implement from_buf_reader for csv::Reader
  • ARROW-4377 - [Rust] Implement std::fmt::Debug for PrimitiveArrays
  • ARROW-4379 - [Python] Register serializers for collections.Counter and collections.deque.
  • ARROW-4383 - [C++] Use the CMake's standard find features
  • ARROW-4386 - [Rust] Temporal array support
  • ARROW-4388 - [Go] add DimNames() method to tensor Interface
  • ARROW-4393 - [Rust] coding style: apply 90 characters per line limit
  • ARROW-4396 - [JS] Update Typedoc for TypeScript 3.2
  • ARROW-4397 - [C++] Add dim_names in Tensor and SparseTensor
  • ARROW-4399 - [C++] Do not use extern template class with NumericArray and NumericTensor
  • ARROW-4401 - [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency
  • ARROW-4406 - [Python] Exclude HDFS directories in S3 from ParquetManifest
  • ARROW-4408 - [CPP/Doc] Remove outdated Parquet documentation
  • ARROW-4422 - [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
  • ARROW-4423 - [C++] Upgrade vendored gmock/gtest to 1.8.1
  • ARROW-4424 - [Python] Install tensorflow and keras-preprocessing in manylinux1 container
  • ARROW-4425 - Add link to ‘Contributing’ page in the top-level Arrow README
  • ARROW-4430 - [C++] Fix untested TypedByteBuffer::Append method
  • ARROW-4431 - [C++] Fixes for gRPC vendored builds
  • ARROW-4435 - Minor fixups to csharp .sln and .csproj file
  • ARROW-4436 - [Documentation] Update building.rst to reflect pyarrow req
  • ARROW-4442 - [JS] Add explicit type annotation to Chunked typeId getter
  • ARROW-4444 - [Testing] Add DataFusion test files to arrow-testing repo
  • ARROW-4445 - [C++][Gandiva] Run Gandiva-LLVM tests in Appveyor
  • ARROW-4446 - [C++][Python] Run Gandiva C++ unit tests in Appveyor, get build and tests working in Python
  • ARROW-4448 - [Java][Flight] Disable flaky TestBackPressure
  • ARROW-4449 - [Rust] Convert File to T: Read + Seek for schema inference
  • ARROW-4454 - [C++] fix unused parameter warnings
  • ARROW-4455 - [Plasma] Suppress class-memaccess warnings
  • ARROW-4459 - [Testing] Add arrow-testing repo as submodule
  • ARROW-4460 - [Website] DataFusion Blog Post
  • ARROW-4461 - [C++] Expose bit map operations that work with raw pointers
  • ARROW-4462 - [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
  • ARROW-4464 - [Rust][DataFusion] Add support for LIMIT
  • ARROW-4466 - [Rust][DataFusion] Add support for Parquet data source
  • ARROW-4468 - [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) (#3571)
  • ARROW-4472 - [Website][Python] Blog post about string memory use work in Arrow 0.12
  • ARROW-4475 - [Python] Fix recursive serialization of self-containing objects
  • ARROW-4476 - [Rust][DataFusion] Update README to cover DataFusion and new testing git submodule
  • ARROW-4481 - [Website] Remove generated specification docs from site after docs migration
  • ARROW-4483 - [Website] Add myself to contributors.yaml to fix broken link in blog post
  • ARROW-4485 - [CI] Determine maintenance approach to pinned conda-forge binutils package
  • ARROW-4486 - [Python][CUDA] Add base argument to foreign_buffer
  • ARROW-4488 - [Rust][u8] > for Buffer does not ensure correct padding
  • ARROW-4489 - [Rust] PrimitiveArray.value_slice performs bounds checking when it should not
  • ARROW-4490 - [Rust] Add explicit SIMD vectorization for boolean ops in “array_ops”
  • ARROW-4491 - [Python] Use StringConverter and stringstream instead of std::stoi and std::to_string
  • ARROW-4499 - [CI] Unpin flake8 in lint script, fix warnings in dev/
  • ARROW-4502 - [C#] Add support for zero-copy reads
  • ARROW-4506 - [Ruby] Add Arrow::RecordBatch#raw_records
  • ARROW-4513 - [Rust] Implement BitAnd/BitOr for &Bitmap
  • ARROW-4517 - [JS] remove version number as it is not used
  • ARROW-4518 - [JS] add jsdelivr to package.json
  • ARROW-4528 - [C++] Update lint docker container to LLVM-7
  • ARROW-4529 - [C++] Add test for BitUtil::RoundDown
  • ARROW-4531 - [C++] Support slices for SumKernel
  • ARROW-4537 - [CI] Suppress shell warning on travis-ci
  • ARROW-4539 - [Java] Fix child vector count for lists. (#3625)
  • ARROW-4540 - [Rust] Basic JSON reader
  • ARROW-4543 - [C#] Update Flat Buffers code to latest version
  • ARROW-4546 - Update LICENSE.txt with parquet-cpp licenses
  • ARROW-4547 - [Python][Documentation] Update python/development.rst with instructions for CUDA-enabled builds
  • ARROW-4556 - [Rust] Preserve JSON field order when inferring schema
  • ARROW-4558 - [C++][Flight] Implement gRPC customizations without UB
  • ARROW-4560 - [R] array() needs to take single input, not ...
  • ARROW-4562 - [C++] Avoid copies when serializing Flight data
  • ARROW-4564 - [C++] IWYU docker image silently fails
  • ARROW-4565 - [R] Fix decimal record batches with no null values
  • ARROW-4568 - [C++] Add version macros to headers
  • ARROW-4572 - [C++] Remove memory zeroing from PrimitiveAllocatingUnaryKernel
  • ARROW-4583 - [Plasma] Fix some small bugs reported by code scan tool
  • ARROW-4586 - [Rust] Remove arrow/mod.rs as it is not needed
  • ARROW-4589 - [Rust] Projection push down query optimizer rule
  • ARROW-4590 - [Rust] Add explicit SIMD vectorization for comparison ops in “array_ops”
  • ARROW-4592 - [GLib] Stop configure immediately when GLib isn't available
  • ARROW-4593 - [Ruby][out_of_range] returns nil
  • ARROW-4594 - [Ruby] returns Arrow::Struct instead of Arrow::Array
  • ARROW-4595 - [Rust] Implement Table API (a.k.a DataFrame)
  • ARROW-4598 - [CI] Remove needless LLVM_DIR for macOS
  • ARROW-4599 - [C++] Add support for system GFlags
  • ARROW-4602 - [Rust][DataFusion] Integrate query optimizer with ExecutionContext
  • ARROW-4603 - [Rust] [DataFusion] Execution context should allow in-memory data sources to be registered
  • ARROW-4604 - [Rust] [DataFusion] Add benchmarks for SQL query execution
  • ARROW-4605 - [Rust] Move filter and limit code from DataFusion into compute module
  • ARROW-4609 - [C++] Use google benchmark from toolchain
  • ARROW-4610 - [Plasma] Avoid Crash in Plasma Java Client
  • ARROW-4611 - [C++] Rework CMake logic
  • ARROW-4612 - [Python] Use cython from PyPI for windows wheels build
  • ARROW-4613 - [C++] Set CMAKE_INSTALL_LIBDIR in gtest thirdparty build
  • ARROW-4614 - [C++/CI] Activate flight build in ci/docker_build_cpp.sh
  • ARROW-4615 - [C++] Add checked_pointer_cast
  • ARROW-4616 - [C++] Log message in BuildUtils as STATUS
  • ARROW-4618 - [Docker] Makefile to build dependent docker images
  • ARROW-4619 - [R] Fix the autobrew script
  • ARROW-4620 - [C#] Add unit tests for “Types” in arrow/csharp
  • ARROW-4623 - [R] update Rcpp version
  • ARROW-4628 - [Rust][DataFusion] Implement type coercion query optimizer rule
  • ARROW-4632 - [Ruby] Add BigDecimal#to_arrow
  • ARROW-4634 - [Rust][Parquet] Reorganize test_common
  • ARROW-4637 - [Python] Conditionally import pandas symbols if they are used. Do not require pandas as a test dependency
  • ARROW-4638 - [R] install instructions using brew
  • ARROW-4640 - [Python] Add docker-compose configuration to build and test the project without pandas installed
  • ARROW-4643 - [C++] Force compiler diagnostic colors
  • ARROW-4644 - [C++/Docker] Build Gandiva in the docker containers
  • ARROW-4645 - [C++/Packaging] Ship Gandiva with OSX and Windows wheels
  • ARROW-4646 - [C++/Packaging] Ship gandiva with the conda-forge packages
  • ARROW-4655 - [Packaging] Parallelize binary upload
  • ARROW-4662 - [Python] Add support of type_codes in UnionType
  • ARROW-4667 - [C++] Suppress unused function warnings with MinGW
  • ARROW-4670 - [Rust] array_ops::sum performance optimizations
  • ARROW-4671 - [C++] MakeBuilder doesn't support Type::DICTIONARY
  • ARROW-4673 - [C++] Implement Scalar::Equals and Datum::Equals
  • ARROW-4676 - [C++] Add support for debug build with MinGW
  • ARROW-4678 - [Rust] Minimize unstable feature usage
  • ARROW-4679 - [Rust] Implement in-memory data source for DataFusion
  • ARROW-4681 - [Rust][DataFusion] Partition aware data sources
  • ARROW-4686 - [Dev] Only accept ‘y’ or ‘n’ in merge_arrow_pr.py prompts
  • ARROW-4689 - [Go] Add support for wasm
  • ARROW-4690 - Building TensorFlow compatible wheels for Arrow
  • ARROW-4692 - [Flight] Explain sidecar in a bit more detail
  • ARROW-4693 - [CI] Build boost with multiprecision
  • ARROW-4697 - [C++] Add URI parsing facility
  • ARROW-4703 - [C++] Upgrade dependency versions
  • ARROW-4705 - [Rust] Improve error handling in csv reader
  • ARROW-4707 - [C++] moving BitsetStack to BitUtil::
  • ARROW-4718 - [C#] Add ArrowStreamReader/Writer ctor with bool leaveOpen
  • ARROW-4727 - [Rust] Add equality check for schemas
  • ARROW-4730 - [C++] Add docker-compose entry for testing Fedora build with system packages
  • ARROW-4731 - [C++] Add docker-compose entry for testing Ubuntu Xenial build with system packages
  • ARROW-4732 - [C++] Add docker-compose entry for testing Debian Testing build with system packages
  • ARROW-4733 - [C++] Add CI entry that builds without the conda-forge toolchain but with system packages
  • ARROW-4734 - [Go] Add option to write a header for CSV writer
  • ARROW-4735 - [Go] Optimize CSV writer CPU/Mem performances
  • ARROW-4739 - [Rust] LogicalPlan can now be passed to threads
  • ARROW-4740 - [Java] Upgrade to JUnit 5.
  • ARROW-4743 - [Java] Add javadoc missing in classes and methods in java…
  • ARROW-4745 - [C++][Documentation] Document notes from replicating Static_Crt_Build on windows
  • ARROW-4749 - [Rust] Return Result for RecordBatch::new()
  • ARROW-4751 - [C++] Add pkg-config to conda_env_cpp.yml now that it's available on Windows
  • ARROW-4754 - [Java] Randomize port and retry binding server when bind fails
  • ARROW-4756 - Update readme for triggering docker builds
  • ARROW-4758 - [C++][Flight] Fix intermittent build failure
  • ARROW-4769 - [Rust] Improve array limit fn where max_records >= len
  • ARROW-4772 - [C++] new ORC adapter interface for stripe and row iteration
  • ARROW-4776 - [C++] Add DictionaryBuilder constructor which takes a dictionary array
  • ARROW-4777 - [C++/Python] manylinux1: Update lz4 to 1.8.3
  • ARROW-4778 - [C++/Python] manylinux1: Update Thrift to 0.12.0
  • ARROW-4782 - [C++] Prototype array and scalar expression types to help with building an deferred compute graph
  • ARROW-4786 - [C++/Python] Support better parallelisation in manylinux1 base build
  • ARROW-4789 - [C++] Deprecate and and later remove arrow::io::ReadableFileInterface
  • ARROW-4790 - [Python/Packaging] Update manylinux docker image in crossbow task
  • ARROW-4791 - [Rust] Remove unused dependencies
  • ARROW-4794 - [Python] Make pandas an optional test dependency
  • ARROW-4797 - [Plasma] Allow client to check store capacity and avoid server crash
  • ARROW-4801 - [GLib] Suppress Meson warnings
  • ARROW-4808 - [Java][Vector] More util methods to set decimal vector.
  • ARROW-4812 - [Rust] [DataFusion] Table.scan() should return one iterator per partition
  • ARROW-4817 - [Rust] [DataFusion] Small re-org of modules
  • ARROW-4818 - [Rust] [DataFusion] Parquet data source does not support null values
  • ARROW-4826 - [Go] export Flush method for CSV writer
  • ARROW-4831 - [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency
  • ARROW-4833 - [Release] Document how to update the brew formula in the release management guide
  • ARROW-4834 - [R] Feature flag when building parquet
  • ARROW-4835 - [GLib] Add boolean operations
  • ARROW-4837 - [C++] Support c++filt on a custom path in the run-test.sh script
  • ARROW-4839 - [C#] Add NuGet package metadata and instructions.
  • ARROW-4843 - [Rust] [DataFusion] Parquet data source should support DATE
  • ARROW-4846 - [Java] Upgrade to jackson 2.9.8
  • ARROW-4849 - [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages
  • ARROW-4854 - [Rust] Use zero-copy slice for limit kernel
  • ARROW-4855 - [Packaging] Generate default package version based on cpp tags in crossbow.py
  • ARROW-4858 - [Flight/Python] enable FlightDataStream to be implemented in Python
  • ARROW-4859 - [GLib] Add garrow_numeric_array_mean()
  • ARROW-4862 - [C++] Fix gcc warnings in CHECKIN
  • ARROW-4862 - [GLib] Add GArrowCastOptions::allow-invalid-utf8 property
  • ARROW-4865 - [Rust] Support list casts
  • ARROW-4873 - [C++] Clarify documentation about how to use external ARROW_PACKAGE_PREFIX while also using CONDA dependency resolution
  • ARROW-4878 - [C++] Append \Library to CONDA_PREFIX when using ARROW_DEPENDENCY_SOURCE=CONDA
  • ARROW-4882 - [GLib] Add sum functions
  • ARROW-4887 - [GLib] Add garrow_array_count()
  • ARROW-4889 - [C++] Add STATUS messages for Protobuf in CMake
  • ARROW-4891 - [C++] Add zlib headers to include directories
  • ARROW-4892 - [Rust][DataFusion] Move SQL parser and planner into SQL module
  • ARROW-4893 - [C++] conda packages should use inside of conda-build
  • ARROW-4894 - [Rust][DataFusion] Remove all uses of panic! from aggregate.rs
  • ARROW-4895 - [Rust][DataFusion] Move error.rs to root of crate
  • ARROW-4896 - [Rust][DataFusion] Remove all uses of panic! from DataFusion tests
  • ARROW-4897 - [Rust][DataFusion] Improve rustdocs
  • ARROW-4898 - [C++] Old versions of FindProtobuf.cmake use ALL-CAPS for variables
  • ARROW-4899 - [Rust][DataFusion] Remove panic from expression.rs
  • ARROW-4901 - [Go] add AppVeyor CI
  • ARROW-4905 - [C++][Plasma] Remove dlmalloc symbols from client library
  • ARROW-4907 - [CI] Add docker container to inspect docker context
  • ARROW-4908 - [Rust][DataFusion] Add support for date/time parquet types encoded as INT32/INT64
  • ARROW-4909 - [CI] Use hadolint to lint Dockerfiles
  • ARROW-4910 - [Rust][DataFusion] Remove all uses of unimplemented!
  • ARROW-4915 - [GLib][C++] Add arrow::NullBuilder support for GLib
  • ARROW-4922 - [Packaging] Use system libraries for .deb and .rpm
  • ARROW-4924 - [Ruby] Add Decimal128#to_s(scale=nil)
  • ARROW-4925 - [Rust] [DataFusion] Remove duplicate implementations of collect_expr
  • ARROW-4926 - [Rust][DataFusion] Update README for 0.13.0
  • ARROW-4929 - [GLib] Add garrow_array_count_values()
  • ARROW-4932 - [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
  • ARROW-4933 - [R] Autodetect Parquet support using pkg-config
  • ARROW-4937 - [R] Clean pkg-config related logic
  • ARROW-4939 - [Python] Add wrapper for “sum” kernel
  • ARROW-4940 - [Rust] Enable warnings for missing docs, add docs in datafusion
  • ARROW-4944 - [C++] Raise minimal required thrift-cpp to 0.11 in conda environment
  • ARROW-4946 - [C++] Support detection of flatbuffers without FlatbuffersConfig.cmake
  • ARROW-4947 - [Flight/C++] Remove redundant schema parameter to Flight client DoGet
  • ARROW-4951 - [C++] Turn off cpp benchmarks in cpp docker images
  • ARROW-4955 - [GLib] Add garrow_file_is_closed()
  • ARROW-4964 - [Ruby] Add closed check if available on auto close
  • ARROW-4969 - [C++] Set RPATH in correct order for test executables on OSX
  • ARROW-4977 - [Ruby] Add support for building on Windows
  • ARROW-4978 - [Ruby] Fix wrong internal variable name for table data
  • ARROW-4979 - [GLib] Add missing lock to garrow::GIOInputStream
  • ARROW-4980 - [GLib] Use GInputStream as the parent of GArrowInputStream
  • ARROW-4981 - [Ruby] Add support for CSV data encoding conversion
  • ARROW-4983 - [Plasma] Unmap memory upon destruction of the PlasmaClient
  • ARROW-4994 - [Website] Update details for ptgoetz
  • ARROW-4995 - [R] Support for winbuilder for CRAN checks
  • ARROW-4996 - [Plasma] Enable uninstalling of signal handler and fix log_dir
  • ARROW-5003 - [R] remove dependency on withr
  • ARROW-5006 - [R] parquet.cpp does not include enough Rcpp
  • ARROW-5011 - [Release] Add support in source release script for custom git hash
  • ARROW-5013 - [Rust][DataFusion] Refactor runtime expression support
  • ARROW-5014 - [Java] Fix typos in Flight module
  • ARROW-5018 - [Release] Include JavaScript implementation
  • ARROW-5032 - [C++] Install headers in vendored/datetime directory
  • ARROW-5041 - [C++] add GTest_SOURCE=BUNDLED to verify-release-candidate.bat
  • ARROW-5075 - [Release] Add 0.13.0 release note
  • ARROW-5084 - [Website] Blog post / release announcement for 0.13.0
  • PARQUET-1477 - [C++] sync thrift to final crypto spec
  • PARQUET-1508 - [C++] Read ByteArray data directly into arrow::BinaryBuilder and BinaryDictionaryBuilder. Refactor encoders/decoders to use cleaner virtual interfaces
  • PARQUET-1519 - [C++] Hide TypedColumnReader implementation behind virtual interfaces, remove use of “extern template class”
  • PARQUET-1521 - [C++] Use pure virtual interfaces for parquet::TypedColumnWriter, remove use of ‘extern template class’
  • PARQUET-1525 - [C++] remove dependency on getopt in parquet tools

Apache Arrow 0.12.1 (2019-02-25)

Bug Fixes

  • ARROW-3564 - [C++] Fix dictionary encoding logic for Parquet 2.0
  • ARROW-4255 - [C++] Eagerly initialize name_to_index_ to avoid race
  • ARROW-4267 - [C++/Parquet] Handle duplicate and struct columns in RowGroup reads
  • ARROW-4323 - [Packaging] Fix failing OSX clang conda forge builds
  • ARROW-4367 - [C++] StringDictionaryBuilder segfaults on Finish with only null entries
  • ARROW-4374 - [C++] DictionaryBuilder does not correctly report length and null_count
  • ARROW-4492 - [Python] Failure reading Parquet column as pandas Categorical in 0.12
  • ARROW-4501 - Fix out-of-bounds read in DoubleCrcHash
  • ARROW-4582 - [Python/C++] Acquire the GIL on Py_INCREF
  • ARROW-4629 - [Python] Pandas arrow conversion slowed down by imports
  • ARROW-4636 - [Python/Packaging] Crossbow builds for conda-osx fail on upload with Ruby linkage errors
  • ARROW-4647 - [Packaging] dev/release/00-prepare.sh fails for minor version changes

New Features and Improvements

  • ARROW-4291 - [Dev] Support selecting features in release verification scripts
  • ARROW-4298 - [Java] Add javax.annotation-api dependency for JDK >= 9
  • ARROW-4373 - [Packaging] Travis fails to deploy conda packages on OSX

Apache Arrow 0.12.0 (2019-01-20)

New Features and Improvements

  • ARROW-45 - [Python] Add unnest/flatten function for List types
  • ARROW-536 - [C++] Provide non-SSE4 versions of functions that use CPU intrinsics for older processors
  • ARROW-554 - [C++] Add functions to unify dictionary types and arrays
  • ARROW-766 - [C++] Introduce zero-copy “StringPiece” type
  • ARROW-854 - [Format] Add tentative SparseTensor format
  • ARROW-912 - [Python] Recommend that Python developers use -DCMAKE_INSTALL_LIBDIR=lib when building Arrow C++ libraries
  • ARROW-1019 - [C++] Implement compressed streams
  • ARROW-1055 - [C++] GPU support library development
  • ARROW-1262 - [Packaging] Packaging automation in arrow-dist
  • ARROW-1423 - [C++] Create non-owned CudaContext from context handle provided by thirdparty user
  • ARROW-1492 - [C++] Type casting function kernel suite
  • ARROW-1688 - [Java] Fail build on checkstyle warnings
  • ARROW-1696 - [C++] Add (de)compression benchmarks
  • ARROW-1822 - [C++] Add SSE4.2-accelerated hash kernels and use if host CPU supports
  • ARROW-1993 - [Python] Add function for determining implied Arrow schema from pandas.DataFrame
  • ARROW-1994 - [Python] Test against Pandas master
  • ARROW-2183 - [C++] Add helper CMake function for globbing the right header files
  • ARROW-2211 - [C++] Use simpler hash functions for integers
  • ARROW-2216 - [CI] CI descriptions and envars are misleading
  • ARROW-2337 - Use Boost shared libraries in Windows release verification script. Parquet fixes
  • ARROW-2374 - [Rust] Add support for array of List<T>
  • ARROW-2475 - [Format] Confusing array length description
  • ARROW-2476 - [Python/Question] Maximum length of an Array created from ndarray
  • ARROW-2483 - [Rust] use bit-packing for boolean vectors
  • ARROW-2504 - [Website] Add ApacheCon NA link
  • ARROW-2535 - [Python] Provide pre-commit hooks that check flake8
  • ARROW-2560 - [Rust] The Rust README should include Rust-specific information on contributing
  • ARROW-2624 - [Python] Random schema generator for Arrow conversion and Parquet testing
  • ARROW-2637 - [C++/Python] Build support and instructions for development on Alpine Linux
  • ARROW-2648 - [Packaging] Follow up packaging tasks
  • ARROW-2653 - [C++] Refactor hash table support
  • ARROW-2670 - [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
  • ARROW-2673 - [Python] Add documentation + docstring for ARROW-2661
  • ARROW-2684 - [Python] Various documentation improvements
  • ARROW-2712 - [C#] Initial C# .NET library
  • ARROW-2720 - [C++] Defer setting of -std=c++11 compiler option to CMAKE_CXX_STANDARD, use CMake option for -fPIC
  • ARROW-2759 - [Plasma] Export plasma notification socket
  • ARROW-2803 - [C++] Put hashing function into src/arrow/util
  • ARROW-2807 - [Python][Parquet] Add memory_map= option to parquet.read_table, read_pandas, read_schema
  • ARROW-2808 - [Python] Add MemoryPool tests
  • ARROW-2919 - [C++/Python] Improve HdfsFile error messages, fix Python unit test suite
  • ARROW-2968 - [R] Multi-threaded conversion from Arrow table to R data.frame
  • ARROW-2995 - [CI] Build Python libraries in same run when running C++ unit tests so project does not need to be rebuilt again right away
  • ARROW-3020 - [C++/Python] Allow empty arrow::Table objects to be written as empty Parquet row groups
  • ARROW-3038 - [Go] implement String array
  • ARROW-3063 - [Go] remove list of TODOs from go/README
  • ARROW-3070 - [Packaging] Use Bintray
  • ARROW-3108 - [C++] arrow::PrettyPrint for Table instances
  • ARROW-3126 - [Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions
  • ARROW-3131 - [Go] add Go1.11 to the build matrix
  • ARROW-3161 - [Packaging] Ensure to run pyarrow unit tests in conda and wheel builds
  • ARROW-3169 - [C++] Break up array-test into multiple compilation units
  • ARROW-3184 - [C++] Enable modular builds and installs with ARROW_OPTIONAL_INSTALL option. Remove ARROW_GANDIVA_BUILD_TESTS
  • ARROW-3194 - [JAVA] Use split length in splitAndTransfer to set value count
  • ARROW-3199 - [Plasma] File descriptor send and receive retries
  • ARROW-3209 - [C++] Rename libarrow_gpu to libarrow_cuda
  • ARROW-3230 - [Python] Missing comparisons on ChunkedArray, Table
  • ARROW-3233 - [Python] Add prose documentation for CUDA support
  • ARROW-3248 - [C++] Add “arrow” prefix to Arrow core unit tests, use PREFIX instead of file name for csv, io, ipc tests. Modular target cleanup
  • ARROW-3254 - [C++] Add option to ADD_ARROW_TEST to compose a test executable from multiple .cc files containing unit tests
  • ARROW-3260 - [CI][skip appveyor]
  • ARROW-3272 - [Java][Docs] Add documentation about Java code style
  • ARROW-3273 - [Java] Fix checkstyle for Javadocs
  • ARROW-3278 - [Python] Retrieve StructType‘s and StructArray’s field by name
  • ARROW-3291 - [C++] Add string_view-based constructor for BufferReader
  • ARROW-3293 - [C++] Test Flight RPC in Travis CI
  • ARROW-3296 - [Python] Add Flight support to manylinux1 wheels
  • ARROW-3303 - [C++] API for creating arrays from simple JSON string
  • ARROW-3306 - [R] Objects and support functions different kinds of arrow::Buffer
  • ARROW-3307 - [R] Convert chunked arrow::Column to R vector
  • ARROW-3310 - [R] Create wrapper classes for various Arrow IO interfaces
  • ARROW-3312 - [R] Use same .clang-format file for both R binding C++ code and main C++ codebase
  • ARROW-3315 - [R] Support for multi-threaded conversions from RecordBatch, Table to R data.frame
  • ARROW-3318 - [C++] Push down read-all-batches operation on RecordBatchReader into C++
  • ARROW-3323 - [Java] Fix checkstyle naming
  • ARROW-3331 - [Gandiva][C++] Add re2 to toolchain
  • ARROW-3340 - [R] support for dates and time classes
  • ARROW-3347 - [Rust] Implement PrimitiveArrayBuilder
  • ARROW-3353 - [Packaging] Build python 3.7 wheels
  • ARROW-3355 - [R] Support for factors
  • ARROW-3358 - [Gandiva][C++] Deprecate Gandiva Status.
  • ARROW-3362 - [R] Guard against null buffers
  • ARROW-3366 - [R] Dockerfile for docker-compose setup
  • ARROW-3368 - [Integration/CI/Python] Add dask integration test to docker-compose setup
  • ARROW-3380 - [Python] Support reading gzipped CSV files
  • ARROW-3381 - [C++] Add bz2 codec
  • ARROW-3383 - [Gandiva][Java] Fix java build
  • ARROW-3384 - [Gandiva] Sync remaining commits from gandiva repo
  • ARROW-3385 - [Gandiva][C++][Java] Crossbow support for deploying gandiva jars
  • ARROW-3387 - [C++] Implement Binary to String cast
  • ARROW-3398 - [Rust] Update existing Builder to use MutableBuffer internally
  • ARROW-3402 - [Gandiva][C++] Utilize common bitmap operation implementations in precompiled IR routines
  • ARROW-3407 - [C++] Add UTF8 handling to CSV conversion
  • ARROW-3409 - [C++] Streaming compression and decompression interfaces
  • ARROW-3421 - [C++] Add include-what-you-use setup to primary docker-compose.yml
  • ARROW-3427 - [C++] Add Windows support, Unix static libs for double-conversion package in conda-forge
  • ARROW-3429 - [Packaging] Add binary upload script
  • ARROW-3430 - [Packaging] Add workaround to verify 0.11.0
  • ARROW-3431 - [GLib] Include Gemfile to archive
  • ARROW-3432 - [Packaging] Expand variables in commit message
  • ARROW-3433 - [C++] Validate re2 with Windows toolchain, EP
  • ARROW-3439 - [R] R language bindings for Feather format
  • ARROW-3440 - [Gandiva] fix readme for builds
  • ARROW-3441 - [Gandiva] Use common unit test creation facilities, do not produce multiple executables for the same unit tests
  • ARROW-3442 - [C++] Allow dynamic linking of (most) unit tests
  • ARROW-3450 - [R] Wrap MemoryMappedFile class
  • ARROW-3451 - [C++/Python] pyarrow and numba CUDA interop
  • ARROW-3455 - [Gandiva][C++] Support pkg-config for Gandiva
  • ARROW-3456 - [CI] Reuse docker images and optimize docker-compose containers
  • ARROW-3460 - [Packaging] Add a script to rebase master on local release branch
  • ARROW-3461 - [Packaging] Add a script to upload RC artifacts as the official release
  • ARROW-3462 - [Packaging] Update CHANGELOG for 0.11.0
  • ARROW-3463 - [Website] Update for 0.11.0
  • ARROW-3464 - [Packaging] Build shared libraries for gandiva fat JAR via crossbow
  • ARROW-3465 - [Documentation] Fix gen_apidocs' docker image
  • ARROW-3469 - [Gandiva] Add gandiva travis OSX entry
  • ARROW-3472 - [Gandiva] remove gandiva_helpers library
  • ARROW-3473 - [Format] Clarify that 64-bit lengths and null counts are permitted, but not recommended
  • ARROW-3474 - [GLib] Extend gparquet API with get_schema and read_column
  • ARROW-3479 - [R] Support to write record_batch as stream
  • ARROW-3482 - [C++] Build with JEMALLOC by default
  • ARROW-3487 - [Gandiva] simplify fns that return errors
  • ARROW-3488 - [Packaging] Separate crossbow task definition files for packaging and tests
  • ARROW-3489 - [Gandiva][C++] Added support for IN expressions
  • ARROW-3490 - [R] streaming of arrow objects to streams
  • ARROW-3492 - [C++] Build jemalloc in parallel
  • ARROW-3493 - [Java] Make sure bound checks are off
  • ARROW-3499 - [R] Expose arrow::ipc::Message type
  • ARROW-3501 - [Gandiva] Enable building with gcc 4.8.x on Ubuntu Trusty, similar distros
  • ARROW-3504 - [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization.
  • ARROW-3505 - [R] Read record batch and table
  • ARROW-3506 - [Packaging] Nightly tests for docker-compose images
  • ARROW-3508 - [C++] Build against double-conversion from conda-forge
  • ARROW-3515 - [C++] Introduce NumericTensor class
  • ARROW-3518 - Detect HOMEBREW_PREFIX automatically
  • ARROW-3519 - [Gandiva] Arena for varlen output fns
  • ARROW-3521 - [GLib] Run Python using find_program in meson.build
  • ARROW-3529 - [Ruby] Import Red Parquet
  • ARROW-3530 - [Java/Python] Add conversion for pyarrow.Schema from org.apache…pojo.Schema
  • ARROW-3533 - [Python/Documentation] Use sphinx_rtd_theme instead of Bootstrap
  • ARROW-3536 - [C++] Add UTF8 validation functions
  • ARROW-3537 - [Rust] Implement Tensor Type
  • ARROW-3539 - [CI/Packaging] Update scripts to build against vendored jemalloc
  • ARROW-3540 - [Rust] Incorporate BooleanArray into PrimitiveArray
  • ARROW-3542 - [C++] Use unsafe appends when building array from CSV
  • ARROW-3545 - [C++/Python] Use “field” terminology with StructType, specify behavior with duplicate field names
  • ARROW-3547 - [R] Protect against Null crash when reading from RecordBatch
  • ARROW-3548 - [Plasma] Add CreateAndSeal object store method for faster puts for small objects.
  • ARROW-3551 - Update MapD to OmniSci on Powered By page
  • ARROW-3553 - [R] Error when losing data on int64, uint64 conversions to double
  • ARROW-3555 - [Plasma] Unify plasma client get function using metadata.
  • ARROW-3556 - [CI] Disable optimizations on Windows
  • ARROW-3557 - [Python] Set Cython language level
  • ARROW-3558 - [Plasma] Remove fatal error when calling get on unsealed object.
  • ARROW-3559 - [Plasma] Static linking for plasma_store_server.
  • ARROW-3562 - [R] Disallow creation of objects with shared_ptr(nullptr), use bits64::integer64
  • ARROW-3563 - [C++] Declare public link dependencies so arrow_static, plasma_static automatically pull in transitive dependencies
  • ARROW-3566 - [Format] Clarify the type of dictonary encoded field
  • ARROW-3567 - [Gandiva][GLib] Add GLib bindings of Gandiva
  • ARROW-3568 - [Packaging] Run pyarrow unittests for windows wheels
  • ARROW-3569 - [Packaging] Run pyarrow unittests when building conda package
  • ARROW-3574 - [Plasma] Use static libraries in plasma library.
  • ARROW-3575 - [Python] New documentation page for CSV reader
  • ARROW-3576 - [Python] Implemented compressed streams
  • ARROW-3577 - [Go] implement Chunked array
  • ARROW-3581 - [Gandiva][C++] Use protobuf as shared library when -DARROW_PROTOBUF_USE_SHARED=ON
  • ARROW-3582 - [CI] fix incantation for C++/Java detection tool
  • ARROW-3583 - [Python/Java] Create RecordBatch from VectorSchemaRoot
  • ARROW-3584 - [Go] Implement Table, Schema and Column
  • ARROW-3587 - [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)
  • ARROW-3588 - [Java] Fix checkstyle for header license
  • ARROW-3589 - [Gandiva] Make gandiva JNI wrappers optional
  • ARROW-3591 - [R] Support for collecting decimal types
  • ARROW-3592 - [Python] Allow getting view of a binary scalar
  • ARROW-3597 - [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests
  • ARROW-3600 - [CI/Packaging] Add Ubuntu 18.10
  • ARROW-3601 - [Rust] Add instructions for publishing to crates.io
  • ARROW-3602 - [Gandiva][Python] Initial Gandiva Cython bindings
  • ARROW-3603 - [Gandiva][C++] Support building with ARROW_BOOST_VENDORED=ON
  • ARROW-3605 - [Plasma] Remove dependence of plasma/events.h on ae.h.
  • ARROW-3607 - [Java] delete() method via JNI for plasma
  • ARROW-3608 - [R] Support for time32 and time64 array types
  • ARROW-3609 - [Gandiva] Convert Gandiva benchmark tests as gbenchmark t…
  • ARROW-3610 - [C++] Add interface to turn stl_allocator into arrow::MemoryPool
  • ARROW-3611 - [Python] Give better error message when type_id has wrong type.
  • ARROW-3612 - [Go] implement RecordBatch and RecordBatchReader
  • ARROW-3615 - [R] Support for NaN
  • ARROW-3616 - [Java] Fix remaining checkstyle issues
  • ARROW-3618 - [Packaging/Documentation] Add -c conda-forge option to avoid PackagesNotFoundError
  • ARROW-3620 - [Python] Document pa.cpu_count() in Sphinx API docs
  • ARROW-3621 - [Go] implement Table, Record, RecordReader and TableReader
  • ARROW-3622 - [Go] implement Schema.Equal
  • ARROW-3623 - [Go] implement Field.Equal
  • ARROW-3624 - [Python/C++] Support for zero-sized device buffers and device-to-device copying
  • ARROW-3625 - [Go] add examples for Table, Record and {Table,Record}Reader
  • ARROW-3626 - [Go] implement CSV reader
  • ARROW-3627 - [Go] add RecordBatchBuilder
  • ARROW-3629 - [Python] Add write_to_dataset to Python Sphinx API listing
  • ARROW-3630 - [Plasma][GLib] Add GLib bindings of Plasma
  • ARROW-3632 - [Packaging] Update deb names in dev/tasks/tasks.yml in release process
  • ARROW-3633 - [Packaging] Update deb names in dev/tasks/tasks.yml for 0.12.0
  • ARROW-3636 - [C++/Python] Update arrow/python/pyarrow_api.h
  • ARROW-3638 - [C++][Python] Move reading from Feather as Table feature to C++ from Python
  • ARROW-3639 - [Packaging] Run gandiva nightly packaging tasks
  • ARROW-3640 - [Go] implement Tensors
  • ARROW-3641 - [Python] Remove unneeded public keyword from pyarrow public C APIs
  • ARROW-3642 - [C++] Add arrowConfig.cmake generation
  • ARROW-3644 - [Rust] Implement ListArrayBuilder
  • ARROW-3645 - [Python] Document compression support in Sphinx
  • ARROW-3646 - [Python] High-level IO API
  • ARROW-3647 - [R] Fix R bit64 crash and formatting
  • ARROW-3648 - [Plasma][Java] Add API to get metadata and data at the same time
  • ARROW-3649 - [Rust] Refactor MutableBuffer's resize
  • ARROW-3656 - [C++] Allow whitespace in numeric CSV fields
  • ARROW-3657 - [R] there is no package called bit64
  • ARROW-3659 - [CI] Fix Travis matrix entry 2 documentation to use gcc
  • ARROW-3660 - [C++] Don't unnecessarily lock MemoryMappedFile for resizing in readonly files
  • ARROW-3661 - [Gandiva][GLib] Use “_” as word separator in constant name
  • ARROW-3662 - [C++] Add a const overload to MemoryMappedFile::GetSize
  • ARROW-3664 - [Rust] Add benchmark for PrimitiveArrayBuilder
  • ARROW-3665 - [Rust] Implement StructArrayBuilder
  • ARROW-3666 - [C++] Improve C++ parser performance
  • ARROW-3672 - & ARROW-3673: [Go] add support for time32 and time64 array
  • ARROW-3673 - [Go] implement Time64 array
  • ARROW-3674 - [Go] Implement Date32 and Date64 array types
  • ARROW-3675 - [Go] implement Date64 array
  • ARROW-3677 - [Go] Add fixed-length binary builder and array
  • ARROW-3681 - [Go] Add benchmarks for CSV reader
  • ARROW-3682 - [Go] unexport encoding/csv.Reader from CSV reader
  • ARROW-3683 - [Go] add functional-option style to configure the CSV reader
  • ARROW-3684 - [Go] Add chunking ability to CSV reader
  • ARROW-3692 - [Gandiva][Ruby] Add Ruby bindings of Gandiva
  • ARROW-3693 - [R] Invalid buffer for empty characters with null data
  • ARROW-3694 - [Java] Avoid superfluous string creation when logging level is disabled
  • ARROW-3695 - [Gandiva] Use add_arrow_lib()
  • ARROW-3696 - [C++] Add feather::TableWriter::Write(table)
  • ARROW-3697 - [Ruby]
  • ARROW-3701 - [Gandiva] add op for decimal 128
  • ARROW-3708 - [Packaging] Support CMake files in Linux packages
  • ARROW-3713 - [Rust] Implement BinaryArrayBuilder
  • ARROW-3718 - [Gandiva] Remove spurious gtest include
  • ARROW-3719 - [GLib] Support read/write table to/from Feather
  • ARROW-3720 - [GLib] Use “indices” instead of “indexes”
  • ARROW-3721 - [Gandiva][Python] Support all Gandiva literals
  • ARROW-3722 - [C++] Allow specifying types of CSV columns
  • ARROW-3723 - [Plasma][Ruby] Add Ruby bindings of Plasma
  • ARROW-3724 - [GLib] Update .gitignore
  • ARROW-3725 - [GLib] Add field readers to GArrowStructDataType
  • ARROW-3726 - [Rust] Add CSV reader with example
  • ARROW-3727 - [Python] Document use of foreign_buffer()
  • ARROW-3731 - MVP to read parquet in R library
  • ARROW-3733 - [GLib] Add to_string() to GArrowTable and GArrowColumn
  • ARROW-3736 - [CI/Docker] Ninja test in docker-compose run cpp hangs
  • ARROW-3738 - [C++] Parse ISO8601-like timestamps in CSV columns
  • ARROW-3741 - [R] Add support for arrow::compute::Cast to convert Arrow arrays from one type to anothe
  • ARROW-3743 - [Ruby] Add support for saving/loading Feather
  • ARROW-3744 - [Ruby] Use garrow_table_to_string() in Arrow::Table#to_s
  • ARROW-3746 - [Gandiva][Python] Print list of functions registered with gandiva
  • ARROW-3747 - [C++] Switch order of struct members in Decimal128
  • ARROW-3748 - [GLib] Add GArrowCSVReader
  • ARROW-3749 - [GLib] Fix typos
  • ARROW-3751 - [Gandiva][Python] Add more cython bindings for gandiva
  • ARROW-3752 - [C++] Remove unused status::ArrowError
  • ARROW-3753 - [Gandiva] Remove debug print
  • ARROW-3755 - [GLib] Add GArrowCompressedInputStream and GArrowCompressedOutputStream
  • ARROW-3760 - [R] Support Arrow CSV reader
  • ARROW-3773 - [C++] Remove redundant AssertArraysEqual function from before monorepo merge
  • ARROW-3778 - [C++] Compile parts of test-util.h that we can once, link with unit tests
  • ARROW-3781 - [C++] Implement BufferedOutputStream::SetBufferSize. Allocate buffer from MemoryPool
  • ARROW-3782 - [C++] Implement BufferedInputStream to pair with BufferedOutputStream
  • ARROW-3784 - [R] Array with type fails with x is not a vector
  • ARROW-3785 - [C++] Enable using double-conversion from $ARROW_BUILD_TOOLCHAIN
  • ARROW-3787 - [Rust] Implement From for BinaryArray
  • ARROW-3788 - [Ruby] Add support for CSV parser written in C++
  • ARROW-3795 - [R] Support for retrieving NAs from INT64 arrays
  • ARROW-3796 - [Rust] Add Example for PrimitiveArrayBuilder
  • ARROW-3798 - [GLib] Add support for column type CSV read option
  • ARROW-3800 - [C++] Vendor a string_view backport
  • ARROW-3803 - [C++/Python] Merge C++ builds and tests, run Python tests in separate CI entries
  • ARROW-3807 - [R] Missing Field API
  • ARROW-3819 - [Packaging] Update conda variant files to conform with feedstock after compiler migration
  • ARROW-3821 - [Format/Documentation] : Fix typos and grammar issues in Flight.proto comments
  • ARROW-3823 - [R] + buffer.complex
  • ARROW-3825 - [Python] Document how to run the Python unit tests in python/README.md
  • ARROW-3826 - [C++] Determine if using ccache caching in Travis CI actually improves build times
  • ARROW-3830 - [GLib] Add GArrowCodec
  • ARROW-3834 - [Doc] Merge C++ and Python documentation
  • ARROW-3836 - [C++] Add PREFIX, EXTRA_LINK_LIBS, DEPENDENCIES to ADD_ARROW_BENCHMARK
  • ARROW-3839 - [Rust] Add ability to infer schema in CSV reader
  • ARROW-3841 - [C++] Suppress catching polymorphic type by value warning
  • ARROW-3842 - [R] RecordBatchStreamWriter api
  • ARROW-3844 - [C++] Remove ARROW_USE_SSE and ARROW_SSE3
  • ARROW-3845 - [Gandiva][GLib] Add GGandivaNode
  • ARROW-3847 - [GLib] Remove unnecessary ''
  • ARROW-3849 - [C++] Leverage Armv8 crc32 extension instructions to accelerate the hash computation for Arm64
  • ARROW-3851 - [C++] Run clang-format in parallel
  • ARROW-3852 - [C++] Suppress used uninitialized warning
  • ARROW-3853 - [C++] Implement string to timestamp cast
  • ARROW-3854 - [GLib] Deprecate garrow_gio_{input,output}_stream_get_raw()
  • ARROW-3855 - [Rust] Schema/Field/Datatype now have derived serde traits
  • ARROW-3856 - [Ruby] Support compressed CSV save/load
  • ARROW-3858 - [GLib] Use {class_name}_get_instance_private
  • ARROW-3859 - [Arrow][Java] Fixed backward incompatible change. (#3018)
  • ARROW-3860 - [C++] Add ARROW_GANDIVA_STATIC_LIBSTDCPP option to restore hard-coded behavior prior to ARROW-3437
  • ARROW-3862 - [C++] Improve third-party dependencies download script
  • ARROW-3863 - [GLib] Use travis_retry with brew bundle command
  • ARROW-3864 - [GLib] Add support for allow-float-truncate cast option
  • ARROW-3865 - [Packaging] Add double-conversion dependency to conda forge recipes and the windows wheel build
  • ARROW-3867 - [Documentation] Uploading binary realase artifacts to Bintray
  • ARROW-3868 - [Rust] Switch to nightly Rust for required build, stable is now allowed to fail
  • ARROW-3870 - [C++] Add Peek to InputStream abstract interface
  • ARROW-3871 - [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData
  • ARROW-3878 - [Rust] Improve primitive types
  • ARROW-3880 - [Rust] Implement simple math operations for numeric arrays
  • ARROW-3881 - [Rust] PrimitiveArray<T> should support comparison operators
  • ARROW-3883 - [Rust] Update README
  • ARROW-3884 - [Python] Add LLVM6 to manylinux1 base image
  • ARROW-3885 - [Rust] Release prepare step should increment Rust version
  • ARROW-3886 - [C++] Add support for decompressed buffer size check for Snappy
  • ARROW-3891 - [Java] Remove Long.bitCount with simple bitmap operations
  • ARROW-3893 - [C++] Improve adaptive int builder performance
  • ARROW-3895 - [Rust] csv::Reader now returns Result instead of Option
  • ARROW-3899 - [Python] Table.to_pandas converts Arrow date32[day] to pandas datetime64[ns]
  • ARROW-3900 - [GLib] Add garrow_mutable_buffer_set_data()
  • ARROW-3905 - [Ruby]
  • ARROW-3906 - [C++] Break out builder.cc into multiple compilation units
  • ARROW-3908 - [Rust] Update rust dockerfile to use nightly toolchain
  • ARROW-3910 - [Python] Set date_as_objects=True as default in to_pandas methods
  • ARROW-3911 - [Python] Deduplicate datetime.date objects in Table.to_pandas internals
  • ARROW-3912 - [Plasma][GLib] Add support for creating and referring objects
  • ARROW-3913 - [Gandiva][GLib] Add GGandivaLiteralNode
  • ARROW-3914 - [C++/Python/Packaging] Docker-compose setup for Alpine linux
  • ARROW-3916 - [Python] Add support for filesystem kwarg in ParquetWriter
  • ARROW-3921 - [GLib][CI] Log Homebrew output
  • ARROW-3922 - [C++] Micro-optimizations to BitUtil::GetBit
  • ARROW-3924 - [Packaging][Plasma] Add support for Plasma deb/rpm packages
  • ARROW-3925 - [Python] Add autoconf to conda install instructions
  • ARROW-3928 - [Python] Deduplicate Python objects when converting binary, string, date, time types to object arrays
  • ARROW-3929 - [Go] improve CSV reader memory usage
  • ARROW-3930 - [C++] Avoid using Mersenne Twister for random test data
  • ARROW-3932 - [Python] Include Benchmarks.md in Sphinx docs
  • ARROW-3934 - [Gandiva] Only add precompiled tests if ARROW_GANDIVA_BUILD_TESTS
  • ARROW-3938 - [Packaging] Stop to refer java/pom.xml to get version information
  • ARROW-3939 - [Rust] Remove macro definition for ListArrayBuilder
  • ARROW-3945 - [Website] Update website for Gandiva donation
  • ARROW-3946 - [GLib] Add support for union
  • ARROW-3948 - [GLib][CI] Set timeout to Homebrew
  • ARROW-3950 - [Plasma] Make loading the TensorFlow op optional
  • ARROW-3952 - [Rust] Upgrade to Rust 2018 Edition
  • ARROW-3958 - [Plasma] Reduce number of IPCs
  • ARROW-3959 - [Rust] Add date/time data types
  • ARROW-3960 - [Rust] remove extern crate for Rust 2018
  • ARROW-3963 - [Packaging/Docker] Nightly test for building sphinx documentations
  • ARROW-3964 - [Go] Refactor examples of csv reader
  • ARROW-3967 - [Gandiva][C++] Make node.h public
  • ARROW-3970 - [Gandiva][C++] Remove unnecessary boost dependencies.
  • ARROW-3971 - [Python] Remove deprecations in 0.11 and prior
  • ARROW-3974 - [C++] Combine field_builders_ and children_ members in array/builder.h
  • ARROW-3982 - [C++] Allow “binary” input in simple JSON format
  • ARROW-3983 - [Gandiva][Crossbow] Link Boost statically in JAR packaging scripts
  • ARROW-3984 - [C++] Exit with error if user hits zstd ExternalProject path
  • ARROW-3986 - [C++] Document memory management and table APIs
  • ARROW-3986 - [C++] Write prose documentation
  • ARROW-3987 - [Java] Benchmark results for ARROW-1807
  • ARROW-3988 - [C++] Do not build unit tests by default, fix building Gandiva unit tests when ARROW_BUILD_TESTS=OFF
  • ARROW-3993 - [JS] CI Jobs Failing
  • ARROW-3994 - [C++] Remove ARROW_GANDIVA_BUILD_TESTS option
  • ARROW-3995 - [CI] Use understandable names on Travis
  • ARROW-3997 - [Documentation] Clarify dictionary index type
  • ARROW-4002 - [C++][Gandiva] Remove needless CMake version check
  • ARROW-4004 - [GLib] Replace GPU with CUDA
  • ARROW-4005 - [Plasma][GLib] Add gplasma_client_disconnect()
  • ARROW-4006 - Add CODE_OF_CONDUCT.md
  • ARROW-4009 - [CI] Run Valgrind and C++ code coverage in different builds
  • ARROW-4010 - [C++] Enable Travis CI scripts to only build and install only certain targets
  • ARROW-4015 - [Plasma] remove unused interfaces for plasma manager
  • ARROW-4017 - [C++] Move vendored libraries in dedicated directory
  • ARROW-4026 - [C++] Add *-all, *-tests, *-benchmarks modular CMake targets. Use in Travis CI
  • ARROW-4028 - [Rust] Merge parquet-rs codebase
  • ARROW-4029 - [C++] Exclude headers with ‘internal’ from installation. Document header file conventions in README
  • ARROW-4030 - [CI] Use travis_terminate in more script commands to fail faster
  • ARROW-4035 - [Ruby] Support msys2 mingw dependencies
  • ARROW-4037 - [Packaging] Remove workaround to verify 0.11.0
  • ARROW-4038 - [Rust] Implement boolean AND, OR, NOT array ops
  • ARROW-4039 - [Python] Update link to ‘development.rst’ page from Python README.md
  • ARROW-4042 - [Rust] Rename BinaryArray::get_value to value
  • ARROW-4043 - [Packaging/Docker] Python tests on alpine miss pytest dependency
  • ARROW-4044 - [Packaging/Python] Add hypothesis test dependency to pyarrow conda recipe
  • ARROW-4045 - [Packaging/Python] Add hypothesis test dependency to wheel crossbow tests
  • ARROW-4048 - [GLib] Return ChunkedArray instead of Array in gparquet_arrow_file_reader_read_column
  • ARROW-4051 - [Gandiva][GLib] Add support for null literal
  • ARROW-4054 - [Python] Update gtest, flatbuffers and OpenSSL in manylinux1 base image
  • ARROW-4060 - [Rust] Add parquet arrow converter.
  • ARROW-4069 - [Python] Add tests for casting binary -> string/utf8. Add pyarrow.utf8() type factory alias for readability
  • ARROW-4075 - [Rust] Reuse array builder after calling finish()
  • ARROW-4079 - [C++] Add machine benchmark
  • ARROW-4080 - [Rust] Improving lengthy build times in Appveyor
  • ARROW-4082 - [C++] Allow RelWithDebInfo, improve FindClangTools
  • ARROW-4084 - [C++] Make Status static method support variadic arguments
  • ARROW-4085 - [GLib] Use “field” for struct data type
  • ARROW-4087 - [C++] Make CSV spellings of null values configurable
  • ARROW-4093 - [C++] Fix wrong suggested method name
  • ARROW-4098 - [Python] Deprecate open_file/open_stream top level APIs in favor of using ipc namespace
  • ARROW-4100 - [Gandiva][C++] Fix regex for special character dot.
  • ARROW-4102 - [C++] Return common IdentityCast when casting to equal type
  • ARROW-4103 - [Docs] Move documentation build instructions from source/python/development.rst to docs/README.md
  • ARROW-4105 - [Rust] Add rust-toolchain to enforce user to use nightly toolchain for building
  • ARROW-4107 - [Python] Use ninja in pyarrow manylinux1 build
  • ARROW-4112 - [Packaging] Add support for Gandiva .deb
  • ARROW-4116 - [Python] Add warning to development instructions to avoid virtualenv when using Anaconda/miniconda
  • ARROW-4122 - [C++] Initialize class members based on codebase static analysis
  • ARROW-4127 - [Documentation][Python] Add instructions to build with Docker
  • ARROW-4129 - [Python] Fix syntax problem in benchmark docs
  • ARROW-4132 - [GLib] Add more GArrowTable constructors
  • ARROW-4141 - [Ruby] Add support for creating schema from raw Ruby objects
  • ARROW-4148 - [CI/Python] Disable ORC on nightly Alpine builds
  • ARROW-4150 - [C++] Ensure allocated buffers have non-null data pointer
  • ARROW-4151 - [Rust] Restructure project directories
  • ARROW-4152 - [GLib] Remove an example to show Torch integration
  • ARROW-4153 - [GLib] Add builder_append_value() for consistency
  • ARROW-4154 - [GLib] Add GArrowDecimal128DataType
  • ARROW-4155 - [Rust] Implement array_ops::sum() for PrimitiveArray
  • ARROW-4156 - [C++] Don't use object libs with Xcode
  • ARROW-4158 - Allow committers to set ARROW_GITHUB_API_TOKEN for merge script, better debugging output
  • ARROW-4160 - [Rust] Add README and executable files to parquet
  • ARROW-4161 - [GLib] Add PlasmaClientOptions
  • ARROW-4162 - [Ruby] Add support for creating data types from description
  • ARROW-4166 - [Ruby] Add support for saving to and loading from buffer
  • ARROW-4167 - [Gandiva] switch to arrow/util/variant
  • ARROW-4168 - [GLib] Use property to keep GArrowDataType passed in garrow_field_new()
  • ARROW-4172 - [Rust] more consistent naming in array builders
  • ARROW-4174 - [Ruby] Add support for building composite array from raw Ruby objects
  • ARROW-4175 - [GLib] Add support for decimal compare operators
  • ARROW-4177 - [C++] Add ThreadPool and TaskGroup microbenchmarks
  • ARROW-4183 - [Ruby] Add Arrow::Struct as an element of Arrow::StructArray
  • ARROW-4184 - [Ruby] Add Arrow::RecordBatch#to_table
  • ARROW-4191 - [C++] Use same CC and AR for jemalloc as for the main sources
  • ARROW-4199 - [GLib] Add garrow_seekable_input_stream_peek()
  • ARROW-4207 - [Gandiva][GLib] Add support for IfNode
  • ARROW-4210 - [Python] Mention boost-cpp directly in the conda meta.yaml for pyarrow
  • ARROW-4211 - [GLib] Add GArrowFixedSizeBinaryDataType
  • ARROW-4214 - [Ruby] Add support for building RecordBatch from raw Ruby objects
  • ARROW-4216 - [Python] Add CUDA API docs
  • ARROW-4228 - [GLib] Add garrow_list_data_type_get_field()
  • ARROW-4229 - [Packaging] Set crossbow target explicitly to enable building arbitrary arrow repo
  • ARROW-4233 - [Packaging] Use Docker to build source archive
  • ARROW-4239 - [Packaging] Fix version update for the next version
  • ARROW-4240 - [Packaging] Add missing Plasma GLib and Gandiva GLib documents to souce archive
  • ARROW-4241 - [Packaging] Disable crossbow conda OSX clang builds
  • ARROW-4243 - [Python] Fix test failures with pandas 0.24.0rc1
  • ARROW-4249 - [Plasma] Clean up client namespace
  • ARROW-4257 - [Release] Update release verification script to check binaries on Bintray
  • ARROW-4266 - [Python][CI] Disable ORC tests in dask integration test
  • ARROW-4269 - [Python] Fix serialization in pandas 0.22
  • ARROW-4270 - [Packaging][Conda] Update xcode version and remove toolchain builds
  • ARROW-4276 - [Release] Remove needless Bintray authentication from binaries verify script
  • ARROW-4306 - [Release] Update website and add blog post announcing 0.12.0 release
  • PARQUET-690 - [C++] Reuse Thrift resources when serializing metadata structures
  • PARQUET-1271 - [C++] Rename parquet_reader tool to parquet-reader for consistency
  • PARQUET-1439 - Remove PARQUET_ARROW_LINKAGE option, clean up overall library linking configuration
  • PARQUET-1449 - [C++] Support building with ARROW_BOOST_VENDORED=ON
  • PARQUET-1463 - [C++] Utilize common hashing machinery for dictionary encoding
  • PARQUET-1467 - [C++] Remove defunct ChunkedAllocator code
  • PARQUET-1473 - [C++] Add helper function that converts ParquetVersion to human-friendly string
  • PARQUET-1484 - [C++] Improve memory usage of FileMetaDataBuilder

Bug Fixes

  • ARROW-1847 - [Doc] Document the difference between RecordBatch and Table in an FAQ fashion
  • ARROW-2026 - [C++] Enforce use_deprecated_int96_timestamps to all time…
  • ARROW-2038 - [Python] Strip s3:// scheme in S3FSWrapper isdir() and isfile()
  • ARROW-2113 - /3768: [Python] set classpath to all hadoop jars when HADOOP_HOME present
  • ARROW-2591 - [Python] Add Parquet test case writing list-typed column with empty lists that caused segfault on 0.9.0
  • ARROW-2592 - [Python] Add “ignore_metadata” option to Table.to_pandas
  • ARROW-2654 - [Python] Error with errno 22 when loading 3.6 GB Parquet file
  • ARROW-2708 - [C++] Internal GetValues function in arrow::compute should check for nullptr
  • ARROW-2831 - [Plasma] MemoryError in teardown
  • ARROW-2970 - [Python] Support conversions of NumPy string arrays requiring chunked binary output
  • ARROW-2987 - [Python] test_cython_api can fail if run in an environment where vsvarsall.bat has been run more than once
  • ARROW-3048 - [Python] Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue)
  • ARROW-3058 - [Python] Raise more helpful better error message when writing a pandas.DataFrame to Feather format that requires a chunked layout
  • ARROW-3186 - [GLib][CI] Use the latest Meson again
  • ARROW-3202 - [C++] Fix compilation on Alpine Linux by using ARROW_WITH_BACKTRACE define
  • ARROW-3225 - [C++/Python] Pandas object conversion of ListType and ListType
  • ARROW-3324 - [Python] Destroy temporary metadata builder classes more eagerly when building files to reduce memory usage
  • ARROW-3343 - [Java] Disable flaky tests
  • ARROW-3405 - [Python] Document CSV reader
  • ARROW-3428 - [Python] Fix from_pandas conversion from float to bool
  • ARROW-3436 - [C++] Boost version required by Gandiva is too new for Ubuntu 14.04
  • ARROW-3437 - [C++] Use older API for boost::optional, remove gtest include from prod code, remove -static-libstdc++ flags
  • ARROW-3438 - [Packaging] Fix too much Markdown escape in CHANGELOG
  • ARROW-3445 - [GLib] Fix libarrow-glib link for libparquet-glib
  • ARROW-3449 - [C++] Fixes to build with CMake 3.2. Document what requires newer CMake
  • ARROW-3466 - [C++] Avoid leaking protobuf symbols
  • ARROW-3467 - [C++] Fix building against external double-conversion
  • ARROW-3470 - [C++] Fix row-wise example
  • ARROW-3477 - [C++] fixes for 32 bit architectures
  • ARROW-3480 - [Website] Fix broken install document for Ubuntu
  • ARROW-3483 - [CI] Python 3.6 build failure on Travis-CI
  • ARROW-3485 - [C++] Examples fail with Protobuf error
  • ARROW-3494 - [Gandiva][C++] fix re2 error in cmake
  • ARROW-3498 - [R] Make IPC APIs consistent
  • ARROW-3516 - [C++] Use unsigned type for difference of pointers in parallel_memcpy
  • ARROW-3517 - [C++] Add a workaround for MinGW-w64 32bit crash
  • ARROW-3524 - [C++] Fix compiler warnings from ARROW-3409 on clang-6
  • ARROW-3527 - [R] remove unused variables
  • ARROW-3528 - [R] Fixed typo in R package documentation
  • ARROW-3535 - [Python] pip install tensorflow install too new numpy in manylinux1 build
  • ARROW-3541 - [Rust] Update BufferBuilder to allow for new bit-packed BooleanArray
  • ARROW-3544 - [Gandiva][C++] Create function registry in multiple compilation units to reduce build times
  • ARROW-3549 - [Rust] Replace i64 with usize for some bit utility functions
  • ARROW-3573 - [Rust] with_bitset does not set valid bits correctly
  • ARROW-3580 - [Gandiva][C++] Fix build error with g++ 8.2.0
  • ARROW-3586 - [Python] Add test ensuring no segfault
  • ARROW-3598 - [Plasma] Fix Plasma GPU linking error.
  • ARROW-3613 - [Go] Fix builder downsize
  • ARROW-3613 - [Go] fix builder resize
  • ARROW-3614 - [R] Support for timestamps
  • ARROW-3634 - [GLib] Follow CudaDeviceManager::AllocateHost() API change
  • ARROW-3637 - [Go] implement Stringer for arrays
  • ARROW-3658 - [Rust] Incorrect List<T> tests
  • ARROW-3670 - [C++] Use FindBacktrace to find execinfo.h support
  • ARROW-3687 - [Rust] Anything measuring array slots should be usize
  • ARROW-3698 - [Gandiva] Segmentation fault when using a large table in Gandiva
  • ARROW-3700 - [C++] Ignore empty lines in CSV files
  • ARROW-3703 - [Python] DataFrame.to_parquet crashes if datetime column has time zones
  • ARROW-3704 - [Gandiva][C++] Add missing include
  • ARROW-3707 - [C++] Fix test regression with zstd 1.3.7
  • ARROW-3711 - [C++] Don't pass CXX_FLAGS to C_FLAGS
  • ARROW-3712 - [CI] Quick fix for RAT failure
  • ARROW-3715 - [C++] : Fix typo in gflags_ep CMake config
  • ARROW-3716 - [R] Missing cases for ChunkedArray conversion
  • ARROW-3728 - [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter
  • ARROW-3734 - [C++] Linking static zstd library fails on Arch x86-64
  • ARROW-3740 - [C++] Builder should not downsize
  • ARROW-3742 - Fix pyarrow.types & gandiva cython bindings
  • ARROW-3745 - [C++] CMake passes static libraries multiple times to linker
  • ARROW-3754 - [C++] Enable Zstandard by default only when CMake is 3.7 or later
  • ARROW-3756 - [CI/Docker/Java] Java tests are failing in docker-compose setup
  • ARROW-3765 - [Gandiva] Segfault when the validity bitmap has not been allocated
  • ARROW-3766 - [Python] pa.Table.from_pandas doesn't use schema ordering
  • ARROW-3768 - [Python] set classpath to hdfs not hadoop executable
  • ARROW-3775 - [C++] Handling Parquet Arrow reads that overflow a BinaryArray capacity
  • ARROW-3790 - [C++] Fix erroneous safe casting
  • ARROW-3792 - [C++] Writing a list-type chunked column to Parquet fails if any chunk is 0-length
  • ARROW-3793 - [C++] TestScalarAppendUnsafe is not testing unsafe appends
  • ARROW-3797 - [Rust] BinaryArray::value_offset incorrect in offset case
  • ARROW-3805 - [Gandiva] Handle null validity bit-map in if-else
  • ARROW-3831 - [C++] Add support for returning decompressed size
  • ARROW-3835 - [C++] Add missing arrow::io::CompressedOutputStream::raw() implementation
  • ARROW-3837 - [C++] Add GFLAGS_IS_A_DLL define to fix Windows build
  • ARROW-3866 - [Python] Column metadata is not transferred to tables in pyarrow
  • ARROW-3869 - [Rust] “invalid fastbin errors” since Rust nightly-2018-11-03
  • ARROW-3874 - [C++] Add LLVM_DIR to find_package in FindLLVM.cmake
  • ARROW-3879 - [C++] Fix uninitialized member in CudaBufferWriter
  • ARROW-3888 - [C++] Fix various compiler warnings
  • ARROW-3889 - [Python] Crash when creating schema from invalid args
  • ARROW-3890 - [Python] Handle NumPy binary arrays with UTF-8 validation when converting to StringArray
  • ARROW-3894 - [C++] Ensure that IPC file is properly initialized even if no record batches are written
  • ARROW-3898 - [Example] parquet-arrow example has compilation errors
  • ARROW-3909 - [Python] Table.from_pandas call that seemingly should zero copy does not
  • ARROW-3918 - [Python] ParquetWriter.write_table doesn't support coerce_timestamps or allow_truncated_timestamps
  • ARROW-3920 - [plasma] Fix reference counting in custom tensorflow plasma operator.
  • ARROW-3931 - [C++] Make possible to build regardless of LANG
  • ARROW-3936 - [C++] Add _O_NOINHERIT to the file open flags on Windows
  • ARROW-3937 - [Rust] Fix Rust nightly build (formatting rules changed)
  • ARROW-3940 - [Python/Documentation] Add required packages to the development instruction
  • ARROW-3941 - [R] RecordBatchStreamReader$schema
  • ARROW-3942 - [R] Feather api fixes
  • ARROW-3953 - [Python] Compat with pandas 0.24 rename of MultiIndex labels -> codes
  • ARROW-3955 - [GLib] Add (transfer full) to free when no longer needed
  • ARROW-3957 - [Python] Better error message when user connects to HDFS cluster with wrong port
  • ARROW-3961 - [Python/Documentation] Fix wrong path in the pyarrow README
  • ARROW-3969 - [Rust] Format using stable rustfmt
  • ARROW-3976 - [Ruby] Try to upgrade git to avoid errors caused by Homebrew on older git
  • ARROW-3977 - [Gandiva] fix label during ctest invoc
  • ARROW-3979 - [Gandiva] fix all valgrind reported errors
  • ARROW-3980 - [C++] Fix CRTP use in json-simple.cc
  • ARROW-3989 - [Rust][CSV] Cast bool string to lower case in reader
  • ARROW-3996 - [C++] Add missing packages on Linux
  • ARROW-4008 - [C++] Restore ARROW_BUILD_UTILITIES to fix integration tests
  • ARROW-4011 - [Gandiva] Install irhelpers.bc and use it
  • ARROW-4019 - [C++] Fix Coverity issues
  • ARROW-4033 - [C++] Use readlink -f instead of realpath in dependency download script
  • ARROW-4034 - [Ruby] Add support :append option to FileOutputStream
  • ARROW-4041 - [CI] Python 2.7 run uses Python 3.6
  • ARROW-4049 - [C++] Arrow never use glog even though glog is linked.
  • ARROW-4052 - [C++] Linker errors with glog and gflags
  • ARROW-4053 - [Python/Integration] HDFS Tests failing with I/O operation on closed file
  • ARROW-4055 - [Python] Fails to convert pytz.utc with versions 2018.3 and earlier
  • ARROW-4058 - [C++] arrow-io-hdfs-test fails when run against HDFS cluster from docker-compose
  • ARROW-4065 - [C++] arrowTargets.cmake is broken
  • ARROW-4066 - [Doc] Instructions to create Sphinx documentation
  • ARROW-4070 - [C++] Enable use of ARROW_BOOST_VENDORED with ninja-build
  • ARROW-4073 - [Python] Fix URI parsing on Windows. Also fix test for get_library_dirs when using ARROW_HOME to develop
  • ARROW-4074 - [Python] test_get_library_dirs_win32 fails if libraries installed someplace different from conda or wheel packages
  • ARROW-4078 - [CI] Detect changes in docs/ directory and build the Linux Python entry if so
  • ARROW-4088 - [Python] Table.from_batches() fails when passed a schema with metadata
  • ARROW-4089 - [Plasma] The tutorial is wrong regarding the parameter type of PlasmaClient.Create
  • ARROW-4101 - [C++] Identity BinaryType cast
  • ARROW-4106 - [Python] Tests fail to run because hypothesis update broke its API
  • ARROW-4109 - [Packaging] Missing glog dependency from arrow-cpp conda recipe
  • ARROW-4113 - [R] Fix version number
  • ARROW-4114 - [C++] Add python to requirements list for running on ubuntu
  • ARROW-4115 - [Gandiva] zero-init boolean data bufs
  • ARROW-4118 - [Python] Fix benchmark setup for “asv run”
  • ARROW-4125 - [Python] Don't fail ASV if Plasma extension is not built (e.g. on Windows)
  • ARROW-4126 - [Go] offset not used when accessing boolean array
  • ARROW-4128 - [C++] Update style guide to reflect NULLPTR and doxygen
  • ARROW-4130 - [Go] offset not used when accessing binary array
  • ARROW-4134 - [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's abort
  • ARROW-4135 - [Python] Can't reload a pandas dataframe containing a list of datetime.time
  • ARROW-4137 - [Rust] Move parquet code into a separate crate
  • ARROW-4138 - [Python] Fix setuptools_scm version customization on Windows
  • ARROW-4147 - [Java] reduce heap usage for varwidth vectors (#3298)
  • ARROW-4149 - [CI/C++] Parquet test misses ZSTD compression codec in CMake 3.2 nightly builds
  • ARROW-4157 - [C++] Fix clang documentation warnings on Ubuntu 18.04
  • ARROW-4171 - [Rust] fix parquet crate release version
  • ARROW-4173 - Fix JIRA library name in error message
  • ARROW-4178 - [C++] Fix TSan and UBSan errors
  • ARROW-4179 - [Python] Use more public API to determine whether a test has a pytest mark or not
  • ARROW-4182 - [Python][CI] SEGV frequency
  • ARROW-4185 - [Rust] Change directory before running Rust examples on Windows
  • ARROW-4186 - [C++] BitmapWriter shouldn't clobber data when length == 0
  • ARROW-4188 - [Rust] Move Rust README to top level rust directory
  • ARROW-4197 - [C++] Better Emscripten support
  • ARROW-4200 - [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst
  • ARROW-4209 - [Gandiva] Avoid struct return param in IR
  • ARROW-4215 - [GLib] Fix typos in documentation
  • ARROW-4227 - [GLib] Fix wrong data type in field of composite data type
  • ARROW-4237 - [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script
  • ARROW-4238 - [Packaging] Fix RC version conflict between crossbow and rake
  • ARROW-4246 - [Plasma][Python][Follow-up] Ensure plasma::ObjectTableEntry always has the same size regardless of whether built with CUDA support
  • ARROW-4246 - [Plasma][Python] PlasmaClient.list returns wrong information with CUDA enabled Plasma
  • ARROW-4256 - [Release] Fix Windows verification script for 0.12 release
  • ARROW-4258 - [Python] Safe cast fails from numpy float64 array with nans to integer
  • ARROW-4260 - [Python] NumPy buffer protocol failure
  • PARQUET-1426 - [C++] parquet-dump-schema has poor usability
  • PARQUET-1458 - [C++] parquet::CompressionToString not recognizing brotli compression
  • PARQUET-1469 - [C++] Fix data corruption bug in parquet::internal::DefinitionLevelsToBitmap that was triggered through random data
  • PARQUET-1471 - [C++] TypedStatistics::UpdateSpaced reads out of bounds value when there are more definition levels than spaced values
  • PARQUET-1481 - [C++] Throw exception when encountering bad Thrift metadata in RecordReader

Apache Arrow 0.11.1 (2018-10-23)

New Features and Improvements

  • ARROW-3353 - [Packaging] Build python 3.7 wheels
  • ARROW-3534 - [Python][skip appveyor]
  • ARROW-3546 - [Python] Provide testing setup to verify wheel binaries work in one or more common Linux distributions
  • ARROW-3565 - [Python] Pin tensorflow to 1.11.0 in manylinux1 container

Bug Fixes

  • ARROW-3514 - [C++] Work around insufficient output size estimate on old zlibs
  • ARROW-3907 - [Python] from_pandas errors when schemas are used with lower resolution timestamps

Apache Arrow 0.11.0 (2018-10-08)

New Features and Improvements

  • ARROW-25 - [C++] Implement CSV reader
  • ARROW-249 - [JAVA] Flight GRPC Implementation
  • ARROW-614 - [C++] Use glog (or some other tool) to print stack traces in debug builds on errors
  • ARROW-1325 - [R] Initial R package that builds against the arrow C++ library
  • ARROW-1424 - [Python] Add CUDA support to pyarrow
  • ARROW-1491 - [C++] Add casting from strings to numbers and booleans
  • ARROW-1521 - [C++] Add BufferOutputStream::Reset method
  • ARROW-1563 - [C++][FOLLOWUP] Use std::function instead of declaring auxiliary helper classes
  • ARROW-1563 - [C++] Implement logical unary and binary kernels for boolean arrays
  • ARROW-1860 - [C++] Add data structure to “stage” a sequence of IPC messages from in-memory data
  • ARROW-1949 - [Python/C++] Add option to Array.from_pandas and pyarrow.array to perform unsafe casts
  • ARROW-1963 - [C++/Python] Create Array from sequence of numpy.datetime64
  • ARROW-1968 - [C++/Python] Add basic unit tests for ORC reader
  • ARROW-2165 - [JAVA] enhance AllocationListener with onChildAdded()/onChildRemoved() calls (#2697)
  • ARROW-2338 - [Scripts] Windows release verification script should create a conda environment
  • ARROW-2352 - [C++/Python] Test OSX packaging in Travis matrix
  • ARROW-2519 - [Rust] Implement min/max for primitive arrays
  • ARROW-2520 - [Rust] CI should also build against nightly Rust
  • ARROW-2555 - [C++/Python] Allow Parquet-Arrow writer to truncate timestamps instead of failing
  • ARROW-2583 - [Rust] Buffer should be typeless
  • ARROW-2617 - [Rust] Schema should contain fields not columns
  • ARROW-2687 - [JS] Example usage in README is outdated
  • ARROW-2734 - [Python] Cython api example doesn't work by default on macOS
  • ARROW-2750 - [MATLAB] Initial MATLAB interface, support for reading numeric types from Feather files
  • ARROW-2799 - [Python] Add safe option to Table.from_pandas to avoid unsafe casts
  • ARROW-2813 - [CI][Followup] Disable gcov output in Travis-CI logs
  • ARROW-2813 - [CI] Mute uninformative lcov warnings
  • ARROW-2817 - [C++] Enable libraries to be installed in msys2 on Windows
  • ARROW-2840 - [C++] See if stream alignment logic can be simplified
  • ARROW-2865 - [C++/Python] Reduce some duplicated code in python/builtin_convert.cc
  • ARROW-2889 - [C++] Add optional argument to ADD_ARROW_TEST CMake function to add unit test prefix
  • ARROW-2900 - [Python] Improve performance of appending nested NumPy arrays in builtin_convert.cc
  • ARROW-2936 - [Python] Implement Table.cast for casting from one schema to another (if possible)
  • ARROW-2948 - [Packaging] Generate changelog with crossbow
  • ARROW-2950 - [C++] Clean up util/bit-util.h
  • ARROW-2952 - [C++] Dockerized include-what-you-use
  • ARROW-2958 - [C++] Bump Flatbuffers EP version to master to build on gcc 8.1
  • ARROW-2960 - [Packaging] Fix verify-release-candidate for binary packages and fix release cutting script for lib64 cmake issue
  • ARROW-2964 - [Go] wire all primitive arrays into array.MakeFromArray
  • ARROW-2971 - [Python] Give some modules in arrow/python more descriptive names
  • ARROW-2972 - [Python] Implement inference logic for uint64 conversions in builtin_convert.cc
  • ARROW-2975 - [Plasma] Fix TensorFlow operator compilation with pip package
  • ARROW-2976 - [Python] Fix pyarrow.get_library_dirs
  • ARROW-2979 - [GLib] Add operator functions in GArrowDecimal128
  • ARROW-2983 - [Packaging] Verify source release and binary artifacts in different scripts
  • ARROW-2989 - [C++/Python] Remove API deprecations in 0.10
  • ARROW-2991 - [CI] Cut down number of AppVeyor jobs
  • ARROW-2994 - [Python] Only include Python and NumPy include directories for libarrow_python targets
  • ARROW-2996 - [C++] Fix typo in cpp/.clang-tidy
  • ARROW-2998 - [C++][Resizable] Buffer
  • ARROW-2999 - [Python] Disable ASV runs in Travis CI for now
  • ARROW-3000 - [C++] Add option to label test groups then only build those unit tests
  • ARROW-3001 - [Packaging] Don't modify PATH during rust release verification
  • ARROW-3002 - [Python] Hash more parts of pyarrow.Field
  • ARROW-3003 - [Doc] Enable Java doc generation
  • ARROW-3005 - [Release] Update website, draft simple release blog post for 0.10.0
  • ARROW-3008 - [Packaging] Verify GPU related modules if available
  • ARROW-3009 - [Python] Fix pyarrow ORC reader
  • ARROW-3010 - [GLib] Update README to use Bundler
  • ARROW-3017 - [C++] Don't throw exception in arrow/util/thread-pool.h
  • ARROW-3018 - [Plasma][FOLLOWUP] Update plasma documentation
  • ARROW-3018 - [Plasma] Remove Mersenne twister
  • ARROW-3019 - [Packaging] Use Bundler to verify Arrow GLib
  • ARROW-3021 - [Go] add support for List arrays
  • ARROW-3022 - [Go] add support for Struct arrays
  • ARROW-3023 - [C++] Add gold linker enabling logic from Apache Kudu
  • ARROW-3024 - [C++] Remove mutex in MemoryPool implementations
  • ARROW-3025 - [C++] Add option to switch between dynamic and static linking in unit test executables
  • ARROW-3026 - [Python][Plasma] Only run Plasma unit tests with valgrind under Python 3.6
  • ARROW-3027 - [Ruby] Stop “git tag” by “rake release”
  • ARROW-3028 - [Python] Do less work to test Python documentation build
  • ARROW-3029 - [Python] Generate version file when building
  • ARROW-3031 - [Go] streamline Release of Arrays and Builders
  • ARROW-3033 - [Dev] docker-compose test tooling does not seem to cache built Docker images
  • ARROW-3034 - [Packaging] Resolve symbolic link in tar.gz
  • ARROW-3035 - [Rust] Examples in README.md do not run
  • ARROW-3036 - [Go] implement array.NewSlice
  • ARROW-3037 - [Go] implement Null array
  • ARROW-3042 - [Go] add godoc badge to README
  • ARROW-3043 - [C++] pthread doesn't exist on MinGW
  • ARROW-3044 - [Python] Remove all occurrences of cython's legacy property definition syntax
  • ARROW-3045 - [Python] Remove nullcheck from ipc Message and MessageReader
  • ARROW-3046 - [GLib] Use rubyish method
  • ARROW-3050 - [C++] Adopt HiveServer2 client codebase from
  • ARROW-3051 - [C++] Status performance optimization from Impala/Kudu
  • ARROW-3057 - [INTEGRATION] Fix spark and hdfs dockerfiles
  • ARROW-3059 - [C++] Remove namespace arrow::test
  • ARROW-3060 - [C++] Factor out string-to-X conversion routines
  • ARROW-3062 - [Python] Fix python package finder to also work in Python 2.7
  • ARROW-3064 - [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables
  • ARROW-3067 - [Packaging] Support dev/rc/release .deb/.rpm builds
  • ARROW-3068 - [Packaging] Bump version to 0.11.0-SNAPSHOT
  • ARROW-3069 - [Release] Stop using SHA1 checksums per ASF policy
  • ARROW-3072 - [C++] Add RETURN_NOT_OK linting rule, use ARROW_RETURN_NOT_OK in header files
  • ARROW-3075 - [C++] Incorporate parquet-cpp codebase into Arrow C++ build
  • ARROW-3076 - [Website] Add Google Analytics scripts to Sphinx, Doxygen API docs
  • ARROW-3088 - [Rust] Use internal Result<T> type instead of Result<T, ArrowError>
  • ARROW-3090 - [Rust] Accompany error messages with assertions
  • ARROW-3094 - [Python] Easier construction of schemas and struct types
  • ARROW-3099 - [C++] Add benchmark for number parsing
  • ARROW-3105 - [Plasma] Improve flushing error message
  • ARROW-3106 - [Website] Update committers and PMC roster on website
  • ARROW-3109 - [Python] Add Python 3.7 virtualenvs to manylinux1 container
  • ARROW-3110 - [C++] Fix warnings with gcc 7.3.0
  • ARROW-3111 - [Java] Adding logback config file to allow running tests with different log level
  • ARROW-3114 - [Website] Add information about user@ mailing list to website / Community page
  • ARROW-3115 - [JAVA] Style checks - fix import ordering
  • ARROW-3116 - [Plasma] Add “ls” to object store
  • ARROW-3117 - [GLib] Add garrow_chunked_array_to_string()
  • ARROW-3119 - [Packaging] Nightly packaging script fails
  • ARROW-3127 - [Doc] Add Tutorial for Sending Tensor from C++ to Python
  • ARROW-3128 - [C++] Support system shared zlib
  • ARROW-3129 - [Packaging] Stop to use deprecated BuildRoot and Group in .spec
  • ARROW-3130 - [Go] add initial support for Go modules
  • ARROW-3136 - [C++] Clean up public API
  • ARROW-3142 - [C++] Fetch all libs from toolchain environment
  • ARROW-3143 - [C++] CopyBitmap into existing memory
  • ARROW-3146 - [C++] Prototype Flight RPC client and server implementations
  • ARROW-3147 - [C++] Improve MSVC version detection
  • ARROW-3148 - [C++] Remove needless U+00A0 NO-BREAK SPACE (#2500)
  • ARROW-3152 - [Packaging] Add zlib to runtime dependencies for arrow-cpp conda package
  • ARROW-3153 - [Packaging] Fix broken nightly package builds introduced with recent cmake changes and orc tests
  • ARROW-3157 - [C++] Add Buffer::Wrap, MutableBuffer::Wrap convenience methods for wrapping typed memory, std::vector
  • ARROW-3158 - [C++] Handle float truncation during casting
  • ARROW-3160 - [Python] Improve pathlib.Path support in parquet and filesystem modules
  • ARROW-3163 - [Python] Add missing Cython dependency to source package
  • ARROW-3167 - [CI] Limit clcache cache size
  • ARROW-3168 - [C++] Restore pkgconfig for Parquet C++ libraries
  • ARROW-3170 - [C++] Experimental readahead spooler
  • ARROW-3171 - [Java] Enable checkstyle for line length and indentation
  • ARROW-3172 - [Rust] Update documentation for datatypes.rs
  • ARROW-3174 - [Rust] run examples as part of CI
  • ARROW-3177 - [Rust] Update expected error messages for tests that ‘should panic’
  • ARROW-3180 - [C++] Add docker-compose setup to simulate Travis CI run locally
  • ARROW-3181 - [Packaging] Adjust conda package scripts to account for Parquet codebase migration
  • ARROW-3182 - [Gandiva] Integrate gandiva to arrow build. Update licenses to apache license.
  • ARROW-3187 - [C++] Add support for using glog (Google logging library)
  • ARROW-3195 - [C++] Add missing error check for NumPy initialization in test
  • ARROW-3196 - Add support for merging both ARROW and PARQUET patches
  • ARROW-3197 - [C++] Add instructions for building Parquet libraries and running the unit tests
  • ARROW-3198 - [Website] Blog post for 0.11 release
  • ARROW-3211 - [C++] Disable gold linker with MinGW-w64
  • ARROW-3212 - [C++] Make IPC metadata deterministic, regardless of current stream position. Clean up stream / tensor alignment logic
  • ARROW-3213 - [C++] Use CMake to build vendored Snappy on Windows
  • ARROW-3214 - [C++] Disable insecure warnings in MinGW build
  • ARROW-3215 - [C++] Add support for finding libpython on MSYS2
  • ARROW-3216 - [C++] Add missing libpython link to libarrow_python in MinGW build
  • ARROW-3217 - [C++] Add missing ARROW_STATIC definition in MinGW build
  • ARROW-3218 - [C++] Remove needless links to utilities in MinGW build
  • ARROW-3219 - [C++] Use Win32 API in MinGW build
  • ARROW-3223 - [GLib] Use the same shared object versioning rule in C++
  • ARROW-3229 - [Packaging] : Adjust wheel package scripts to account for Parquet codebase migration
  • ARROW-3234 - [C++] Fix libprotobuf shared library link order
  • ARROW-3235 - [Packaging] Update deb names
  • ARROW-3236 - [C++] Fix stream accounting bug causing garbled schema message when writing IPC file format
  • ARROW-3240 - [GLib] Add build instructions using meson
  • ARROW-3242 - [C++] Make CpuInfo a singleton, use coarser-grained dispatch to SSE4 in Parquet dictionary encoding
  • ARROW-3249 - [Python] Run flake8 on integration_test.py and crossbow.py
  • ARROW-3250 - [C++] Buffer implementation which owns memory from a std::string
  • ARROW-3252 - [C++] Do not hard code the “v” part of versions in thirdparty toolchain
  • ARROW-3257 - [C++] Stop to use IMPORTED_LINK_INTERFACE_LIBRARIES
  • ARROW-3258 - [GLib] Fix CI failure on macOS
  • ARROW-3259 - [GLib] Rename “writeable” to “writable”
  • ARROW-3261 - [Python] Add “field” method to select fields from StructArray
  • ARROW-3262 - [Python] Implement getitem with integers on pyarrow.Column
  • ARROW-3264 - [Java] Checkstyle fix whitespace
  • ARROW-3267 - [Python] Create empty table from schema
  • ARROW-3268 - [CI][skip travis]
  • ARROW-3269 - [Python] Fix warnings in unit test suite
  • ARROW-3270 - [Release] Adjust release verification scripts to recent parquet migration
  • ARROW-3274 - [Packaging] Missing glog dependency from conda-forge recipes
  • ARROW-3276 - [Packaging] Add support for Parquet deb/rpm packages
  • ARROW-3281 - [Java] Make sure that WritableByteChannel in WriteChannel writes
  • ARROW-3282 - [R] initial R functionality
  • ARROW-3284 - [R][C++] Status code R error
  • ARROW-3285 - [GLib] Add arrow_cpp_build_type and arrow_cpp_build_dir options
  • ARROW-3286 - [C++] Add missing ARROW_EXPORT to RecordBatchBuilder
  • ARROW-3287 - [C++] Suppress “redeclared without dllimport attribute” warning from MinGW
  • ARROW-3288 - [GLib] Add missing new API index for 0.11.0
  • ARROW-3300 - [Release] Update deb package names in preparation
  • ARROW-3301 - [Website] Update Jekyll and Bootstrap 4
  • ARROW-3305 - [JS] Incorrect development documentation link in javascript readme
  • ARROW-3309 - [JS] Missing links from DEVELOP.md
  • ARROW-3313 - [R] Follow-up: install clang-format in R CI entry
  • ARROW-3313 - [R] Move .clang-format to top level. Add r/lint.sh script for linting R C++ files in Travis CI
  • ARROW-3319 - [GLib] Add align() to GArrowInputStream and GArrowOutputStream
  • ARROW-3320 - [C++] Improve float parsing performance
  • ARROW-3321 - [C++] Improve integer parsing performance
  • ARROW-3334 - [Python] Update conda packages to new numpy requirement
  • ARROW-3335 - [Python] Add ccache to manylinux1 container
  • ARROW-3339 - [R] Support for character vectors
  • ARROW-3341 - [R] Support for logical vector
  • ARROW-3349 - [C++] Use aligned_* API in MinGW
  • ARROW-3350 - [Website] Fix powered by links
  • ARROW-3352 - [Packaging] Fix recently failing wheel builds
  • ARROW-3356 - [Python] Document parameters of Table.to_pandas method
  • ARROW-3357 - [Rust] Add a mutable buffer implementation
  • ARROW-3360 - [GLib] Import Parquet GLib
  • ARROW-3363 - [C++/Python] Add helper functions to detect scalar Python types
  • ARROW-3371 - [Python] Remove check_metadata argument for Field.equals docstring
  • ARROW-3375 - [Rust] remove unused mempool
  • ARROW-3376 - [C++] Add double-conversion to cpp/thirdparty/download_dependencies.sh
  • ARROW-3377 - [Gandiva][C++] Replace If statement with bit operations for bitmap
  • ARROW-3382 - [C++] Run Gandiva tests in Travis CI
  • ARROW-3392 - [Python] Support filters in disjunctive normal form in ParquetDataset
  • ARROW-3395 - [C++/Python] Add docker container for linting
  • ARROW-3397 - [C++] Change a CMake relative path for modules
  • ARROW-3400 - [Packaging] Add support for Parquet GLib deb/rpm
  • ARROW-3404 - [C++] Make CSV chunker faster
  • ARROW-3411 - [Packaging] Make dev/release/01-perform.sh executable
  • ARROW-3412 - [Packaging] Update rat exclude files
  • ARROW-3413 - [Packaging] Include Parquet GLib document to source archive
  • ARROW-3415 - [Packaging] Fix “conda activate” failure
  • ARROW-3416 - [Packaging] Use SHA512 instead of SHA1
  • ARROW-3417 - [Packaging] Fix Parquet C++ test failure
  • ARROW-3418 - [C++] Update parquet-cpp version to 1.5.1-SNAPSHOT
  • ARROW-3423 - [Packaging] Remove RC information from deb/rpm packages
  • ARROW-3443 - [Java] Flight reports memory leaks in TestBasicOperation
  • PARQUET-169 - Implement support for bulk reading and writing rep/def levels
  • PARQUET-267 - Detach thirdparty code from build configuration.
  • PARQUET-416 - C++11 compilation, code reorg, libparquet and installation targets
  • PARQUET-418 - Refactored parquet_reader utility for printing file contents.
  • PARQUET-428 - Support INT96 and FIXED_LEN_BYTE_ARRAY types
  • PARQUET-434 - Add a ParquetFileReader class
  • PARQUET-435 - Change column reader methods to be array-oriented rather than scalar
  • PARQUET-436 - Implement basic Write Support
  • PARQUET-437 - Add googletest setup and ADD_PARQUET_TEST helper
  • PARQUET-438 - Update RLE encoding tools and add unit tests from Impala
  • PARQUET-439 - Conform copyright headers to ASF requirements
  • PARQUET-442 - Nested schema conversion, Thrift struct decoupling, dump-schema utility
  • PARQUET-448 - Add cmake options to not build tests and/or executables
  • PARQUET-449 - updated to latest parquet.thrift
  • PARQUET-451 - Add RowGroupReader helper class and refactor parquet_reader.cc into DebugPrint
  • PARQUET-456 - Finish gzip implementation and unit test all compressors
  • PARQUET-463 - Add local DCHECK macros, fix some dcheck bugs exposed
  • PARQUET-468 - Use thirdparty Thrift compiler to compile parquet.thrift at make time
  • PARQUET-477 - Add clang-format / clang-tidy checks to toolchain
  • PARQUET-482 - Organize public API headers
  • PARQUET-485 - Decouple page deserialization from column reader to facilitate unit testing
  • PARQUET-488 - Add SSE cmake toggle, fix build on systems without SSE
  • PARQUET-489 - Shared library symbol visibility
  • PARQUET-494 - Implement DictionaryEncoder and test dictionary decoding
  • PARQUET-496 - Fix cpplint configuration to catch more style errors
  • PARQUET-497 - Decouple serialized file internals from the ParquetFileReader public API
  • PARQUET-499 - Complete PlainEncoder implementation for all primitive types and test end to end
  • PARQUET-501 - Add OutputStream abstract interface, refactor encoding code paths
  • PARQUET-503 - Reenable parquet 2.0 encoding implementations.
  • PARQUET-508 - Add ParquetFilePrinter
  • PARQUET-508 - Add ParquetFilePrinter
  • PARQUET-512 - Add Google benchmark for performance testing
  • PARQUET-515 - Add “SetData” to LevelDecoder
  • PARQUET-518 - Remove -Wno-sign-compare and scrub integer signedness
  • PARQUET-519 - Remove last of suppressed compiler warnings
  • PARQUET-520 - Add MemoryMapSource and add unit tests for both it and LocalFileSource
  • PARQUET-533 - Add a Buffer abstraction, refactor input/output classes to be simpler using Buffers
  • PARQUET-538 - Improve ColumnReader Tests
  • PARQUET-542 - Support custom memory allocators
  • PARQUET-545 - Improve API to support decimal type
  • PARQUET-547 - Refactor templates to all be based on DataType structs
  • PARQUET-551 - Handle compiler warnings due to disabled DCHECKs in relea…
  • PARQUET-556 - Extend RowGroupStatistics to include “min” “max” statistics
  • PARQUET-559 - Enable external RandomAccessSource as input to the ParquetFileReader
  • PARQUET-564 - Add cmake option to run valgrind on each unit test executable
  • PARQUET-566 - Add method to retrieve the full column path
  • PARQUET-568 - Enable top-level column selection.
  • PARQUET-572 - Rename public namespace to parquet from parquet_cpp
  • PARQUET-573 - Create a public API for reading and writing file metadata
  • PARQUET-582 - Conversions functions for Parquet enums to Thrift enums
  • PARQUET-583 - Parquet to Thrift schema conversion
  • PARQUET-587 - Implement BufferReader::Read(int64_t,uint8_t*)
  • PARQUET-589 - Implement BufferedInputStream for better memory usage
  • PARQUET-592 - Support compressed writes
  • PARQUET-593 - Add API for writing Page statistics
  • PARQUET-595 - API for KeyValue metadata
  • PARQUET-595 - API for KeyValue metadata
  • PARQUET-597 - Add data rates to benchmark output
  • PARQUET-598 - Test writing all primitive data types
  • PARQUET-600 - Add benchmarks for RLE-Level encoding
  • PARQUET-603 - Implement missing information in schema descriptor
  • PARQUET-605 - Expose schema node in ColumnDescriptor
  • PARQUET-607 - Public writer header
  • PARQUET-610 - Print additional ColumnMetaData for each RowGroup
  • PARQUET-616 - WriteBatch should accept const arrays
  • PARQUET-619 - Add OutputStream for local files
  • PARQUET-625 - Improve RLE read performance
  • PARQUET-633 - Add version to WriterProperties
  • PARQUET-634 - Consistent private linking of dependencies
  • PARQUET-636 - Expose selection for different encodings
  • PARQUET-641 - Instantiate stringstream only if needed in SerializedPageReader::NextPage
  • PARQUET-646 - Add options to make developing with clang and 3rd-party gcc easier
  • PARQUET-666 - Add support for writing dictionaries
  • PARQUET-671 - performance improvements for rle/bit-packed decoding
  • PARQUET-679 - Local Windows build and Appveyor support
  • PARQUET-679 - Fix debug asserts in tests (msvc/debug build)
  • PARQUET-679 - [C++] Resolve unit tests issues on Windows; Run unit tes…
  • PARQUET-679 - Local Windows build and Appveyor support
  • PARQUET-681 - Add tool to scan a parquet file
  • PARQUET-681 - Add tool to scan a parquet file
  • PARQUET-687 - C++: Switch to PLAIN encoding if dictionary grows too large
  • PARQUET-689 - C++: Compress DataPages eagerly
  • PARQUET-699 - Update parquet.thrift from https://github.com/apache/parquet-format
  • PARQUET-712 - Add library to read into Arrow memory
  • PARQUET-721 - benchmarks for reading into Arrow
  • PARQUET-724 - Test more advanced properties setting
  • PARQUET-728 - Incorporate upstream Arrow API changes
  • PARQUET-728 - Incorporate upstream Arrow API changes
  • PARQUET-731 - API to return metadata size and Skip reading values
  • PARQUET-737 - Use absolute namespace in macros
  • PARQUET-752 - Account for upstream Arrow API changes
  • PARQUET-762 - C++: Use optimistic allocation instead of Arrow Builders
  • PARQUET-763 - C++: Expose ParquetFileReader through Arrow reader
  • PARQUET-769 - Add support for Brotli compression
  • PARQUET-778 - Standardize the schema output to match the parquet-mr format
  • PARQUET-782 - Support writing to Arrow sinks
  • PARQUET-785 - LIST schema conversion for Arrow lists
  • PARQUET-805 - Read Int96 into Arrow Timestamp(ns)
  • PARQUET-807 - Allow user to retain ownership of parquet::FileMetaData.
  • PARQUET-807 - Allow user to retain ownership of parquet::FileMetaData.
  • PARQUET-809 - Add SchemaDescriptor::Equals method
  • PARQUET-813 - Build thirdparty dependencies using ExternalProject
  • PARQUET-820 - Decoders should directly emit arrays with spacing for null entries
  • PARQUET-829 - Make use of ARROW-469
  • PARQUET-830 - Add parquet::arrow::OpenFile with additional properties and metadata args
  • PARQUET-833 - C++: Provide API to write spaced arrays
  • PARQUET-834 - Support I/O of arrow::ListArray
  • PARQUET-835 - Read Arrow columns in parallel with thread pool
  • PARQUET-836 - Bugfix + testcase for column subsetting in arrow::FileReader::ReadFlatTable
  • PARQUET-844 - Schema, compression consolidation / flattening
  • PARQUET-848 - Build Thrift bits as part of main parquet_objlib component
  • PARQUET-857 - Flatten parquet/encodings directory, consolidate code
  • PARQUET-858 - Flatten column directory, minor code consolidation
  • PARQUET-859 - Flatten parquet/file directory, consolidate file reader, file writer code
  • PARQUET-862 - Provide defaut cache size values
  • PARQUET-866 - API fixes for ARROW-33 patch
  • PARQUET-867 - Support writing sliced Arrow arrays
  • PARQUET-874 - Use default memory allocator from Arrow
  • PARQUET-877 - Update Arrow Hash, update Version in metadata.
  • PARQUET-882 - Improve Application Version parsing
  • PARQUET-890 - Support I/O of DATE columns in parquet_arrow
  • PARQUET-894 - Fix compilation warnings
  • PARQUET-894 - Fix compilation warning
  • PARQUET-897 - Only use designated public headers from libarrow
  • PARQUET-903 - Add option to set RPATH to origin
  • PARQUET-909 - Reduce buffer allocations (mallocs) on critical path
  • PARQUET-909 - Reduce buffer allocations (mallocs) on critical path
  • PARQUET-911 - [C++] Support nested structs in parquet_arrow
  • PARQUET-928 - Support pkg-config
  • PARQUET-929 - Handle arrow::DictionaryArray when writing Arrow data
  • PARQUET-930 - Add timestamp[us] to schema test
  • PARQUET-934 - Support multiarch on Debian
  • PARQUET-935 - Set version to shared library
  • PARQUET-946 - Add ReadRowGroup and num_row_group methods to arrow::FileReader
  • PARQUET-953 - Add static constructors to arrow::FileWriter for initializing from schema, add WriteTable method
  • PARQUET-967 - Combine libparquet, libparquet_arrow libraries
  • PARQUET-970 - Add Lz4 and Zstd compression codecs
  • PARQUET-978 - [C++] Minimizing footer reads for small(ish) metadata
  • PARQUET-984 - Add abi and so version to pkg-config
  • PARQUET-991 - Resolve msvc warnings; Appveyor treats msvc warnings as …
  • PARQUET-991 - Fix msvc warning C4100: ‘’: unreferenced formal parameter
  • PARQUET-991 - Resolve msvc warnings; Appveyor treats msvc warnings as …
  • PARQUET-999 - Improve MSVC build - Enable PARQUET_BUILD_BENCHMARKS
  • PARQUET-1008 - [C++] TypedColumnReader::ReadBatch method updated to ac…
  • PARQUET-1035 - Write Int96 from Arrow timestamp(ns)
  • PARQUET-1037 - allow arbitrary size row-groups
  • PARQUET-1041 - Support Arrow's NullArray
  • PARQUET-1043 - Raise minimum CMake version to 3.2, delete cruft.
  • PARQUET-1044 - Use compression libraries from Apache Arrow
  • PARQUET-1045 - Remove code that's being moved to Apache Arrow in ARROW-1154
  • PARQUET-1053 - Fix unused result warnings due to unchecked Statuses
  • PARQUET-1053 - Fix unused result warnings due to unchecked Statuses
  • PARQUET-1068 - Modify .clang-format to use straight Google format with 90-character line width
  • PARQUET-1068 - Modify .clang-format to use straight Google format with 90-character line width
  • PARQUET-1072 - Build with ARROW_NO_DEPRECATED_API in Travis CI
  • PARQUET-1078 - Add option to coerce Arrow timestamps to a particular unit
  • PARQUET-1079 - Remove Arrow offset shift unneeded after ARROW-1335
  • PARQUET-1083 - Factor logic in parquet-scan.cc into a library function to help with perf testing
  • PARQUET-1083 - Factor logic in parquet-scan.cc into a library function to help with perf testing
  • PARQUET-1086 - [C++] Remove usage of arrow/util/compiler-util.h
  • PARQUET-1087 - Add ScanContents function to arrow::FileReader that catches Parquet exceptions
  • PARQUET-1092 - Support writing chunked arrow::Table columns
  • PARQUET-1093 - Improve Arrow level generation error message
  • PARQUET-1094 - Add benchmark for boolean Arrow column I/O
  • PARQUET-1095 - [C++] Read and write Arrow decimal values
  • PARQUET-1104 - Upgrade to Apache Arrow 0.7.0 RC0
  • PARQUET-1150 - Hide statically linked boost symbols
  • PARQUET-1160 - [C++] Implement BYTE_ARRAY-backed Decimal reads
  • PARQUET-1164 - [C++] Account for API changes in ARROW-1808
  • PARQUET-1165 - Pin clang-format version to 4.0
  • PARQUET-1166 - Add GetRecordBatchReader in parquet/arrow/reader
  • PARQUET-1177 - Add PARQUET_BUILD_WARNING_LEVEL option and more rigorous Clang warnings
  • PARQUET-1177 - Add PARQUET_BUILD_WARNING_LEVEL option and more rigorous Clang warnings
  • PARQUET-1196 - Example parquet_arrow project
  • PARQUET-1200 - Support reading a single Arrow column from a Parquet file
  • PARQUET-1218 - More informative error message on too short pages
  • PARQUET-1225 - NaN values may lead to incorrect filtering under certai…
  • PARQUET-1227 - Thrift crypto metadata structures
  • PARQUET-1256 - Add --print-key-value-metadata option to parquet_reader tool
  • PARQUET-1256 - Add --print-key-value-metadata option to parquet_reader tool
  • PARQUET-1267 - [C++] replace “unsafe” std::equal by std::memcmp
  • PARQUET-1276 - [C++] Reduce the amount of memory used for writing null decimal values
  • PARQUET-1279 - [C++] Adding use of ASSERT_NO_FATAL_FAILURE in unit tests when calling helper functions that call ASSERT_ macros
  • PARQUET-1301 - [C++] Crypto package in parquet-cpp
  • PARQUET-1308 - [C++] Use Arrow thread pool, not Arrow ParallelFor, fix deprecated APIs, upgrade clang-format version. Fix record delimiting bug
  • PARQUET-1323 - Fix compiler warnings on clang-6
  • PARQUET-1332 - Add bloom filter for parquet
  • PARQUET-1340 - Fix Travis Ci valgrind errors related to std::random_de…
  • PARQUET-1346 - [C++] Protect against empty Arrow arrays with null values
  • PARQUET-1348 - Add ability to write FileMetaData in arrow FileWriter
  • PARQUET-1350 - [C++] Use abstract ResizableBuffer instead of concrete PoolBuffer
  • PARQUET-1360 - Use conforming API style, variable names in WriteFileMetaData functions
  • PARQUET-1366 - [C++] Streamline use of Arrow's bit-util.h APIs
  • PARQUET-1372 - Add an API to allow writing RowGroups based on size
  • PARQUET-1372 - Add an API to allow writing RowGroups based on size
  • PARQUET-1378 - Allow RowGroups with zero rows to be written
  • PARQUET-1382 - [C++] Prepare for arrow::test namespace removal
  • PARQUET-1392 - Read multiple RowGroups at once into an Arrow table
  • PARQUET-1398 - [C++] move iv_prefix to Algorithms
  • PARQUET-1401 - [C++] optional RowGroup fields for handling hidden columns
  • PARQUET-1427 - [C++] Incorporate with build system, parquet target. Fix
  • PARQUET-1431 - [C++] Automaticaly set thrift to use boost for thrift versions before 0.11

Bug Fixes

  • ARROW-1380 - [Plasma] Fix “still reachable” valgrind warnings when PLASMA_VALGRIND=1
  • ARROW-1661 - [Python] Build Python 3.7 in manylinux container
  • ARROW-1799 - [Plasma C++] Make unittest does not create plasma store executable
  • ARROW-1996 - [Python] pyarrow.read_serialized cannot read concatenated records
  • ARROW-2027 - [C++] ipc::Message::SerializeTo does not pad the message body
  • ARROW-2220 - Only suggest default fix version that is a mainline release in merge tool
  • ARROW-2310 - Source release scripts fail with Java8
  • ARROW-2646 - [C++/Python] Pandas roundtrip for date objects
  • ARROW-2775 - [Python] ccache error when building manylinux1 wheels
  • ARROW-2776 - [C++] Do not pass -Wno-noexcept-type for compilers that do not support it
  • ARROW-2782 - [Python] Ongoing Travis CI failures in Plasma unit tests
  • ARROW-2785 - [C++] Crash in json-integration-test
  • ARROW-2814 - [Python] Unify conversion paths for sequences of Python objects
  • ARROW-2854 - [C++/Python] Casting float NaN to int should raise an error on safe cast
  • ARROW-2925 - [JS] Documentation failing in docker container
  • ARROW-2965 - [Python] Guard against overflow when serializing Numpy uint64 scalar
  • ARROW-2966 - [Python] Data type conversion error
  • ARROW-2973 - [Python] pitrou/asv.git@customize_commands does not work with the “new” way of activating conda
  • ARROW-2974 - [Python] Replace usages of “source activate” with “conda activate” in CI scripts
  • ARROW-2986 - [C++] Use /EHsc flag for exception handling on MSVC, disable C4772 compiler warning in arrow/util/logging.h
  • ARROW-2992 - [Python] Fix Parquet benchmark
  • ARROW-2992 - [CI] Remove some AppVeyor build configurations
  • ARROW-3006 - [GLib] Fix a bug that .gir/.typelib for GPU aren't installed
  • ARROW-3007 - [Packaging] Remove needless dependencies
  • ARROW-3011 - [CI] Remove Slack notification
  • ARROW-3012 - [Python] Fix setuptools_scm usage
  • ARROW-3013 - [Website] Fix download links on website for tarballs, checksums
  • ARROW-3015 - [Python] Fix typo in uint8() docstring
  • ARROW-3047 - [C++/Python] Better build instructions with ORC
  • ARROW-3049 - [C++/Python] Fix reading empty ORC file
  • ARROW-3053 - [Python] Add unit test for strided object conversion that was failing in 0.9.0
  • ARROW-3056 - [Python] Add notes to NativeFile docstrings for BufferedIOBase methods that are not implemented
  • ARROW-3061 - [JAVA] Fix BufferAllocator#getHeadroom (#2434)
  • ARROW-3065 - [Python] concat_tables() failing from bad Pandas Metadata
  • ARROW-3083 - [CI][skip appveyor]
  • ARROW-3093 - [C++] Linking errors with ORC enabled
  • ARROW-3095 - [Plasma] Move plasma store
  • ARROW-3098 - [C++/Python] Allow seeking at end of BufferReader and FixedSizeBufferWriter
  • ARROW-3100 - [GLib] Follow Homebrew change that lua splits luarocks
  • ARROW-3125 - [C++] Add support for finding libpython on MSYS2
  • ARROW-3125 - [Python] Update ASV instructions
  • ARROW-3132 - Regenerate 0.10.0 changelog given JIRA metadata updates
  • ARROW-3137 - [Python] pyarrow 0.10 requires newer version of numpy than specified in requirements
  • ARROW-3140 - [Plasma] Fix Plasma build with GPU support
  • ARROW-3141 - [Python] Raise numpy global requirement to 1.14
  • ARROW-3145 - [C++] Thrift compiler reruns in arrow/dbi/hiveserver2/thrift when using Ninja build
  • ARROW-3173 - [Rust] dynamic_types example does not run
  • ARROW-3175 - [Java] Switch to official flatbuffers Java artifact and com.github.icexelloss for flatc executable artifact
  • ARROW-3183 - [Python] Fix get_library_dirs on Windows
  • ARROW-3188 - [Python] Table.from_arrays segfaults if lists and schema are passed
  • ARROW-3190 - [C++] Rename Writeable references to Writable, add backwards compatibility, deprecations
  • ARROW-3206 - [C++] Fix CMake error when ARROW_HIVESERVER2=ON but tests disabled
  • ARROW-3227 - [Python] Require bytes-like input to NativeFile.write
  • ARROW-3228 - [Python] Do not allow PyObject_GetBuffer to obtain non-readonly Py_buffer when pyarrow Buffer is not mutable
  • ARROW-3231 - [Python] Sphinx's autodoc_default_flags is now deprecated
  • ARROW-3237 - [CI] Update linux packaging filenames in rat exclusion list
  • ARROW-3241 - [Plasma] test_plasma_list test failure on Ubuntu 14.04
  • ARROW-3251 - [C++] Fix conversion warnings in cast.cc
  • ARROW-3256 - ,3304: [JS] fix file footer inconsistency, yield all messages from the stream reader
  • ARROW-3271 - [Python] Manylinux1 builds timing out in Travis CI
  • ARROW-3279 - [C++] Allow linking Arrow tests dynamically on Windows
  • ARROW-3299 - [C++] Make RecordBatchBuilder non-copyable to appease MSVC
  • ARROW-3322 - [CI] Fix AppVeyor script to skip Rust job when no Rust changes
  • ARROW-3327 - [Python] Use local Arrow checkout instead of separate clone
  • ARROW-3338 - [Python] Crash when schema and columns do not match
  • ARROW-3342 - Appveyor builds have stopped triggering on GitHub
  • ARROW-3348 - [Plasma] Fix bug in which plasma store dies when object created by remo…
  • ARROW-3354 - [Python] Swap cuda.read_record_batch arguments
  • ARROW-3369 - [Packaging] Wheel builds are failing due to wheel 0.32 release
  • ARROW-3370 - [Packaging] Suppress BFD warnings on CentOS 6
  • ARROW-3373 - [Plasma] Fix bug when plasma client requests multiple objects and add test.
  • ARROW-3374 - [Python] Implicitly set from_pandas=True when passing pandas.Categorical to pyarrow.array. Preserve ordered categories
  • ARROW-3390 - [C++] cmake file under windows msys2 system doesn't work
  • ARROW-3393 - [C++] Add missing override on virtual dtor in task-group.cc
  • ARROW-3394 - [Java] Remove duplicate dependency in Flight for grpc-netty
  • ARROW-3403 - [Website] Source tarball link missing from install page
  • ARROW-3420 - [C++] Fix outstanding include-what-you-use issues in src/arrow, src/parquet codebases
  • PARQUET-232 - minor compilation issue
  • PARQUET-446 - Hide Thrift compiled headers and Boost from public API, #include scrubbing
  • PARQUET-454 - Fix inconsistencies with boolean PLAIN encoding
  • PARQUET-455 - Fix OS X / Clang compiler warnings
  • PARQUET-457 - Verify page deserialization for GZIP and SNAPPY codecs, related refactoring
  • PARQUET-469 - Roll back Thrift thirdparty and compiled sources to 0.9.0
  • PARQUET-472 - Changed the ownership of InputStream in ColumnReader.
  • PARQUET-505 - Column reader should automatically handle large data pages
  • PARQUET-507 - Reduce the runtime of rle-test
  • PARQUET-513 - Fail build if valgrind finds error during ctest, fix a core dump
  • PARQUET-525 - Add test coverage for failure modes in ParseMetaData
  • PARQUET-537 - Ensure that LocalFileSource is properly closed.
  • PARQUET-549 - Add column reader tests for dictionary pages
  • PARQUET-555 - Dictionary page metadata handling inconsistencies
  • PARQUET-561 - Add destructor to PIMPL
  • PARQUET-599 - Better size estimation for levels
  • PARQUET-604 - Add writer headers to installation
  • PARQUET-614 - Remove unneeded LZ4-related code
  • PARQUET-620 - Ensure metadata is written only once
  • PARQUET-621 - Add flag to indicate if decimalmetadata is set
  • PARQUET-629 - RowGroupSerializer should only close itself once
  • PARQUET-639 - Do not export DCHECK in public headers
  • PARQUET-643 - Add const modifier to schema pointer reference
  • PARQUET-657 - Do not define DISALLOW_COPY_AND_ASSIGN if already defined
  • PARQUET-658 - Add virtual destructor to ColumnReader
  • PARQUET-659 - Export extern templates for typed column reader/writer classes
  • PARQUET-662 - Compile ParquetException implementation and explicitly export
  • PARQUET-676 - Fix incorrect MaxBufferSize for small bit widths
  • PARQUET-691 - Write ColumnChunk metadata after chunk is complete
  • PARQUET-694 - Revert default data page size back to 1M
  • PARQUET-700 - Disable dictionary encoding for boolean columns
  • PARQUET-701 - Ensure that Close can be called multiple times
  • PARQUET-702 - Add a writer + reader example with detailed comments
  • PARQUET-702 - Add a writer + reader example with detailed comments
  • PARQUET-703 - Validate that ColumnChunk metadata counts nulls in num_values
  • PARQUET-704 - Install scan-all.h
  • PARQUET-708 - account for “worst case scenario” in MaxBufferSize for bit_width > 1
  • PARQUET-710 - Remove unneeded private member variables from RowGroupReader ABI
  • PARQUET-711 - Use metadata builders in parquet writer
  • PARQUET-711 - Use metadata builders in parquet writer
  • PARQUET-718 - Fix I/O of non-dictionary encoded pages
  • PARQUET-719 - Fix WriterBatch API to handle NULL values
  • PARQUET-720 - Mark ScanAllValues as inline to prevent link error
  • PARQUET-739 - Don't use a static buffer for data accessed by multiple threads
  • PARQUET-739 - Don't use a static buffer for data accessed by multiple threads
  • PARQUET-741 - Always allocate fresh buffers while compressing
  • PARQUET-742 - Add missing license headers
  • PARQUET-745 - TypedRowGroupStatistics fails to PlainDecode min and max in ByteArrayType
  • PARQUET-747 - Better hide TypedRowGroupStatistics in public API
  • PARQUET-759 - Fix handling of columns of empty strings
  • PARQUET-760 - Store correct encoding in fallback data pages
  • PARQUET-764 - Support batches for PLAIN boolean writes that aren't a multiple of 8
  • PARQUET-766 - Expose ParquetFileReader through Arrow reader as const
  • PARQUET-775 - Make TrackingAllocator thread-safe
  • PARQUET-779 - Export TypedRowGroupStatistics in libparquet
  • PARQUET-780 - WriterBatch API does not properly handle NULL values for byte array types
  • PARQUET-789 - Catch/translate ParquetExceptions in parquet::arrow::FileReader
  • PARQUET-793 - Do not return incorrect statistics
  • PARQUET-797 - Updates for ARROW-418 header API changes
  • PARQUET-799 - Fix bug in MemoryMapSource::CloseFile
  • PARQUET-812 - Read BYTE_ARRAY with no logical type as arrow::BinaryArray
  • PARQUET-816 - Workaround for incorrect column chunk metadata in parquet-mr <= 1.2.8
  • PARQUET-818 - Refactoring to utilize common IO, buffer, memory management abstractions and implementations
  • PARQUET-818 - Refactoring to utilize common IO, buffer, memory management abstractions and implementations
  • PARQUET-819 - Don't try to install no longer existing arrow/utils.h
  • PARQUET-827 - Account for arrow::MemoryPool API change and fix bug in reading Int96 timestamps
  • PARQUET-828 - Do not implicitly cast ParquetVersion enum to int
  • PARQUET-837 - Remove RandomAccessSource::Seek method which can be a source of thread safety problems
  • PARQUET-841 - Version number being incorrectly written for v1 files
  • PARQUET-842 - Do not set unnecessary fields in the parquet::SchemaElement
  • PARQUET-843 - Impala is thrown off by a REPEATED root schema node
  • PARQUET-846 - CpuInfo::Init() is not thread safe
  • PARQUET-880 - Prevent destructors from throwing
  • PARQUET-888 - Add missing virtual dtor.
  • PARQUET-889 - Fix compilation when SSE is enabled
  • PARQUET-892 - Specify public link targets for parquet_static so that transitive dependencies are linked in executables
  • PARQUET-895 - Fix broken reading of nested repeated columns
  • PARQUET-898 - Upgrade to googletest 1.8.0, move back to Xcode 6.4 in Travis CI
  • PARQUET-908 - Fix shared library visibility of some symbols in types.h
  • PARQUET-914 - Rewording exception message in column writer.
  • PARQUET-915 - Support additional Arrow date/time types and metadata
  • PARQUET-918 - Keep ordering in column indices when converting Parquet Schema
  • PARQUET-918 - FromParquetSchema API crashes on nested schemas
  • PARQUET-919 - Account for ARROW-683 changes, but make no functional changes. Set PARQUET_ARROW=on by default
  • PARQUET-923 - Account for Time type changes in Arrow
  • PARQUET-933 - Account for API changes in ARROW-728
  • PARQUET-936 - Return Invalid Status if chunk_size <= 0 when WriteTable in parquet-arrow
  • PARQUET-943 - Fix build error on x86
  • PARQUET-947 - Account for Arrow library consolidation in ARROW-795, API changes in ARROW-782
  • PARQUET-958 - [C++] Print Parquet metadata in JSON format
  • PARQUET-958 - [C++] Print Parquet metadata in JSON format
  • PARQUET-963 - Return NotImplemented when attempting to read a struct field
  • PARQUET-965 - Add FIXED_LEN_BYTE_ARRAY read and write support in parquet-arrow
  • PARQUET-979 - Limit size of min, max or disable stats for long binary types
  • PARQUET-992 - Do not transitively include zlib.h in public API
  • PARQUET-995 - Use sizeof(Int96) instead of Int96Type
  • PARQUET-997 - Fix override compiler warnings
  • PARQUET-1002 - Compute statistics based on Sort Order
  • PARQUET-1003 - Modify DEFAULT_CREATED_BY value for every new release v…
  • PARQUET-1007 - Update parquet.thrift
  • PARQUET-1029 - [C++] Some extern template symbols not being exported in gcc
  • PARQUET-1029 - [C++] Some extern template symbols not being exported in gcc
  • PARQUET-1033 - Improve documentation about WriteBatchSpaced
  • PARQUET-1038 - Key value metadata should be nullptr if not set
  • PARQUET-1040 - Add missing writer methods
  • PARQUET-1042 - Fix Compilation breaks on GCC 4.8
  • PARQUET-1048 - Apache Arrow static transitive dependencies
  • PARQUET-1048 - Apache Arrow static transitive dependencies
  • PARQUET-1054 - Fixes for Arrow API changes in ARROW-1199
  • PARQUET-1071 - Check that arrow::FileWriter::Close() is idempotent
  • PARQUET-1085 - [C++] Use namespaced macros from arrow/util/macros.h, work around UNUSED rename
  • PARQUET-1088 - Remove parquet_version.h from version control since it gets auto generated
  • PARQUET-1090 - Add max row group length option, fix int32 overflow
  • PARQUET-1098 - Install util/comparison.h
  • PARQUET-1100 - Introduce RecordReader interface to better support nested data, refactor parquet/arrow/reader
  • PARQUET-1108 - Fix Int96 comparators
  • PARQUET-1114 - Apply changes for ARROW-1601 ARROW-1611, change shared l…
  • PARQUET-1121 - Handle Dictionary[Null] arrays on writing Arrow tables
  • PARQUET-1123 - [C++] Update parquet-cpp to use Arrow's AssertArraysEqual
  • PARQUET-1138 - Fix Arrow 0.7.1 build
  • PARQUET-1167 - [C++] FieldToNode function should return a status when throwing an exception
  • PARQUET-1175 - Fix arrow::ArrayData method rename from ShallowCopy to Copy
  • PARQUET-1179 - Upgrade to Thrift 0.11, use std::shared_ptr instead of boost::shared_ptr
  • PARQUET-1180 - Fix behaviour of num_children element of primitive nodes
  • PARQUET-1193 - [CPP] Implement ColumnOrder to support min_value and max_value
  • PARQUET-1226 - Fixes for CHECKIN compiler warning level with clang 5.0
  • PARQUET-1233 - Enable option to switch between stl classes and boost c…
  • PARQUET-1245 - Fix creating Arrow table with duplicate column names
  • PARQUET-1255 - Fix error message when PARQUET_TEST_DATA isn't defined
  • PARQUET-1265 - Segfault on static ApplicationVersion initialization
  • PARQUET-1268 - Fix conversion of null list Arrow arrays
  • PARQUET-1270 - Install executable tools
  • PARQUET-1272 - Return correct row count for nested columns in ScanFileContents
  • PARQUET-1273 - Properly write dictionary values when writing in chunks
  • PARQUET-1274 - Prevent segfault that was occurring when writing a nanosecond timestamp with arrow writer properties set to coerce timestamps and support deprecated int96 timestamps.
  • PARQUET-1283 - [C++] Remove trailing space for string and int96 statis…
  • PARQUET-1307 - Fix memory-test for newer Arrow
  • PARQUET-1315 - ColumnChunkMetaData.has_dictionary_page() should return…
  • PARQUET-1333 - [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc
  • PARQUET-1334 - [C++] memory_map parameter seems missleading in parquet file opener
  • PARQUET-1357 - FormatStatValue truncates binary statistics on zero character
  • PARQUET-1358 - index_page_offset should be unset as it is not supported
  • PARQUET-1369 - Disregard column sort order if statistics max/min are equal
  • PARQUET-1384 - fix clang build error for bloom_filter-test.cc

Apache Arrow 0.10.0 (2018-08-06)

Bug Fixes

  • ARROW-198 - [Java] OutOfMemoryError for vector test case
  • ARROW-640 - [Python] Implement hash and equality for Array scalar values Arrow scalar values
  • ARROW-2020 - [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps
  • ARROW-2059 - [Python] Possible performance regression in Feather read/write path
  • ARROW-2101 - [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string
  • ARROW-2122 - [Python] Pyarrow fails to serialize dataframe with timestamp.
  • ARROW-2182 - [Python] Build C++ libraries in benchmarks build step
  • ARROW-2189 - [C++] Seg. fault on make_shared<PoolBuffer>
  • ARROW-2193 - [Plasma] plasma_store has runtime dependency on Boost shared libraries when ARROW_BOOST_USE_SHARED=on
  • ARROW-2195 - [Plasma] Return auto-releasing buffers
  • ARROW-2247 - [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault
  • ARROW-2273 - [Python] Raise NotImplementedError when pandas Sparse types serializing
  • ARROW-2300 - [C++/Python] Integration test for HDFS
  • ARROW-2305 - [Python] Bump Cython requirement to 0.27+
  • ARROW-2314 - [C++/Python] Fix union array slicing
  • ARROW-2326 - [Python] Use @loader_path/ as rpath instead of @loader_path when bundling C++ libraries in wheels on macOS
  • ARROW-2328 - [C++] Fixed and unit tested feather writing with slice
  • ARROW-2331 - [Python] Fix indexing for negative or out-of-bounds indices
  • ARROW-2333 - [Python] Fix bundling boost with default namespace
  • ARROW-2342 - [Python] Allow pickling more types
  • ARROW-2346 - [Python] Fix PYARROW_CXX_FLAGS with multiple options
  • ARROW-2349 - [Python] Opt in to bundling Boost shared libraries separately
  • ARROW-2351 - [C++] StringBuilder::append(vector...) not impleme…
  • ARROW-2354 - [C++] Make PyDecimal_Check() faster
  • ARROW-2355 - [Python] Unable to import pyarrow [0.9.0] OSX
  • ARROW-2357 - [Python] Add microbenchmark for PandasObjectIsNull()
  • ARROW-2368 - [JAVA] Correctly pad negative values in DecimalVector#setBigEndian (#1809)
  • ARROW-2369 - [Python] Fix reading large Parquet files (> 4 GB)
  • ARROW-2370 - [GLib] Fix include path in .pc on Meson build
  • ARROW-2371 - [GLib] Update “Requires” in .pc on GNU Autotools build
  • ARROW-2372 - [Python] ArrowIOError: Invalid argument when reading Parquet file
  • ARROW-2375 - [Rust] Implement Drop for Buffer so memory is released
  • ARROW-2377 - [GLib] Support old GObject Introspection
  • ARROW-2380 - [Python] Streamline conversions
  • ARROW-2382 - [Rust] Bug fix: List was not using aligned mem
  • ARROW-2383 - [deb] Use system Protocol Buffers
  • ARROW-2387 - [Python] Flip test for rescale loss if value < 0
  • ARROW-2391 - [C++/Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64
  • ARROW-2393 - [C++][PREPEND] macros from status.h into util/logging.h since they use the logging infrastructure and shouldn't be in the public API.
  • ARROW-2403 - [C++] arrow::CpuInfo::model_name_ destructed twice on exit
  • ARROW-2405 - [C++] is required for std::function
  • ARROW-2418 - [Rust] BUG FIX: reserve memory when building list
  • ARROW-2419 - [Site] Hard-code timezone
  • ARROW-2420 - [Rust] Fix major memory bug and add benches
  • ARROW-2421 - [C++] Update LLVM version in cpp README
  • ARROW-2423 - [Python] Enable DataType, Field and plasma ObjectID equality checks against no…
  • ARROW-2424 - [Rust] Fix build - add missing import
  • ARROW-2425 - [Rust] BUG FIX: Add u8 mappings for Array::from
  • ARROW-2426 - [GLib] Follow python -> python@3 change in Homebrew
  • ARROW-2432 - [Python] Fix Pandas decimal type conversion with None values
  • ARROW-2437 - [C++] Add ReadMessage without aligned argument.
  • ARROW-2438 - [Rust] memory_pool.rs misses license header
  • ARROW-2441 - [Rust] Builder::slice_mut assertions are too strict
  • ARROW-2443 - [Python] Allow creation of empty Dictionary indices
  • ARROW-2450 - [Python] Test for Parquet roundtrip of null lists
  • ARROW-2452 - [TEST] Spark integration test fails with permission error
  • ARROW-2454 - [C++] Allow zero-array chunked arrays
  • ARROW-2455 - [C++] Initialize the atomic bytes_allocated_ properly
  • ARROW-2457 - [GLib] Support large is_valids in builder's append_values()
  • ARROW-2459 - pyarrow: Segfault with pyarrow.deserialize_pandas
  • ARROW-2462 - [C++] Fix Segfault in UnpackBinaryDictionary
  • ARROW-2465 - [Plasma/GPU] Preserve plasma_store rpath
  • ARROW-2466 - [C++] Fix “append” flag to FileOutputStream
  • ARROW-2468 - [Rust] Builder::slice_mut() should take mut self.
  • ARROW-2471 - [Rust] Builder zero capacity fix
  • ARROW-2473 - [Rust] List empty slice assertion
  • ARROW-2474 - [Rust] Add windows support for memory pool abstraction
  • ARROW-2489 - [Plasma] Fix PlasmaClient ABI variation
  • ARROW-2491 - [Python] raise NotImplementedError on from_buffers with nested types
  • ARROW-2492 - [Python] Prevent segfault on accidental call of pyarrow.Array
  • ARROW-2500 - [Java] IPC Writers/readers are not always setting validity bits correctly
  • ARROW-2502 - [Rust] Restore Windows Compatibility
  • ARROW-2503 - [Python] Prevent trailing space character for string statistics
  • ARROW-2509 - Build for node 9.8
  • ARROW-2510 - [Python] Segmentation fault when converting empty column as categorical
  • ARROW-2511 - [Java] Fix BaseVariableWidthVector.allocateNew to not swallow exception (#1947)
  • ARROW-2514 - [Python] Speed up inferring nested Numpy array
  • ARROW-2515 - [Python] Add DictionaryValue class, fixing bugs with nested dictionaries
  • ARROW-2518 - [Java] Re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7
  • ARROW-2530 - [GLib] Support out-of-source directory build again
  • ARROW-2534 - [C++] Hide all zlib symbols from libarrow.so
  • ARROW-2545 - [Python] Link against required system libraries
  • ARROW-2554 - [Python] fix timestamp unit detection from python lists
  • ARROW-2557 - [Rust] Add badge for code coverage in README
  • ARROW-2561 - [C++] Fix double free in cuda-test under code coverage
  • ARROW-2564 - [C++] Replace deprecated method in documentation
  • ARROW-2565 - [Plasma] new subscriber cannot receive notifications about existing objects
  • ARROW-2570 - [Python] Add support for writing parquet files with LZ4 compression
  • ARROW-2571 - [C++] Lz4Codec doesn't properly handle empty data
  • ARROW-2575 - [Python] Exclude hidden files starting with . in ParquetManifest
  • ARROW-2578 - [Plasma] Use mersenne twister to generate random number
  • ARROW-2589 - [Python] Workaround regression in Pandas 0.23.0
  • ARROW-2593 - [Python] TypeError: data type “mixed-integer” not understood
  • ARROW-2594 - [Java] When realloc Vectors, zero out all unfilled bytes of new buffer
  • ARROW-2599 - [Python] pip install is not working without Arrow C++ being installed
  • ARROW-2601 - [Python] Prevent user from calling *MemoryPool constructors directly
  • ARROW-2603 - [Python] Allow date and datetime subclassing
  • ARROW-2615 - [Rust] Post refactor cleanup
  • ARROW-2622 - [C++] Array methods IsNull and IsValid are not complementary
  • ARROW-2629 - [Plasma] Iterator invalidation for pending_notifications_
  • ARROW-2630 - [JAVA] typo fix
  • ARROW-2632 - [Java] ArrowStreamWriter accumulates ArrowBlock but does not use them
  • ARROW-2640 - [JS] Write schema metadata
  • ARROW-2642 - [Python] Fail building parquet binding on Windows
  • ARROW-2643 - [C++] Travis-CI build failure with cpp toolchain enabled
  • ARROW-2644 - [Python] Fix prototype declaration in Parquet binding
  • ARROW-2655 - [C++] Fix compiler warnings with gcc 7
  • ARROW-2657 - [Python] Import TensorFlow python extension before pyarrow to avoid segfault
  • ARROW-2668 - [C++] Suppress -Wnull-pointer-arithmetic when compiling plasma/malloc.cc on clang
  • ARROW-2669 - [C++] EP_CXX_FLAGS not passed on when building gbenchmark
  • ARROW-2675 - Fix build error with clang-10 (Apple Clang / LLVM)
  • ARROW-2683 - [Python] Resource Warning (Unclosed File) when using pyarrow.parquet.read_table()
  • ARROW-2690 - [Plasma] Use uniform function names in public APIs in Plasma. Add namespace around Flatbuffers
  • ARROW-2691 - [Rust] Update code formatting with latest Rust stable
  • ARROW-2693 - [Python] pa.chunked_array causes a segmentation fault on empty input
  • ARROW-2694 - - [Python] ArrayValue string conversion returns the representation instead of the converted python object string
  • ARROW-2698 - [Python] Exception when passing a string to Table.column
  • ARROW-2711 - [Python] Fix inference from Pandas column with first empty list
  • ARROW-2715 - Address apt flakiness with launchpad.net
  • ARROW-2716 - [Python] Make manylinux1 base image independent of Python patch releases
  • ARROW-2721 - [C++] Fix ORC and Protocol Buffers link error
  • ARROW-2722 - [Python] Sanitize dtype number to handle edge cases
  • ARROW-2723 - [C++] Add .pc for arrow orc
  • ARROW-2726 - [C++] Fix the latest Boost version
  • ARROW-2727 - [Java] Fix POM file issue causing build failure in java/adapters/jdbc
  • ARROW-2741 - [Python][D] and type=pa.date64 produces invalid results
  • ARROW-2744 - [C++] Avoid creating list arrays with a null values buffer
  • ARROW-2745 - [C++] ORC ExternalProject needs to declare dependency on vendored protobuf
  • ARROW-2747 - [Python] Fix huge pages Plasma test
  • ARROW-2754 - [Python] Change Python setup.py to make release builds by default
  • ARROW-2770 - [Packaging] Account for conda-forge compiler migration in conda recipes
  • ARROW-2773 - [Python] corrected partition_cols parameter name
  • ARROW-2781 - [Python] Download boost using curl in manylinux1 image
  • ARROW-2787 - [Python] Fix Cython usage instructions
  • ARROW-2795 - [Python] Run TensorFlow import workaround only on Linux platforms
  • ARROW-2806 - [C++/Python] More consistent null/nan handling
  • ARROW-2810 - [Plasma] Remove flatbuffers from public API
  • ARROW-2812 - [Ruby] interface for Arrow::StructArray
  • ARROW-2820 - [Python] Check that array lengths in RecordBatch.from_arrays are all the same
  • ARROW-2823 - [C++] Search for flatbuffers in /lib64
  • ARROW-2841 - [Go] support building in forks
  • ARROW-2850 - [C++/Python] Correctly set RPATHs on all binaries
  • ARROW-2851 - [C++] Update RAT excludes for new install file names
  • ARROW-2852 - [Rust] Make Array sync and send
  • ARROW-2856 - [Python/C++] Array constructor should not truncate floats when casting to int
  • ARROW-2862 - [C++] Ensure thirdparty download directory has been created in thirdparty/download_thirdparty.sh
  • ARROW-2867 - [Python] Incorrect example for Cython usage
  • ARROW-2871 - [Python] Raise when calling to_numpy() on boolean array
  • ARROW-2872 - [Python] Add tensorflow mark to opt-in to TF-related unit tests
  • ARROW-2876 - [Packaging] Replace ssh-URLs with https://
  • ARROW-2877 - [Packaging] crossbow submit results in duplicate Travis CI build
  • ARROW-2878 - [Packaging] README.md does not mention setting GitHub API token in user's crossbow repo settings
  • ARROW-2883 - [C++] Fix Clang warnings in code built with -DARROW_GPU=ON
  • ARROW-2891 - [Python] Preserve schema in write_to_dataset
  • ARROW-2894 - [Glib] Adjust tests to format refactor
  • ARROW-2895 - [CI] Add missing Ruby dependency on C++
  • ARROW-2896 - [GLib] Add missing exports
  • ARROW-2901 - [Java] Build is failing on Java9
  • ARROW-2902 - [Python] Clean up after build artifacts created by root docker user in HDFS integration test
  • ARROW-2903 - [C++] Setting -DARROW_HDFS=OFF breaks arrow build when linking against boost libraries
  • ARROW-2911 - [Python] Parquet binary statistics that end in ‘\0’ truncate last byte
  • ARROW-2917 - [Python] Use detach() to avoid PyTorch gradient errors
  • ARROW-2920 - [Python] Fix pytorch segfault
  • ARROW-2926 - [Python] Do not attempt to write tables with invalid schemas in ParquetWriter.write_table
  • ARROW-2930 - [C++] migrated MacOS specific code for shared library target
  • ARROW-2940 - [Python] Fix OSError when trying to load libcaffe2.so in pytorch 0.3.0
  • ARROW-2945 - [Packaging] Update argument check
  • ARROW-2955 - Fix typo in pyarrow's HDFS API result
  • ARROW-2963 - [C++] Make thread pool fork-safe
  • ARROW-2978 - [Rust] Travis CI build is failing
  • ARROW-2982 - The “--show-progress” option is only supported in wget 1.16 and higher
  • ARROW-3210 - [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

New Features and Improvements

  • ARROW-530 - [C++/Python] Provide subpools for better memory allocation …
  • ARROW-564 - [Python] Add Array.to_numpy()
  • ARROW-665 - C++: Move zeroing logic for (re)allocations to the Allocator
  • ARROW-889 - [Python/C++] Unify PrettyPrints between Python and C++
  • ARROW-902 - [C++] Script for downloading all thirdparty build dependencies and configuration for offline builds
  • ARROW-906 - [C++/Python] Read and write field metadata in IPC
  • ARROW-1018 - [C++] Create FileOutputStream, ReadableFile from file descriptor
  • ARROW-1163 - [Java] Java client support for plasma
  • ARROW-1388 - [Python] Add Table.drop method for removing columns
  • ARROW-1454 - [Python] Also match ArrowNotImplementedError in unsupported type conversions from pandas
  • ARROW-1715 - [Python] Implement pickling for Column, ChunkedArray, RecordBatch, Table
  • ARROW-1722 - [C++] Add linting script to find C++/CLI incompatibilities
  • ARROW-1731 - [Python] Add columns selector in Table.from_array
  • ARROW-1744 - [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow
  • ARROW-1780 - - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects (#1759)
  • ARROW-1858 - [Python] Added documentation for pq.write_dataset
  • ARROW-1868 - [Java] Change vector getMinorType to use MinorType instead of Types.MinorType
  • ARROW-1886 - [C++/Python] Flatten struct columns in table
  • ARROW-1913 - [Java] Disable Javadoc doclint with Java 8
  • ARROW-1928 - [C++] Add BitmapReader/BitmapWriter benchmarks
  • ARROW-1954 - [Python] Add metadata accessor to pyarrow.Field
  • ARROW-1964 - [Python] Expose StringBuilder to Python
  • ARROW-2014 - [Python] Document read_pandas method in pyarrow.parquet
  • ARROW-2055 - [Java] Upgrade to Java 8
  • ARROW-2060 - [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts
  • ARROW-2061 - [C++] Run ASAN builds in Travis CI
  • ARROW-2074 - [Python] Infer lists of dicts as struct arrays
  • ARROW-2097 - [CI, Python] Reduce Travis-CI verbosity
  • ARROW-2100 - [Python] Drop Python 3.4 support
  • ARROW-2140 - [Python] Improve float16 support
  • ARROW-2141 - [Python] Support variable length binary conversion from Pandas
  • ARROW-2147 - [Python] Fix type inference of numpy arrays
  • ARROW-2207 - [GLib] Support GArrowDecimal128
  • ARROW-2222 - handle untrusted inputs
  • ARROW-2224 - [C++] Remove boost-regex dependency
  • ARROW-2241 - [Python] Simple script for running all current ASV benchmarks at a commit or tag
  • ARROW-2264 - [Python] Efficiently serialize numpy arrays with dtype of unicode fixed length string
  • ARROW-2267 - Rust bindings
  • ARROW-2276 - [Python] Expose buffer protocol on Tensor
  • ARROW-2281 - [Python] Add Array.from_buffers()
  • ARROW-2285 - [C++/Python] Can't convert Numpy string arrays
  • ARROW-2286 - [C++/Python] Allow subscripting pyarrow.lib.StructValue
  • ARROW-2287 - [Python] chunked array not iterable, not indexable
  • ARROW-2299 - [Go] Import Go arrow implementation from influxdata/arrow
  • ARROW-2301 - [Python] Build source distribution inside the manylinux1 docker
  • ARROW-2302 - [GLib] Unify GNU Autotools build and Meson build into one Travis CI job
  • ARROW-2308 - [Python] Make deserialized numpy arrays 64-byte aligned.
  • ARROW-2315 - [C++/Python] Flatten struct array
  • ARROW-2319 - [C++] Add BufferedOutputStream class
  • ARROW-2322 - [Java] Document dev environment requirements for publishing Java release artifacts
  • ARROW-2325 - [Python] Update setup.py to use Markdown project description
  • ARROW-2330 - [C++] Optimize delta buffer creation with partially finishable array builders
  • ARROW-2332 - Add Feather Dataset class
  • ARROW-2332 - Feather Reader option to return Table
  • ARROW-2334 - [C++] Update boost to 1.66.0
  • ARROW-2335 - [Go] move README one directory higher
  • ARROW-2340 - [Website] Add blog post about Go code donation
  • ARROW-2341 - [Python] Improve pa.union() mode argument behaviour
  • ARROW-2343 - [Java/Packaging] Run mvn clean in API doc builds
  • ARROW-2344 - [Go] Run Go unit tests in Travis CI
  • ARROW-2345 - [Documentation] Fix bundle exec and set sphinx nosidebar to True
  • ARROW-2348 - [GLib] Remove GLib + Go example
  • ARROW-2350 - Consolidated RUN step in spark_integration Dockerfile
  • ARROW-2353 - [CI] Check correctness of built wheel on AppVeyor
  • ARROW-2361 - [Rust] Starting point for a native Rust implementation of Arrow
  • ARROW-2364 - [Plasma] PlasmaClient::Get() could take vector of object ids
  • ARROW-2376 - [Rust] Travis builds the Rust library
  • ARROW-2378 - [Rust] Rustfmt
  • ARROW-2381 - [Rust] Adds iterator support to Buffer
  • ARROW-2384 - [Rust] Additional test & Trait standardization
  • ARROW-2385 - [Rust] implement to_json for DataType and Field
  • ARROW-2388 - [C++] Use valid_bytes API for StringBuilder::Append
  • ARROW-2389 - [C++] Add CapacityError
  • ARROW-2390 - [C++/Python] Map Python exceptions to Arrow status codes
  • ARROW-2394 - [Python] Correct flake8 errors in benchmarks
  • ARROW-2395 - [Python] Fix flake8 warnings outside of pyarrow/ directory. Check in CI
  • ARROW-2396 - [Rust] Unify Rust Errors
  • ARROW-2397 - [Documentation] Update format documentation to describe tensor alignment.
  • ARROW-2398 - [Rust] Create Builder for building buffers directly in aligned memory
  • ARROW-2400 - [C++] Fix Status destructor performance
  • ARROW-2401 - Support filters on Hive partitioned Parquet files
  • ARROW-2402 - [C++] Avoid spurious copies with FixedSizeBinaryBuilder
  • ARROW-2404 - [C++] Fix “declaration of ‘type_id’ hides class member” w…
  • ARROW-2407 - [GLib] Add garrow_string_array_builder_append_values()
  • ARROW-2408 - [Rust][T] fromBuffer`
  • ARROW-2408 - [Rust] Remove build warnings
  • ARROW-2411 - [C++] Add StringBuilder::Append(const char **values)
  • ARROW-2413 - [Rust] Remove useless calls to format!().
  • ARROW-2414 - Fix a variety of typos.
  • ARROW-2415 - [Rust] Fix clippy ref-match-pats warnings.
  • ARROW-2416 - [C++] Support system libprotobuf
  • ARROW-2417 - [Rust] Fix API safety issues
  • ARROW-2422 - Support more operators for partition filtering
  • ARROW-2427 - [C++] Implement ReadAt properly
  • ARROW-2430 - [Packaging] MVP for branch based packaging automation
  • ARROW-2433 - [Rust][T] )
  • ARROW-2434 - [Rust] Add windows support
  • ARROW-2435 - [Rust] Add memory pool abstraction.
  • ARROW-2436 - [Rust] Add windows CI
  • ARROW-2439 - [Rust] Run license header checks also in Rust CI entry
  • ARROW-2440 - [Rust] Implement ListBuilder
  • ARROW-2442 - [C++] Disambiguate builder Append() overloads
  • ARROW-2445 - [Rust] Add documentation and make some fields private
  • ARROW-2448 - [Plasma] Reference counting for PlasmaClient::Impl
  • ARROW-2451 - [Python] Handle non-object arrays more efficiently in custom serializer.
  • ARROW-2453 - [Python] Improve Table column access
  • ARROW-2458 - [Plasma] Use one thread pool per PlasmaClient
  • ARROW-2463 - [C++] Update flatbuffers to 1.9.0
  • ARROW-2464 - [Python] Use a python_version marker instead of a condition
  • ARROW-2469 - [C++] Make out arguments last in ReadMessage.
  • ARROW-2470 - [C++] Avoid seeking in GetFileSize
  • ARROW-2472 - [Rust] Remove public attributes from Schema and Field and add accessors
  • ARROW-2477 - [Rust] Set up code coverage in CI
  • ARROW-2478 - [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode
  • ARROW-2479 - [C++] Add ThreadPool class
  • ARROW-2480 - [C++] Enable casting the value of a decimal to int32_t or int64_t
  • ARROW-2481 - [Rust] Move all calls to free() into memory.rs
  • ARROW-2482 - [Format] Clarify struct field alignment
  • ARROW-2484 - [C++] Document ABI compliance checking
  • ARROW-2485 - Re-write of run_clang_format.py, such that it outputs the diffs of th…
  • ARROW-2486 - [C++/Python] Provide a Docker image that contains all dependencies for development
  • ARROW-2488 - [C++] Add Boost 1.67 and 1.68 as recognized versions
  • ARROW-2493 - [Python] Add support for pickling to buffers and arrays
  • ARROW-2494 - [C++] Return status codes from PlasmaClient::Seal instead of crashing
  • ARROW-2498 - [Java] Use java 1.8 instead of java 1.7
  • ARROW-2499 - [C++] Factor out Python iteration routines
  • ARROW-2505 - [C++] Disable MSVC warning C4800
  • ARROW-2506 - [Plasma] Build error on macOS
  • ARROW-2507 - [Rust] Don't take a reference when not needed.
  • ARROW-2508 - [Python] Fix pytest.raises msg to message
  • ARROW-2513 - [Python] DictionaryType should give access to index type and dictionary array
  • ARROW-2516 - [CI] Filter changes in AppVeyor builds
  • ARROW-2521 - [Rust] Refactor Rust API to use traits and generic to represent Array instead of enum
  • ARROW-2522 - [C++] Version shared library files
  • ARROW-2525 - [GLib] Add garrow_struct_array_flatten()
  • ARROW-2526 - [GLib] Update .gitignore
  • ARROW-2527 - [GLib] Enable GPU document
  • ARROW-2528 - [Rust] Add trait bounds for T in Buffer/List
  • ARROW-2529 - [C++] Update mention of clang-format to 5.0 in the docs
  • ARROW-2531 - [C++] Update clang bits to 6.0
  • ARROW-2533 - [CI] Fast finish failing AppVeyor builds
  • ARROW-2536 - [Rust] optimize capacity allocation for ListBuilder
  • ARROW-2537 - [Ruby] Import
  • ARROW-2539 - [Plasma] Use unique_ptr instead of raw pointer
  • ARROW-2540 - [Plasma] Create constructors & destructors for ObjectTableEntry
  • ARROW-2541 - [Plasma] Replace macros with constexpr
  • ARROW-2543 - [Rust] Cache dependencies when building our rust library
  • ARROW-2544 - [CI] Run the C++ tests with two jobs
  • ARROW-2547 - Fix off-by-one in List<List<byte>> example
  • ARROW-2548 - Clarify List<Char> Array example
  • ARROW-2549 - [GLib] Apply arrow::StatusCode changes to GArrowError
  • ARROW-2550 - [C++] Add missing status codes into arrow::Status::CodeAsString()
  • ARROW-2551 - [Plasma] Improve notification logic
  • ARROW-2552 - [Plasma] Fix memory error
  • ARROW-2553 - [Python] Set MACOSX_DEPLOYMENT_TARGET in wheel build
  • ARROW-2558 - [Plasma] avoid walk through all the objects when a client disconnects
  • ARROW-2562 - [CI] C++ and Rust code coverage using codecov.io
  • ARROW-2563 - [Rust] Poor caching in Travis-CI
  • ARROW-2566 - [CI] Add codecov.io badge
  • ARROW-2567 - [C++] Not only compare type ids on Array equality
  • ARROW-2568 - [Python] Expose thread pool size setting to Python, and deprecate “nthreads” where possible
  • ARROW-2569 - [C++] Improve thread pool size heuristic
  • ARROW-2574 - [Python] Add Cython and Python code coverage
  • ARROW-2576 - [GLib] Add abs functions for Decimal128
  • ARROW-2577 - [Plasma] Add asv benchmarks for plasma
  • ARROW-2580 - [GLib] Fix abs functions for Decimal128
  • ARROW-2582 - [GLib] Add negate functions for Decimal128
  • ARROW-2585 - [C++] Add Decimal::FromBigEndian, which was formerly a static method in parquet-cpp/src/parquet/arrow/reader.cc
  • ARROW-2586 - [C++] Changing the type of ListBuilder‘s and StructBuilder’s children from unique_ptr to shared_ptr so that it can support deserialization from Parquet to Arrow with arbitrary nesting
  • ARROW-2595 - [Plasma] to avoid producing garbage data
  • ARROW-2596 - [GLib] Use the default value of GTK-Doc
  • ARROW-2597 - [Plasma] remove UniqueIDHasher
  • ARROW-2604 - [Java] Add convenience method to VarCharVector to set Text
  • ARROW-2608 - [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer
  • ARROW-2611 - [Python] Fix Python 2 integer serialization
  • ARROW-2612 - [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
  • ARROW-2613 - [Docs] Update the gen_apidocs docker script
  • ARROW-2614 - Remove ‘group: deprecated’ in Travis
  • ARROW-2626 - [Python] Add column name to exception message when writing pandas df fails
  • ARROW-2634 - [Go] Add Go license details to LICENSE.txt
  • ARROW-2635 - [Ruby] Add LICENSE.txt and NOTICE.txt for Apache Arrow Ruby
  • ARROW-2636 - [Ruby] Add missing “unofficial” notes
  • ARROW-2638 - [Python] Prevent calling extension class constructors directly
  • ARROW-2639 - [Python] Remove unnecessary _check_nullptr methods
  • ARROW-2641 - [C++] Avoid spurious memset() calls, improve bitmap write performance
  • ARROW-2645 - [Java] Refactor ArrowWriter to remove all ArrowFileWriter specifc logic
  • ARROW-2649 - [C++] Add GenerateBits() function to improve bitmap writing performance
  • ARROW-2656 - [Python] Improve creation time of ParquetManifest for partitioned datasets using thread pool
  • ARROW-2660 - [Python] Experimental zero-copy pickling
  • ARROW-2661 - [Python] Adding the ability to programmatically pass hdfs configration key/value pairs via pyarrow
  • ARROW-2662 - [Python] Add to_pandas to ChunkedArray
  • ARROW-2663 - [Python] Make dictionary_encode and unique accesible on Column / ChunkedArray
  • ARROW-2664 - [Python] Implement getitem / slicing on Buffer
  • ARROW-2666 - [Python] numpy.asarray should trigger to_pandas on Array/ChunkedArray
  • ARROW-2672 - [Python] Build ORC extension in manylinux1 wheels
  • ARROW-2674 - [Packaging] Start building nightlies
  • ARROW-2676 - [Packaging] Deploy build artifacts to github releases
  • ARROW-2677 - [Python] Expose Parquet ZSTD compression
  • ARROW-2678 - [GLib] Add more common problems compiling c_glib on OSX
  • ARROW-2680 - [Python] Add documentation about type inference in Table.from_pandas
  • ARROW-2682 - [CI] Notify in Slack about broken builds
  • ARROW-2689 - [Python] Remove parameter timestamps_to_ms
  • ARROW-2692 - [Python] Add test for writing dictionary encoded columns to chunked Parquet files
  • ARROW-2695 - [Python] Prevent calling scalar constructors directly
  • ARROW-2696 - [JAVA] enhance AllocationListener with an onFailedAllocation() call (#2133)
  • ARROW-2699 - [C++/Python] Add Table method that replaces a column with a new supplied column
  • ARROW-2700 - [Python] Add simple examples to Array.cast docstring
  • ARROW-2701 - [C++] Make MemoryMappedFile resizable redux
  • ARROW-2704 - [Java] Change MessageReader API to improve custom message handling for streams
  • ARROW-2713 - [Packaging] Fix linux package builds
  • ARROW-2717 - [Packaging] Postfix conda artifacts with target arch
  • ARROW-2718 - [Packaging] GPG sign downloaded artifacts
  • ARROW-2724 - [Packaging] Determine whether all the expected artifacts are uploaded
  • ARROW-2725 - [Java] make Accountant.AllocationOutcome publicly visible (#2149)
  • ARROW-2729 - [GLib] Add decimal128 array builder
  • ARROW-2731 - Add external Orc capability
  • ARROW-2732 - [GLib] Update brew packages for macOS
  • ARROW-2733 - [GLib] Cast garrow_decimal128 to gint64
  • ARROW-2738 - [GLib] Use Brewfile on installation process
  • ARROW-2739 - [GLib] Use G_DECLARE_DERIVABLE_TYPE
  • ARROW-2740 - [Python] Add address property to Buffer
  • ARROW-2742 - [Python] Allow Table.from_batches to use iterator of record batches
  • ARROW-2748 - [GLib] Add garrow_decimal_data_type_get_scale() (and _precision())
  • ARROW-2749 - [GLib] Rename *garrow_decimal128_array_get_value to *garrow_decimal128_array_format_value
  • ARROW-2751 - [GLib] Add garrow_table_replace_column()
  • ARROW-2752 - [GLib] Document garrow_decimal_data_type_new()
  • ARROW-2753 - [GLib] Add garrow_schema_*_field()
  • ARROW-2755 - [Python] Allow using Ninja to build extension
  • ARROW-2756 - [Python] Remove redundant imports and minor fixes in parquet tests
  • ARROW-2758 - [Plasma] Use Scope enum in Plasma
  • ARROW-2760 - [Python] Remove legacy property definition syntax from parquet module and test them
  • ARROW-2761 - [Python] Add support for set operations in hive partition filtering
  • ARROW-2763 - [Python] Make _metadata file accessible in ParquetDataset
  • ARROW-2780 - [Go] Run code coverage analysis
  • ARROW-2784 - [C++] MemoryMappedFile::WriteAt allow writing past the end
  • ARROW-2790 - [C++] Minor style changes from the review
  • ARROW-2790 - [C++] Buffers can contain uninitialized memory
  • ARROW-2791 - [Packaging] Build Ubuntu 18.04 packages
  • ARROW-2792 - [Packaging] Consider uploading tarballs to avoid naming conflicts
  • ARROW-2794 - [Plasma] Add the RPC of a list of Delete Objects in Plasma
  • ARROW-2798 - [Plasma] Use hashing function that takes into account all UniqueID bytes
  • ARROW-2802 - [Docs] Move all release management instructions to Confluence
  • ARROW-2804 - [Website] Link to Developer wiki (Confluence) from front page
  • ARROW-2805 - [Python] Use official way to find TensorFlow module
  • ARROW-2809 - [C++] Only print cpplint and clang-format output for failures by default
  • ARROW-2811 - [Python] Test serialization for determinism
  • ARROW-2815 - [CI] Suppress DEBUG logging when building Java library in C++ CI entries
  • ARROW-2816 - [Python] Make NativeFile BufferedIOBase-compliant
  • ARROW-2821 - [C++] Remove redundant memsets in BooleanBuilder
  • ARROW-2822 - [C++] Remove the unneeded const qualifier and clarify the comments
  • ARROW-2822 - [C++] Zero padding bytes in PoolBuffer
  • ARROW-2824 - [GLib] Add garrow_decimal128_array_get_value()
  • ARROW-2825 - [C++] Add AllocateBuffer / AllocateResizableBuffer variants with default memory pool
  • ARROW-2826 - [C++] Remove ArrayBuilder::Init method, clean up Resize, remove PoolBuffer from public API
  • ARROW-2827 - [C++] Stop to use -jN in sub make
  • ARROW-2829 - [GLib] Add GArrowORCFileReader
  • ARROW-2830 - [deb] Enable parallel build again
  • ARROW-2832 - [Python] Pretty-print schema metadata in Schema.__repr__
  • ARROW-2833 - [Python] Column.__repr__ will lock up Jupyter with large datasets
  • ARROW-2834 - [GLib] Remove “enable_” prefix from Meson options
  • ARROW-2836 - [Packaging] Expand build matrices to multiple tasks
  • ARROW-2837 - [C++] ArrayBuilder::null_bitmap returns PoolBuffer
  • ARROW-2838 - [Python] Speed up PandasObjectIsNull
  • ARROW-2844 - [Packaging] Test OSX wheels after build
  • ARROW-2845 - [Packaging] Upload additional debian artifacts
  • ARROW-2846 - [Packaging] Update nightly build in crossbow as well as the sample configuration
  • ARROW-2847 - [Packaging] Fix artifact name matching for conda forge packages
  • ARROW-2848 - [Packaging] Use lib10.deb instead of lib0.deb
  • ARROW-2849 - [Ruby] Arrow::Table#load supports ORC
  • ARROW-2855 - [C++] Blog post that outlines the benefits of using jemalloc
  • ARROW-2859 - [Python] Accept buffer-like objects as sources in open_file, open_stream APIs
  • ARROW-2861 - [Python] Add note about how to not write DataFrame index to Parquet
  • ARROW-2864 - [Plasma] Add deletion cache to delete objects later when they are not in use.
  • ARROW-2868 - [Packaging] Fix Apache Arrow ORC GLib related problems
  • ARROW-2869 - [Python] Add documentation for Array.to_numpy
  • ARROW-2874 - [Packaging] Pass job prefix when putting on Queue
  • ARROW-2875 - [Packaging] Don't attempt to download arrow archive in linux builds
  • ARROW-2881 - [Website] Add community tab to header, add link and callout to dev wiki
  • ARROW-2884 - [Packaging] Support RC
  • ARROW-2886 - [Release] Remove an unused variable
  • ARROW-2890 - [Plasma] Make python client release method private
  • ARROW-2893 - [C++] Remove PoolBuffer class from public API and hide implementation details behind factory functions
  • ARROW-2897 - [Packaging] Organize supported Ubuntu versions
  • ARROW-2898 - [Packaging] Setuptools_scm just shipped a new version which fails to parse `apache-arrow-<version>` tag
  • ARROW-2906 - [Website] Remove the link to slack channel
  • ARROW-2907 - [GitHub] Improve the first paragraph of “How to contribute patches”
  • ARROW-2908 - [Rust] Update version to 0.10.0
  • ARROW-2914 - [Integration] Add WindowPandasUDFTests to Spark integration script
  • ARROW-2915 - [Packaging] Remove artifact form ubuntu-trusty build
  • ARROW-2918 - [C++] Improve formatting of Struct pretty prints
  • ARROW-2921 - [Release] Update .deb/.rpm changelogs in preparation
  • ARROW-2922 - [Release] Make python command name customizable
  • ARROW-2923 - [DOC] Adding Apache Spark integration test instructions
  • ARROW-2924 - [Java] mvn release fails when an older maven javadoc plugin is installed
  • ARROW-2927 - [Packaging] AppVeyor wheel task is failing on initial checkout
  • ARROW-2928 - [Packaging] AppVeyor crossbow conda builds are picking up boost 1.63.0 instead of the installed version
  • ARROW-2929 - [C++] ARROW-2826 Breaks parquet-cpp 1.4.0 builds
  • ARROW-2934 - [Packaging] Add checksums creation to sign subcommand
  • ARROW-2935 - [Packaging] Add verify_binary_artifacts function to verify-release-candidate.sh
  • ARROW-2937 - [Java] Followup to ARROW-2704. Make MessageReader classes immutable and clarify docs
  • ARROW-2943 - [C++] Implement BufferedOutputStream::Flush
  • ARROW-2944 - [Format] Synchronize some metadata changes to columnar format Markdown documents
  • ARROW-2946 - [Packaging] Stop to use $PWD
  • ARROW-2947 - [Packaging] Remove Ubuntu Artful
  • ARROW-2949 - [CI] Add retry logic when downloading miniconda to reduce flakiness
  • ARROW-2951 - [CI] Changes in format/ should cause Appveyor builds to run
  • ARROW-2953 - [Plasma] Reduce plasma memory usage
  • ARROW-2954 - [Plasma] Reduce plasma store memory usage
  • ARROW-2962 - [Packaging] Bintray descriptor files are no longer needed
  • ARROW-2977 - [Packaging] Release verification script should check rust too
  • ARROW-2985 - [Ruby] Run unit tests in verify-release-candidate.sh
  • ARROW-2988 - [Release] More automated release verification on Windows
  • ARROW-2990 - [GLib] Fail to build with rpath-ed Arrow C++ on macOS

Apache Arrow 0.9.0 (2018-03-19)

New Features and Improvements

  • ARROW-232 - [Python] Add unit test for writing Parquet file from chunked table
  • ARROW-633 - /634: [Java] Add FixedSizeBinary support in Java and integration tests (Updated)
  • ARROW-634 - Add integration tests for FixedSizeBinary
  • ARROW-760 - [Python] document differences w.r.t. fastparquet
  • ARROW-764 - [C++] Improves performance of CopyBitmap and adds benchmarks
  • ARROW-969 - [C++] Add add/remove field functions for RecordBatch
  • ARROW-1021 - [Python] Add documentation for C++ pyarrow API
  • ARROW-1035 - [Python] Add streaming dataframe reconstruction benchmark
  • ARROW-1394 - [Plasma] Add optional extension for allocating memory on GPUs
  • ARROW-1463 - [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code
  • ARROW-1579 - [Java] Adding containerized Spark Integration tests
  • ARROW-1580 - [Python] Instructions for setting up nightly builds on Linux
  • ARROW-1621 - [JAVA] Reduce Heap Usage per Vector
  • ARROW-1623 - [C++] Add convenience method to construct Buffer from a string that owns its memory
  • ARROW-1632 - [Python] Permit categorical conversions in Table.to_pandas on a per-column basis
  • ARROW-1643 - [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS
  • ARROW-1705 - [Python] allow building array from dicts
  • ARROW-1706 - [Python] Coerce array inputs to StructArray.from_arrays. Flip order of arguments
  • ARROW-1712 - [C++] Add method to BinaryBuilder to reserve space for value data
  • ARROW-1757 - [C++] Add DictionaryArray::FromArrays alternate ctor that can check or sanitized “untrusted” indices
  • ARROW-1815 - [Java] Rename MapVector to StructVector
  • ARROW-1832 - [JS] Implement JSON reader for integration tests
  • ARROW-1835 - [C++] Create Arrow schema from std::tuple types
  • ARROW-1861 - [Python][skip ci]
  • ARROW-1872 - [Website] Minor edits and addition of YAML for versions
  • ARROW-1899 - [Python] Refactor handling of null sentinels in python/numpy_to_arrow.cc
  • ARROW-1920 - [C++/Python] Add experimental reader for Apache ORC files
  • ARROW-1926 - [GLib] Add garrow_timestamp_data_type_get_unit()
  • ARROW-1927 - [Plasma] Add delete function
  • ARROW-1929 - [C++] Copy over testing utility code from PARQUET-1092
  • ARROW-1930 - [C++] Adds Slice operation to ChunkedArray and Column
  • ARROW-1931 - [C++] Suppress C4996 deprecation warning in MSVC builds for now
  • ARROW-1937 - [Python] Document nested array initialization
  • ARROW-1942 - [C++] Hash table specializations for small integers
  • ARROW-1947 - [Plasma] Change Client Create and Get to use Buffers
  • ARROW-1951 - [Python] Add memcopy threads argument to PlasmaClient put.
  • ARROW-1962 - [Java] Adding reset to ValueVector interface
  • ARROW-1965 - [GLib] Add garrow_array_builder_get_value_data_type()
  • ARROW-1969 - [C++] Don't build ORC extension by default
  • ARROW-1970 - [GLib] Add garrow_chunked_array_get_value_data_type() and garrow_chunked_array_get_value_type()
  • ARROW-1977 - [C++] Update windows dev docs
  • ARROW-1978 - [Website] Consolidate Powered By project list, add more visibly to front page
  • ARROW-2004 - [C++] Add shrink_to_fit parameter to BufferBuilder::Resize, add Reserve method
  • ARROW-2007 - [Python] Implement float32 conversions, use NumPy dtype when possible for inner arrays
  • ARROW-2011 - [Python] Allow setting the pickler in the serialization context.
  • ARROW-2012 - [GLib] Support “make distclean”
  • ARROW-2018 - [C++] fix Build instruction on macOS and Homebrew
  • ARROW-2019 - [JAVA] Control the memory allocated for inner vector in LIST (#1497)
  • ARROW-2024 - [Python] Remove torch serialization from default serialization context.
  • ARROW-2028 - [Python] extra_cmake_args needs to be passed through shlex.split
  • ARROW-2031 - [Python] HadoopFileSystem is pickleable
  • ARROW-2035 - [C++] Update vendored cpplint.py to a Py3-compatible one
  • ARROW-2036 - [Python] Support standard IOBase methods on NativeFile
  • ARROW-2042 - [Plasma] Revert API change of plasma::Create to output a MutableBuffer
  • ARROW-2043 - [C++] change description from OS X to macOS
  • ARROW-2046 - [Python] Support path-like objects
  • ARROW-2048 - [Python/C++] Upate Thrift pin to 0.11
  • ARROW-2050 - [Python] Support setup.py pytest
  • ARROW-2052 - [C++ / Python] Rework OwnedRef, remove ScopedRef
  • ARROW-2053 - [C++] Build instruction is incomplete
  • ARROW-2054 - [C++] Fix compilation warnings
  • ARROW-2064 - [GLib] Add common build problems link to the install section
  • ARROW-2065 - [Python] Fix bug in SerializationContext.clone().
  • ARROW-2066 - [Python] Document using pyarrow with Azure Blob Store
  • ARROW-2068 - [Python] Expose array's buffers
  • ARROW-2069 - [Python] Add note that Plasma is not supported on Windows
  • ARROW-2071 - [Python] Fix test slowness on Travis-CI
  • ARROW-2071 - [Python] Lighten serialization tests
  • ARROW-2073 - [Python] Create struct array from sequence of tuples
  • ARROW-2076 - [Python] Display slowest test durations
  • ARROW-2083 - [CI] Detect changed components on Travis-CI
  • ARROW-2084 - [C++] Support newer Brotli static library names
  • ARROW-2086 - [Python] Shrink size of arrow_manylinux1_x86_64_base docker image
  • ARROW-2087 - [Python] Binaries of 3rdparty are not stripped in manylinux1 base image
  • ARROW-2088 - [GLib] Add GArrowNumericArray
  • ARROW-2089 - [GLib] Rename to GARROW_TYPE_BOOLEAN for consistency
  • ARROW-2090 - [Python] Add context methods to ParquetWriter
  • ARROW-2093 - [Python] Do not install PyTorch in Travis CI
  • ARROW-2094 - [C++] Install libprotobuf and set PROTOBUF_HOME when using toolchain
  • ARROW-2095 - [C++] Less verbose building 3rd party deps
  • ARROW-2096 - [C++] Turn off Boost_DEBUG to trim build output
  • ARROW-2099 - [Python] Add safe option to DictionaryArray.from_arrays to do boundschecking of indices by default
  • ARROW-2107 - [GLib] Follow arrow::gpu::CudaIpcMemHandle API change
  • ARROW-2108 - [Python] Update instructions for ASV
  • ARROW-2110 - [Python] Only require pytest-runner on test commands
  • ARROW-2111 - [C++] Lint in parallel
  • ARROW-2114 - [Python][skip appveyor]
  • ARROW-2117 - [C++] Update codebase / CI toolchain for clang 5.0
  • ARROW-2118 - [C++] Fix misleading error when memory mapping a zero-length file
  • ARROW-2120 - [C++] Add possibility to use empty _MSVC_STATIC_LIB_SUFFIX for Thirdparties
  • ARROW-2121 - [Python] Handle object arrays directly in pandas serializer.
  • ARROW-2123 - [JS] Upgrade to TS 2.7.1
  • ARROW-2132 - Add link to Plasma in main README
  • ARROW-2134 - [CI] Make Travis-CI commit inspection more robust
  • ARROW-2137 - [Python] Don't print paths that are ignored when reading Parquet files
  • ARROW-2138 - [C++] abort on failed debug check
  • ARROW-2142 - [Python] Allow conversion from Numpy struct array
  • ARROW-2143 - [Python] Provide a manylinux1 wheel for cp27m
  • ARROW-2146 - [GLib] Add Slice api to ChunkedArray
  • ARROW-2149 - [Python] Reorganize test_convert_pandas.py
  • ARROW-2154 - [Python] Implement equality on buffers
  • ARROW-2155 - [Python] frombuffer() should respect mutability of argument
  • ARROW-2156 - [CI] Isolate Sphinx dependencies
  • ARROW-2163 - [CI] Make apt installs explicit
  • ARROW-2166 - [GLib] Add Slice api to Column
  • ARROW-2168 - [C++] Build toolchain on CI with jemalloc
  • ARROW-2169 - [C++] MSVC is complaining about uncaptured variables
  • ARROW-2174 - [JS] export arrow format and schema enums
  • ARROW-2176 - [C++] Extend DictionaryBuilder to support delta dictionaries
  • ARROW-2177 - [C++] Remove support for specifying negative scale values in DecimalType
  • ARROW-2180 - [C++] Remove deprecated APIs from 0.8.0 cycle
  • ARROW-2181 - [PYTHON][DOC] Add doc on usage of concat_tables
  • ARROW-2184 - [C++] Add static constructor for FileOutputStream returning shared_ptr to OutputStream
  • ARROW-2185 - Strip CI directives from commit messages
  • ARROW-2190 - [GLib] Add add/remove field functions for RecordBatch
  • ARROW-2191 - [C++] Only use specific version of jemalloc
  • ARROW-2197 - Document C++ ABI issue and workaround
  • ARROW-2198 - [Python] correct docstring for parquet.read_table
  • ARROW-2199 - [JAVA] Control the memory allocated for inner vectors in containers. (#1646)
  • ARROW-2203 - [C++] StderrStream class
  • ARROW-2204 - Fix TLS errors in manylinux1 build
  • ARROW-2205 - [Python] Option for integer object nulls
  • ARROW-2206 - [JS] Document Perspective project
  • ARROW-2218 - [Python] PythonFile should infer mode when not given
  • ARROW-2231 - [CI] Use clcache on AppVeyor for faster builds
  • ARROW-2238 - [C++] Detect and use clcache in cmake configuration
  • ARROW-2239 - [C++] Update Windows build docs
  • ARROW-2250 - [Python] Do not create a subprocess for plasma but just use existing process
  • ARROW-2252 - [Python] Create buffer from address, size and base
  • ARROW-2253 - [Python] Support eq on scalar values
  • ARROW-2257 - [C++] Add high-level option to toggle CXX11 ABI
  • ARROW-2261 - [GLib] Improve memory management for GArrowBuffer data
  • ARROW-2262 - [Python] Support slicing on pyarrow.ChunkedArray
  • ARROW-2279 - [Python] Better error message if lib cannot be found
  • ARROW-2282 - [Python] Create StringArray from buffers
  • ARROW-2283 - [C++] Support Arrow C++ installed in /usr detection by pkg-config
  • ARROW-2289 - [GLib] Add Numeric, Integer, FloatingPoint data types
  • ARROW-2291 - [C++] Add additional libboost-regex-dev to build instructions in README
  • ARROW-2292 - [Python] Rename frombuffer() to py_buffer()
  • ARROW-2309 - [C++] Use std::make_unsigned
  • ARROW-2321 - [C++] Release verification script fails with if CMAKE_INSTALL_LIBDIR is not $ARROW_HOME/lib
  • ARROW-2329 - [Website]: 0.9.0 release update
  • ARROW-2336 - [Website] Blog post for 0.9.0 release
  • ARROW-2768 - [Packaging] Support Ubuntu 18.04
  • ARROW-2783 - Importing conda-forge pyarrow fails

Bug Fixes

  • ARROW-1345 - [Python] Test conversion from nested NumPy arrays with smaller int, float types
  • ARROW-1589 - [C++] Fuzzing for certain input formats
  • ARROW-1646 - [Python] Handle NumPy scalar types
  • ARROW-1856 - [Python] Auto-detect Parquet ABI version when using PARQUET_HOME
  • ARROW-1909 - [C++] Enables building with benchmarks on windows
  • ARROW-1912 - [Website] Add committer affiliations and roles to website
  • ARROW-1919 - [Plasma] Test that object ids are 20 bytes
  • ARROW-1924 - [Python] Bring back pickle=True option for serialization
  • ARROW-1933 - [GLib] Fix build error with --with-arrow-cpp-build-dir
  • ARROW-1940 - [Python] Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table
  • ARROW-1941 - [Python] Fix empty list roundtrip in to_pandas
  • ARROW-1943 - [JAVA] handle setInitialCapacity for deeply nested lists
  • ARROW-1944 - [C++] Fix ARROW_STATIC_LIB in FindArrow
  • ARROW-1945 - [C++] Fix doxygen documentation of array.h
  • ARROW-1946 - [JAVA] Add APIs to decimal vector for writing big endian data
  • ARROW-1948 - [Java] Load ListVector validity buffer with BitVectorHelper to handle all non-null
  • ARROW-1950 - [Python] pandas_type in pandas metadata incorrect for List types
  • ARROW-1953 - [JS] Fix JS build
  • ARROW-1955 - MSVC generates “attempting to reference a deleted function” during build.
  • ARROW-1958 - [Python] Error in pandas conversion for datetimetz row index
  • ARROW-1961 - [Python] Preserve pre-existing schema metadata in Parquet files when passing flavor=‘spark’
  • ARROW-1966 - [C++] Accommodate JAVA_HOME on Linux that includes the jre/ directory, or is the full path to directory with libjvm
  • ARROW-1967 - Python: AssertionError w.r.t Pandas conversion on Parquet files in 0.8.0 dev version
  • ARROW-1971 - [Python] Add pandas serialization to the default
  • ARROW-1972 - [Python] Import pyarrow in DeserializeObject.
  • ARROW-1973 - [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.
  • ARROW-1976 - [Python] Handling unicode pandas columns on parquet.read_table
  • ARROW-1979 - [JS] Fix JS builds hanging in es2015
  • ARROW-1980 - [Python] Fix race condition in write_to_dataset
  • ARROW-1982 - [Python] Coerce Parquet statistics as bytes to more useful Python scalar types
  • ARROW-1986 - [Python] HadoopFileSystem is not picklable and cannot currently be used with multiprocessing
  • ARROW-1991 - [Website] Fix Docker documentation build
  • ARROW-1992 - [C++/Python] Fix segfault when string to categorical empty string array
  • ARROW-1997 - [C++/Python] Ignore zero-copy-option in to_pandas when strings_to_categorical is True
  • ARROW-1998 - [Python] fix crash on empty Numpy arrays
  • ARROW-1999 - [Python] Type checking in from_numpy_dtype
  • ARROW-2000 - [Plasma] Deduplicate file descriptors when replying to GetRequest.
  • ARROW-2002 - [Python] check write_queue is not full and writer_thread is alive before enqueue new record when download file.
  • ARROW-2003 - [Python] Remove use of fastpath parameter to pandas.core.internals.make_block
  • ARROW-2005 - [Python] Fix incorrect flake8 config path to Cython lint config
  • ARROW-2008 - [Python] Type inference for int32 NumPy arrays (expecting list<int32>) returns int64 and then conversion fails
  • ARROW-2010 - [C++] Do not suppress shorten-64-to-32 warnings from clang, fix warnings in ORC adapter
  • ARROW-2017 - [Python] Use unsigned PyLong API for uint64 values over int64 range
  • ARROW-2023 - [C++] Fix ASAN failure on malformed / empty stream input, enable ASAN builds, add more dev docs
  • ARROW-2025 - [C++] Creating multiple equivalent HadoopFileSystems works fine
  • ARROW-2029 - [Python] NativeFile.tell errors after close
  • ARROW-2032 - [C++] ORC ep installs on each call to ninja build
  • ARROW-2033 - [Python] Fix pa.array() with iterator input
  • ARROW-2039 - [Python] Avoid crashing on uninitialized Buffer
  • ARROW-2040 - [Python] Deserialized Numpy array must keep ref to underlying tensor
  • ARROW-2047 - [Python] Use sys.executable instead of one in the search path.
  • ARROW-2049 - [Python] Use python -m cython to run Cython, instead of CYTHON_EXECUTABLE
  • ARROW-2062 - [Python] Do not use memory maps in test_serialization.py to try to improve Travis CI flakiness
  • ARROW-2070 - [Python] Fix chdir logic in setup.py
  • ARROW-2072 - [Python] Fix crash in decimal128.byte_width
  • ARROW-2080 - [Python] Update documentation about pandas serialization context.
  • ARROW-2085 - [Python] HadoopFileSystem.isdir/.isfile return False on missing paths
  • ARROW-2106 - [Python] Add conversion for a series of datetime objects
  • ARROW-2109 - [C++] Completely disable boost autolink on MSVC build
  • ARROW-2124 - [Python] Add test for empty item in array
  • ARROW-2128 - [Python] Support arrays of empty lists
  • ARROW-2129 - [Python] Handle conversion of empty tables to Pandas
  • ARROW-2131 - [Python] Prepend module path to PYTHONPATH when spawning subprocess
  • ARROW-2133 - [Python] Fix segfault on conversion of empty nested array to Pandas
  • ARROW-2135 - [Python] Fix NaN conversion when casting from Numpy array
  • ARROW-2139 - [Python] Address Sphinx deprecation warning when building docs
  • ARROW-2145 - /ARROW-2153/ARROW-2157/ARROW-2160/ARROW-2177: [Python] Decimal conversion not working for NaN values
  • ARROW-2150 - [Python] Raise NotImplementedError when comparing with pyarrow.Array for now
  • ARROW-2151 - [Python] Fix conversion from np.uint64 scalars
  • ARROW-2153 - [C++/Python] Decimal conversion not working for exponential notation
  • ARROW-2157 - [Python] Decimal arrays cannot be constructed from Python lists
  • ARROW-2158 - [Python] Construction of Decimal array with None or np.nan fails
  • ARROW-2160 - [C++/Python] Fix decimal precision inference
  • ARROW-2161 - [Python] Skip test_cython_api if ARROW_HOME isn't defined
  • ARROW-2162 - [Python/C++] Decimal Values with too-high precision are multiplied by 100
  • ARROW-2167 - [C++] Building Orc extensions fails with the default BUILD_WARNING_LEVEL=Production
  • ARROW-2170 - [Python] construct_metadata fails on reading files where no index was preserved
  • ARROW-2171 - [C++/Python] Make OwnedRef safer
  • ARROW-2172 - [C++/Python] Fix converting from Numpy array with non-natural stride
  • ARROW-2173 - [C++/Python] Hold the GIL in NumPyBuffer destructor
  • ARROW-2175 - [Python] Install Arrow libraries in Travis CI builds when only Python directory is affected
  • ARROW-2178 - [JS] Fix JS html FileReader example
  • ARROW-2179 - [C++] Install omitted headers in arrow/util
  • ARROW-2192 - [CI] Always build on master branch and repository
  • ARROW-2194 - [Python] Pandas columns metadata incorrect for empty string columns
  • ARROW-2208 - [Python] install issues with jemalloc
  • ARROW-2209 - [Python] Partition columns are not correctly loaded in schema of ParquetDataset
  • ARROW-2210 - [C++] Reset ptr on failed memory allocation
  • ARROW-2212 - [C++/Python] Build Protobuf in base manylinux 1 docker image
  • ARROW-2223 - [JS] compile src/bin as es5-cjs to all output targets
  • ARROW-2227 - [Python] Fix off-by-one error in chunked binary conversions
  • ARROW-2228 - [Python] Unsigned int type for arrow Table not supported
  • ARROW-2230 - [Python] Strip catch-all tag matching from git-describe
  • ARROW-2232 - [Python] pyarrow.Tensor constructor segfaults
  • ARROW-2234 - [JS] Read timestamp low bits as Uint32s
  • ARROW-2240 - [Python] Array initialization with leading numpy nan fails with exception
  • ARROW-2244 - [C++] Add unit test to explicitly check that NullArray internal data set correctly in Slice operations
  • ARROW-2245 - ARROW-2246: [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
  • ARROW-2246 - [Python] Use namespaced boost in manylinux1 package
  • ARROW-2251 - [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live
  • ARROW-2254 - [Python] Ignore JS tags in local dev versions
  • ARROW-2258 - [Python] Add additional information to find Boost on windows
  • ARROW-2263 - [Python] Prepend local pyarrow/ path to PYTHONPATH in test_cython.py
  • ARROW-2265 - [Python] Use CheckExact when serializing lists and numpy arrays.
  • ARROW-2268 - Drop usage of md5 checksums for source releases, verification scripts
  • ARROW-2269 - [Python] Make boost namespace selectable in wheels
  • ARROW-2270 - [Python] Fix lifetime of ForeignBuffer base object
  • ARROW-2272 - [Python] Clean up leftovers in test_plasma.py
  • ARROW-2275 - [C++] Guard against bad use of Buffer.mutable_data()
  • ARROW-2280 - [Python] Return the offset for the buffers in pyarrow.Array
  • ARROW-2284 - [Python] Fix error display on test_plasma error
  • ARROW-2288 - [Python] Fix slicing logic
  • ARROW-2297 - [JS] babel-jest is not listed as a dev dependency
  • ARROW-2304 - [C++] Fix HDFS MultipleClients unit test
  • ARROW-2306 - [Python] Fix partitioned Parquet test against HDFS
  • ARROW-2307 - [Python] Allow reading record batch streams with zero record batches
  • ARROW-2311 - [Python/C++] Fix struct array slicing
  • ARROW-2312 - [JS] run test_js before test_integration
  • ARROW-2313 - [C++] Add -NDEBUG flag to arrow.pc
  • ARROW-2316 - [C++] Revert Buffer::mutable_data to inline so that linkers do not have to remember to define NDEBUG for release builds
  • ARROW-2318 - [Plasma] Run plasma store tests with unique socket
  • ARROW-2320 - [C++] Vendored Boost build does not build regex library
  • ARROW-2406 - [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

Apache Arrow 0.8.0 (2017-12-18)

Bug Fixes

  • ARROW-226 - [C++] If opening an HDFS file fails and it does not exist, say so to help with debugging
  • ARROW-641 - [C++] Do not build io-hdfs-test if ARROW_HDFS is off
  • ARROW-1282 - Large memory reallocation by Arrow causes hang in jemalloc
  • ARROW-1298 - C++: Add prefix to jemalloc functions to guard against issues when using multiple allocators in the same process
  • ARROW-1341 - [C++] Deprecate arrow::MakeTable in favor of new ctor from ARROW-1334
  • ARROW-1347 - [JAVA] Return consistent child field name for List Vectors
  • ARROW-1398 - [Python] No support reading columns of type decimal(19,4)
  • ARROW-1409 - [Format] Remove page id from Buffer metadata, increment metadata version number
  • ARROW-1431 - [Java] JsonFileReader doesn't intialize some vectors approperately
  • ARROW-1436 - PyArrow Timestamps written to Parquet as INT96 appear in Spark as ‘bigint’
  • ARROW-1540 - Add NO_VALGRIND option to ADD_ARROW_TEST and disable valgrind in a few problematic tests
  • ARROW-1541 - [C++] Fix race conditions in arrow_gpu with generated Flatbuffers files. Do not put generated files in source tree
  • ARROW-1543 - [C++] Correct C++ tutorial to use std::unique_ptr instead of std::shared_ptr
  • ARROW-1549 - [JS] Integrate auto-generated Arrow test files
  • ARROW-1555 - [Python] Implement Dask exists function
  • ARROW-1584 - [C++/Python] Support Null type in IPC round trips, fix serialize_pandas on empty DataFrame
  • ARROW-1585 - /ARROW-1586: [PYTHON] serialize_pandas roundtrip loses columns name
  • ARROW-1586 - [PYTHON] serialize_pandas roundtrip loses columns name
  • ARROW-1609 - [Plasma] Xcode 9 compilation workaround
  • ARROW-1615 - Added BUILD_WARNING_LEVEL and BUILD_WARNING_FLAGS to Setup…
  • ARROW-1617 - [Python] Do not use symlinks in python/cmake_modules
  • ARROW-1620 - Python: Download Boost in manylinux1 build from bintray
  • ARROW-1622 - [Plasma] Plasma doesn't compile with XCode 9
  • ARROW-1624 - [C++] Fix build on LLVM 4.0, remove some clang warning suppressions
  • ARROW-1625 - [Serialization] Support OrderedDict and defaultdict serialization
  • ARROW-1629 - [C++] Add miscellaneous DCHECKs and minor changes based on infer tool output
  • ARROW-1633 - [Python] Support NumPy string and unicode types in pyarrow.array, Array.from_pandas
  • ARROW-1640 - Fix HTTPS failures in cmake / libcurl caused by ca-certificates clash
  • ARROW-1647 - [Plasma] Make sure to read length header as int64_t instead of size_t.
  • ARROW-1653 - [Plasma] Use static cast to avoid compiler warning.
  • ARROW-1655 - [Java] Add Scale and Precision to ValueVectorTypes.tdd for Decimals
  • ARROW-1656 - [C++] Endianness Macro is Incorrect on Windows And Mac
  • ARROW-1657 - [C++] Multithreaded Read Test Failing on Arch Linux
  • ARROW-1658 - [Python] Add boundschecking of dictionary indices when creating CategoricalBlock
  • ARROW-1663 - [Java] use consistent name for null and not-null in FixedSizeLis…
  • ARROW-1670 - [Serialization] Speed up deserialization by getting rid of smart pointer overhead
  • ARROW-1672 - [Python] Failure to write Feather bytes column
  • ARROW-1673 - [Python] Add support for numpy ‘bool’ type
  • ARROW-1676 - [C++] Only pad null bitmap up to a factor of 8 bytes in Feather format
  • ARROW-1678 - [Python] Implement numpy.float16 SerDe
  • ARROW-1680 - [Python] Timestamp unit change not done in from_pandas() conversion
  • ARROW-1681 - [Python] Error writing with nulls in lists
  • ARROW-1686 - [Docs] rsync contents of apidocs directory into site java directory
  • ARROW-1693 - [JS] Expand JavaScript implementation, build system, fix integration tests
  • ARROW-1694 - [Java] Unclosed VectorSchemaRoot in JsonFileReader#readDictionaryBatches()
  • ARROW-1695 - [Serialization] Fix reference counting of numpy arrays created in custom serializer
  • ARROW-1698 - [JS] File reader attempts to load the same dictionary batch more than once
  • ARROW-1704 - [GLib] Fix Go example failure
  • ARROW-1708 - [JS] Fix linter error
  • ARROW-1709 - [C++] Decimal.ToString is incorrect for negative scale
  • ARROW-1711 - [Python] Fix flake8 calls to lint the right directories
  • ARROW-1714 - [Python] Fix invalid serialization/deserialization None name Series
  • ARROW-1720 - [Python] Implement bounds check in chunk getter
  • ARROW-1723 - [C++] add ARROW_STATIC to mark static libs on Windows
  • ARROW-1730 - , ARROW-1738: [Python] Fix wrong datetime conversion
  • ARROW-1732 - [Python] Permit creating record batches with no columns, test pandas roundtrips
  • ARROW-1735 - [C++] Test CastKernel writing into output array with non-zero offset
  • ARROW-1738 - [Python] Wrong datetime conversion when pa.array with unit
  • ARROW-1739 - [Python] Fix broken build due to using unittest.TestCase methods
  • ARROW-1742 - C++: clang-format is not detected correct on OSX anymore
  • ARROW-1743 - [Python] Avoid non-array writeable-flag check
  • ARROW-1745 - [Plasma] Include gtest after plasma/compat.h in tests.
  • ARROW-1749 - [C++] Handle range of Decimal128 values that require 39 digits to be displayed
  • ARROW-1751 - [Python] Pandas 0.21.0 introduces a breaking API change for MultiIndex construction
  • ARROW-1754 - [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name
  • ARROW-1756 - [Python] Fix large file read/write error
  • ARROW-1762 - [C++] Add note to readme about need to set LC_ALL on some Linux systems
  • ARROW-1764 - [Python] Add -c conda-forge for Windows dev installation instructions
  • ARROW-1766 - [GLib] Fix failing builds on OSX
  • ARROW-1768 - [Python] Fix suppressed exception in ParquetWriter.del
  • ARROW-1769 - Python: pyarrow.parquet.write_to_dataset creates cyclic references
  • ARROW-1770 - [GLib] Fix GLib compiler warning
  • ARROW-1771 - [C++] ARROW-1749 Breaks Public API test in parquet-cpp
  • ARROW-1776 - [C++] Define arrow::gpu::CudaContext::bytes_allocated()
  • ARROW-1778 - [Python] Link parquet-cpp statically, privately in manylinux1 wheels
  • ARROW-1781 - Don't use brew when using the toolchain
  • ARROW-1788 - Fix Plasma store abort bug on client disconnection
  • ARROW-1791 - Limit generated data range to physical limits for temporal types
  • ARROW-1793 - fix a typo for README.md
  • ARROW-1800 - [C++] Fix and simplify random_decimals
  • ARROW-1805 - [Python] Ignore special private files when traversing ParquetDataset
  • ARROW-1811 - [C++/Python] Rename all Decimal based APIs to Decimal128
  • ARROW-1812 - [C++] Plasma store modifies hash table while iterating during client disconnect
  • ARROW-1813 - Enforce checkstyle failure in JAVA build and fix all checkstyle
  • ARROW-1821 - [INTEGRATION] Add integration test case for when Field has zero null count and optional validity buffer
  • ARROW-1829 - [Plasma] Fixes to eviction policy.
  • ARROW-1830 - [Python] Relax restriction that Parquet files in a dataset end in .parq or .parquet
  • ARROW-1831 - [Python] Docker-based documentation build does not properly set LD_LIBRARY_PATH
  • ARROW-1836 - [C++] Remove deprecated static_visitor struct to avoid msvc C4996 warning
  • ARROW-1839 - /ARROW-1871: [C++/Python] Add Decimal Parquet Read/Write Tests
  • ARROW-1840 - [Website] The installation command failed on Windows10 anaconda envir…
  • ARROW-1845 - [Python] Expose Decimal128Type
  • ARROW-1852 - [C++] Make retrieval of Plasma manager fd a const operation
  • ARROW-1853 - [Plasma] Fix off-by-one error in retry processing
  • ARROW-1863 - [Python] PyObjectStringify could render bytes-like output for more types of objects
  • ARROW-1865 - [C++] Do not alter number of rows attribute when removing last column from Table
  • ARROW-1869 - [JAVA] Fix LowCostIdentityHashMap name
  • ARROW-1871 - [Python/C++] Appending Python Decimals with different scales requires rescaling
  • ARROW-1873 - [Python] Catch more possible Python/OOM errors in to_pandas conversion path
  • ARROW-1877 - [Java] Fix incorrect equals method in JsonStringArrayList
  • ARROW-1879 - [Python] Dask integration tests are not skipped if dask is not installed
  • ARROW-1881 - Ignore JS tags for Python packages
  • ARROW-1882 - [C++] Reintroduce DictionaryBuilder
  • ARROW-1883 - [Python] Fix handling of metadata in to_pandas when not all columns are present
  • ARROW-1889 - [Python] --exclude is not available in older git versions
  • ARROW-1890 - [Python] Fix mask handling for Date32 NumPy conversions
  • ARROW-1891 - [Python] Always use NumPy NaT sentinels to mark nulls when converting to array
  • ARROW-1892 - [Python] Support binaries in lists
  • ARROW-1893 - [Python] Convert memoryview to bytes when loading from pickle in Python 2.7
  • ARROW-1895 - /ARROW-1897: [Python] Add field_name to pandas index metadata
  • ARROW-1897 - [Python] Incorrect numpy_type for pandas metadata of Categoricals
  • ARROW-1904 - [C++] Deprecate PrimitiveArray::raw_values
  • ARROW-1906 - [Python] Do not override user-supplied type in pyarrow.array when converting DatetimeTZ pandas data
  • ARROW-1908 - [Python] Construction of arrow table from pandas DataFrame with duplicate column names crashes
  • ARROW-1910 - [C++] Use c_glib Brewfile in README for installing dependencies on macOS (#1407)
  • ARROW-1914 - [C++] Fix build dependency for GPU support build
  • ARROW-1915 - [Python] Add missing parquet decorator to decimal tests
  • ARROW-1916 - [Java] Include java/dev/checkstyle in git archive for source releases
  • ARROW-1917 - Fixes to enable verify-release-candidate.sh to work for 0.8.0
  • ARROW-1935 - Download page must not link to snapshots / nightly builds
  • ARROW-1936 - Broken links to signatures/hashes etc
  • ARROW-1939 - Correct links in release 0.8 blog post

New Features and Improvements

  • ARROW-480 - [Python] Implement RowGroupMetaData.ColumnChunk
  • ARROW-504 - [Python] Add adapter to write pandas.DataFrame in user-selected chunk size to streaming format
  • ARROW-507 - [C++] Complete ListArray::FromArrays implementation, add unit tests
  • ARROW-541 - [JS] Implement JavaScript-compatible implementation
  • ARROW-571 - [Python] Add unit test for incremental Parquet file building, improve docs
  • ARROW-587 - Add fix version to PR merge tool
  • ARROW-609 - [C++] Function for casting from days since UNIX epoch to int64 date
  • ARROW-838 - [Python] Expand pyarrow.array to handle NumPy arrays not originating in pandas
  • ARROW-905 - [Docs] Dockerize document generation
  • ARROW-911 - [Python] Expand development.rst with build instructions without conda
  • ARROW-942 - Support running integration tests with both Python 2.7 and 3.6
  • ARROW-950 - [Website] Add Google Analytics tag to site
  • ARROW-972 - UnionArray in pyarrow
  • ARROW-1032 - [JS] Support custom_metadata
  • ARROW-1047 - [Java][FollowUp] Change ArrowMagic to be non-public class
  • ARROW-1047 - [Java] Add Generic Reader Interface for Stream Format
  • ARROW-1087 - [Python] Add pyarrow.get_include function. Bundle includes in all builds
  • ARROW-1114 - [C++] Add simple RecordBatchBuilder class
  • ARROW-1134 - [C++] Support for C++/CLI compilation, add NULLPTR define to avoid using nullptr in public headers
  • ARROW-1178 - [C++/Python] Add option to set chunksize in TableBatchReader, Table.to_batches method
  • ARROW-1226 - [C++] Docs cleaning in arrow/ipc. Doxyfile fixes, move ipc/metadata-internal.h symbols to internal NS
  • ARROW-1250 - [Python] Add pyarrow.types module with useful type checking functions
  • ARROW-1362 - [Integration] Validate vector type layout in IPC messages
  • ARROW-1367 - [Website] Divide CHANGELOG issues by component and add subheaders
  • ARROW-1369 - Support boolean types in the javascript arrow reader library
  • ARROW-1371 - [Website] Add “Powered By” page to the website
  • ARROW-1455 - [Python] Add Dockerfile for validating Dask integration
  • ARROW-1471 - [JAVA] Document requirements and non/requirements for ValueVector updates
  • ARROW-1472 - [JAVA] Design updated ValueVector Object Hierarchy
  • ARROW-1473 - ValueVector new hierarchy prototype (implementation phase 1)
  • ARROW-1474 - [JAVA] ValueVector hierarchy (Implementation Phase 2)
  • ARROW-1476 - [JAVA] Implement Final ValueVector Updates
  • ARROW-1482 - [C++] Implement casts between date32 and date64
  • ARROW-1483 - [C++] Implement casts between time32 and time64
  • ARROW-1484 - [C++/Python] Implement casts between date, time, timestamp units
  • ARROW-1485 - [C++] Implement union-like data type for accommodating kernel arguments which may be scalars or arrays
  • ARROW-1486 - [C++] Make Column, RecordBatch, and Table non-copyable
  • ARROW-1487 - [C++] Implement casts from List to List, where a cast function is defined from any A to B
  • ARROW-1488 - [C++] Implement ArrayBuilder::Finish in terms of FinishInternal based on ArrayData
  • ARROW-1498 - Add CONTRIBUTING.md to .github special directory
  • ARROW-1503 - [Python] Add default serialization context, callbacks for pandas.Series/DataFrame
  • ARROW-1522 - [Python] Zero copy buffer deserialization
  • ARROW-1523 - [C++] Add helper data struct with methods for reading a validity bitmap possibly having a non-zero offset
  • ARROW-1524 - [C++] More graceful solution for handling non-zero offsets on inputs and outputs in compute library
  • ARROW-1525 - [C++] New compare functions that return boolean instead of Status
  • ARROW-1526 - [Python] Add unit test for fix in PARQUET-1100
  • ARROW-1535 - [Python] Enable sdist tarballs to be installed
  • ARROW-1538 - [C++] Support Ubuntu 14.04 in .deb packaging automation
  • ARROW-1539 - [C++] Remove APIs deprecated as of 0.7.0 or prior releases
  • ARROW-1556 - [C++] Move verbose AssertArraysEqual function used in PARQUET-1100 into arrow/test-util.h
  • ARROW-1559 - [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel
  • ARROW-1573 - [C++] Implement stateful kernel function that uses DictionaryBuilder to compute dictionary indices
  • ARROW-1575 - [Python] Add tests for pyarrow.column factory function
  • ARROW-1576 - [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes
  • ARROW-1577 - [JS] add ASF release scripts
  • ARROW-1588 - [C++/Format] Harden Decimal Format
  • ARROW-1593 - [Python] Pass through preserve_index to RecordBatch.from_pandas in serialize_pandas
  • ARROW-1594 - [Python] Multithreaded conversions to Arrow in from_pandas
  • ARROW-1600 - [C++] Add Buffer constructor that wraps std::string
  • ARROW-1602 - [C++] Add IsValid method to pair with IsNull
  • ARROW-1603 - [C++] Add BinaryArray::GetString helper method
  • ARROW-1604 - [Python] Support common type aliases in cast(...) and various type= arguments
  • ARROW-1605 - [Python] pyarrow.array should be able to yield smaller integer types without an explicit cast
  • ARROW-1607 - [C++] Implement DictionaryBuilder for Decimals
  • ARROW-1613 - [Java] Alternative ArrowReader close to free resources but leave ReadChannel open
  • ARROW-1616 - [Python] Add unit test for RecordBatchWriter.write dispatching to write_table or write_batch
  • ARROW-1626 - Add make targets to run the inter-procedural static analys…
  • ARROW-1627 - New class to handle collection of BufferLedger(s) within …
  • ARROW-1630 - [Serialization] Support Python datetime objects
  • ARROW-1631 - [C++] Add GRPC to ThirdpartyToolchain
  • ARROW-1635 - Add release management guide
  • ARROW-1637 - [C++] IPC round-trip for null type
  • ARROW-1641 - [C++] Hide std::mutex from public headers
  • ARROW-1648 - C++: Add cast from Dictionary[NullType] to NullType
  • ARROW-1649 - C++: Print number of nulls in PrettyPrint for NullArray
  • ARROW-1651 - [JS] Lazy row accessor in Table
  • ARROW-1652 - [JS] housekeeping, vector cleanup
  • ARROW-1654 - [Python] Implement pickling for DataType, Field, Schema
  • ARROW-1662 - Move to using Homebrew/bundle and Brewfile
  • ARROW-1665 - [Serialization] Support more custom datatypes in the default serialization context
  • ARROW-1666 - [GLib] Enable gtk-doc on Travis CI Mac environment
  • ARROW-1667 - [GLib] Support Meson
  • ARROW-1671 - [C++] Deprecate arrow::MakeArray that returns Status, refactor existing code to new variant
  • ARROW-1675 - [Python] Use RecordBatch.from_pandas in Feather write path
  • ARROW-1677 - [Blog] Post on ray and arrow serialization
  • ARROW-1679 - [GLib] Add garrow_record_batch_reader_read_next()
  • ARROW-1683 - [Python] Restore TimestampType to pyarrow namespace
  • ARROW-1684 - [Python] Support selecting nested Parquet fields by any path prefix
  • ARROW-1685 - [GLib] Add GArrowTableBatchReader
  • ARROW-1687 - [Python] Expose UnionArray to pyarrow
  • ARROW-1689 - [Python] Implement zero-copy conversions for DictionaryArray
  • ARROW-1689 - [Python] Allow user to request no data copies
  • ARROW-1690 - [GLib] Add garrow_array_is_valid()
  • ARROW-1691 - [Java] Conform Java Decimal type implementation to format decisions in ARROW-1588
  • ARROW-1697 - [GitHub] Add ISSUE_TEMPLATE.md
  • ARROW-1701 - [Serialization] Support zero copy PyTorch Tensor serialization
  • ARROW-1702 - Update jemalloc in manylinux1 build
  • ARROW-1703 - [C++] Vendor exact version of jemalloc we depend on
  • ARROW-1707 - Update dev README after movement to GitBox
  • ARROW-1710 - [Java] Remove Non-Nullable Vectors
  • ARROW-1716 - [Format/JSON] Use string integer value for Decimals in JSON
  • ARROW-1717 - [Java] Refactor JsonReader for new class hierarchy and fix
  • ARROW-1718 - [C++/Python][D] -> date32
  • ARROW-1719 - [Java] Remove accessor and mutator interface
  • ARROW-1721 - [Python] Implement null-mask check in places where it isn't supported in numpy_to_arrow.cc
  • ARROW-1724 - [Packaging] Support Ubuntu 17.10
  • ARROW-1725 - [Packaging] Upload .deb for Ubuntu 17.10
  • ARROW-1726 - [GLib] Add setup description to verify C GLib build
  • ARROW-1727 - [Format] Expand Arrow streaming format to permit deltas / additions to existing dictionaries
  • ARROW-1728 - [C++] Run clang-format checks in Travis CI
  • ARROW-1734 - C++/Python: Add cast function on Column-level
  • ARROW-1736 - [GLib] Add GArrowCastOptions:allow-time-truncate
  • ARROW-1737 - [GLib] Use G_DECLARE_DERIVABLE_TYPE
  • ARROW-1740 - C++: Kernel to get unique values of an Array/Column
  • ARROW-1746 - [Python] Add build dependencies for Arch Linux
  • ARROW-1747 - [C++] Don't export symbols of statically linked libraries
  • ARROW-1748 - [GLib] Add GArrowRecordBatchBuilder
  • ARROW-1750 - [C++] Remove the need for arrow/util/random.h
  • ARROW-1752 - [Packaging] Add GPU packages for Debian and Ubuntu
  • ARROW-1753 - [Python] Provide for matching subclasses with register_type in serialization context
  • ARROW-1755 - [C++] CMake option to link msvc crt statically
  • ARROW-1758 - [Python] Remove pickle=True option for object serialization
  • ARROW-1759 - [Python] Add function / property to get implied Arrow schema from Parquet file
  • ARROW-1763 - [Python] Implement hash for DataType
  • ARROW-1765 - [Doc] Use dependencies from conda in C++ docker build
  • ARROW-1767 - [C++] Support file reads and writes over 2GB on Windows
  • ARROW-1772 - [C++] Add public-api-test module in style of parquet-cpp
  • ARROW-1773 - [C++] Add casts from date/time types to compatible signed integers
  • ARROW-1775 - Ability to abort created but unsealed Plasma objects
  • ARROW-1777 - [C++] Add ArrayData::Make static ctor for more convenient construction
  • ARROW-1779 - [Java] Integration test breaks without zeroing out validity vectors
  • ARROW-1782 - [Python] Add pyarrow.compress, decompress APIs
  • ARROW-1783 - [Python] Provide a “component” dict representation of a serialized Python object with minimal allocation
  • ARROW-1784 - [Python] Enable zero-copy serialization, deserialization of pandas.DataFrame via components
  • ARROW-1785 - [Format/C++/Java] Remove VectorLayout from serialized schemas
  • ARROW-1787 - [Python] Support reading parquet files into DataFrames in a backward compatible way
  • ARROW-1794 - [C++/Python] Rename DecimalArray to Decimal128Array
  • ARROW-1795 - [Plasma] Create flag to make Plasma store use a single memory-mapped file.
  • ARROW-1801 - [Docs] Update install instructions to use red-data-tools repos
  • ARROW-1802 - [GLib] Support arrow-gpu
  • ARROW-1806 - [GLib] Add garrow_record_batch_writer_write_table()
  • ARROW-1808 - [C++] Make RecordBatch, Table virtual interfaces for column access
  • ARROW-1809 - [GLib] Use .xml instead of .sgml for GTK-Doc main file
  • ARROW-1810 - [Plasma] Remove unused Plasma test shell scripts
  • ARROW-1816 - [Java] Resolve new vector classes structure for timestamp, date and maybe interval
  • ARROW-1817 - [Java] Configure JsonReader to read floating point NaN values
  • ARROW-1818 - Examine Java Dependencies
  • ARROW-1819 - [Java] Remove legacy vector classes
  • ARROW-1820 - [C++] Create arrow_compute shared library subcomponent
  • ARROW-1826 - [JAVA] Avoid branching in copyFrom for fixed width scalars
  • ARROW-1827 - [Java] Add checkstyle file and license template
  • ARROW-1828 - [C++] Hash kernel specialization for BooleanType
  • ARROW-1834 - [Doc] Build documentation in separate build folders
  • ARROW-1838 - [C++] Conform kernel API to use Datum for input and output
  • ARROW-1841 - [JS] Update text-encoding-utf-8 and tslib for node ESModules support
  • ARROW-1844 - [C++] Add initial Unique benchmarks for int64, variable-length strings
  • ARROW-1849 - [GLib] Add input checks to GArrowRecordBatch
  • ARROW-1850 - [C++] Use void* / const void* for buffers in file APIs
  • ARROW-1854 - [Python] Use pickle to serialize numpy arrays of objects.
  • ARROW-1855 - [GLib] Add workaround for build failure on macOS
  • ARROW-1857 - [Python] Add switch for boost linkage with static parquet in wheels
  • ARROW-1859 - [GLib] Add GArrowDictionaryDataType
  • ARROW-1862 - [GLib] Add GArrowDictionaryArray
  • ARROW-1864 - [Java] Upgrade Netty to 4.1.17
  • ARROW-1866 - [Java] Combine MapVector and NonNullableMapVector Classes
  • ARROW-1867 - [Java] Add missing methods to BitVector from legacy vector class
  • ARROW-1874 - [GLib] Add garrow_array_unique()
  • ARROW-1878 - [GLib] Add garrow_array_dictionary_encode()
  • ARROW-1884 - [C++] Exclude integration test JSON reader/writer classes from public API
  • ARROW-1885 - [Java] Restore MapVector class names prior to ARROW-1710
  • ARROW-1901 - [Python] Support recursive mkdir for DaskFilesystem
  • ARROW-1902 - [Python] Remove mkdir race condition from write_to_dataset
  • ARROW-1905 - [Python] Add more comprehensive list of exact type checking functions to pyarrow.types
  • ARROW-1911 - [JS] Add Graphistry to Arrow JS proof points
  • ARROW-1922 - Blog post on recent improvements/changes in JAVA Vectors
  • ARROW-1932 - [Website] Update site for 0.8.0
  • ARROW-1934 - [Website] Blog post summarizing highlights of 0.8.0 release

Apache Arrow 0.7.1 (2017-10-01)

New Features and Improvements

  • ARROW-559 - Add release verification script for Linux
  • ARROW-1464 - [GLib] Add “Common build problems” section into the README.md of c_glib
  • ARROW-1537 - [C++] Support building with full path install_name on macOS
  • ARROW-1546 - [GLib] Support GLib 2.40 again
  • ARROW-1548 - [GLib] Support bulk append in builder
  • ARROW-1578 - [C++] Run lint checks in Travis CI much earlier at before_script stage to fail faster
  • ARROW-1592 - [GLib] Add GArrowUIntArrayBuilder
  • ARROW-1608 - Support Release verification script on macOS
  • ARROW-1612 - [GLib] Update readme for mac os
  • ARROW-1618 - [JAVA] Reduce Heap Usage(Phase 1): move release listener logic to Allocation Manager
  • ARROW-1634 - [Website] Updates for 0.7.1 release

Bug Fixes

  • ARROW-1497 - [Java] Fix JsonReader to initialize count correctly
  • ARROW-1500 - [C++] Do not ignore return value from truncate in MemoryMa…
  • ARROW-1529 - [GLib] Use Xcode 8.3 on Travis CI
  • ARROW-1533 - [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
  • ARROW-1536 - [C++] Do not transitively depend on libboost_system
  • ARROW-1542 - [C++] Install packages in temporary directory in MSVC build verification script
  • ARROW-1544 - [JS] Export Vector types
  • ARROW-1545 - Remove deprecated args of builder
  • ARROW-1547 - [JAVA] Fix 8x memory over-allocation in BitVector
  • ARROW-1550 - [Python] Followup: fix flake8 warning
  • ARROW-1550 - [Python] Explicitly close owned file handles in ParquetWriter.close to avoid Windows flakiness
  • ARROW-1553 - [JAVA] Implement setInitialCapacity for MapWriter
  • ARROW-1554 - [Python] Update Sphinx install page to note that VC14 runtime may need to be installed on Windows
  • ARROW-1557 - [Python] Validate names length in Table.from_arrays
  • ARROW-1590 - [JS] Flow TS Table method generics
  • ARROW-1591 - C++: Xcode 9 is not correctly detected
  • ARROW-1595 - [Python] Fix package dependency resolution issue causing broken builds
  • ARROW-1598 - [C++] Fix diverged code comment in plasma tutorial
  • ARROW-1601 - [C++] Do not read extra byte from validity bitmap, add internal::BitmapReader in lieu of macros
  • ARROW-1606 - [Python] Copy .lib files in addition to .dll when bundling libraries for Windows
  • ARROW-1610 - C++/Python: Only call python-prefix if the default PYTHON_LIBRARY is not present
  • ARROW-1611 - [C++] Add BitmapWriter, do not perform out of bounds read in BitmapReader when length is 0
  • ARROW-1619 - [Java] Correctly set “lastSet” for variable vectors in JsonReader

Apache Arrow 0.7.0 (2017-09-17)

Bug Fixes

  • ARROW-12 - Get Github activity mirrored to JIRA
  • ARROW-248 - UnionVector.close() should call clear()
  • ARROW-269 - UnionVector getBuffers method does not include typevector
  • ARROW-407 - BitVector.copyFromSafe() should re-allocate if necessary instead of returning false
  • ARROW-801 - Provide direct access to underlying buffer memory addresses
  • ARROW-1302 - C++: Set MAKE to make if not defined
  • ARROW-1332 - [Packaging] Building Windows wheels in Apache repos
  • ARROW-1354 - [Python] Segfault in Table.from_pandas with Mixed-Type Categories
  • ARROW-1357 - [Python] Account for chunked arrays when converting lists back to pandas form
  • ARROW-1363 - [C++] Use buffer layout from dictionary index type in IPC messages
  • ARROW-1365 - [Python] Remove outdated pyarrow.jemalloc_memory_pool example. Update API doc site build instructions
  • ARROW-1373 - Implement getBuffer() methods for ValueVector
  • ARROW-1375 - [C++] Remove dependency on msvc version for Snappy build
  • ARROW-1378 - [Python] whl is not a supported wheel on this platform on Debian/Jessie
  • ARROW-1379 - [Java] adding maven-dependency-plugin and fixing all reported dependency errors
  • ARROW-1390 - [Python] Add more serialization tests
  • ARROW-1407 - Fix bug where DictionaryEncoder can only encode vector le…
  • ARROW-1411 - [Python] Booleans in Float Columns cause Segfault
  • ARROW-1414 - [GLib] Cast after status check
  • ARROW-1421 - [Python] Extend Python serialization API to accept non-list types
  • ARROW-1426 - [Site] Fix the title of the top page.
  • ARROW-1429 - [Python] Open common Parquet metadata file using passed file system
  • ARROW-1430 - [Python] Python CI build outside of a bash function scope, enable flake8 to fail build
  • ARROW-1434 - [Python][D] numpy arrays
  • ARROW-1435 - [Python] Properly handle time zone metadata in Parquet round trips
  • ARROW-1437 - [Python] pa.Array.from_pandas segfaults when given a mixed-type array
  • ARROW-1439 - [Packaging] Automate updating RPM in RPM build
  • ARROW-1443 - [Java] Fixed a small bug on ArrowBuf.setBytes with unsliced ByteBuffers
  • ARROW-1444 - [JAVA] fix last byte copy in BitVector splitAndTransfer
  • ARROW-1446 - [Python] Add (very slow) large memory unit test for int32 overflow in PARQUET-1090
  • ARROW-1450 - [Python] Raise proper error if custom serialization handler fails
  • ARROW-1452 - [C++] Restore DISALLOW_COPY_AND_ASSIGN usages removed in ARROW-1452 patch
  • ARROW-1452 - [C++] Make macros in arrow/util/macros.h more unique
  • ARROW-1453 - [C++/Python] Support non-contiguous Tensors in WriteTensor
  • ARROW-1457 - [C++] Optimize strided WriteTensor
  • ARROW-1458 - [Python] Document that create_parents=False is unsupported in HadoopFileSystem
  • ARROW-1459 - [Python] Use list values length to advance offset when reconstructing array of ndarrays
  • ARROW-1461 - [C++] Restore LLVM apt usage
  • ARROW-1461 - [C++] Disable builds using LLVM apt repo until installation issues resolved
  • ARROW-1467 - [JAVA] Fix reset() and allocateNew() in Nullable Value Vectors t…
  • ARROW-1469 - Segfault when serialize Pandas series with mixed object type
  • ARROW-1490 - [Java] Allow failures for JDK9 for now
  • ARROW-1493 - [C++] Flush stream in PrettyPrint functions
  • ARROW-1495 - [C++] Store shared_ptr to boxed arrays in RecordBatch
  • ARROW-1507 - [C++] Include arrow/array.h for arrow::internal::ArrayData
  • ARROW-1512 - [C++] Fix API change in documentation
  • ARROW-1514 - [C++] Fix a typo in document
  • ARROW-1527 - Fix Travis CI JDK9 build
  • ARROW-1531 - [C++] Return ToBytes by value from Decimal128
  • ARROW-1532 - [Python] Referencing an Empty Schema causes a SegFault

New Features and Improvements

  • ARROW-34 - C++: establish a basic function evaluation model
  • ARROW-229 - [C++] Implement cast functions for numeric types, booleans
  • ARROW-592 - [C++] Provide .deb and .rpm packages
  • ARROW-594 - [C++/Python] Write arrow::Table to stream and file writers
  • ARROW-695 - Add decimal integration test.
  • ARROW-696 - [C++] Support decimals in IPC and JSON reader/writer to enable integration tests
  • ARROW-759 - [Python] Serializing large class of Python objects in Apache Arrow
  • ARROW-786 - [Format] In-memory format for 128-bit Decimals, handling of sign bit
  • ARROW-837 - [Python] Add public pyarrow.allocate_buffer API. Rename FixedSizeBufferOutputStream
  • ARROW-941 - Add “cold start” instructions for running integration tests
  • ARROW-989 - [Python] Write pyarrow.Table to FileWriter or StreamWriter
  • ARROW-1156 - [C++/Python] Expand casting API, add UnaryKernel callable. Use Cast in appropriate places when converting from pandas
  • ARROW-1238 - [Java] Adding Decimal type JSON read and write support
  • ARROW-1286 - PYTHON: support Categorical serialization to/from parquet
  • ARROW-1307 - [Python] Expand IPC section to include object serialization, Feather format. Add Feather functions to API listing
  • ARROW-1317 - [Python] Attempt to set Hadoop CLASSPATH when using JNI
  • ARROW-1331 - [JAVA] include package statement
  • ARROW-1331 - [JAVA] Refactor unit tests
  • ARROW-1339 - [C++] Use of boost::filesystem::path to handle file paths
  • ARROW-1344 - [C++] Do not permit writing to closed BufferOutputStream
  • ARROW-1348 - [C++/Python] Release verification script for Windows
  • ARROW-1351 - Update CHANGELOG.md in 00-prepare.sh when creating release candidate
  • ARROW-1352 - [Integration] Added specific formatting for producer consumer output
  • ARROW-1355 - [Java] Make Arrow buildable with jdk9
  • ARROW-1356 - [Website] Add new committers
  • ARROW-1358 - Update sha{1, 256, 512} checksums per latest ASF release policy
  • ARROW-1359 - [C++] Add flavor=‘spark’ option to write_parquet that sanitizes schema field names
  • ARROW-1364 - [C++] IPC support machinery for record batch roundtrips to GPU device memory
  • ARROW-1366 - [Plasma] Define entry point for the plasma store
  • ARROW-1372 - [Plasma] enable HUGETLB support on Linux to improve plasma put performance
  • ARROW-1376 - [C++] RecordBatchStreamReader::Open API is inconsistent with writer
  • ARROW-1377 - [Python] Add ParquetFile.scan_contents function to use for benchmarking
  • ARROW-1381 - [Python] Use FixedSizeBufferWriter in SerializedPyObject.to_buffer
  • ARROW-1383 - [C++] Add vector append variant to primitive array builders that accepts std::vector
  • ARROW-1384 - [C++] Add SerializeRecordBatch API for writing a record batch as an IPC message to a new buffer
  • ARROW-1386 - [C++] Unpin CMake version in MSVC toolchain builds
  • ARROW-1387 - [C++] Set up GPU leaf library, add unit test module for CUDA tests
  • ARROW-1392 - [C++] Add GPU IO interfaces for CUDA
  • ARROW-1395 - [C++/Python] Remove APIs deprecated from 0.5.0 onward
  • ARROW-1396 - [C++] Add PrettyPrint for schemas that outputs dictionaries
  • ARROW-1397 - [Packaging] Use Docker instead of Vagrant
  • ARROW-1399 - [C++] Add CUDA build version defines in public headers
  • ARROW-1400 - [Python] Adding parquet.write_to_dataset() method for writing partitioned .parquet files
  • ARROW-1401 - [C++] Add note to readme about ARROW_EXTRA_ERROR_CONTEXT
  • ARROW-1401 - [C++] Add ARROW_EXTRA_ERROR_CONTEXT option
  • ARROW-1402 - [C++] Deprecate APIs which return std::shared_ptr in favor of std::shared_ptr
  • ARROW-1404 - [Packaging] Build .deb and .rpm on Travis CI
  • ARROW-1405 - [Python] Expose LoggingMemoryPool in Python API
  • ARROW-1406 - [Python] Harden user API for generating serialized schema and record batch messages as memoryview-compatible objects
  • ARROW-1408 - [C++] IPC public API cleanup, refactoring. Add SerializeSchema, ReadSchema public APIs
  • ARROW-1410 - Remove MAP_POPULATE flag when mmapping files in Plasma store.
  • ARROW-1412 - [Plasma] Add higher level API for putting and getting Python objects
  • ARROW-1413 - [C++] Add include-what-you-use configuration
  • ARROW-1415 - [GLib] Support date32 and date64
  • ARROW-1416 - Clarify memory layout documentation
  • ARROW-1417 - [Python] Allow more generic filesystem objects to be passed to ParquetDataset
  • ARROW-1418 - [Python] Introduce SerializationContext to register custom serialization callbacks
  • ARROW-1419 - [GLib] Suppress sign-conversion warnings
  • ARROW-1427 - [GLib] Add arrow cpp link to readme
  • ARROW-1428 - [C++] Append steps to clone source code to README.mb
  • ARROW-1432 - [C++] Build bundled jemalloc functions with private prefix
  • ARROW-1433 - [C++] Simplify Array::Slice to be non-virtual
  • ARROW-1438 - [Python] Pull serialization context through PlasmaClient put and get
  • ARROW-1441 - [Site] Add Ruby to Flexible section
  • ARROW-1442 - [Website] Add note about nightly builds to /install
  • ARROW-1447 - [C++] Fix many include-what-you-use warnings
  • ARROW-1448 - [Packaging] Support uploading built .deb and .rpm to Bintray
  • ARROW-1449 - Implement Decimal using only Int128
  • ARROW-1451 - [C++] Add public API file for IO section in arrow/io/api.h
  • ARROW-1460 - [C++] Pin clang-format at LLVM 4.0
  • ARROW-1462 - [GLib] Add GArrowTime32Array and GArrowTime64Array
  • ARROW-1466 - [C++] Implement PrettyPrint for DecimalArray
  • ARROW-1468 - [C++] Add primitive Append variants that accept std::vector
  • ARROW-1479 - [JS] Expand JavaScript implementation
  • ARROW-1480 - [Python] Improve performance of serializing sets
  • ARROW-1481 - [C++] Expose type casts as generic callable object that can write into pre-allocated memory
  • ARROW-1494 - [C++] Improve doxygen comments in arrow/table.h, note that RecordBatch::column returns new object
  • ARROW-1499 - [Python] Consider adding option to parquet.write_table that sets options for maximum Spark compatibility
  • ARROW-1504 - [GLib] Add GArrowTimestampArray
  • ARROW-1505 - [GLib] Simplify arguments check
  • ARROW-1506 - [C++] Add .pc for compute modules
  • ARROW-1508 - C++: Add support for FixedSizeBinaryType in DictionaryBuilder
  • ARROW-1510 - [GLib] Support cast
  • ARROW-1511 - [C++] Promote ArrayData, MakeArray to public API, deprecate MakePrimitiveArray
  • ARROW-1513 - C++: Add cast from Dictionary to plain arrays
  • ARROW-1515 - [GLib] Detect version directly
  • ARROW-1516 - [GLib] Update document
  • ARROW-1517 - Remove unnecessary temporary in DecimalUtil::ToString function
  • ARROW-1519 - [C++] Move DecimalUtil functions to methods on the Int128 class
  • ARROW-1528 - [GLib] Resolve recursive include dependency
  • ARROW-1530 - [C++] Install arrow/util/parallel.h
  • ARROW-1551 - [Website] Updates for 0.7.0 release
  • ARROW-1597 - [Packaging] arrow-compute.pc is missing in .deb/.rpm file list

Apache Arrow 0.6.0 (2017-08-14)

Bug Fixes

  • ARROW-187 - [C++] Add development style notes to C++ README, note about esoteric exceptions in constructors
  • ARROW-276 - [JAVA] Nullable Vectors should extend BaseValueVector and not Bas…
  • ARROW-573 - [C++/Python] Implement IPC metadata handling for ordered dictionaries, pandas conversions
  • ARROW-884 - [C++] Exclude internal namespaces from generated Doxygen docs
  • ARROW-932 - [Python] Fix MSVC compiler warnings, build Python with /WX and -Werror in CI
  • ARROW-968 - [Python] Support slices in RecordBatch.getitem
  • ARROW-1192 - [JAVA] Use buffer slice for splitAndTransfer in List and Union Vectors.
  • ARROW-1195 - [C++] CpuInfo init with cores number, frequency and cache…
  • ARROW-1204 - [C++] Remove WholeProgramOptimization(/GL) compilation fl…
  • ARROW-1225 - [Python] Decode bytes to utf8 unicode if possible when passing explicit utf8 type to pyarrow.array
  • ARROW-1237 - [JAVA] expose the ability to set lastSet
  • ARROW-1239 - [JAVA] upgrading git-commit-id-plugin
  • ARROW-1240 - [JAVA] security: upgrade logback to address CVE-2017-5929 (take 2)
  • ARROW-1240 - [JAVA] security: upgrade slf4j to 1.7.25 and logback to 1.2.3
  • ARROW-1241 - [C++] Appveyor build matrix extended with Visual Studio 2…
  • ARROW-1242 - [JAVA] - upgrade jackson to mitigate security vulnerabilities (take 2)
  • ARROW-1242 - [JAVA] - upgrade jackson to mitigate security vulnerabilities
  • ARROW-1245 - [Integration] Enable JavaTester in Integration tests
  • ARROW-1248 - [Python] Suppress return-type-c-linkage warning in Cython clang builds
  • ARROW-1249 - [JAVA] expose fillEmpties from Nullable variable length vectors
  • ARROW-1263 - [C++] Get CPU info on Windows; Resolve patching whitespac…
  • ARROW-1265 - [Plasma] Clean up all resources on SIGTERM to keep valgrind output clean
  • ARROW-1267 - [Java] Handle zero length case in BitVector.splitAndTransfer
  • ARROW-1269 - [Packaging] Add Windows wheel build scripts from ARROW-1068 to arrow-dist
  • ARROW-1275 - [C++] Deafult Snappy static lib suffix updated to “_static”
  • ARROW-1276 - enable parquet serialization of empty DataFrames
  • ARROW-1283 - [JAVA] Allow VectorSchemaRoot to close more than once
  • ARROW-1285 - [Python] Delete any incomplete file when attempt to write single Parquet file fails
  • ARROW-1287 - [Python] Implement whence argument for pyarrow.NativeFile.seek
  • ARROW-1290 - [C++] Double buffer size when exceeding capacity in arrow::BufferBuilder as in array builders
  • ARROW-1291 - [Python] Cast non-string DataFrame columns to strings in RecordBatch/Table.from_pandas
  • ARROW-1294 - [C++] Pin cmake=3.8.0 in MSVC toolchain build
  • ARROW-1296 - [Java] Fix allocationSizeInBytes in FixedValueVectors.res…
  • ARROW-1300 - [JAVA] Fix Tests for ListVector
  • ARROW-1306 - [C++] Use UTF8 filenames in local file error messages
  • ARROW-1308 - [C++] Link utility executables to Arrow shared library if ARROW_BUILD_STATIC=off
  • ARROW-1309 - [Python] Handle nested lists with all None values in Array.from_pandas
  • ARROW-1310 - [JAVA] revert changes made in ARROW-886
  • ARROW-1311 - python hangs after write a few parquet tables
  • ARROW-1312 - [Python] Follow-up: do not use jemalloc in manylinux1 builds
  • ARROW-1312 - [C++] Make ARROW_JEMALLOC OFF by default until ARROW-1282 is resolved
  • ARROW-1326 - [Python] Fix Sphinx Build in Travis CI, treat Sphinx warnings as errors
  • ARROW-1327 - [Python] Always release GIL before calling check_status in Cython
  • ARROW-1328 - [Python] Set correct Arrow type when coercing to milliseconds and passing explicit type
  • ARROW-1330 - [Plasma] Turn on plasma tests on manylinux1
  • ARROW-1335 - [C++] Add offset to PrimitiveArray::raw_values to make consistent with other raw_values
  • ARROW-1338 - [Python] Do not close RecordBatchWriter on dealloc in case sink is no longer valid
  • ARROW-1340 - [Java] Fix NullableMapVector field metadata
  • ARROW-1342 - [Python] Support strided ndarrays in pandas conversion from nested lists
  • ARROW-1343 - [Java] Aligning serialized schema, end of buffers in RecordBatches
  • ARROW-1350 - [C++] Do not exclude Plasma source tree from source release

New Features and Improvements

  • ARROW-439 - [Python] Add option in “to_pandas” conversions to yield Categorical from String/Binary arrays
  • ARROW-622 - [Python] Add coerce_timestamps option to parquet.write_table, deprecate timestamps_to_ms argument
  • ARROW-1076 - [Python] Handle nanosecond timestamps more gracefully when writing to Parquet format
  • ARROW-1093 - [Python] Run flake8 in Travis CI. Add note about development to README
  • ARROW-1104 - Integrate in-memory object store from Ray
  • ARROW-1116 - [Python] Create single external GitHub repo building for building wheels for all platforms in one shot
  • ARROW-1121 - [C++] Improve error message when opening OS file fails
  • ARROW-1140 - [C++] Allow optional build of plasma
  • ARROW-1149 - [Plasma] Create Cython client library for Plasma
  • ARROW-1173 - [Plasma] Add blog post describing Plasma object store
  • ARROW-1211 - [C++] Enable builder classes to automatically use the default memory pool
  • ARROW-1213 - [Python] Support s3fs filesystem for Amazon S3 in ParquetDataset
  • ARROW-1219 - [C++] Use Google C++ code formatting
  • ARROW-1224 - [Format] Clarify language around buffer padding and align…
  • ARROW-1230 - [Plasma] Install libraries and headers
  • ARROW-1243 - [JAVA] update all libs to latest versions
  • ARROW-1246 - [Format] Draft Flatbuffer metadata description for Map
  • ARROW-1251 - [C++] Update C++ README to account for toolchain evolution
  • ARROW-1253 - [C++/Python] Speed up C++ / Python builds by using conda-forge toolchain for thirdparty libraries
  • ARROW-1255 - [Plasma] Fix typo in plasma protocol; add DCHECK for ReadXXX in plasma protocol.
  • ARROW-1256 - [Plasma] Fix compile warnings on macOS
  • ARROW-1257 - Plasma documentation
  • ARROW-1258 - [C++] Suppress Clang dlmalloc compiler warnings
  • ARROW-1259 - [Plasma] Speed up plasma tests
  • ARROW-1260 - [Plasma] Use factory method to create Python PlasmaClient
  • ARROW-1264 - [Python] Raise exception in Python instead of aborting if cannot connect to Plasma store
  • ARROW-1268 - [WEBSITE] Added blog post for Spark integration toPandas()
  • ARROW-1270 - [Packaging] Add Python wheel build scripts for macOS to arrow-dist
  • ARROW-1272 - [Python] Add script to arrow-dist to generate and upload manylinux1 Python wheels
  • ARROW-1273 - [Python] Add Parquet read_metadata, read_schema convenience functions
  • ARROW-1274 - [C++] Fix CMake >= 3.3 warning. Also add option to suppress ExternalProject output
  • ARROW-1281 - [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python
  • ARROW-1288 - Fix many license headers to use proper ASF one
  • ARROW-1289 - [Python] Add PYARROW_BUILD_PLASMA CMake option, follow semantics of --with-parquet
  • ARROW-1297 - 0.6.0 Release
  • ARROW-1301 - [C++/Python] More complete filesystem API for HDFS
  • ARROW-1303 - [C++] Support downloading Boost
  • ARROW-1304 - [Java] Fix Indentation, WhitespaceAround and EmptyLineSeparator checkstyle warnings in Java
  • ARROW-1305 - [GLib] Add GArrowIntArrayBuilder
  • ARROW-1315 - [GLib] Add missing status check for arrow::ArrayBuilder::Finish()
  • ARROW-1323 - [GLib] Add garrow_boolean_array_get_values()
  • ARROW-1333 - [Plasma] Example code for using Plasma to sort a DataFrame
  • ARROW-1334 - [C++] Add alternate Table constructor that takes vector of Array
  • ARROW-1336 - [C++] Add arrow::schema factory function, simply some awkward constructors
  • ARROW-1353 - [Website] Updates + blog post for 0.6.0 release

Apache Arrow 0.5.0 (2017-07-23)

New Features and Improvements

  • ARROW-111 - [C++] Add static analyzer to tool chain to verify checking of Status returns
  • ARROW-195 - [C++] Upgrade clang bits to clang-3.8 and move back to trusty.
  • ARROW-460 - [C++] JSON read/write for dictionaries
  • ARROW-462 - [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent
  • ARROW-575 - Python: Auto-detect nested lists and nested numpy arrays in Pandas
  • ARROW-597 - [Python] Add read_pandas convenience to stream and file reader classes. Add some data type docstrings
  • ARROW-599 - [C++] Lz4 compression codec support
  • ARROW-599 - CMake support of LZ4 compression lib
  • ARROW-600 - ZSTD compression lib support
  • ARROW-692 - Integration test data generator for dictionary types
  • ARROW-693 - [Java] Add dictionary support to JSON reader and writer
  • ARROW-742 - [C++] Use gflags from toolchain; Resolve cmake FindGFlags …
  • ARROW-742 - [C++] std::wstring_convert exceptions handling
  • ARROW-834 - Python Support creating from iterables
  • ARROW-915 - [Python] Struct Array reads limited support
  • ARROW-935 - [Java] Build Javadoc and site with OpenJDK8 in Java CI build
  • ARROW-960 - Add section on how to develop with pip
  • ARROW-962 - [Python] Add schema attribute to RecordBatchFileReader
  • ARROW-964 - [Python] Improve api docs
  • ARROW-966 - [Python] Also accept Field instance in pyarrow.list_
  • ARROW-978 - [Python] - Change python documentation sphinx theme to bootstrap
  • ARROW-1041 - [Python] Support read_pandas on a directory of Parquet files
  • ARROW-1048 - Use existing LD_LIBRARY_PATH in source release script to accommodate non-system toolchain libs
  • ARROW-1052 - Arrow 0.5.0 release
  • ARROW-1071 - [Python] RecordBatchFileReader does not have a schema property
  • ARROW-1073 - C++: Adapative integer builder
  • ARROW-1095 - Add Arrow logo PNG to website img folder
  • ARROW-1100 - [Python] Add mode property to NativeFile
  • ARROW-1102 - Make MessageSerializer.serializeMessage() public
  • ARROW-1120 - Support for writing timestamp(ns) to Int96
  • ARROW-1122 - [Website] Change timestamp to yield correct Jekyll date
  • ARROW-1122 - [Website] Add turbodbc + arrow blog post
  • ARROW-1123 - Make jemalloc the default allocator
  • ARROW-1135 - [C++] Use clang 4.0 in one of the Linux builds
  • ARROW-1137 - Python: Ensure Pandas roundtrip of all-None column
  • ARROW-1142 - [C++] Port over compression toolchain and interfaces from parquet-cpp, use Arrow-style error handling
  • ARROW-1145 - [GLib] Add get_values()
  • ARROW-1146 - Add .gitignore for *_generated.h files in src/plasma/format
  • ARROW-1148 - [C++] Raise minimum CMake version to 3.2
  • ARROW-1151 - [C++] Add branch prediction to RETURN_NOT_OK
  • ARROW-1154 - [C++] Import miscellaneous computational utility code from parquet-cpp
  • ARROW-1160 - C++: Implement DictionaryBuilder
  • ARROW-1165 - [C++] Refactor PythonDecimalToArrowDecimal to not use templates
  • ARROW-1172 - [C++] Refactor to use unique_ptr for builders
  • ARROW-1183 - [Python] Implement pandas conversions between Time32, Time64 types and datetime.time
  • ARROW-1185 - [C++] Status class cleanup, warn_unused_result attribute and Clang warning fixes
  • ARROW-1187 - Python: Feather: Serialize a DataFrame with None column
  • ARROW-1193 - [C++] Support pkg-config for arrow_python.so
  • ARROW-1196 - [C++] Release, Debug, Toolchain, NMake Generator Appveyor…
  • ARROW-1198 - Python: Add public C++ API to unwrap PyArrow object
  • ARROW-1199 - [C++] Implement mutable POD struct for Array data
  • ARROW-1202 - [C++] Remove semicolons from status macros
  • ARROW-1212 - [GLib] Add garrow_binary_array_get_offsets_buffer()
  • ARROW-1214 - [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings
  • ARROW-1217 - [GLib] Add GInputStream based arrow::io::RandomAccessFile
  • ARROW-1220 - [C++] Cmake script errors out if lib is not found under *…
  • ARROW-1221 - [C++] Add run_clang_format.py script, exclusions file. Pin clang-format-3.9
  • ARROW-1227 - [GLib] Support GOutputStream
  • ARROW-1229 - [GLib] Use “read” instead of “get” for reading record batch
  • ARROW-1244 - Exclude C++ Plasma source tree when creating source release
  • ARROW-1252 - [Website] Update for 0.5.0 release, add blog post summarizing changes from 0.4.x

Bug Fixes

  • ARROW-288 - Implement Arrow adapter for Spark Datasets
  • ARROW-601 - Some logical types not supported when loading Parquet
  • ARROW-784 - Cleaning up thirdparty toolchain support in Arrow on Windows
  • ARROW-785 - possible issue on writing parquet via pyarrow, subsequently read in Hive
  • ARROW-924 - Setting GTEST_HOME Fails on CMake run
  • ARROW-992 - [Python] Try to set a version in in-place local builds
  • ARROW-1043 - [Python] Make sure pandas metadata created by arrow conforms to the pandas spec
  • ARROW-1074 - Support lists and arrays in pandas DataFrames without explicit schema
  • ARROW-1079 - [Python] Filter out private directories when building Parquet dataset manifest
  • ARROW-1081 - Fill null_bitmap correctly in TestBase
  • ARROW-1096 - [C++] CreateFileMapping maximum size calculation issue
  • ARROW-1097 - Reading tensor needs file to be opened in writeable mode
  • ARROW-1098 - . [Format] modify document mistake
  • ARROW-1101 - Implement write(TypeHolder) methods in UnionListWriter
  • ARROW-1103 - [Python] Support read_pandas (with index metadata) on directory of Parquet files
  • ARROW-1107 - [JAVA] Fix getField() for NullableMapVector
  • ARROW-1108 - [JAVA] Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
  • ARROW-1109 - [JAVA] transferOwnership fails when readerIndex is not 0
  • ARROW-1110 - [JAVA] make union vector naming consistent
  • ARROW-1111 - [JAVA] Make aligning buffers optional, and allow -1 for unknown null count
  • ARROW-1112 - [JAVA] Set lastSet for VarLength and List vectors when loading
  • ARROW-1113 - [C++] Upgrade to gflags 2.2.0, use tarball instead of git tag
  • ARROW-1115 - [C++] use CCACHE_FOUND value for ccache path
  • ARROW-1117 - [Docs] Minor issues in GLib README
  • ARROW-1124 - Increase numpy dependency to >=1.10.x
  • ARROW-1125 - Python: Add public C++ API to unwrap PyArrow object
  • ARROW-1125 - partial schemas for Table.from_pandas
  • ARROW-1128 - [Docs] command to build a wheel is not properly rendered
  • ARROW-1129 - [C++] Fix gflags issue in Linux/macOS toolchain builds
  • ARROW-1130 - io-hdfs-test failure
  • ARROW-1131 - [Python] Enable the Parquet unit tests by default if the extension imports
  • ARROW-1132 - [Python] Unable to write pandas DataFrame w/MultiIndex containing duplicate values to parquet
  • ARROW-1136 - [C++] Add null checks for invalid streams
  • ARROW-1138 - Travis: Use OpenJDK7 instead of OracleJDK7
  • ARROW-1139 - Silence dlmalloc warning on clang-4.0
  • ARROW-1141 - on import get libjemalloc.so.2: cannot allocate memory in static TLS block
  • ARROW-1143 - C++: Fix comparison of NullArray
  • ARROW-1144 - [C++] Remove unused variable
  • ARROW-1147 - [C++] Allow optional vendoring of flatbuffers in plasma
  • ARROW-1150 - Silence AdaptiveIntBuilder compiler warning on MSVC
  • ARROW-1152 - [Cython] read_tensor should work with a readable file
  • ARROW-1153 - All non-Pandas column throws NotImplemented: unhandled type
  • ARROW-1155 - [Python] Add null check when user improperly instantiates ArrayValue instances
  • ARROW-1157 - C++/Python: Decimal templates are not correctly exported on OSX
  • ARROW-1159 - [C++] Use dllimport for visibility when not building Arrow library
  • ARROW-1162 - Empty data vector transfer between list vectors should no…
  • ARROW-1164 - C++: Templated functions need ARROW_EXPORT instead of ARROW_TEMPLATE_EXPORT
  • ARROW-1166 - Fix errors in example and missing reference in Layout.md
  • ARROW-1167 - [Python] Support chunking string columns in Table.from_pandas
  • ARROW-1168 - [Python] pandas metadata may contain “mixed” data types
  • ARROW-1169 - [C++] jemalloc externalproject doesn‘t build with CMake’s ninja generator
  • ARROW-1170 - C++: Link to pthread on ARROW_JEMALLOC=OFF
  • ARROW-1174 - [GLib] Fix ListArray test failure
  • ARROW-1177 - [C++] Check for int32 offset overflow in ListBuilder, BinaryBuilder
  • ARROW-1179 - C++: Add missing virtual destructors
  • ARROW-1180 - [GLib] Fix a returning invalid address bug in garrow_tensor_get_dimension_name()
  • ARROW-1181 - [Python] Parquet multiindex test should be optional
  • ARROW-1182 - C++: Specify BUILD_BYPRODUCTS for zlib and zstd
  • ARROW-1186 - [C++] Add support to build only Parquet dependencies
  • ARROW-1188 - [Python] Handle Feather case where category values are null type
  • ARROW-1190 - [JAVA] Fixing VectorLoader for duplicate field names
  • ARROW-1191 - [JAVA] Implement getField() method for complex readers
  • ARROW-1194 - [Python] Expose MockOutputStream in pyarrow.
  • ARROW-1197 - [GLib] Fix a bug that record batch related functions for C++ aren't included
  • ARROW-1200 - C++: Switch DictionaryBuilder to signed integers
  • ARROW-1201 - [Python] Incomplete Python types cause a core dump when repr-ing
  • ARROW-1203 - [C++] Disallow BinaryBuilder to append byte strings larger than the maximum value of int32_t
  • ARROW-1205 - C++: Reference to type objects in ArrayLoader may cause segmentation faults
  • ARROW-1206 - [C++] Add finer grained control of compression library support, do not expose symbols which may not be built in compression.h
  • ARROW-1208 - [C++] Install zstd from conda for Toolchain Appveyor buil…
  • ARROW-1208 - [C++] Temporary remove conda's build of zstd from Toolcha…
  • ARROW-1215 - [Python] Generate documentation for class members in API Reference
  • ARROW-1216 - [Python] Fix creating numpy array from arrow buffers on python 2
  • ARROW-1218 - [C++] Fix arrow build if no compression library is used
  • ARROW-1222 - [Python] Raise exception when passing unsupported Python object type to pyarrow.array
  • ARROW-1223 - [GLib] Fix function name that returns wrapped object
  • ARROW-1228 - [GLib] Fix test file name
  • ARROW-1233 - [C++] Validate libs availability in conda toolchain
  • ARROW-1235 - [C++] Make operator<< for Array/Status and std::ostream inline
  • ARROW-1236 - Fix lib path in pkg-config file
  • ARROW-1284 - Windows can't install pyarrow 0.4.1 and 0.5.0

Apache Arrow 0.4.1 (2017-06-09)

Bug Fixes

  • ARROW-424 - [C++] Make ReadAt, Write HDFS functions threadsafe
  • ARROW-1039 - Python: pyarrow.Filesystem.read_parquet causing error if nthreads>1
  • ARROW-1050 - [C++] Export arrow::ValidateArray
  • ARROW-1051 - [Python] Opt in to Parquet unit tests to avoid accidental suppression of dynamic linking errors
  • ARROW-1056 - [Python] Ignore pandas index in parquet+hdfs test
  • ARROW-1057 - Fix cmake warning and msvc debug asserts
  • ARROW-1060 - [Python] Add unit tests for reference counts in memoryview interface
  • ARROW-1062 - [GLib] Follow API changes in examples
  • ARROW-1066 - [Python] pandas 0.20.1 deprecation of pd.lib causes a warning on import
  • ARROW-1070 - [C++] Use physical types for Feather date/time types
  • ARROW-1075 - [GLib] Fix build error on macOS
  • ARROW-1082 - [GLib] Add CI on macOS
  • ARROW-1085 - [java] Follow up on template cleanup. Missing method for …
  • ARROW-1086 - include additional pxd files during package build
  • ARROW-1088 - [Python] Only test unicode filenames if system supports them
  • ARROW-1090 - Improve build_ext usability with --bundle-arrow-cpp
  • ARROW-1091 - Decimal scale and precision are flipped
  • ARROW-1092 - More Decimal and scale flipped follow-up
  • ARROW-1094 - [C++] Always truncate buffer read in ReadableFile::Read if actual number of bytes less than request
  • ARROW-1127 - pyarrow 4.1 import failure on Travis

New Features and Improvements

  • ARROW-897 - [GLib] Extract CI configuration for GLib
  • ARROW-986 - [Format] Add brief explanation of dictionary batches in IPC.md
  • ARROW-990 - [JS] Add tslint support for linting TypeScript
  • ARROW-1020 - [Format] Revise language for Timestamp type in Schema.fbs to avoid possible confusion about tz-naive timestamps
  • ARROW-1034 - [PYTHON] Resolve wheel build issues on Windows
  • ARROW-1049 - [java] vector template cleanup
  • ARROW-1063 - [Website] Updates for 0.4.0 release, release posting
  • ARROW-1068 - [Python] Create external repo with appveyor.yml configured for building Python wheel installers
  • ARROW-1069 - Add instructions for publishing maven artifacts
  • ARROW-1078 - [Python] Account for Apache Parquet shared library consolidation
  • ARROW-1080 - C++: Add tutorial about converting to/from row-wise representation
  • ARROW-1084 - Implementations of BufferAllocator should handle Netty's OutOfDirectMemoryError
  • ARROW-1118 - [Website] Site updates for 0.4.1

Apache Arrow 0.4.0 (2017-05-22)

Bug Fixes

  • ARROW-813 - [Python] setup.py sdist must also bundle dependent cmake m…
  • ARROW-824 - Date and Time Vectors should reflect timezone-less semantics
  • ARROW-856 - Also read compiler info from stdout
  • ARROW-909 - Link jemalloc statically if build as external project
  • ARROW-939 - fix division by zero if one of the tensor dimensions is zero
  • ARROW-940 - [JS] Generate multiple artifacts
  • ARROW-944 - Python: Compat broken for pandas==0.18.1
  • ARROW-948 - [GLib] Update C++ header file list
  • ARROW-952 - fix regex include from C++ standard library
  • ARROW-958 - [Python] Fix conda source build instructions
  • ARROW-979 - [Python] Fix setuptools_scm version when release tag is not in the master timeline
  • ARROW-991 - [Python] Create new dtype when deserializing from Arrow to NumPy datetime64
  • ARROW-995 - [Website] Fix a typo
  • ARROW-998 - [Format] Clarify that the IPC file footer contains an additional copy of the schema
  • ARROW-1003 - [C++] Check flag _WIN32 instead of __WIN32
  • ARROW-1004 - [Python] Add conversions for numpy object arrays with integers and floats
  • ARROW-1017 - [Python] Fix memory leaks in conversion to pandas.DataFrame
  • ARROW-1023 - Python: Fix bundling of arrow-cpp for macOS
  • ARROW-1033 - [Python] pytest discovers scripts/test_leak.py
  • ARROW-1045 - [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.*
  • ARROW-1046 - [Python] Reconcile pandas metadata spec
  • ARROW-1053 - [Python] Remove unnecessary Py_INCREF in PyBuffer causing memory leak
  • ARROW-1054 - [Python] Test suite fails on pandas 0.19.2
  • ARROW-1061 - [C++] Harden decimal parsing against invalid strings
  • ARROW-1064 - ModuleNotFoundError: No module named ‘pyarrow._parquet’

New Features and Improvements

  • ARROW-29 - [C++] FindRe2 cmake module
  • ARROW-182 - [C++] Factor out Array::Validate into a separate function
  • ARROW-376 - Python: Convert non-range Pandas indices (optionally) to Arrow
  • ARROW-446 - [Python] Expand Sphinx documentation for 0.3
  • ARROW-482 - [Java] Exposing custom field metadata
  • ARROW-532 - [Python] Expand pyarrow.parquet documentation for 0.3 release
  • ARROW-579 - Python: Provide redistributable pyarrow wheels on OSX
  • ARROW-596 - [Python] Add convenience function to convert pandas.DataFrame to pyarrow.Buffer containing a file or stream representation
  • ARROW-629 - [JS] Add unit test suite
  • ARROW-714 - [C++] Add import_pyarrow C API in the style of NumPy for thirdparty C++ users
  • ARROW-819 - Public Cython and C++ API in the style of lxml, arrow::py::import_pyarrow method
  • ARROW-872 - [JS] Read streaming format
  • ARROW-873 - [JS] Implement fixed width list type
  • ARROW-874 - [JS] Read dictionary-encoded vectors
  • ARROW-881 - [Python] Reconstruct Pandas DataFrame indexes using metadata
  • ARROW-891 - [Python] Expand Windows build instructions to not require looking at separate C++ docs
  • ARROW-899 - [Doc] Add 0.3.0 changelog
  • ARROW-901 - [Python] Add Parquet unit test for fixed size binary
  • ARROW-913 - [Python] Only link jemalloc to the Cython extension where it's needed
  • ARROW-923 - Changelog generation Python script, add 0.1.0 and 0.2.0 changelog
  • ARROW-929 - Remove KEYS file from git
  • ARROW-943 - [GLib] Support running unit tests with source archive
  • ARROW-945 - [GLib] Add a Lua example to show Torch integration
  • ARROW-946 - [GLib] Use “new” instead of “open” for constructor name
  • ARROW-947 - [Python] Improve execution time of manylinux1 build
  • ARROW-953 - Use conda-forge cmake, curl in CI toolchain
  • ARROW-954 - Flag for compiling Arrow with header-only boost
  • ARROW-956 - [Python] compat with pandas >= 0.20.0
  • ARROW-957 - [Doc] Add HDFS and Windows documents to doxygen output
  • ARROW-961 - [Python] Rename InMemoryOutputStream to BufferOutputStream
  • ARROW-963 - [GLib] Add equal
  • ARROW-967 - [GLib] Support initializing array with buffer
  • ARROW-970 - [Python] Nicer experience if user accidentally calls pyarrow.Table ctor directly
  • ARROW-977 - [java] Add Timezone aware timestamp vectors
  • ARROW-980 - Fix detection of “msvc” COMPILER_FAMILY
  • ARROW-982 - [Website] Improve website front copy to highlight serialization efficiency benefits
  • ARROW-984 - [GLib] Add Go examples
  • ARROW-985 - [GLib] Update package information
  • ARROW-988 - [JS] Add entry to Travis CI matrix
  • ARROW-993 - [GLib] Add missing error checks in Go examples
  • ARROW-996 - [Website] Add 0.3.0 release announce in Japanese
  • ARROW-997 - [Java] Implementing transferPair for FixedSizeListVector
  • ARROW-1000 - [GLib] Move install document to Website
  • ARROW-1001 - [GLib] Unify writer files
  • ARROW-1002 - [C++] Fix inconsistency with padding at start of IPC file format
  • ARROW-1008 - [C++] Add abstract stream writer and reader C++ APIs. Give clearer names to IPC reader/writer classes
  • ARROW-1010 - [Website] Provide for translations without repeating blog post in blogroll
  • ARROW-1011 - [FORMAT] fix typo and mistakes in Layout.md
  • ARROW-1014 - 0.4.0 release
  • ARROW-1015 - [Java] Schema-level metadata
  • ARROW-1016 - Python: Include C++ headers (optionally) in wheels
  • ARROW-1022 - [Python] Add multithreaded read option to read_feather
  • ARROW-1024 - Python: Update build time numpy version to 1.10.1
  • ARROW-1025 - [Website] Improved changelog for website, include git shortlog
  • ARROW-1027 - [Python] Allow negative indexing in fields/columns on pyarrow Table and Schema objects
  • ARROW-1028 - [Python] Fix IPC docs per API changes
  • ARROW-1029 - [Python] Fixes for building pyarrow with Parquet support on MSVC. Add to appveyor build
  • ARROW-1030 - Python: Account for library versioning in parquet-cpp
  • ARROW-1031 - [GLib] Support pretty print
  • ARROW-1037 - [GLib] Follow reader name change
  • ARROW-1038 - [GLib] Follow writer name change
  • ARROW-1040 - [GLib] Support tensor IO
  • ARROW-1044 - [GLib] Support Feather
  • ARROW-1126 - Python: Add function to convert NumPy/Pandas dtypes to Arrow DataTypes

Apache Arrow 0.3.0 (2017-05-05)

Bug Fixes

  • ARROW-109 - [C++] Add nesting stress tests up to 500 recursion depth
  • ARROW-208 - Add checkstyle policy to java project
  • ARROW-347 - Add method to pass CallBack when creating a transfer pair
  • ARROW-413 - DATE type is not specified clearly
  • ARROW-431 - [Python] Review GIL release and acquisition in to_pandas conversion
  • ARROW-443 - [Python] Support ingest of strided NumPy arrays from pandas
  • ARROW-451 - [C++] Implement DataType::Equals as TypeVisitor. Add default implementations for TypeVisitor, ArrayVisitor methods
  • ARROW-454 - pojo.Field doesn't implement hashCode()
  • ARROW-526 - [Format] Revise Format documents for evolution in IPC stream / file / tensor formats
  • ARROW-565 - [C++] Examine “Field::dictionary” member
  • ARROW-570 - Determine Java tools JAR location from project metadata
  • ARROW-584 - [C++] Fix compiler warnings exposed with -Wconversion
  • ARROW-586 - Problem with reading parquet files saved by Apache Spark
  • ARROW-588 - [C++] Fix some 32 bit compiler warnings
  • ARROW-595 - [Python] Set schema attribute on StreamReader
  • ARROW-604 - Python: boxed Field instances are missing the reference to their DataType
  • ARROW-611 - [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
  • ARROW-613 - WIP TypeScript Implementation
  • ARROW-617 - [Format] Add additional Time metadata and comments based on discussion in ARROW-617
  • ARROW-619 - [Python] Fixed remaining typo for LD_LIBRARY_PATH
  • ARROW-619 - Fix typos in setup.py args and LD_LIBRARY_PATH
  • ARROW-623 - Fix segfault in repr of empty field
  • ARROW-624 - [C++] Restore MakePrimitiveArray function, use in feather.cc
  • ARROW-627 - [C++] Add compatibility macros for exported extern templates
  • ARROW-628 - [Python] Install nomkl metapackage when building parquet-cpp in Travis CI
  • ARROW-630 - [C++] Create boolean batches for IPC testing, properly account for nonzero offset
  • ARROW-636 - [C++] Update README about Boost system requirement
  • ARROW-639 - [C++] Invalid offset in slices
  • ARROW-642 - [Java] Remove temporary file in java/tools
  • ARROW-644 - Python: Cython should be a setup-only requirement
  • ARROW-652 - Remove trailing f in merge script output
  • ARROW-654 - [C++] Serialize timezone in IPC metadata
  • ARROW-666 - [Python] Error in DictionaryArray __repr__
  • ARROW-667 - build of arrow-master/cpp fails with altivec error?
  • ARROW-668 - [Python] Box timestamp values as pandas.Timestamp if available, attach tzinfo
  • ARROW-671 - [GLib] Install missing license file
  • ARROW-673 - [Java] Support additional Time metadata
  • ARROW-677 - [java] Fix checkstyle jcl-over-slf4j conflict issue
  • ARROW-678 - [GLib] Fix dependencies
  • ARROW-680 - [C++] Support CMake 2 or older again
  • ARROW-682 - [Integration] Check implementations against themselves
  • ARROW-683 - [C++/Python] Refactor to make Date32 and Date64 types for new metadata. Test IPC roundtrip
  • ARROW-685 - [GLib] AX_CXX_COMPILE_STDCXX_11 error running ./configure
  • ARROW-686 - [C++] Account for time metadata changes, add Time32 and Time64 types
  • ARROW-689 - [GLib] Fix install directories
  • ARROW-691 - [Java] Encode dictionary type in message format
  • ARROW-697 - JAVA Throw exception for record batches > 2GB
  • ARROW-699 - [C++] Resolve Arrow and Arrow IPC build issues on Windows;
  • ARROW-702 - fix BitVector.copyFromSafe to reAllocate instead of returning false
  • ARROW-703 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-704 - Fix bad import caused by conflicting changes
  • ARROW-709 - [C++] Restore type comparator for DecimalType
  • ARROW-713 - [C++] Fix cmake linking issue in new IPC benchmark
  • ARROW-715 - [Python] Make pandas not a hard requirement, flake8 fixes
  • ARROW-716 - [Python] Update README build instructions after moving libpyarrow to C++ tree
  • ARROW-720 - arrow should not have a dependency on slf4j bridges in com…
  • ARROW-723 - [Python] Ensure that passing chunk_size=0 when writing Parquet file does not enter infinite loop
  • ARROW-726 - [C++] Fix segfault caused when passing non-buffer object to arrow::py::PyBuffer
  • ARROW-732 - [C++] Schema comparison bugs in struct and union types
  • ARROW-736 - [Python] Mixed-type object DataFrame columns should not silently co…
  • ARROW-738 - Fix manylinux1 build
  • ARROW-739 - Don't install jemalloc in parallel
  • ARROW-740 - FileReader fails for large objects
  • ARROW-747 - [C++] Calling add_dependencies with dl causes spurious CMake warning
  • ARROW-749 - [Python] Delete partially-written Feather file when column write fails
  • ARROW-753 - [Python] Fix linker error for python-test on OS X
  • ARROW-756 - [C++] MSVC build fixes and cleanup, remove -fPIC flag from EP builds on Windows, Dev docs
  • ARROW-757 - [C++] MSVC build fails on googletest when using NMake
  • ARROW-762 - [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS
  • ARROW-776 - [GLib] Fix wrong type name
  • ARROW-777 - restore getObject behavior on Date and Time
  • ARROW-778 - Port merge tool to work on Windows
  • ARROW-780 - PYTHON_EXECUTABLE Required to be set during build
  • ARROW-781 - [C++/Python] Increase reference count of the numpy base array?
  • ARROW-783 - [Java/C++] Fixes for 0-length record batches
  • ARROW-787 - [GLib] Fix compilation error caused by introducing BooleanBuilder::Append overload
  • ARROW-789 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-793 - [GLib] Fix indent
  • ARROW-794 - [C++/Python] Disallow strided tensors in ipc::WriteTensor
  • ARROW-796 - [Java] Checkstyle additions causing build failure in some environments
  • ARROW-797 - [Python] Make more explicitly curated public API page, sphinx cleanup
  • ARROW-800 - [C++] Boost headers being transitively included in pyarrow
  • ARROW-805 - [C++] Don't throw IOError when listing empty HDFS dir
  • ARROW-809 - [C++] Do not write excess bytes in IPC writer after slicing arrays
  • ARROW-812 - Pip install pyarrow on mac failed.
  • ARROW-817 - [Python] Fix comment in date32 conversion
  • ARROW-821 - [Python] Extra file _table_api.h generated during Python build process
  • ARROW-822 - [Python] StreamWriter Wrapper for Socket and File-like Objects without tell()
  • ARROW-826 - [C++/Python] Fix compilation error on Mac with -DARROW_PYTHON=on
  • ARROW-829 - Don't deactivate Parquet dictionary encoding on column-wis…
  • ARROW-830 - [Python] Expose jemalloc memory pool and other memory pool functions in public pyarrow API
  • ARROW-836 - add test for pandas conversion of timedelta, currently unimplemented
  • ARROW-839 - [Python] Use mktime variant that is reliable on MSVC
  • ARROW-847 - Specify BUILD_BYPRODUCTS for gtest
  • ARROW-852 - Also search for ARROW libs when pkg-config provided the path
  • ARROW-853 - [Python] Only set RPATH when bundling the shared libraries
  • ARROW-858 - Remove boost_regex from arrow dependencies
  • ARROW-866 - [Python] Be robust to PyErr_Fetch returning a null exc value
  • ARROW-867 - [Python] pyarrow MSVC fixes
  • ARROW-875 - Avoid setting an extra empty in fillEmpties()
  • ARROW-879 - compat with pandas v0.20.0
  • ARROW-882 - [C++] Rename statically build library on Windows to avoid …
  • ARROW-883 - [JAVA] Introduction of new types has shifted Enumerations
  • ARROW-885 - [Python/C++] Decimal test failure on MSVC
  • ARROW-886 - [Java] Fixing reallocation of VariableLengthVector offsets
  • ARROW-887 - add default value to units for backward compatibility
  • ARROW-888 - Transfer ownership of buffer in BitVector transferTo()
  • ARROW-895 - Fix lastSet in fillEmpties() and copyFrom()
  • ARROW-900 - [Python] Fix UnboundLocalError in ParquetDatasetPiece.read
  • ARROW-903 - [GLib] Remove a needless “.”
  • ARROW-914 - [C++/Python] Fix Decimal ToBytes
  • ARROW-922 - Allow Flatbuffers and RapidJSON to be used locally on Windows
  • ARROW-927 - C++/Python: Add manylinux1 builds to Travis matrix
  • ARROW-928 - [C++] Detect supported MSVC versions
  • ARROW-933 - [Python] Remove debug print statement
  • ARROW-934 - [GLib] Glib sources missing from result of 02-source.sh
  • ARROW-936 - add missing file; revert tag change
  • ARROW-936 - fix release README
  • ARROW-938 - Fix Rat license warnings

New Features and Improvements

  • ARROW-6 - Hope to add development document
  • ARROW-39 - C++: Logical chunked arrays / columns: conforming to fixed chunk sizes
  • ARROW-52 - Set up project blog
  • ARROW-95 - Add Jekyll-based website publishing toolchain, migrate existing arrow-site
  • ARROW-98 - Java: API documentation
  • ARROW-99 - C++: Explore if RapidCheck may be helpful for testing / worth adding to toolchain
  • ARROW-183 - C++: Add storage type to DecimalType
  • ARROW-231 - [C++] : Add typed Resize to PoolBuffer
  • ARROW-281 - [C++] IPC/RPC support on Win32 platforms
  • ARROW-316 - [Format] Changes to Date metadata format per discussion in ARROW-316
  • ARROW-341 - [Python] Move pyarrow's C++ code to the main C++ source tree, install libarrow_python and headers
  • ARROW-452 - [C++/Python] Incorporate C++ and Python codebases for Feather file format
  • ARROW-459 - [C++] Dictionary IPC support in file and stream formats
  • ARROW-483 - [C++/Python] Provide access to “custom_metadata” Field attribute in IPC setting
  • ARROW-491 - [Format / C++] Add FixedWidthBinary type to format, C++ implementation
  • ARROW-492 - [C++] Add arrow/arrow.h public API
  • ARROW-493 - [C++] Permit large (length > INT32_MAX) arrays in memory
  • ARROW-502 - [C++/Python] : Logging memory pool
  • ARROW-510 - ARROW-582 ARROW-663 ARROW-729: [Java] Added units for Time and Date types, and integration tests
  • ARROW-518 - C++: Make Status::OK method constexpr
  • ARROW-520 - [C++] STL-compliant allocator
  • ARROW-528 - [Python] Utilize improved Parquet writer C++ API, add write_metadata function, test _metadata files
  • ARROW-534 - [C++] Add IPC tests for date/time after ARROW-452, fix bugs
  • ARROW-539 - [Python] Add support for reading partitioned Parquet files with Hive-like directory schemes
  • ARROW-542 - Adding dictionary encoding to FileWriter
  • ARROW-550 - [Format] Draft experimental Tensor flatbuffer message type
  • ARROW-552 - [Python] Implement getitem for DictionaryArray by returning a value from the dictionary
  • ARROW-557 - [Python] Add option to explicitly opt in to HDFS tests, do not implicitly skip
  • ARROW-563 - Support non-standard gcc version strings
  • ARROW-566 - Bundle Arrow libraries in Python package
  • ARROW-568 - [C++] Add default implementations for TypeVisitor, ArrayVisitor methods that return NotImplemented
  • ARROW-569 - [C++] Set version for *.pc
  • ARROW-574 - Python: Add support for nested Python lists in Pandas conversion
  • ARROW-576 - [C++] Complete file/stream implementation for union types
  • ARROW-577 - [C++] Use private implementation pattern in ipc::StreamWriter and ipc::FileWriter
  • ARROW-578 - [C++] Add -DARROW_CXXFLAGS=... option to make CMake more consistent
  • ARROW-580 - C++: Also provide jemalloc_X targets if only a static or shared version is found
  • ARROW-582 - [Java] Added JSON reader/writer unit test for date, time, and timestamp
  • ARROW-589 - C++: Use system provided shared jemalloc if static is unavailable
  • ARROW-591 - [C++] Add round trip testing fixture for JSON format
  • ARROW-593 - [C++] : Rename ReadableFileInterface to RandomAccessFile
  • ARROW-598 - [Python] Add support for converting pyarrow.Buffer to a memoryview with zero copy
  • ARROW-603 - [C++] Add RecordBatch::Validate method, call in RecordBatch ctor in debug builds
  • ARROW-605 - [C++] Refactor IPC adapter code into generic ArrayLoader class. Add Date32Type
  • ARROW-606 - [C++] upgrade flatbuffers version to 1.6.0
  • ARROW-608 - [Format] Days since epoch date type
  • ARROW-610 - [C++] Win32 compatibility in file.cc
  • ARROW-612 - [Java] Added not null to Field.toString output
  • ARROW-615 - [Java] Moved ByteArrayReadableSeekableByteChannel to src main o.a.a.vector.util
  • ARROW-616 - [C++] Do not include debug symbols in release builds by default
  • ARROW-618 - [Python/C++] Support timestamp+timezone conversion to pandas
  • ARROW-620 - [C++] Implement JSON integration test support for date, time, timestamp, fixed width binary
  • ARROW-621 - [C++] Start IPC benchmark suite for record batches, implement “inline” visitor. Code reorg
  • ARROW-625 - [C++] Add TimeUnit to TimeType::ToString. Add timezone to TimestampType::ToString if present
  • ARROW-626 - [Python] Replace PyBytesBuffer with zero-copy, memoryview-based PyBuffer
  • ARROW-631 - [GLib] Import
  • ARROW-632 - [Python] Add support for FixedWidthBinary type
  • ARROW-635 - [C++] Add JSON read/write support for FixedWidthBinary
  • ARROW-637 - [Format] Add timezone to Timestamp metadata, comments describing the semantics
  • ARROW-646 - [Python] Conda s3 robustness, set CONDA_PKGS_DIR env variable and add Travis CI caching
  • ARROW-647 - [C++] Use Boost shared libraries for tests and utilities
  • ARROW-648 - [C++] Support multiarch on Debian
  • ARROW-650 - [GLib] Follow ReadableFileInterface -> RnadomAccessFile change
  • ARROW-651 - [C++] Set version to shared library
  • ARROW-655 - [C++/Python] Implement DecimalArray
  • ARROW-656 - [C++] Add random access writer for a mutable buffer. Rename WriteableFileInterface to WriteableFile for better consistency
  • ARROW-657 - [C++/Python] Expose Tensor IPC in Python. Add equals method. Add pyarrow.create_memory_map/memory_map functions
  • ARROW-658 - [C++] Implement a prototype in-memory arrow::Tensor type
  • ARROW-659 - [C++] Add multithreaded memcpy implementation
  • ARROW-660 - [C++] Restore function that can read a complete encapsulated record batch message
  • ARROW-661 - [C++] Add LargeRecordBatch metadata type, IPC support, associated refactoring
  • ARROW-662 - [Format] Move Schema flatbuffers into their own file that can be included
  • ARROW-663 - [Java] Support additional Time metadata + vector value accessors
  • ARROW-664 - [C++] Make C++ Arrow serialization deterministic
  • ARROW-669 - [Python] Attach proper tzinfo when computing boxed scalars for TimestampArray
  • ARROW-670 - Arrow 0.3 release
  • ARROW-672 - [Format] Add MetadataVersion::V3 for Arrow 0.3
  • ARROW-674 - [Java] Support additional Timestamp timezone metadata
  • ARROW-675 - [GLib] Update package metadata
  • ARROW-676 - move from MinorType to FieldType in ValueVectors to carry all the relevant type bits
  • ARROW-679 - [Format] Change FieldNode, RecordBatch lengths to long, remove LargeRecordBatch. Refactoring
  • ARROW-681 - [C++] Disable boost's autolinking if shared boost is used …
  • ARROW-684 - [Python] More helpful error message if libparquet_arrow not built
  • ARROW-687 - [C++] Build and run full test suite in Appveyor
  • ARROW-688 - [C++] Use CMAKE_INSTALL_INCLUDEDIR for consistency
  • ARROW-690 - Only send JIRA updates to issues@arrow.apache.org
  • ARROW-698 - Add flag to FileWriter::WriteRecordBatch for writing record batches with lengths over INT32_MAX
  • ARROW-700 - Add headroom interface for allocator
  • ARROW-701 - [Java] Support Additional Date Type Metadata
  • ARROW-706 - [GLib] Add package install document
  • ARROW-707 - [Python] Return NullArray for array of all None in Array.from_pandas. Revert from_numpy -> from_pandas
  • ARROW-708 - [C++] Simplify metadata APIs to all use the Message class, perf analysis
  • ARROW-710 - [Python] Read/write with file-like Python objects from read_feather/write_feather
  • ARROW-711 - [C++] Remove extern template declarations for NumericArray<T> types
  • ARROW-712 - [C++] Reimplement Array::Accept as inline visitor
  • ARROW-717 - [C++] Implement IPC zero-copy round trip for tensors
  • ARROW-718 - [Python] Implement pyarrow.Tensor container, zero-copy NumPy roundtrips
  • ARROW-719 - [GLib] Release source archive
  • ARROW-722 - [Python] Support additional date/time types and metadata, conversion to/from NumPy and pandas.DataFrame
  • ARROW-724 - Add How to Contribute section to README
  • ARROW-725 - [Formats/Java] FixedSizeList message and java implementation
  • ARROW-727 - [Python] Ensure that NativeFile.write accepts any bytes, unicode, or object providing buffer protocol. Rename build_arrow_buffer to pyarrow.frombuffer
  • ARROW-728 - [C++/Python] Add Table::RemoveColumn method, remove name member, some other code cleaning
  • ARROW-729 - [Java] Add vector type for 32-bit date as days since UNIX epoch
  • ARROW-731 - [C++] Add shared library related versions to .pc
  • ARROW-733 - [C++/Python] Rename FixedWidthBinary to FixedSizeBinary for consistency with FixedSizeList
  • ARROW-734 - [C++/Python] Support building PyArrow on MSVC
  • ARROW-735 - [C++] Developer instruction document for MSVC on Windows
  • ARROW-737 - [C++] Enable mutable buffer slices, SliceMutableBuffer function
  • ARROW-741 - [Python] Switch Travis CI to use Python 3.6 instead of 3.5
  • ARROW-743 - [C++] Consolidate all but decimal array tests into array-test, collect some tests in type-test.cc
  • ARROW-744 - [GLib] Re-add an assertion for garrow_table_new() test
  • ARROW-745 - [C++] Allow use of system cpplint
  • ARROW-746 - [GLib] Add garrow_array_get_data_type()
  • ARROW-748 - [Python] Pin runtime library versions in conda-forge packages to force upgrades
  • ARROW-751 - [Python] Make all Cython modules private. Some code tidying
  • ARROW-752 - [Python] Support boxed Arrow arrays as input to DictionaryArray.from_arrays
  • ARROW-754 - [GLib] Add garrow_array_is_null()
  • ARROW-755 - [GLib] Add garrow_array_get_value_type()
  • ARROW-758 - [C++] Build with /WX in Appveyor, fix MSVC compiler warnings
  • ARROW-761 - [C++/Python] Add GetTensorSize method, Python bindings
  • ARROW-763 - C++: Use to find libpythonX.X.dylib
  • ARROW-765 - [Python] Add more natural Exception type hierarchy for thirdparty users
  • ARROW-768 - [Java] Change the “boxed” object representation of date and time types
  • ARROW-769 - [GLib] Support building without installed Arrow C++
  • ARROW-770 - [C++] Move .clang* files back into cpp source tree
  • ARROW-771 - [Python] Add read_row_group / num_row_groups to ParquetFile
  • ARROW-773 - [CPP] Add Table::AddColumn API
  • ARROW-774 - [GLib] Remove needless LICENSE.txt copy
  • ARROW-775 - add simple constructors to value vectors
  • ARROW-779 - [C++] Check for old metadata and raise exception if found
  • ARROW-782 - [C++] API cleanup, change public member access in DataType classes to functions, use class instead of struct
  • ARROW-788 - [C++] Align WriteTensor message
  • ARROW-795 - [C++] Consolidate arrow/arrow_io/arrow_ipc into a single shared and static library
  • ARROW-798 - [Docs] Publish Format Markdown documents somehow on arrow.apache.org
  • ARROW-802 - [GLib] Add read examples
  • ARROW-803 - [GLib] Update package repository URL
  • ARROW-804 - [GLib] Update build document
  • ARROW-806 - [GLib] Support add/remove a column from table
  • ARROW-807 - [GLib] Update “Since” tag
  • ARROW-808 - [GLib] Remove needless ignore entries
  • ARROW-810 - [GLib] Remove io/ipc prefix
  • ARROW-811 - [GLib] Add GArrowBuffer
  • ARROW-815 - [Java] Exposing reAlloc for ValueVector
  • ARROW-816 - [C++] Travis CI script cleanup, add C++ toolchain env with Flatbuffers, RapidJSON
  • ARROW-818 - [Python] Expand Sphinx API docs, pyarrow.* namespace. Add factory functions for time32, time64
  • ARROW-820 - [C++] Build dependencies for Parquet library without arrow…
  • ARROW-825 - [Python] Rename pyarrow.from_pylist to pyarrow.array, test on tuples
  • ARROW-827 - [Python] Miscellaneous improvements to help with Dask support
  • ARROW-828 - [C++] Add new dependency to README
  • ARROW-831 - Switch from boost::regex to std::regex
  • ARROW-832 - [C++] Update to gtest 1.8.0, remove now unneeded test_main.cc
  • ARROW-833 - [Python] Add Developer quickstart for conda users
  • ARROW-841 - [Python] Add pyarrow build to Appveyor
  • ARROW-844 - [Format] Update README documents in format/
  • ARROW-845 - [Python] Sync changes from PARQUET-955; explicit ARROW_HOME will override pkgconfig
  • ARROW-846 - [GLib] Add GArrowTensor, GArrowInt8Tensor and GArrowUInt8Tensor
  • ARROW-848 - [Python] Another pass on conda dev guide
  • ARROW-849 - [C++] Support setting production build dependencies with ARROW_BUILD_TOOLCHAIN
  • ARROW-857 - [Python] Automate publishing Python documentation to arrow-site
  • ARROW-859 - [C++] Do not build unit tests by default?
  • ARROW-860 - [C++] Remove typed Tensor containers
  • ARROW-861 - [Python] Move DEVELOPMENT.md to Sphinx docs
  • ARROW-862 - [Python] Simplify README landing documentation to direct users and developers toward the documentation
  • ARROW-863 - [GLib] Use GBytes to implement zero-copy
  • ARROW-864 - [GLib] Unify Array files
  • ARROW-865 - [Python] Add unit tests validating Parquet date/time type roundtrips
  • ARROW-868 - [GLib] Use GBytes to reduce copy
  • ARROW-869 - [JS] Rename directory to js/
  • ARROW-871 - [GLib] Unify DataType files
  • ARROW-876 - [GLib] Unify ArrayBuilder files
  • ARROW-877 - [GLib] Add garrow_array_get_null_bitmap()
  • ARROW-878 - [GLib] Add garrow_binary_array_get_buffer()
  • ARROW-880 - [GLib] Support getting raw data of primitive arrays
  • ARROW-890 - [GLib] Add GArrowMutableBuffer
  • ARROW-892 - [GLib] Fix GArrowTensor document
  • ARROW-893 - Add GLib document to Web site
  • ARROW-894 - [GLib] Add GArrowResizableBuffer and GArrowPoolBuffer
  • ARROW-896 - Support Jupyter Notebook in Web site
  • ARROW-898 - [C++/Python] Use shared_ptr to avoid copying KeyValueMetadata, add to Field type also
  • ARROW-904 - [GLib] Simplify error check codes
  • ARROW-907 - C++: Construct Table from schema and arrays
  • ARROW-908 - [GLib] Unify OutputStream files
  • ARROW-910 - [C++] Write 0 length at EOS in StreamWriter
  • ARROW-916 - [GLib] Add GArrowBufferOutputStream
  • ARROW-917 - [GLib] Add GArrowBufferReader
  • ARROW-918 - [GLib] Use GArrowBuffer for read buffer
  • ARROW-919 - [GLib] Use “id” to get type enum value from GArrowDataType
  • ARROW-920 - [GLib] Add Lua examples
  • ARROW-925 - [GLib] Fix GArrowBufferReader test
  • ARROW-926 - Add wesm to KEYS
  • ARROW-930 - javadoc generation fails with java 8
  • ARROW-931 - [GLib] Reconstruct input stream
  • ARROW-965 - Website updates for 0.3.0 release

Apache Arrow 0.2.0 (2017-02-18)

Bug Fixes

  • ARROW-112 - Changed constexprs to kValue naming.
  • ARROW-202 - Integrate with appveyor ci for windows
  • ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility
  • ARROW-224 - [C++] Address static linking of boost dependencies
  • ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io)
  • ARROW-239 - Test reading remainder of file in HDFS with read() with no args
  • ARROW-261 - Refactor String/Binary code paths to reflect unnested (non-list-based) structure
  • ARROW-273 - Lists use unsigned offset vectors instead of signed (as defined in the spec)
  • ARROW-275 - Add tests for UnionVector in Arrow File
  • ARROW-294 - [C++] Do not use platform-dependent fopen/fclose functions for MemoryMappedFile
  • ARROW-322 - [C++] Remove ARROW_HDFS option, always build the module
  • ARROW-323 - [Python] Opt-in to pyarrow.parquet extension rather than attempting and failing silently
  • ARROW-334 - [Python] Remove INSTALL_RPATH_USE_LINK_PATH
  • ARROW-337 - UnionListWriter.list() is doing more than it should, this …
  • ARROW-339 - [Dev] Lingering Python 3 fixes
  • ARROW-339 - Python 3 compatibility in merge_arrow_pr.py
  • ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero
  • ARROW-342 - Set Python version on release
  • ARROW-345 - libhdfs integration doesn't work for Mac
  • ARROW-346 - Use conda environment to build API docs
  • ARROW-348 - [Python] Add build-type command line option to setup.py, build CMake extensions in a build type subdirectory
  • ARROW-349 - Add six as a requirement
  • ARROW-351 - Time type has no unit
  • ARROW-354 - Fix comparison of arrays of empty strings
  • ARROW-357 - Use a single RowGroup for Parquet files as default.
  • ARROW-358 - Add explicit environment variable to locate libhdfs in one's environment
  • ARROW-362 - Fix memory leak in zero-copy arrow to NumPy/pandas conversion
  • ARROW-371 - Handle pandas-nullable types correctly
  • ARROW-375 - Fix unicode Python 3 issue in columns argument of parquet.read_table
  • ARROW-384 - Align Java and C++ RecordBatch data and metadata layout
  • ARROW-386 - [Java] Respect case of struct / map field names
  • ARROW-387 - [C++] Verify zero-copy Buffer slices from BufferReader retain reference to parent Buffer
  • ARROW-390 - Only specify dependencies for json-integration-test on ARROW_BUILD_TESTS=ON
  • ARROW-392 - [C++/Java] String IPC integration testing / fixes. Add array / record batch pretty-printing
  • ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector
  • ARROW-395 - Arrow file format writes record batches in reverse order.
  • ARROW-398 - Java file format requires bitmaps of all 1's to be written…
  • ARROW-399 - ListVector.loadFieldBuffers ignores the ArrowFieldNode len…
  • ARROW-400 - set struct length on load
  • ARROW-401 - Floating point vectors should do an approximate comparison…
  • ARROW-402 - Fix reference counting issue with empty buffers. Close #232
  • ARROW-403 - [Java] Create transfer pairs for internal vectors in UnionVector transfer impl
  • ARROW-404 - [Python] Fix segfault caused by HdfsClient getting closed before an HdfsFile
  • ARROW-405 - Use vendored hdfs.h if not found in include/ in $HADOOP_HOME
  • ARROW-406 - [C++] Set explicit 64K HDFS buffer size, test large reads
  • ARROW-408 - Remove defunct conda recipes
  • ARROW-414 - [Java] “Buffer too large to resize to ...” error
  • ARROW-420 - Align DATE type with Java implementation
  • ARROW-421 - [Python] Retain parent reference in PyBytesReader
  • ARROW-422 - IPC should depend on rapidjson_ep if RapidJSON is vendored
  • ARROW-429 - Revert ARROW-379 until git-archive issues are resolved
  • ARROW-433 - Correctly handle Arrow to Python date conversion for timezones west of London
  • ARROW-434 - [Python] Correctly handle Python file objects in Parquet read/write paths
  • ARROW-435 - Fix spelling of RAPIDJSON_VENDORED
  • ARROW-437 - [C++} Fix clang compiler warning
  • ARROW-445 - arrow_ipc_objlib depends on Flatbuffer generated files
  • ARROW-447 - Always return unicode objects for UTF-8 strings
  • ARROW-455 - [C++] Add dtor to BufferOutputStream that calls Close()
  • ARROW-469 - C++: Add option so that resize doesn't decrease the capacity
  • ARROW-481 - [Python] Fix 2.7 regression in Parquet path to open file code path
  • ARROW-486 - [C++] Use virtual inheritance for diamond inheritance
  • ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails
  • ARROW-494 - [C++] Extend lifetime of memory mapped data if any buffers reference it
  • ARROW-499 - Update file serialization to use the streaming serialization format.
  • ARROW-505 - [C++] Fix compiler warning in gcc in release mode
  • ARROW-511 - Python: Implement List conversions for single arrays
  • ARROW-513 - [C++] Fixing Appveyor / MSVC build
  • ARROW-516 - Building pyarrow with parquet
  • ARROW-519 - [C++] Refactor array comparison code into a compare.h / compare.cc in part to resolve Xcode 6.1 linker issue
  • ARROW-523 - Python: Account for changes in PARQUET-834
  • ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor
  • ARROW-535 - [Python] Add type mapping for NPY_LONGLONG
  • ARROW-537 - [C++] Do not compare String/Binary data in null slots when comparing arrays
  • ARROW-540 - [C++] Build fixes after ARROW-33, PARQUET-866
  • ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries
  • ARROW-544 - [C++] Test writing zero-length record batches, zero-length BinaryArray fixes
  • ARROW-545 - [Python] Ignore non .parq/.parquet files when reading directories as Parquet datasets
  • ARROW-548 - [Python] Add nthreads to Filesystem.read_parquet and pass through
  • ARROW-551 - C++: Construction of Column with nullptr Array segfaults
  • ARROW-556 - [Integration] Configure C++ integration test executable with a single environment variable. Update README
  • ARROW-561 - [JAVA][PYTHON] Update java & python dependencies to improve downstream packaging experience
  • ARROW-562 - Mockito should be in test scope

New Features and Improvements

  • ARROW-33 - [C++] Implement zero-copy array slicing, integrate with IPC code paths
  • ARROW-81 - [Format] Augment dictionary encoding metadata to accommodate additional use cases
  • ARROW-96 - Add C++ API documentation
  • ARROW-97 - API documentation via sphinx-apidoc
  • ARROW-108 - [C++] Add Union implementation and IPC/JSON serialization tests
  • ARROW-189 - Build 3rd party with ExternalProject.
  • ARROW-191 - Python: Provide infrastructure for manylinux1 wheels
  • ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types
  • ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet
  • ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects
  • ARROW-240 - Installation instructions for pyarrow
  • ARROW-243 - [C++] Add option to switch between libhdfs and libhdfs3 when creating HdfsClient
  • ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC
  • ARROW-303 - [C++] Also build static libraries for leaf libraries
  • ARROW-312 - [Java] IPC file round trip tool for integration testing
  • ARROW-312 - Read and write Arrow IPC file format from Python
  • ARROW-317 - Add Slice, Copy methods to Buffer
  • ARROW-327 - [Python] Remove conda builds from Travis CI setup
  • ARROW-328 - Return shared_ptr by value instead of const-ref
  • ARROW-330 - CMake functions to simplify shared / static library configuration
  • ARROW-332 - Add RecordBatch.to_pandas method
  • ARROW-333 - Make writers update their internal schema even when no data is written
  • ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better
  • ARROW-336 - Run Apache Rat in Travis builds
  • ARROW-338 - Implement visitor pattern for IPC loading/unloading
  • ARROW-344 - Instructions for building with conda
  • ARROW-350 - Added Kerberos to HDFS client
  • ARROW-353 - Arrow release 0.2
  • ARROW-355 - Add tests for serialising arrays of empty strings to Parquet
  • ARROW-356 - Add documentation about reading Parquet
  • ARROW-359 - Document ARROW_LIBHDFS_DIR
  • ARROW-360 - C++: Add method to shrink PoolBuffer using realloc
  • ARROW-361 - Python: Support reading a column-selection from Parquet files
  • ARROW-363 - [Java/C++] integration testing harness, initial integration tests
  • ARROW-365 - Python: Provide Array.to_pandas()
  • ARROW-366 - Java Dictionary Vector
  • ARROW-367 - converter json <=> Arrow file format for Integration tests
  • ARROW-368 - Added note for LD_LIBRARY_PATH in Python README
  • ARROW-369 - [Python] Convert multiple record batches at once to Pandas
  • ARROW-370 - Python: Pandas conversion from `datetime.date` columns
  • ARROW-372 - json vector serialization format
  • ARROW-373 - [C++] JSON serialization format for testing
  • ARROW-374 - More precise handling of bytes vs unicode in Python API
  • ARROW-377 - Python: Add support for conversion of Pandas.Categorical
  • ARROW-379 - Use setuptools_scm for Python versioning
  • ARROW-380 - [Java] optimize null count when serializing vectors
  • ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton
  • ARROW-382 - Extend Python API documentation
  • ARROW-383 - [C++] Integration testing CLI tool
  • ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects
  • ARROW-394 - [Integration] Generate tests cases for numeric types, strings, lists, structs
  • ARROW-396 - [Python] Add pyarrow.schema.Schema.equals
  • ARROW-409 - [Python] Change record batches conversion to Table
  • ARROW-410 - [C++] Add virtual Writeable::Flush
  • ARROW-411 - [Java] Move compactor functions in Integration to a separate Validator module
  • ARROW-415 - C++: Add Equals implementation to compare Tables
  • ARROW-416 - C++: Add Equals implementation to compare Columns
  • ARROW-417 - Add Equals implementation to compare ChunkedArrays
  • ARROW-418 - [C++] Array / Builder class code reorganization, flattening
  • ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory
  • ARROW-423 - Define BUILD_BYPRODUCTS for CMake 3.2+
  • ARROW-425 - Add private API to get python Table from a C++ object
  • ARROW-426 - Python: Conversion from pyarrow.Array to a Python list
  • ARROW-427 - [C++] Implement dictionary array type
  • ARROW-428 - [Python] Multithreaded conversion from Arrow table to pandas.DataFrame
  • ARROW-430 - Improved version handling
  • ARROW-432 - [Python] Construct precise pandas BlockManager structure for zero-copy DataFrame initialization
  • ARROW-438 - [C++/Python] Implement zero-data-copy record batch and table concatenation.
  • ARROW-440 - [C++] Support pkg-config
  • ARROW-441 - [Python] Expose Arrow's file and memory map classes as NativeFile subclasses
  • ARROW-442 - [Python] Inspect Parquet file metadata from Python
  • ARROW-444 - [Python] Native file reads into pre-allocated memory. Some IO API cleanup / niceness
  • ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict
  • ARROW-450 - Fixes for PARQUET-818
  • ARROW-456 - Add jemalloc based MemoryPool
  • ARROW-457 - Python: Better control over memory pool
  • ARROW-458 - [Python] Expose jemalloc MemoryPool
  • ARROW-461 - [Python] Add Python interfaces to DictionaryArray data, pandas interop
  • ARROW-463 - C++: Support jemalloc 4.x
  • ARROW-466 - Add ExternalProject for jemalloc
  • ARROW-467 - [Python] Run Python parquet-cpp unit tests in Travis CI
  • ARROW-468 - Python: Conversion of nested data in pd.DataFrames
  • ARROW-470 - [Python] Add “FileSystem” abstraction to access directories of files in a uniform way
  • ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata
  • ARROW-472 - [Python] Expose more C++ IO interfaces. Add equals methods to Parquet schemas. Pass Parquet metadata separately in reader
  • ARROW-474 - [Java] Add initial version of streaming serialized format.
  • ARROW-475 - [Python] Add support for reading multiple Parquet files as a single pyarrow.Table
  • ARROW-476 - Add binary integration test fixture, add Java support
  • ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer
  • ARROW-478 - Consolidate BytesReader and BufferReader to accept PyBytes or Buffer
  • ARROW-479 - Python: Test for expected schema in Pandas conversion
  • ARROW-484 - Revise README to include more detail about software components
  • ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe
  • ARROW-490 - Python: Update manylinux1 build scripts
  • ARROW-495 - [C++] Implement streaming binary format, refactoring
  • ARROW-497 - Integration harness for streaming file format
  • ARROW-498 - [C++] Add command line utilities that convert between stream and file.
  • ARROW-503 - [Python] Implement Python interface to streaming file format
  • ARROW-506 - Java: Implement echo server for integration testing.
  • ARROW-508 - [C++] Add basic threadsafety to normal files and memory maps
  • ARROW-509 - [Python] Add support for multithreaded Parquet reads
  • ARROW-512 - C++: Add method to check for primitive types
  • ARROW-514 - [Python] Automatically wrap pyarrow.io.Buffer in BufferReader
  • ARROW-515 - [Python] Add read_all methods to FileReader, StreamReader
  • ARROW-521 - [C++] Track peak allocations in default memory pool
  • ARROW-524 - provide apis to access nested vectors and buffers
  • ARROW-525 - Python: Add more documentation to the package
  • ARROW-527 - Remove drill-module.conf file
  • ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build
  • ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved
  • ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds
  • ARROW-546 - Python: Account for changes in PARQUET-867
  • ARROW-547 - [Python] Add zero-copy slice methods to Array, RecordBatch
  • ARROW-553 - C++: Faster valid bitmap building
  • ARROW-558 - Add KEYS files

Apache Arrow 0.1.0 (2016-10-10)

New Features and Improvements

  • ARROW-1 - Initial Arrow Code Commit
  • ARROW-2 - Post Simple Website
  • ARROW-3 - This patch includes a WIP draft specification document for the physical Arrow memory layout produced over a series of discussions amongst the to-be Arrow committers during late 2015. There are also a few small PNG diagrams that illustrate some of the Arrow layout concepts.
  • ARROW-4 - This provides an partial C++11 implementation of the Apache Arrow data structures along with a cmake-based build system. The codebase generally follows Google C++ style guide, but more cleaning to be more conforming is needed. It uses googletest for unit testing.
  • ARROW-7 - Add barebones Python library build toolchain
  • ARROW-8 - Add .travis.yml and test script for Arrow C++. OS X build fixes
  • ARROW-9 - Rename some unchanged “Drill” to “Arrow” (follow-up)
  • ARROW-9 - Replace straggler references to Drill
  • ARROW-10 - Fix mismatch of javadoc names and method parameters
  • ARROW-11 - Mirror JIRA activity to dev@arrow.apache.org
  • ARROW-13 - Add PR merge tool from parquet-mr, suitably modified
  • ARROW-14 - Add JIRA components
  • ARROW-15 - Fix a naming typo for memory.AllocationManager.AllocationOutcome
  • ARROW-19 - Add an externalized MemoryPool interface for use in builder classes
  • ARROW-20 - Add null_count_ member to array containers, remove nullable_ member
  • ARROW-21 - Implement a simple in-memory Schema data structure
  • ARROW-22 - [C++] Convert flat Parquet schemas to Arrow schemas
  • ARROW-23 - Add a logical Column data structure
  • ARROW-24 - C++: Implement a logical Table container type
  • ARROW-26 - Add instructions for enabling Arrow C++ Parquet adapter build
  • ARROW-28 - Adding google's benchmark library to the toolchain
  • ARROW-30 - [Python] Routines for converting between arrow::Array/Table and pandas.DataFrame
  • ARROW-31 - Python: prototype user object model, add PyList conversion path with type inference
  • ARROW-35 - Add a short call-to-action in the top level README.md
  • ARROW-37 - [C++ / Python] Implement BooleanArray and BooleanBuilder. Handle Python built-in bool
  • ARROW-42 - Add Python tests to Travis CI build
  • ARROW-43 - Python: format array values to in repr for interactive computing
  • ARROW-44 - Python: prototype object model for array slot values (“scalars”)
  • ARROW-48 - Python: Add Schema object wrapper
  • ARROW-49 - [Python] Add Column and Table wrapper interface
  • ARROW-50 - C++: Enable library builds for 3rd-party users without having to build thirdparty googletest
  • ARROW-53 - Python: Fix RPATH and add source installation instructions
  • ARROW-54 - [Python] Rename package to “pyarrow”
  • ARROW-56 - Format: Specify LSB bit ordering in bit arrays
  • ARROW-57 - Format: Draft data headers IDL for data interchange
  • ARROW-58 - Format: Draft type metadata (“schemas”) IDL
  • ARROW-59 - Python: Boolean data support for builtin data structures
  • ARROW-60 - [C++] Struct type builder API
  • ARROW-64 - Add zsh support to C++ build scripts
  • ARROW-66 - Maybe some missing steps in installation guide
  • ARROW-67 - C++ metadata flatbuffer serialization and data movement to memory maps
  • ARROW-68 - Better error handling for not fully setup systems
  • ARROW-70 - Add adapt ‘lite’ DCHECK macros from Kudu as also used in Parquet
  • ARROW-71 - [C++] Add clang-tidy and clang-format to the the tool chain.
  • ARROW-73 - Support older CMake versions
  • ARROW-76 - Revise format document to include null count, defer non-nullable arrays to the domain of metadata
  • ARROW-78 - C++: Add constructor for DecimalType
  • ARROW-79 - [Python] Add benchmarks
  • ARROW-82 - Initial IPC support for ListArray
  • ARROW-85 - memcmp can be avoided in Equal when comparing with the same …
  • ARROW-86 - [Python] Implement zero-copy Arrow-to-Pandas conversion
  • ARROW-87 - [C++] Add all four possible ways to encode Decimals in Parquet to schema conversion
  • ARROW-89 - [Python] Add benchmarks for Arrow<->Pandas conversion
  • ARROW-90 - [C++] Check for SIMD instruction set support
  • ARROW-91 - Basic Parquet read support
  • ARROW-92 - Arrow to Parquet Schema conversion
  • ARROW-100 - [C++] Computing RowBatch size
  • ARROW-101 - Fix java compiler warnings
  • ARROW-102 - travis-ci support for java project
  • ARROW-106 - [C++] Add IPC to binary/string types
  • ARROW-107 - [C++] Implement IPC for structs
  • ARROW-190 - Python: Provide installable sdist builds
  • ARROW-196 - [C++] Add conda dev recipe for libarrow and libarrow_parquet
  • ARROW-197 - Working first draft of a conda recipe for pyarrow
  • ARROW-199 - [C++] Refine third party dependency
  • ARROW-201 - [C++] Initial ParquetWriter implementation
  • ARROW-203 - Python: Basic filename based Parquet read/write
  • ARROW-204 - Add Travis CI builds that post conda artifacts for Linux and OS X
  • ARROW-206 - Expose a C++ api to compare ranges of slots between two arrays
  • ARROW-207 - Extend BufferAllocator interface to allow decorators around BufferAllocator
  • ARROW-212 - Change contract of PrimitiveArray to reflect its abstractness
  • ARROW-213 - Exposing static arrow build
  • ARROW-214 - C++: Add String support to Parquet I/O
  • ARROW-215 - Support other integer types and strings in Parquet I/O
  • ARROW-218 - Add optional API token authentication option to PR merge tool
  • ARROW-222 - Prototyping an IO interface for Arrow, with initial HDFS target
  • ARROW-233 - Add visibility macros, add static build option
  • ARROW-234 - Build libhdfs IO extension in conda artifacts
  • ARROW-236 - Bridging IO interfaces under the hood in pyarrow
  • ARROW-237 - Implement parquet-cpp's abstract IO interfaces for memory allocation and file reading
  • ARROW-238 - Change InternalMemoryPool::Free() to return Status::Invalid when ther…
  • ARROW-242 - Support Timestamp Data Type
  • ARROW-245 - add endianness to RecordBatch
  • ARROW-251 - Expose APIs for getting code and message of the status
  • ARROW-252 - Add implementation guidelines to the documentation
  • ARROW-253 - restrict ints to 8, 16, 32, or 64 bits in V1
  • ARROW-254 - remove Bit type as it is redundant with Boolean
  • ARROW-255 - Finalize Dictionary representation
  • ARROW-256 - [Format] Add a version number to the IPC/RPC metadata
  • ARROW-257 - Add a typeids Vector to Union type
  • ARROW-262 - Start metadata specification document
  • ARROW-264 - File format
  • ARROW-267 - [C++] Implement file format layout for IPC/RPC
  • ARROW-270 - Define more generic Interval logical type
  • ARROW-271 - Update Field structure to be more explicit
  • ARROW-272 - Arrow release 0.1
  • ARROW-279 - rename vector module to arrow-vector
  • ARROW-280 - [C++] Refactor IPC / memory map IO to use common arrow_io interfaces. Create arrow_ipc leaf library
  • ARROW-282 - Make parquet-cpp an optional dependency of pyarrow
  • ARROW-285 - Optional flatc download
  • ARROW-286 - Build thirdparty dependencies in parallel
  • ARROW-289 - Install test-util.h
  • ARROW-290 - Specialize alloc() in ArrowBuf
  • ARROW-291 - [Python] Update NOTICE file for Python codebase
  • ARROW-292 - [Java] Upgrade Netty to 4.0.41
  • ARROW-293 - [C++] Implement Arrow IO interfaces for operating system files
  • ARROW-296 - [Python / C++] Remove arrow::parquet, make pyarrow link against parquet_arrow
  • ARROW-298 - create release scripts
  • ARROW-299 - Use absolute namespace in macros
  • ARROW-301 - Add user field metadata to IPC schemas
  • ARROW-302 - [C++/Python] Implement C++ IO interfaces for interacting with Python file and bytes objects
  • ARROW-305 - Add compression and use_dictionary options to Parquet
  • ARROW-306 - Add option to pass cmake arguments via environment variable
  • ARROW-315 - finalize timestamp
  • ARROW-318 - Revise python/README.md given recent changes in codebase
  • ARROW-319 - Add canonical Arrow Schema json representation
  • ARROW-324 - Update arrow metadata diagram
  • ARROW-325 - make TestArrowFile not dependent on timezone

Bug Fixes

  • ARROW-5 - Correct Apache Maven repo for maven plugin use
  • ARROW-5 - Update drill-fmpp-maven-plugin to 1.5.0
  • ARROW-16 - Building cpp issues on XCode 7.2.1
  • ARROW-17 - set some vector fields to package level access for Drill compatibility
  • ARROW-18 - Fix decimal precision and scale in MapWriters
  • ARROW-36 - Remove fixVersions from JIRA resolve code path
  • ARROW-46 - ListVector should initialize bits in allocateNew
  • ARROW-51 - Add simple ValueVector tests
  • ARROW-55 - [Python] Fix unit tests in 2.7
  • ARROW-62 - Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction
  • ARROW-63 - [C++] Enable ctest to work on systems with Python 3 as the default Python
  • ARROW-65 - Be less restrictive on PYTHON_LIBRARY search paths
  • ARROW-69 - Change permissions for assignable users
  • ARROW-72 - Search for alternative parquet-cpp header
  • ARROW-75 - Fix handling of empty strings
  • ARROW-77 - [C++] Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls
  • ARROW-80 - Handle len call for pre-init arrays
  • ARROW-83 - [C++] Add basic test infrastructure for DecimalType
  • ARROW-84 - C++: separate test codes
  • ARROW-88 - [C++] Refactor usages of parquet_cpp namespace
  • ARROW-93 - Fix builds when using XCode 7.3
  • ARROW-94 - [Format] Expand list example to clarify null vs empty list
  • ARROW-103 - Add files to gitignore
  • ARROW-104 - [FORMAT] Add alignment and padding requirements + union clarification
  • ARROW-105 - Unit tests fail if assertions are disabled
  • ARROW-113 - TestValueVector test fails if cannot allocate 2GB of memory
  • ARROW-185 - Make padding and alignment for all buffers be 64 bytes
  • ARROW-188 - Add numpy as install requirement
  • ARROW-193 - typos “int his” fix to “in this”
  • ARROW-194 - C++: Allow read-only memory mapped source
  • ARROW-200 - [C++/Python] Return error status on string initialization failure
  • ARROW-205 - builds failing on master branch with apt-get error
  • ARROW-209 - [C++] Triage builds due to unavailable LLVM apt repo
  • ARROW-210 - Cleanup of the string related types in C++ code base
  • ARROW-211 - [Format] Fixed typos in layout examples
  • ARROW-217 - Fix Travis w.r.t conda 4.1.0 changes
  • ARROW-219 - Preserve CMAKE_CXX_FLAGS, fix compiler warnings
  • ARROW-223 - Do not link against libpython
  • ARROW-225 - [C++/Python] master Travis CI build is broken
  • ARROW-244 - Some global APIs of IPC module should be visible to the outside
  • ARROW-246 - [Java] UnionVector doesn‘t call allocateNew() when creating it’s vectorType
  • ARROW-247 - Missing explicit destructor in RowBatchReader causes an incomplete type error
  • ARROW-250 - Fix for ARROW-246 may cause memory leaks
  • ARROW-259 - Use Flatbuffer Field type instead of MaterializedField
  • ARROW-260 - Fix flaky oversized tests
  • ARROW-265 - Fix few decimal bugs
  • ARROW-265 - Pad negative decimal values with1
  • ARROW-266 - [C++] Fix broken build due to Flatbuffers namespace change
  • ARROW-274 - Add NullableMapVector to support nullable maps
  • ARROW-277 - Flatbuf serialization fails for Timestamp type
  • ARROW-278 - [Format] Rename Tuple to Struct_ in flatbuffers IDL
  • ARROW-283 - [C++] Account for upstream changes in parquet-cpp
  • ARROW-284 - Disable arrow_parquet module in Travis CI to triage builds
  • ARROW-287 - Make nullable vectors use a BitVecor instead of UInt1Vector for bits
  • ARROW-297 - Fix Arrow pom for release
  • ARROW-304 - NullableMapReaderImpl.isSet() always returns true
  • ARROW-308 - UnionListWriter.setPosition() should not call startList()
  • ARROW-309 - Types.getMinorTypeForArrowType() does not work for Union type
  • ARROW-313 - Build on any version of XCode
  • ARROW-314 - JSONScalar is unnecessary and unused
  • ARROW-320 - ComplexCopier.copy(FieldReader, FieldWriter) should not st…
  • ARROW-321 - fix arrow licenses
  • ARROW-855 - Arrow Memory Leak