Changelog
9.0.2 (2022-02-09)
Full Changelog
Breaking changes:
Implemented enhancements:
- Add
async
arrow parquet reader #1154 [parquet] [arrow] (tustvold) - Rename
Bitmap::len
to Bitmap::bit_len
#1233 - Extend CSV schema inference to allow scientific notation for floating point types #1215 [arrow]
- Write Multiple RecordBatch to Parquet Row Group #1211
- Add doc examples for
eq_dyn
etc. #1202 [arrow] - Add comparison kernels for
BinaryArray
#1108 impl ArrowNativeType for i128
#1098- Remove
Copy
trait bound from dyn scalar kernels #1243 [arrow] (matthewmturner) - Add
into_inner
for IPC FileWriter
#1236 [arrow] (yjshen) - [Minor]Re-export
array::builder::make_builder
to make it available for downstream #1235 [arrow] (yjshen)
Fixed bugs:
- Parquet v8.0.0 panics when reading all null column to NullArray #1245 [parquet]
- Get
Unknown configuration option rust-version
when running the rust format command #1240 Bitmap
Length Validation is Incorrect #1231 [arrow]- Writing sliced
ListArray
or MapArray
ignore offsets #1226 [parquet] - Remove broken
memory-tracking
crate feature #1171 - Revert making
parquet::data_type
and parquet::arrow::schema
experimental #1244 [parquet] (tustvold)
Documentation updates:
Performance improvements:
- Improve performance for arithmetic kernels with
simd
feature enabled (except for division/modulo) #1221 [arrow] (jhorstmann) - Do not concatenate identical dictionaries #1219 [arrow] (tustvold)
- Preserve dictionary encoding when decoding parquet into Arrow arrays, 60x perf improvement (#171) #1180 [parquet] (tustvold)
Closed issues:
UnalignedBitChunkIterator
to that iterates through already aligned u64
blocks #1227- Remove unused
ArrowArrayReader
in parquet #1197 [parquet]
Merged pull requests:
8.0.0 (2022-01-20)
Full Changelog
Breaking changes:
Implemented enhancements:
- Parquet reader should be able to read structs within list #1186 [parquet]
- Disable serde_json
arbitrary_precision
feature flag #1174 [arrow] - Simplify and reduce code duplication in arithmetic.rs #1160 [arrow]
- Return
Err
from JSON writer rather than panic!
for unsupported types #1157 [arrow] - Support
scalar
mathematics kernels for Array
and scalar value #1153 [arrow] - Support
DecimalArray
in sort kernel #1137 - Parquet Fuzz Tests #1053
- BooleanBufferBuilder Append Packed #1038 [arrow]
- parquet Performance Optimization: StructArrayReader Redundant Level & Bitmap Computation #1034 [parquet]
- Reduce Public Parquet API #1032 [parquet]
- Add
from_iter_values
for binary array #1188 [arrow] (Jimexist) - Add support for
MapArray
in json writer #1149 [arrow] (helgikrs)
Fixed bugs:
- Empty string arrays with no nulls are not equal #1208 [arrow]
- Pretty print a
RecordBatch
containing Float16
triggers a panic #1193 [arrow] - Writing structs nested in lists produces an incorrect output #1184 [parquet]
- Undefined behavior for
GenericStringArray::from_iter_values
if reported iterator upper bound is incorrect #1144 [arrow] - Interval comparisons with
simd
feature asserts #1136 [arrow] - RecordReader Permits Illegal Types #1132 [parquet]
Security fixes:
Documentation updates:
Performance improvements:
- Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037) #1054 [parquet] [arrow] (tustvold)
- Improve parquet performance: Skip levels computation for required struct arrays in parquet #1035 [parquet] (tustvold)
Closed issues:
- Generify ColumnReaderImpl and RecordReader #1040 [parquet]
- Parquet Preserve BitMask #1037
Merged pull requests:
7.0.0 (2022-1-07)
Full Changelog
Arrow
Breaking changes:
pretty_format_batches
now returns Result<impl Display>
rather than String
: #975MutableBuffer::typed_data_mut
is marked unsafe
: #1029- UnionArray updated match latest Arrow spec, added
UnionMode
, UnionArray::new()
marked unsafe
: #885
New Features:
- Support for
Float16Array
types #888 - IPC support for
UnionArray
#654 - Dynamic comparison kernels for scalars (e.g.
eq_dyn_scalar
), including DictionaryArray
: #1113
Enhancements:
- Added
Schema::with_metadata
and Field::with_metadata
#1092 - Support for custom datetime format for inference and parsing csv files #1112
- Implement
Array
for ArrayRef
for easier use #1129 - Pretty printing display support for
FixedSizeBinaryArray
#1097 - Dependency Upgrades:
pyo3
, parquet-format
, prost
, tonic
- Avoid allocating vector of indices in
lexicographical_partition_ranges
#998
Parquet
Fixed bugs:
- (parquet) Fix reading of dictionary encoded pages with null values: #1130
Changelog
6.5.0 (2021-12-23)
Full Changelog
6.4.0 (2021-12-10)
Full Changelog
6.3.0 (2021-11-26)
Full Changelog
Changes:
6.2.0 (2021-11-12)
Full Changelog
Features / Fixes:
6.1.0 (2021-10-29)
Full Changelog
Features / Fixes:
Other:
6.0.0 (2021-10-13)
Full Changelog
Breaking changes:
Implemented enhancements:
- Improve parquet binary writer speed by reducing allocations #819
- Expose buffer operations #808
- Add doc examples of writing parquet files using
ArrowWriter
#788
Fixed bugs:
- JSON reader can create null struct children on empty lists #825
- Incorrect null count for cast kernel for list arrays #815
minute
and second
temporal kernels do not respect timezone #500- Fix data corruption in json decoder f64-to-i64 cast #652 [arrow] (xianwill)
Documentation updates:
5.5.0 (2021-09-24)
Full Changelog
Implemented enhancements:
- parquet should depend on a small set of arrow features #800
- Support equality on RecordBatch #735
Fixed bugs:
- Converting from string to timestamp uses microseconds instead of milliseconds #780
- Document has no link to
RowColumIter
#762 - length on slices with null doesn't work #744
5.4.0 (2021-09-10)
Full Changelog
Implemented enhancements:
- Upgrade lexical-core to 0.8 #747
append_nulls
and append_trusted_len_iter
for PrimitiveBuilder #725- Optimize MutableArrayData::extend for null buffers #397
Fixed bugs:
- Arithmetic with scalars doesn't work on slices #742
- Comparisons with scalar don't work on slices #740
unary
kernel doesn't respect offset #738new_null_array
creates invalid struct arrays #734- --no-default-features is broken for parquet #733 [parquet]
Bitmap::len
returns the number of bytes, not bits. #730- Decimal logical type is formatted incorrectly by print_schema #713
- parquet_derive does not support chrono time values #711
- Numeric overflow when formatting Decimal type #710
- The integration tests are not running #690
Closed issues:
- Question: Is there no way to create a DictionaryArray with a pre-arranged mapping? #729
5.3.0 (2021-08-26)
Full Changelog
Implemented enhancements:
- Add optimized filter kernel for regular expression matching #697
- Can't cast from timestamp array to string array #587
Fixed bugs:
- ‘Encoding DELTA_BYTE_ARRAY is not supported’ with parquet arrow readers #708
- Support reading json string into binary data type. #701
Closed issues:
- Resolve Issues with
prettytable-rs
dependency #69 [arrow]
5.2.0 (2021-08-12)
Full Changelog
Implemented enhancements:
- Make rand an optional dependency #671
- Remove undefined behavior in
value
method of boolean and primitive arrays #645 - Avoid materialization of indices in filter_record_batch for single arrays #636
- Add a note about arrow crate security / safety #627
- Allow the creation of String arrays from an interator of &Option<&str> #598
- Support arrow map datatype #395
Fixed bugs:
- Parquet fixed length byte array columns write byte array statistics #660 [parquet]
- Parquet boolean columns write Int32 statistics #659 [parquet]
- Writing Parquet with a boolean column fails #657
- JSON decoder data corruption for large i64/u64 #653
- Incorrect min/max statistics for strings in parquet files #641 [parquet]
Closed issues:
- Release candidate verifying script seems work on macOS #640
- Update CONTRIBUTING #342
5.1.0 (2021-07-29)
Full Changelog
Implemented enhancements:
- Make FFI_ArrowArray empty() public #602
- exponential sort can be used to speed up lexico partition kernel #586
- Implement sort() for binary array #568
- primitive sorting can be improved and more consistent with and without
limit
if sorted unstably #553
Fixed bugs:
- Confusing memory usage with CSV reader #623
- FFI implementation deviates from specification for array release #595
- Parquet file content is different if
~/.cargo
is in a git checkout #589 - Ensure output of MIRI is checked for success #581
- MIRI failure in
array::ffi::tests::test_struct
and other ffi tests #580 - ListArray equality check may return wrong result #570
- cargo audit failed #561
- ArrayData::slice() does not work for nested types such as StructArray #554
Documentation updates:
- More examples of how to construct Arrays #301
Closed issues:
- Implement StringBuilder::append_option #263 [arrow]
5.0.0 (2021-07-14)
Full Changelog
Breaking changes:
Implemented enhancements:
Fixed bugs:
- Error building on master - error: cyclic package dependency: package
ahash v0.7.4
depends on itself. Cycle #544 - IPC reader panics with out of bounds error #541
- Take kernel doesn't handle nulls and structs correctly #530 [arrow]
- master fails to compile with
default-features=false
#529 - README developer instructions out of date #523
- Update rustc and packed_simd in CI before 5.0 release #517
- Incorrect memory usage calculation for dictionary arrays #503 [arrow]
- sliced null buffers lead to incorrect result in take kernel (and probably on other places) #502
- Cast of utf8 types and list container types don't respect offset #334 [arrow]
- fix take kernel null handling on structs #531 [arrow] (bjchambers)
- Correct array memory usage calculation for dictionary arrays #505 [arrow] (jhorstmann)
- parquet: improve BOOLEAN writing logic and report error on encoding fail #443 [parquet] (garyanaplan)
- Fix bug with null buffer offset in boolean not kernel #418 [arrow] (jhorstmann)
- respect offset in utf8 and list casts #335 [arrow] (ritchie46)
- Fix comparison of dictionaries with different values arrays (#332) #333 [arrow] (tustvold)
- ensure null-counts are written for all-null columns #307 [parquet] (crepererum)
- fix invalid null handling in filter #296 [arrow] (ritchie46)
- fix NaN handling in parquet statistics #256 (crepererum)
Documentation updates:
Merged pull requests:
4.4.0 (2021-06-24)
Full Changelog
Breaking changes:
- migrate partition kernel to use Iterator trait #437 [arrow]
- Remove DictionaryArray::keys_array #391 [arrow]
Implemented enhancements:
- sort kernel boolean sort can be O(n) #447 [arrow]
- C data interface for decimal128, timestamp, date32 and date64 #413
- Add Decimal to CsvWriter #405
- Use iterators to increase performance of creating Arrow arrays #200 [parquet]
Fixed bugs:
- Release Audit Tool (RAT) is not being triggered #481
- Security Vulnerabilities: flatbuffers:
read_scalar
and read_scalar_at
allow transmuting values without unsafe
blocks #476 - Clippy broken after upgrade to rust 1.53 #467
- Pull Request Labeler is not working #462
- Arrow 4.3 release: error[E0658]: use of unstable library feature ‘partition_point’: new API #456
- parquet reading hangs when row_group contains more than 2048 rows of data #349
- Fail to build arrow #247
- JSON reader does not implement iterator #193 [arrow]
Security fixes:
- Ensure a successful MIRI Run on CI #227
Closed issues:
- sort kernel has a lot of unnecessary wrapping #446
- [Parquet] Plain encoded boolean column chunks limited to 2048 values #48 [parquet]
4.3.0 (2021-06-10)
Full Changelog
Implemented enhancements:
- Add partitioning kernel for sorted arrays #428 [arrow]
- Implement sort by float lists #427 [arrow]
- Derive Eq and PartialEq for SortOptions #426 [arrow]
- use prettier and github action to normalize markdown document syntax #399
- window::shift can work for more than just primitive array type #392
- Doctest for ArrayBuilder #366
Fixed bugs:
- Boolean
not
kernel does not take offset of null buffer into account #417 - my contribution not marged in 4.2 release #394
- window::shift shall properly handle boundary cases #387
- Parquet
WriterProperties.max_row_group_size
not wired up #257 - Out of bound reads in chunk iterator #198 [arrow]
4.2.0 (2021-05-29)
Full Changelog
Breaking changes:
- DictionaryArray::values() clones the underlying ArrayRef #313 [arrow]
Implemented enhancements:
- Simplify shift kernel using null array #371
- Provide
Arc
-based constructor for parquet::util::cursor::SliceableCursor
#368 - Add badges to crates #361
- Consider inlining PrimitiveArray::value #328
- Implement automated release verification script #327
- Add wasm32 to the list of target architectures of the simd feature #316
- add with_escape for csv::ReaderBuilder #315 [arrow]
- IPC feature gate #310
- csv feature gate #309 [arrow]
- Add
shrink_to
/ shrink_to_fit
to MutableBuffer
#297
Fixed bugs:
- Incorrect crate setup instructions #364
- Arrow-flight only register rerun-if-changed if file exists #350
- Dictionary Comparison Uses Wrong Values Array #332
- Undefined behavior in FFI implementation #322
- All-null column get wrong parquet null-counts #306 [parquet]
- Filter has inconsistent null handling #295
4.1.0 (2021-05-17)
Full Changelog
Implemented enhancements:
- Add Send to ArrayBuilder #290 [arrow]
- Improve performance of bound checking option #280 [arrow]
- extend compute kernel arity to include nullary functions #276
- Implement FFI / CDataInterface for Struct Arrays #251 [arrow]
- Add support for pretty-printing Decimal numbers #230 [arrow]
- CSV Reader String Dictionary Support #228 [arrow]
- Add Builder interface for adding Arrays to record batches #210 [arrow]
- Support auto-vectorization for min/max #209 [arrow]
- Support LargeUtf8 in sort kernel #25 [arrow]
Fixed bugs:
- no method named
select_nth_unstable_by
found for mutable reference &mut [T]
#283 - Rust 1.52 Clippy error #266
- NaNs can break parquet statistics #255 [parquet]
- u64::MAX does not roundtrip through parquet #254 [parquet]
- Integration tests failing to compile (flatbuffer) #249 [arrow]
- Fix compatibility quirks between arrow and parquet structs #245 [parquet]
- Unable to write non-null Arrow structs to Parquet #244 [parquet]
- schema: missing field
metadata
when deserialize #241 [arrow] - Arrow does not compile due to flatbuffers upgrade #238 [arrow]
- Sort with limit panics for the limit includes some but not all nulls, for large arrays #235 [arrow]
- arrow-rs contains a copy of the “format” directory #233 [arrow]
- Fix SEGFAULT/ SIGILL in child-data ffi #206 [arrow]
- Read list field correctly in <struct<list>> #167 [parquet]
- FFI listarray lead to undefined behavior. #20
Security fixes:
Documentation updates:
- Comment out the instructions in the PR template #277
- Update links to datafusion and ballista in README.md #19
- Update “repository” in Cargo.toml #12
Closed issues:
- Arrow Aligned Vec #268
- [Rust]: Tracking issue for AVX-512 #220 [arrow]
- Umbrella issue for clippy integration #217 [arrow]
- Support sort #215 [arrow]
- Support stable Rust #214 [arrow]
- Remove Rust and point integration tests to arrow-rs repo #211 [arrow]
- ArrayData buffers are inconsistent accross implementations #207
- 3.0.1 patch release #204
- Document patch release process #202
- Simplify Offset #186 [arrow]
- Typed Bytes #185 [arrow]
- [CI]docker-compose setup should enable caching #175
- Improve take primitive performance #174
- [CI] Try out buildkite #165 [arrow]
- Update assignees in JIRA where missing #160
- [Rust]: From<ArrayDataRef> implementations should validate data type #103 [arrow]
- [DataFusion] Verify that projection push down does not remove aliases columns #99 [arrow]
- [Rust][DataFusion] Implement modulus expression #98 [arrow]
- [DataFusion] Add constant folding to expressions during logically planning #96 [arrow]
- [DataFusion] DataFrame.collect should return RecordBatchReader #95 [arrow]
- [Rust][DataFusion] Add FORMAT to explain plan and an easy to visualize format #94 [arrow]
- [DataFusion] Implement metrics framework #90 [arrow]
- [DataFusion] Implement micro benchmarks for each operator #89 [arrow]
- [DataFusion] Implement pretty print for physical query plan #88 [arrow]
- [Archery] Support rust clippy in the lint command #83
- [rust][datafusion] optimize count(*) queries on parquet sources #75 [arrow]
- [Rust][DataFusion] Improve like/nlike performance #71 [arrow]
- [DataFusion] Implement optimizer rule to remove redundant projections #56 [arrow]
- [DataFusion] Parquet data source does not support complex types #39 [arrow]
- Merge utils from Parquet and Arrow #32 [arrow] [parquet]
- Add benchmarks for Parquet #30 [parquet]
- Mark methods that do not perform bounds checking as unsafe #28 [arrow]
- Test issue #24 [arrow]
- This is a test issue #11
For older versions, see apache/arrow/CHANGELOG.md
* This Changelog was automatically generated by github_changelog_generator