blob: 63de08bef2163ee7daeb869528f5557ef4ec1ad6 [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## [15.0.0](https://github.com/apache/datafusion/tree/15.0.0) (2022-12-01)
[Full Changelog](https://github.com/apache/datafusion/compare/14.0.0-rc1...15.0.0)
**Breaking changes:**
- Expose remaining parquet config options into ConfigOptions \(try 2\) [\#4427](https://github.com/apache/datafusion/pull/4427) ([alamb](https://github.com/alamb))
- Config Cleanup: Remove TaskProperties and KV structure, keep key=value serialization [\#4382](https://github.com/apache/datafusion/pull/4382) ([alamb](https://github.com/alamb))
- add `{TDigest,ScalarValue,Accumulator}::size` [\#4342](https://github.com/apache/datafusion/pull/4342) ([crepererum](https://github.com/crepererum))
- API-break: Support `SubqueryAlias` and remove `Alias in Projection` [\#4333](https://github.com/apache/datafusion/pull/4333) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- split `try_new_with_schema_alias` from original code [\#4284](https://github.com/apache/datafusion/pull/4284) ([jackwener](https://github.com/jackwener))
- Collapse statistics in normal explain plan [\#4157](https://github.com/apache/datafusion/pull/4157) ([alamb](https://github.com/alamb))
- Linearize binary expressions to reduce proto tree complexity [\#4115](https://github.com/apache/datafusion/pull/4115) ([isidentical](https://github.com/isidentical))
- support `SET Timezone` [\#4107](https://github.com/apache/datafusion/pull/4107) [[sql](https://github.com/apache/datafusion/labels/sql)] ([waitingkuo](https://github.com/waitingkuo))
**Implemented enhancements:**
- Refactor Built-in, Aggregate window functions to increase code reuse. [\#4440](https://github.com/apache/datafusion/issues/4440)
- Helper to get "root" error [\#4435](https://github.com/apache/datafusion/issues/4435)
- Do NOT convert intermediate/source errors to strings. [\#4434](https://github.com/apache/datafusion/issues/4434)
- Estimate the `total_byte_size` of the filter expression's result when selectivity is available [\#4374](https://github.com/apache/datafusion/issues/4374)
- refactor the code of the `HashJoin` [\#4356](https://github.com/apache/datafusion/issues/4356)
- `CoalesceBatchesExec` reports no ordering [\#4331](https://github.com/apache/datafusion/issues/4331)
- Introduce tournament tree to achieve better k-way sort-merging [\#4300](https://github.com/apache/datafusion/issues/4300)
- Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable [\#4299](https://github.com/apache/datafusion/issues/4299)
- Remove the macro rule `unary_scalar_expr` from `expr_fn.rs` [\#4298](https://github.com/apache/datafusion/issues/4298)
- Remove Alias-in-Projection, replace it with `SubqueryAlias` [\#4291](https://github.com/apache/datafusion/issues/4291)
- reimplement `reduce_outer_join` [\#4270](https://github.com/apache/datafusion/issues/4270)
- Reimplement `filter_push_down` [\#4266](https://github.com/apache/datafusion/issues/4266)
- Reimplement `eliminate_limit` [\#4264](https://github.com/apache/datafusion/issues/4264)
- Reimplement `limit_push_down` [\#4263](https://github.com/apache/datafusion/issues/4263)
- Make a data driven SQL testing tool \(so we can reuse duckdb test suite, example\) [\#4248](https://github.com/apache/datafusion/issues/4248)
- upgrade chrono to 0.4.23 [\#4224](https://github.com/apache/datafusion/issues/4224)
- support scan non-string columns partitioned parquet files [\#4218](https://github.com/apache/datafusion/issues/4218)
- Allow optimizer rules to skip optimizing plans [\#4209](https://github.com/apache/datafusion/issues/4209)
- Supporting specifying schema when create tables [\#4183](https://github.com/apache/datafusion/issues/4183)
- Improve ergonomics of creating `ListingOptions` [\#4178](https://github.com/apache/datafusion/issues/4178)
- Add ability to specify external sort information for ParquetExec [\#4169](https://github.com/apache/datafusion/issues/4169)
- Add another method to collect referenced columns from an expression [\#4152](https://github.com/apache/datafusion/issues/4152)
- Improve `EXPLAIN ANALYZE` output for parquet exec [\#4144](https://github.com/apache/datafusion/issues/4144)
- `TableProviderFactory::create` should have `Optional<DFSchemaRef>` parameter [\#4142](https://github.com/apache/datafusion/issues/4142)
- Support more expressions in equality join [\#4140](https://github.com/apache/datafusion/issues/4140)
- JoinSelection Rule to choose physical join implementation: HashJoin\(Partitioned or CollectLeft\) or SortMergeJoin base on Stats [\#4139](https://github.com/apache/datafusion/issues/4139)
- Allow TPCH tooling to create a combined result for easier processing by outside tools [\#4127](https://github.com/apache/datafusion/issues/4127)
- Allow additional options when creating an external table [\#4125](https://github.com/apache/datafusion/issues/4125)
- reuse code utils::optimize_children instead of redundant implementation [\#4120](https://github.com/apache/datafusion/issues/4120)
- Add test field to PR template [\#4113](https://github.com/apache/datafusion/issues/4113)
- Allow for automatic registration of `ListingTables` [\#4111](https://github.com/apache/datafusion/issues/4111)
- Add CI check that configs.md is up-to-date [\#4108](https://github.com/apache/datafusion/issues/4108)
- Support `SET` timezone to non-UTC time zone [\#4106](https://github.com/apache/datafusion/issues/4106)
- Parquet predicates contains `and true` expressions [\#4091](https://github.com/apache/datafusion/issues/4091)
- Replace RwLock\<HashMap\> and Mutex\<HashMap\> by using DashMap [\#4077](https://github.com/apache/datafusion/issues/4077)
- add support for `.xz` compressed files [\#4074](https://github.com/apache/datafusion/issues/4074)
- add a feature gate to make support for compressed files optional [\#4073](https://github.com/apache/datafusion/issues/4073)
- Support serializing more deeply nested AND / OR expressions [\#4066](https://github.com/apache/datafusion/issues/4066)
- Use f64::total_cmp instead of OrderedFloat [\#4051](https://github.com/apache/datafusion/issues/4051)
- Add documentation to make it clear that decimal support is still experimental [\#4036](https://github.com/apache/datafusion/issues/4036)
- Simplify Pushed Down Predicates [\#4020](https://github.com/apache/datafusion/issues/4020)
- Improve HashJoinExec metrics [\#4009](https://github.com/apache/datafusion/issues/4009)
- Move physical plan serde from Ballista to DataFusion [\#3949](https://github.com/apache/datafusion/issues/3949)
- Support `SubqueryAlias` better in planner [\#3927](https://github.com/apache/datafusion/issues/3927)
- A framework for expression boundary analysis \(and statistics\) [\#3898](https://github.com/apache/datafusion/issues/3898)
- Replace `Filter: Boolean(false)` with `EmptyRelation` [\#3864](https://github.com/apache/datafusion/issues/3864)
- Implement statistics estimation for `FilterExec` [\#3845](https://github.com/apache/datafusion/issues/3845)
- Support parquet page filtering for more types: String, Binary\(Decimal\), Int96 [\#3833](https://github.com/apache/datafusion/issues/3833)
- Allow configuring parquet filter pushdown dynamically [\#3821](https://github.com/apache/datafusion/issues/3821)
- Unable to register tables in non-cloud S3 servers [\#3640](https://github.com/apache/datafusion/issues/3640)
- support more data type in prune for cast/try_cast [\#3442](https://github.com/apache/datafusion/issues/3442)
- Disable spill to disk globally [\#3264](https://github.com/apache/datafusion/issues/3264)
- Consider to categorize Operator [\#3216](https://github.com/apache/datafusion/issues/3216)
- Replace Projection.alias with SubqueryAlias [\#2212](https://github.com/apache/datafusion/issues/2212)
- \[Optimizer\] Eliminate the distinct [\#2045](https://github.com/apache/datafusion/issues/2045)
- beautify datafusion's site: https://datafusion.apache.org/ [\#1819](https://github.com/apache/datafusion/issues/1819)
- split datafusion-logical-plan sub-module [\#1755](https://github.com/apache/datafusion/issues/1755)
- convert `outer join` to `inner join` to improve performance [\#1585](https://github.com/apache/datafusion/issues/1585)
- Add sqllogictest for datafusion [\#1453](https://github.com/apache/datafusion/issues/1453)
- Add additional simplification rules [\#1406](https://github.com/apache/datafusion/issues/1406)
- support more subqueries [\#1209](https://github.com/apache/datafusion/issues/1209)
- Add baseline metrics for remaining execution plan nodes [\#1019](https://github.com/apache/datafusion/issues/1019)
- Make `ExecutionPlan` implementations immutable [\#987](https://github.com/apache/datafusion/issues/987)
- Architecture overview may be insufficient in README [\#980](https://github.com/apache/datafusion/issues/980)
- Add a separate configuration setting for parallelism of scanning parquet files [\#924](https://github.com/apache/datafusion/issues/924)
- Support hash repartion elimination [\#41](https://github.com/apache/datafusion/issues/41)
**Fixed bugs:**
- `pyarrow` CI failed [\#4448](https://github.com/apache/datafusion/issues/4448)
- `UnwrapCastInComparison` exist bug [\#4430](https://github.com/apache/datafusion/issues/4430)
- The CLI panics when passing an invalid `explain` query [\#4378](https://github.com/apache/datafusion/issues/4378)
- HashJoin should return Err when the right side input stream produce Err [\#4362](https://github.com/apache/datafusion/issues/4362)
- Optimizer check errors if resulting schema has different metadata [\#4346](https://github.com/apache/datafusion/issues/4346)
- Panic with function `to_hex` [\#4339](https://github.com/apache/datafusion/issues/4339)
- `LimitPushDown` pushdown into limit, result is wrong [\#4308](https://github.com/apache/datafusion/issues/4308)
- DESCRIBE statement issue with qualified table references [\#4303](https://github.com/apache/datafusion/issues/4303)
- Panic with window function LAST_VALUE [\#4297](https://github.com/apache/datafusion/issues/4297)
- CI failed in `Compare to postgres` [\#4294](https://github.com/apache/datafusion/issues/4294)
- Field alias can't work in where clause [\#4288](https://github.com/apache/datafusion/issues/4288)
- Some valid filters are not pushed down to parquet scan [\#4282](https://github.com/apache/datafusion/issues/4282)
- The type renaming `pub type NullColumnarValue = ColumnarValue` makes no sense [\#4271](https://github.com/apache/datafusion/issues/4271)
- Current `limit_push_down` can't support cross_join [\#4256](https://github.com/apache/datafusion/issues/4256)
- Cargo test fail [\#4253](https://github.com/apache/datafusion/issues/4253)
- RightSemi/RightAnti HashJoin has bug, the left_indices is never populated, causing failure to apply join filters. [\#4247](https://github.com/apache/datafusion/issues/4247)
- Clippy failures [\#4245](https://github.com/apache/datafusion/issues/4245)
- Cannot query s3 data from datafusion-cli [\#4239](https://github.com/apache/datafusion/issues/4239)
- Bug parsing interval with negative values [\#4237](https://github.com/apache/datafusion/issues/4237)
- `cargo test` reports errors on the master branch. [\#4236](https://github.com/apache/datafusion/issues/4236)
- Doc of the expression function`log2` is incorrect [\#4231](https://github.com/apache/datafusion/issues/4231)
- HashJoin with mode PartitionMode:CollectLeft has bug and can produce wrong result [\#4230](https://github.com/apache/datafusion/issues/4230)
- Add ambiguous check when generate projection plan [\#4210](https://github.com/apache/datafusion/issues/4210)
- What happened for NDJSON support on CLI? [\#4198](https://github.com/apache/datafusion/issues/4198)
- Add ambiguous check when generate join plan [\#4197](https://github.com/apache/datafusion/issues/4197)
- Clippy failing on master : error: use of deprecated associated function `chrono::NaiveDate::from_ymd`: use `from_ymd_opt()` instead [\#4187](https://github.com/apache/datafusion/issues/4187)
- Reimplement the `eliminate_cross_join` [\#4176](https://github.com/apache/datafusion/issues/4176)
- Incorrect handling of column names [\#4166](https://github.com/apache/datafusion/issues/4166)
- Update release scripts to support datafusion-benchmarks [\#4134](https://github.com/apache/datafusion/issues/4134)
- Bug in interpreting correctly parsed SQL with aliases [\#4123](https://github.com/apache/datafusion/issues/4123)
- The percentile argument for ApproxPercentileCont must be Float64, not Decimal128\(2, 1\) [\#4103](https://github.com/apache/datafusion/issues/4103)
- Panic when using array_agg [\#4080](https://github.com/apache/datafusion/issues/4080)
- Wrong result for FIRST_VALUE AND LAST_VALUE window functions [\#4076](https://github.com/apache/datafusion/issues/4076)
- Round error when casting float to decimal [\#4071](https://github.com/apache/datafusion/issues/4071)
- Predicate still has cast when comparing Timestamp\(Nano, None\) to a timestamp literal, so can't be pushed down or used for pruning [\#3938](https://github.com/apache/datafusion/issues/3938)
- Revisit required_child_distribution\(\), output_partitioning\(\), output_ordering\(\) implementations in ExecutionPlan's implementations [\#3653](https://github.com/apache/datafusion/issues/3653)
- Can't push down projection after do type coercion [\#3583](https://github.com/apache/datafusion/issues/3583)
- In some circumstances cast expression is not working [\#3499](https://github.com/apache/datafusion/issues/3499)
- output_partitioning\(\) and output_ordering\(\) implementations are wrong in some physical plan implementations with alias [\#3400](https://github.com/apache/datafusion/issues/3400)
- Interval Literal doesn't work for timeunit less than millisecond [\#3204](https://github.com/apache/datafusion/issues/3204)
- `INTERVAL` literal with duplicated interval types should raise error [\#3183](https://github.com/apache/datafusion/issues/3183)
- Error occurs when only using partition columns in query [\#1999](https://github.com/apache/datafusion/issues/1999)
- regex_match does not compile using the `g` flag [\#1429](https://github.com/apache/datafusion/issues/1429)
- `between` with NULL literals does not work: can't be evaluated because there isn't a common type to coerce the types to [\#1193](https://github.com/apache/datafusion/issues/1193)
- \[Datafusion\] Error with CAST: Unsupported SQL type Time [\#193](https://github.com/apache/datafusion/issues/193)
**Closed issues:**
- SQL level coverage for when memory limit is exceeded [\#4404](https://github.com/apache/datafusion/issues/4404)
- Throw error \(not `panic`\) if a listing table specifies an missing partition column [\#4350](https://github.com/apache/datafusion/issues/4350)
- Page index pruning fail on complex_expr [\#4317](https://github.com/apache/datafusion/issues/4317)
- optimize `limit-full join` in the limit push down rule [\#4275](https://github.com/apache/datafusion/issues/4275)
- `infer_schema` function is not working with s3 Urls or http endpoints [\#4269](https://github.com/apache/datafusion/issues/4269)
- Add support binary boolean operators with nulls [\#4241](https://github.com/apache/datafusion/issues/4241)
- Add additional testing to parquet predicate pushdown integration tests [\#4087](https://github.com/apache/datafusion/issues/4087)
- Add metrics for parquet page level skipping [\#4086](https://github.com/apache/datafusion/issues/4086)
- Add parquet page index pushdown metrics [\#4058](https://github.com/apache/datafusion/issues/4058)
- Throw a runtime error if the memory allocated to GroupByHash exceeds a limit [\#3940](https://github.com/apache/datafusion/issues/3940)
- support unsigned numeric data type in UnwrapCastInBinaryComparison rule [\#3702](https://github.com/apache/datafusion/issues/3702)
- Support type cast in union [\#2125](https://github.com/apache/datafusion/issues/2125)
- \[EPIC\] Memory Limited Sort \(Externalized / Spill\) [\#1568](https://github.com/apache/datafusion/issues/1568)
- Maintain partition information in Union [\#189](https://github.com/apache/datafusion/issues/189)
- Add coercion support for `NULL` literals [\#185](https://github.com/apache/datafusion/issues/185)
**Merged pull requests:**
- Make `datafusion-sql` depend on `arrow-schema` instead of `arrow` [\#4456](https://github.com/apache/datafusion/pull/4456) [[sql](https://github.com/apache/datafusion/labels/sql)] ([mbrobbel](https://github.com/mbrobbel))
- replace the comparator for `decimal array op scalar` using arrow kernel [\#4453](https://github.com/apache/datafusion/pull/4453) ([liukun4515](https://github.com/liukun4515))
- Fix pyarrow test [\#4450](https://github.com/apache/datafusion/pull/4450) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- Replace `&Option<T>` with `Option<&T>` [\#4446](https://github.com/apache/datafusion/pull/4446) [[sql](https://github.com/apache/datafusion/labels/sql)] ([askoa](https://github.com/askoa))
- Improve error handling for array downcasting [\#4445](https://github.com/apache/datafusion/pull/4445) ([retikulum](https://github.com/retikulum))
- Refactor Builtin Window Function Implementation [\#4441](https://github.com/apache/datafusion/pull/4441) ([mustafasrepo](https://github.com/mustafasrepo))
- feat: `DataFusionError::find_root` [\#4437](https://github.com/apache/datafusion/pull/4437) ([crepererum](https://github.com/crepererum))
- fix: do NOT convert errors to strings but keep the type [\#4436](https://github.com/apache/datafusion/pull/4436) ([crepererum](https://github.com/crepererum))
- The CLI panics when passing an invalid explain query [\#4429](https://github.com/apache/datafusion/pull/4429) ([comphead](https://github.com/comphead))
- \[minor\] use arrow kernel concat_batches instead combine_batches [\#4423](https://github.com/apache/datafusion/pull/4423) ([Ted-Jiang](https://github.com/Ted-Jiang))
- fix panic on to_hex function for negative numbers [\#4422](https://github.com/apache/datafusion/pull/4422) ([retikulum](https://github.com/retikulum))
- Optimize filter executor in pull-based executor [\#4421](https://github.com/apache/datafusion/pull/4421) ([xudong963](https://github.com/xudong963))
- optimize limit push for join case [\#4411](https://github.com/apache/datafusion/pull/4411) ([liukun4515](https://github.com/liukun4515))
- Add integration test for erroring when memory limits are hit [\#4406](https://github.com/apache/datafusion/pull/4406) ([alamb](https://github.com/alamb))
- feat: `ResourceExhausted` for memory limit in `AggregateStream` [\#4405](https://github.com/apache/datafusion/pull/4405) ([crepererum](https://github.com/crepererum))
- Update to arrow 28 [\#4400](https://github.com/apache/datafusion/pull/4400) [[sql](https://github.com/apache/datafusion/labels/sql)] ([tustvold](https://github.com/tustvold))
- Update rstest requirement from 0.15.0 to 0.16.0 [\#4399](https://github.com/apache/datafusion/pull/4399) ([dependabot[bot]](https://github.com/apps/dependabot))
- Add sqllogictests \(v0\) [\#4395](https://github.com/apache/datafusion/pull/4395) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- improve hashjoin execution metrics [\#4394](https://github.com/apache/datafusion/pull/4394) ([AssHero](https://github.com/AssHero))
- Add `with_new_inputs` for LogicalPlan [\#4393](https://github.com/apache/datafusion/pull/4393) ([jackwener](https://github.com/jackwener))
- Clean the code in `limit.rs`. [\#4391](https://github.com/apache/datafusion/pull/4391) ([HaoYang670](https://github.com/HaoYang670))
- Move physical plan serde from Ballista to DataFusion [\#4390](https://github.com/apache/datafusion/pull/4390) ([Kikkon](https://github.com/Kikkon))
- Fix page index pruning fail on complex_expr [\#4387](https://github.com/apache/datafusion/pull/4387) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Add check for nested types in equivalent names and types [\#4380](https://github.com/apache/datafusion/pull/4380) ([alamb](https://github.com/alamb))
- refine the code of build schema for ambiguous check, factor this out into a function [\#4379](https://github.com/apache/datafusion/pull/4379) [[sql](https://github.com/apache/datafusion/labels/sql)] ([AssHero](https://github.com/AssHero))
- Refactor the Hash Join [\#4377](https://github.com/apache/datafusion/pull/4377) ([liukun4515](https://github.com/liukun4515))
- Minor: Fix typos in the documentation [\#4376](https://github.com/apache/datafusion/pull/4376) ([martin-g](https://github.com/martin-g))
- Include byte size estimates in the filter statistics [\#4375](https://github.com/apache/datafusion/pull/4375) ([isidentical](https://github.com/isidentical))
- HashJoin should return Err when the right side input stream produce Err, add more join UTs to cover different join types [\#4373](https://github.com/apache/datafusion/pull/4373) [[sql](https://github.com/apache/datafusion/labels/sql)] ([mingmwang](https://github.com/mingmwang))
- feat: `ResourceExhausted` for memory limit in `GroupedHashAggregateStream` [\#4371](https://github.com/apache/datafusion/pull/4371) ([crepererum](https://github.com/crepererum))
- Use limit\(\) function instead of show_limit\(\) in the first example [\#4369](https://github.com/apache/datafusion/pull/4369) ([martin-g](https://github.com/martin-g))
- Update env_logger requirement from 0.9 to 0.10 [\#4367](https://github.com/apache/datafusion/pull/4367) ([dependabot[bot]](https://github.com/apps/dependabot))
- reimplement `push_down_filter` to remove global-state [\#4365](https://github.com/apache/datafusion/pull/4365) ([jackwener](https://github.com/jackwener))
- Support to use Schedular in tpch benchmark [\#4361](https://github.com/apache/datafusion/pull/4361) ([xudong963](https://github.com/xudong963))
- Adding more dataframe example to read csv files [\#4360](https://github.com/apache/datafusion/pull/4360) ([DataPsycho](https://github.com/DataPsycho))
- minor: correct name and typo [\#4359](https://github.com/apache/datafusion/pull/4359) ([jackwener](https://github.com/jackwener))
- Do not log error if page index can not be evaluated [\#4358](https://github.com/apache/datafusion/pull/4358) ([alamb](https://github.com/alamb))
- Clean the `expr_fn` - use `scalar_expr` to create unary scalar expr functions, remove macro `unary_scalar_functions` [\#4357](https://github.com/apache/datafusion/pull/4357) ([HaoYang670](https://github.com/HaoYang670))
- Throw error \(not `panic`\) if a listing table specifies an missing partition column [\#4354](https://github.com/apache/datafusion/pull/4354) ([doki23](https://github.com/doki23))
- Improve error handling and add some more types for proper downcasting [\#4352](https://github.com/apache/datafusion/pull/4352) ([retikulum](https://github.com/retikulum))
- Add check to avoid underflow in memory manager [\#4351](https://github.com/apache/datafusion/pull/4351) ([askoa](https://github.com/askoa))
- Improve error messages when memory is exhausted while sorting [\#4348](https://github.com/apache/datafusion/pull/4348) ([alamb](https://github.com/alamb))
- Do not error in optimizer if resulting schema has different metadata [\#4347](https://github.com/apache/datafusion/pull/4347) ([alamb](https://github.com/alamb))
- minor: improve optimizer logging and do not repeat rule name [\#4345](https://github.com/apache/datafusion/pull/4345) ([alamb](https://github.com/alamb))
- minor: fix typos in test names [\#4344](https://github.com/apache/datafusion/pull/4344) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- Minor: Add docstrings to `EliminateOuterJoins` optimizer pass [\#4343](https://github.com/apache/datafusion/pull/4343) ([alamb](https://github.com/alamb))
- Minor: refactor: isolate common memory accounting utils [\#4341](https://github.com/apache/datafusion/pull/4341) ([crepererum](https://github.com/crepererum))
- minor: make `plan_from_tables` return one plan instead of `Vec` [\#4336](https://github.com/apache/datafusion/pull/4336) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- enhancement: when fetch == 0, pushdown limit 0 instead skip+fetch. [\#4334](https://github.com/apache/datafusion/pull/4334) ([jackwener](https://github.com/jackwener))
- Teach optimizer that `CoalesceBatchesExec` does not destroy output order [\#4332](https://github.com/apache/datafusion/pull/4332) ([alamb](https://github.com/alamb))
- Add ability to disable DiskManager [\#4330](https://github.com/apache/datafusion/pull/4330) ([tustvold](https://github.com/tustvold))
- Update cli.md [\#4329](https://github.com/apache/datafusion/pull/4329) ([psvri](https://github.com/psvri))
- fix bug: right semi join can't support the filter [\#4327](https://github.com/apache/datafusion/pull/4327) ([liukun4515](https://github.com/liukun4515))
- reimplment `eliminate_limit` to remove `global-state`. [\#4324](https://github.com/apache/datafusion/pull/4324) ([jackwener](https://github.com/jackwener))
- Refine Err propagation and avoid unwrap in transform closures [\#4318](https://github.com/apache/datafusion/pull/4318) ([mingmwang](https://github.com/mingmwang))
- Add a checker to confirm physical optimizer rules will keep the physical plan schema immutable [\#4316](https://github.com/apache/datafusion/pull/4316) ([mingmwang](https://github.com/mingmwang))
- Refactor downcasting functions with downcastvalue macro and improve error handling of `ListArray` downcasting [\#4313](https://github.com/apache/datafusion/pull/4313) ([retikulum](https://github.com/retikulum))
- minor: add another test case to cover join ambiguous check [\#4305](https://github.com/apache/datafusion/pull/4305) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- Fix DESCRIBE statement qualified table issue [\#4304](https://github.com/apache/datafusion/pull/4304) [[sql](https://github.com/apache/datafusion/labels/sql)] ([gruuya](https://github.com/gruuya))
- Use tournament loser tree for k-way sort-merging, increase merge speed by 50% [\#4301](https://github.com/apache/datafusion/pull/4301) ([richox](https://github.com/richox))
- Pin Python `setuptools` in the CI to fix integration tests [\#4296](https://github.com/apache/datafusion/pull/4296) ([isidentical](https://github.com/isidentical))
- Support `SubqueryAlias` in optimizer, physcial planner. [\#4293](https://github.com/apache/datafusion/pull/4293) ([jackwener](https://github.com/jackwener))
- minor: avoid a clone into string when checking ambiguous [\#4292](https://github.com/apache/datafusion/pull/4292) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- replace the comparison op for decimal array op using the arrow-rs kernel [\#4290](https://github.com/apache/datafusion/pull/4290) ([liukun4515](https://github.com/liukun4515))
- MINOR: replace `{..}` with `(_)`, typo, remove outdated TODO [\#4286](https://github.com/apache/datafusion/pull/4286) ([jackwener](https://github.com/jackwener))
- Reduce Expr copies in `ParquetExec` [\#4283](https://github.com/apache/datafusion/pull/4283) ([alamb](https://github.com/alamb))
- Fix issue in filter pushdown with overloaded projection index [\#4281](https://github.com/apache/datafusion/pull/4281) ([thinkharderdev](https://github.com/thinkharderdev))
- Skip useless pruning predicates in `ParquetExec` [\#4280](https://github.com/apache/datafusion/pull/4280) ([alamb](https://github.com/alamb))
- Push down more predicates into `ParquetExec` [\#4279](https://github.com/apache/datafusion/pull/4279) ([alamb](https://github.com/alamb))
- Fix EXPLAIN plan for ParquetExec to show pruning_predicate [\#4278](https://github.com/apache/datafusion/pull/4278) ([alamb](https://github.com/alamb))
- reimplement `limit_push_down` to remove global-state, enhance optimize and simplify code. [\#4276](https://github.com/apache/datafusion/pull/4276) ([jackwener](https://github.com/jackwener))
- Bump actions/labeler from 4.0.2 to 4.1.0 [\#4274](https://github.com/apache/datafusion/pull/4274) ([dependabot[bot]](https://github.com/apps/dependabot))
- Remove the type alias `NullColumnarValue` [\#4273](https://github.com/apache/datafusion/pull/4273) ([HaoYang670](https://github.com/HaoYang670))
- reimplement `eliminate_outer_join` [\#4272](https://github.com/apache/datafusion/pull/4272) ([jackwener](https://github.com/jackwener))
- Fix bugs in parsing `with header row` and `partitioned by` [\#4268](https://github.com/apache/datafusion/pull/4268) [[sql](https://github.com/apache/datafusion/labels/sql)] ([HaoYang670](https://github.com/HaoYang670))
- improve error messages while downcasting `UInt32Array`, `UInt64Array` and `BooleanArray` [\#4261](https://github.com/apache/datafusion/pull/4261) ([retikulum](https://github.com/retikulum))
- add ambiguous check for projection [\#4260](https://github.com/apache/datafusion/pull/4260) [[sql](https://github.com/apache/datafusion/labels/sql)] ([AssHero](https://github.com/AssHero))
- Add ambiguous check for join [\#4258](https://github.com/apache/datafusion/pull/4258) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- support cross_join in `limit_push_down` [\#4257](https://github.com/apache/datafusion/pull/4257) ([jackwener](https://github.com/jackwener))
- Support parquet page filtering on min_max for `decimal128` and `string` columns [\#4255](https://github.com/apache/datafusion/pull/4255) ([Ted-Jiang](https://github.com/Ted-Jiang))
- fix conflict and UT, cleanup redundant legacy code [\#4252](https://github.com/apache/datafusion/pull/4252) ([jackwener](https://github.com/jackwener))
- Minor: remove unecessary clone\(\) in planner [\#4249](https://github.com/apache/datafusion/pull/4249) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- Fix nightly clippy failures [\#4246](https://github.com/apache/datafusion/pull/4246) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- Improve Error Handling and Readibility for downcasting `Float32Array`, `Float64Array`, `StringArray` [\#4244](https://github.com/apache/datafusion/pull/4244) ([retikulum](https://github.com/retikulum))
- Use defaults for ListingOptions builder [\#4243](https://github.com/apache/datafusion/pull/4243) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- Support binary boolean operators with nulls [\#4242](https://github.com/apache/datafusion/pull/4242) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Fixing doc of the expression [\#4240](https://github.com/apache/datafusion/pull/4240) ([Creampanda](https://github.com/Creampanda))
- Fix negative interval parsing bug [\#4238](https://github.com/apache/datafusion/pull/4238) ([Jefffrey](https://github.com/Jefffrey))
- remove duplicate or redundant code [\#4235](https://github.com/apache/datafusion/pull/4235) ([jackwener](https://github.com/jackwener))
- add a checker to confirm optimizer can keep plan schema immutable. [\#4233](https://github.com/apache/datafusion/pull/4233) ([jackwener](https://github.com/jackwener))
- Fix the percentile argument for ApproxPercentileCont must be Float64, not Decimal128\(2, 1\) [\#4228](https://github.com/apache/datafusion/pull/4228) ([comphead](https://github.com/comphead))
- refactor how we create listing tables [\#4227](https://github.com/apache/datafusion/pull/4227) ([timvw](https://github.com/timvw))
- Update sqlparser requirement from 0.26 to 0.27 [\#4226](https://github.com/apache/datafusion/pull/4226) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- upgrade required chrono version to 0.4.23 [\#4225](https://github.com/apache/datafusion/pull/4225) ([waitingkuo](https://github.com/waitingkuo))
- Support types other than String for partition columns on ListingTables [\#4221](https://github.com/apache/datafusion/pull/4221) ([doki23](https://github.com/doki23))
- \[CBO\] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin [\#4219](https://github.com/apache/datafusion/pull/4219) ([mingmwang](https://github.com/mingmwang))
- Remove alias in Union [\#4212](https://github.com/apache/datafusion/pull/4212) ([jackwener](https://github.com/jackwener))
- Add try_optimize method [\#4208](https://github.com/apache/datafusion/pull/4208) ([andygrove](https://github.com/andygrove))
- Provide a builder for ListingOptions with fixups [\#4207](https://github.com/apache/datafusion/pull/4207) ([alamb](https://github.com/alamb))
- Avoid error with empty iterators used for `ScalarValue::iter_to_array` [\#4206](https://github.com/apache/datafusion/pull/4206) ([GrandChaman](https://github.com/GrandChaman))
- Improve error message for regexp_match 'g' flag [\#4203](https://github.com/apache/datafusion/pull/4203) ([Jefffrey](https://github.com/Jefffrey))
- Return `ResourceExhausted` errors when memory limit is exceed in `GroupedHashAggregateStreamV2` \(Row Hash\) [\#4202](https://github.com/apache/datafusion/pull/4202) ([crepererum](https://github.com/crepererum))
- Add additional expr boolean simplification rules [\#4200](https://github.com/apache/datafusion/pull/4200) ([Jefffrey](https://github.com/Jefffrey))
- Update to arrow and parquet 27.0.0 [\#4199](https://github.com/apache/datafusion/pull/4199) [[sql](https://github.com/apache/datafusion/labels/sql)] ([tustvold](https://github.com/tustvold))
- Support `create table` with explicit column definitions [\#4194](https://github.com/apache/datafusion/pull/4194) [[sql](https://github.com/apache/datafusion/labels/sql)] ([doki23](https://github.com/doki23))
- Support all equality predicates in equality join [\#4193](https://github.com/apache/datafusion/pull/4193) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- add `propagate_empty_relation` optimizer rule [\#4192](https://github.com/apache/datafusion/pull/4192) ([jackwener](https://github.com/jackwener))
- fix clippy [\#4190](https://github.com/apache/datafusion/pull/4190) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- Fix clippy by avoiding deprecated functions in chrono [\#4189](https://github.com/apache/datafusion/pull/4189) ([alamb](https://github.com/alamb))
- Disallow duplicate interval types during parsing [\#4188](https://github.com/apache/datafusion/pull/4188) ([Jefffrey](https://github.com/Jefffrey))
- Parse nanoseconds for intervals [\#4186](https://github.com/apache/datafusion/pull/4186) ([Jefffrey](https://github.com/Jefffrey))
- Add rule to reimplement `Eliminate cross join` and remove it in planner [\#4185](https://github.com/apache/datafusion/pull/4185) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- \[FOLLOWUP\] Enforcement Rule: resolve review comments, refactor adjust_input_keys_ordering\(\) [\#4184](https://github.com/apache/datafusion/pull/4184) ([mingmwang](https://github.com/mingmwang))
- Simplify boolean parquet pushdown predicate [\#4182](https://github.com/apache/datafusion/pull/4182) ([Jefffrey](https://github.com/Jefffrey))
- Minor: consolidate parquet `custom_reader` integration test into parquet_exec [\#4175](https://github.com/apache/datafusion/pull/4175) ([alamb](https://github.com/alamb))
- minor: remove redundant println and cleanup [\#4173](https://github.com/apache/datafusion/pull/4173) ([jackwener](https://github.com/jackwener))
- Add ability to specify external sort information for ListingTables [\#4170](https://github.com/apache/datafusion/pull/4170) ([alamb](https://github.com/alamb))
- Improve Error Handling and Readibility for downcasting `Decimal128Array` [\#4168](https://github.com/apache/datafusion/pull/4168) ([retikulum](https://github.com/retikulum))
- Minor: Remove completed comment on parquet row group pruning [\#4167](https://github.com/apache/datafusion/pull/4167) ([alamb](https://github.com/alamb))
- Update hashbrown requirement from 0.12 to 0.13 [\#4164](https://github.com/apache/datafusion/pull/4164) ([dependabot[bot]](https://github.com/apps/dependabot))
- MINOR: enable `dyn_cmp_dict` feature on arrow for physical expr crate [\#4163](https://github.com/apache/datafusion/pull/4163) ([isidentical](https://github.com/isidentical))
- Derive filter statistic estimates from the predicate expression [\#4162](https://github.com/apache/datafusion/pull/4162) ([isidentical](https://github.com/isidentical))
- Minor: pass `ParquetFileMetrics` to `build_row_filter` in parquet [\#4161](https://github.com/apache/datafusion/pull/4161) ([alamb](https://github.com/alamb))
- Minor: Extract parquet row group pruning code into its own module [\#4160](https://github.com/apache/datafusion/pull/4160) ([alamb](https://github.com/alamb))
- Full support for time32 and time64 literal values \(`ScalarValue`\) [\#4156](https://github.com/apache/datafusion/pull/4156) ([andre-cc-natzka](https://github.com/andre-cc-natzka))
- Window frame GROUPS mode support [\#4155](https://github.com/apache/datafusion/pull/4155) ([zembunia](https://github.com/zembunia))
- Improve error messages while downcasting Int64Array [\#4154](https://github.com/apache/datafusion/pull/4154) ([retikulum](https://github.com/retikulum))
- Add another method to collect referenced columns from an expression [\#4153](https://github.com/apache/datafusion/pull/4153) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- Remove BoxedAsyncFileReader [\#4150](https://github.com/apache/datafusion/pull/4150) ([tustvold](https://github.com/tustvold))
- Support unsigned integers in `unwrap_cast_in_comparison` Optimizer rule [\#4149](https://github.com/apache/datafusion/pull/4149) ([alamb](https://github.com/alamb))
- Add support for `DataType::Timestamp` casts in `unwrap_cast_in_comparison` optimizer pass [\#4148](https://github.com/apache/datafusion/pull/4148) ([alamb](https://github.com/alamb))
- Add additional testing for `unwrap_cast_in_comparison` [\#4147](https://github.com/apache/datafusion/pull/4147) ([alamb](https://github.com/alamb))
- improve error messages while downcasting Int32Array [\#4146](https://github.com/apache/datafusion/pull/4146) ([retikulum](https://github.com/retikulum))
- Minor: Update docstring on unwrap_cast_in_comparison [\#4145](https://github.com/apache/datafusion/pull/4145) ([alamb](https://github.com/alamb))
- add schema parameter to table provider factory create method [\#4143](https://github.com/apache/datafusion/pull/4143) ([milenkovicm](https://github.com/milenkovicm))
- fix: shouldn't pass alias through into subquery. [\#4141](https://github.com/apache/datafusion/pull/4141) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- Preserve the `Cast` expression in `columnize_expr` [\#4137](https://github.com/apache/datafusion/pull/4137) [[sql](https://github.com/apache/datafusion/labels/sql)] ([HaoYang670](https://github.com/HaoYang670))
- Set versions to dependencies with path in benchmarks Cargo.toml file [\#4136](https://github.com/apache/datafusion/pull/4136) ([ArkashaJavelin](https://github.com/ArkashaJavelin))
- Fix links [\#4135](https://github.com/apache/datafusion/pull/4135) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- Use f64::total_cmp instead of OrderedFloat [\#4133](https://github.com/apache/datafusion/pull/4133) ([comphead](https://github.com/comphead))
- Add parquet integration tests for explicitly smaller page sizes, page pruning [\#4131](https://github.com/apache/datafusion/pull/4131) ([alamb](https://github.com/alamb))
- Consolidate `ParquetExec` tests in `parquet_exec` integration test [\#4130](https://github.com/apache/datafusion/pull/4130) ([alamb](https://github.com/alamb))
- Minor: Use upstream `BooleanArray::true_count` [\#4129](https://github.com/apache/datafusion/pull/4129) ([alamb](https://github.com/alamb))
- Combined TPCH runs & uniformed summaries for benchmarks [\#4128](https://github.com/apache/datafusion/pull/4128) ([isidentical](https://github.com/isidentical))
- Enable TableProviderFactories to receive additional options when creating an external table [\#4126](https://github.com/apache/datafusion/pull/4126) [[sql](https://github.com/apache/datafusion/labels/sql)] ([timvw](https://github.com/timvw))
- Add CI check that configs.md is up-to-date [\#4124](https://github.com/apache/datafusion/pull/4124) ([mvanschellebeeck](https://github.com/mvanschellebeeck))
- \[Part3\] Partition and Sort Enforcement, Enforcement rule implementation [\#4122](https://github.com/apache/datafusion/pull/4122) ([mingmwang](https://github.com/mingmwang))
- reuse code `utils::optimize_children` but affect inline. [\#4121](https://github.com/apache/datafusion/pull/4121) ([jackwener](https://github.com/jackwener))
- reuse code `utils::optimize_children` instead of redundant implementation [\#4119](https://github.com/apache/datafusion/pull/4119) ([jackwener](https://github.com/jackwener))
- Allow listing tables to be created via TableFactories [\#4112](https://github.com/apache/datafusion/pull/4112) ([avantgardnerio](https://github.com/avantgardnerio))
- Update SQL reference to state that decimal support is currently experimental [\#4109](https://github.com/apache/datafusion/pull/4109) ([andygrove](https://github.com/andygrove))
- Add metrics for parquet page level skipping [\#4105](https://github.com/apache/datafusion/pull/4105) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Add parser option for parsing SQL numeric literals as decimal [\#4102](https://github.com/apache/datafusion/pull/4102) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Replace RwLock\<HashMap\> and Mutex\<HashMap\> by using DashMap [\#4079](https://github.com/apache/datafusion/pull/4079) ([yahoNanJing](https://github.com/yahoNanJing))
- Custom window frame support extended to built-in window functions [\#4078](https://github.com/apache/datafusion/pull/4078) ([mustafasrepo](https://github.com/mustafasrepo))
- Enable tests for page index filtering in parquet filter pushdown test [\#4062](https://github.com/apache/datafusion/pull/4062) ([alamb](https://github.com/alamb))
- \[Part2\] Partition and Sort Enforcement, ExecutionPlan enhancement [\#4043](https://github.com/apache/datafusion/pull/4043) ([mingmwang](https://github.com/mingmwang))
- add support for xz file compression and `compression` feature [\#3993](https://github.com/apache/datafusion/pull/3993) [[sql](https://github.com/apache/datafusion/labels/sql)] ([Jimexist](https://github.com/Jimexist))
- Expression boundary analysis framework [\#3912](https://github.com/apache/datafusion/pull/3912) ([isidentical](https://github.com/isidentical))