blob: 8020dd1924fcb106836676235f5c109993c8efbd [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## [14.0.0](https://github.com/apache/datafusion/tree/14.0.0) (2022-11-04)
[Full Changelog](https://github.com/apache/datafusion/compare/13.0.0-rc1...14.0.0)
**Breaking changes:**
- Improve FieldNotFound errors [\#4084](https://github.com/apache/datafusion/pull/4084) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Refactor: move `simplify_expression.rs` and `expr_simplifier.rs` to a new mod `simplify_expressions` [\#3951](https://github.com/apache/datafusion/pull/3951) ([HaoYang670](https://github.com/HaoYang670))
- Support for non-u64 types for Window Bound [\#3916](https://github.com/apache/datafusion/pull/3916) [[sql](https://github.com/apache/datafusion/labels/sql)] ([mustafasrepo](https://github.com/mustafasrepo))
- Expose parquet reader settings using normal DataFusion `ConfigOptions` [\#3822](https://github.com/apache/datafusion/pull/3822) ([alamb](https://github.com/alamb))
- Add `Filter::try_new` with validation [\#3796](https://github.com/apache/datafusion/pull/3796) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Change public simplify API and add a public coerce API [\#3758](https://github.com/apache/datafusion/pull/3758) ([alamb](https://github.com/alamb))
**Implemented enhancements:**
- Automatically register tables if ObjectStore root is configured [\#4094](https://github.com/apache/datafusion/issues/4094)
- Simplify small `InList` expressions [\#4089](https://github.com/apache/datafusion/issues/4089)
- Support `SET` command [\#4067](https://github.com/apache/datafusion/issues/4067)
- add uuid\(\) function to generate unique uuid per row [\#4045](https://github.com/apache/datafusion/issues/4045)
- Publish benchmark crate so that it can be used as a library in Ballista [\#4016](https://github.com/apache/datafusion/issues/4016)
- Add statistics methods to `TableProvider` trait for use in cost-based optimizations in the logical plan [\#3983](https://github.com/apache/datafusion/issues/3983)
- Implement `current_time` Function [\#3982](https://github.com/apache/datafusion/issues/3982)
- Implement `current_date` Function [\#3981](https://github.com/apache/datafusion/issues/3981)
- Put common code used for testing code into datafusion/test_utils.rs [\#3960](https://github.com/apache/datafusion/issues/3960)
- Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings [\#3952](https://github.com/apache/datafusion/issues/3952)
- Don't make dependants install protoc [\#3947](https://github.com/apache/datafusion/issues/3947)
- Implement right anti join and support it in HashBuildProbeOrder [\#3946](https://github.com/apache/datafusion/issues/3946)
- Implement right semi join and support it in HashBuildProbeOrder [\#3945](https://github.com/apache/datafusion/issues/3945)
- Refactor `simplify_expressions` and `expr_simplifier` [\#3934](https://github.com/apache/datafusion/issues/3934)
- Implement serialization for `ScalarValue::FixedSizeBinary` [\#3928](https://github.com/apache/datafusion/issues/3928)
- Support inlining view / dataframes logical plan [\#3913](https://github.com/apache/datafusion/issues/3913)
- Plans with tables from `TableProviderFactory`s can't be serialized [\#3906](https://github.com/apache/datafusion/issues/3906)
- Simplify `a AND a` and `a OR a`. [\#3895](https://github.com/apache/datafusion/issues/3895)
- Allow configuring statistics on TPC-H benchmarks [\#3888](https://github.com/apache/datafusion/issues/3888)
- CI checks stuck in queued mode [\#3883](https://github.com/apache/datafusion/issues/3883)
- Multiple optimizer passes [\#3879](https://github.com/apache/datafusion/issues/3879)
- datafusion-proto does not support view table scan [\#3874](https://github.com/apache/datafusion/issues/3874)
- TableProviderFactories need to be async and return a Result to be useful [\#3866](https://github.com/apache/datafusion/issues/3866)
- Factorize common AND factors out of OR predicates to support filterPushDown as possible [\#3858](https://github.com/apache/datafusion/issues/3858)
- Replace `concat_ws` with `concat` when the delimiter is empty string [\#3857](https://github.com/apache/datafusion/issues/3857)
- Concatenate contiguous literal arguments of `concat_ws` when doing the expression simplification [\#3856](https://github.com/apache/datafusion/issues/3856)
- Partition and Sort Enforcement [\#3854](https://github.com/apache/datafusion/issues/3854)
- Enable mimalloc by default in benchmarks [\#3851](https://github.com/apache/datafusion/issues/3851)
- Add collect statistics configuration [\#3847](https://github.com/apache/datafusion/issues/3847)
- \[SQL\] - Support cache/uncache table syntax [\#3842](https://github.com/apache/datafusion/issues/3842)
- Filter pushdown doesn't seem to apply for filter on TPC-H Q17 [\#3839](https://github.com/apache/datafusion/issues/3839)
- Support pushdown multi-columns in PageIndex pruning. [\#3834](https://github.com/apache/datafusion/issues/3834)
- Consolidate `Expr` manipulation code so it is more discoverable and make it easier to use [\#3808](https://github.com/apache/datafusion/issues/3808)
- Leverage input array's null buffer for regex replace to optimize sparse arrays [\#3803](https://github.com/apache/datafusion/issues/3803)
- Improve join cardinality estimation when there is no overlap in the min/max values [\#3802](https://github.com/apache/datafusion/issues/3802)
- datafusion-cli up to date check is failing on master [\#3798](https://github.com/apache/datafusion/issues/3798)
- Optimize benchmark q2 subquery filter [\#3789](https://github.com/apache/datafusion/issues/3789)
- Benchmark should infer schema when running against Parquet [\#3776](https://github.com/apache/datafusion/issues/3776)
- Allow specialized physical functions to provide hints for the array adapter [\#3762](https://github.com/apache/datafusion/issues/3762)
- \[User Guide\] Add `EXPLAIN` to SQL reference [\#3755](https://github.com/apache/datafusion/issues/3755)
- move `type coercion` for agg/agg udf [\#3752](https://github.com/apache/datafusion/issues/3752)
- Prevent Cargo.lock for datafusion-cli being out-of-date [\#3744](https://github.com/apache/datafusion/issues/3744)
- Add example of expr apis including simplification and coercion [\#3740](https://github.com/apache/datafusion/issues/3740)
- support `type coercion` for ScalarFunction expr in the logical phase [\#3731](https://github.com/apache/datafusion/issues/3731)
- Add support for DISTINCT projections in `decorrelate_where_exists` [\#3724](https://github.com/apache/datafusion/issues/3724)
- Add type coercion rule for `CONCAT` and `CONCAT_WS` [\#3720](https://github.com/apache/datafusion/issues/3720)
- Expose and document a simpler public API for simplify expressions [\#3709](https://github.com/apache/datafusion/issues/3709)
- Expose + document the type coercion API publicly [\#3708](https://github.com/apache/datafusion/issues/3708)
- Concatenate contiguous literal arguments of `CONCAT` during the expression simplification. [\#3683](https://github.com/apache/datafusion/issues/3683)
- DataFusion 13.0.0 Release [\#3671](https://github.com/apache/datafusion/issues/3671)
- Add division by `0` rules in the expression simplification [\#3663](https://github.com/apache/datafusion/issues/3663)
- Compressed CSV/JSON Read [\#3641](https://github.com/apache/datafusion/issues/3641)
- remove type coercion for agg [\#3623](https://github.com/apache/datafusion/issues/3623)
- extract or clause as predicate for join rels [\#3577](https://github.com/apache/datafusion/issues/3577)
- Improve performance of `regex_replace` [\#3518](https://github.com/apache/datafusion/issues/3518)
- Add benchmarks for parquet queries with filter pushdown enabled [\#3457](https://github.com/apache/datafusion/issues/3457)
- Make type coercion rule more robust [\#3390](https://github.com/apache/datafusion/issues/3390)
- `ViewTable::scan` ignores filters and limits [\#3249](https://github.com/apache/datafusion/issues/3249)
- Add `CREATE VIEW` documentation to user guide [\#3211](https://github.com/apache/datafusion/issues/3211)
- Push additional parquet filtering into the parquet scan \[EPIC\] [\#3147](https://github.com/apache/datafusion/issues/3147)
- Remove `core/logical_plan` module [\#2683](https://github.com/apache/datafusion/issues/2683)
- Datafusion Optimizer Enhancement [\#2255](https://github.com/apache/datafusion/issues/2255)
- \[Optimizer\] Eliminate self compare self [\#2252](https://github.com/apache/datafusion/issues/2252)
- Break datafusion crate into smaller crates [\#1750](https://github.com/apache/datafusion/issues/1750)
- Benchmark `constellation-rs/amadeus`'s parquet implementation [\#1341](https://github.com/apache/datafusion/issues/1341)
- Use `parquet2` async reader in `physical_plan/parquet` [\#1058](https://github.com/apache/datafusion/issues/1058)
- Table Scan Enhancement Plan [\#944](https://github.com/apache/datafusion/issues/944)
- Implement parquet page-level skipping with column index, using min/max stats [\#847](https://github.com/apache/datafusion/issues/847)
- Support min/max statistics in ParquetTable and ParquetExec [\#537](https://github.com/apache/datafusion/issues/537)
**Fixed bugs:**
- Clippy failing on master [\#4100](https://github.com/apache/datafusion/issues/4100)
- Panic when the number of partitions of the pipeline that throws the exception is inconsistent with the number of partitions output by the query [\#4096](https://github.com/apache/datafusion/issues/4096)
- FieldNotFound when field is available [\#4083](https://github.com/apache/datafusion/issues/4083)
- SingleDistinctToGroupBy being applied too broadly [\#4082](https://github.com/apache/datafusion/issues/4082)
- single_distinct_to_groupby strips qualifiers from group-by expressions [\#4049](https://github.com/apache/datafusion/issues/4049)
- Another Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate: [\#4046](https://github.com/apache/datafusion/issues/4046)
- Decimal multiplied by Float produces incorrect results [\#4035](https://github.com/apache/datafusion/issues/4035)
- Cannot query external table - TableScan replaced with EmptyExec [\#4027](https://github.com/apache/datafusion/issues/4027)
- benchmark q17 produces incorrect result [\#4026](https://github.com/apache/datafusion/issues/4026)
- benchmark q14 produces incorrect result [\#4025](https://github.com/apache/datafusion/issues/4025)
- benchmark q11 producing incorrect results [\#4023](https://github.com/apache/datafusion/issues/4023)
- Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:" [\#4006](https://github.com/apache/datafusion/issues/4006)
- Incorrect results with parquet filtering pushdown enabled [\#4005](https://github.com/apache/datafusion/issues/4005)
- Wrong results when parquet page index filtering is enabled [\#4002](https://github.com/apache/datafusion/issues/4002)
- Output schema of semi join has invalid projection added after HashBuildProbeOrder [\#4001](https://github.com/apache/datafusion/issues/4001)
- `async` deserialization functions are unintuitive and possibly insecure [\#3977](https://github.com/apache/datafusion/issues/3977)
- `Expr::to_bytes` can produce output that hits `Expr::from_bytes` recursion limit [\#3968](https://github.com/apache/datafusion/issues/3968)
- Bug on propagating arrow field metadata [\#3964](https://github.com/apache/datafusion/issues/3964)
- Predicate still has cast when comparing Timestamp\(Nano, None\) to a timestamp literal, so can't be pushed down or used for pruning [\#3938](https://github.com/apache/datafusion/issues/3938)
- Error using `IN` list on dictionary encoded data: `InList does not support datatype Dictionary(Int32, Utf8).` [\#3936](https://github.com/apache/datafusion/issues/3936)
- Internal error in CAST from Timestamp\[us\] [\#3922](https://github.com/apache/datafusion/issues/3922)
- ScalarValue not implemented for FixedSizeBinary types [\#3910](https://github.com/apache/datafusion/issues/3910)
- \[DOC\] - There are unsupported DDL in the official documentation [\#3904](https://github.com/apache/datafusion/issues/3904)
- datafusion-proto deserialize with Substring\(str \[from int\] \[for int\]\) fails [\#3901](https://github.com/apache/datafusion/issues/3901)
- `count(Literal)` gives wrong column name [\#3891](https://github.com/apache/datafusion/issues/3891)
- `projection_push_down` adds duplicate projections with multiple passes [\#3881](https://github.com/apache/datafusion/issues/3881)
- Default physical planner generates empty relation for DROP TABLE, CREATE MEMORY TABLE, etc [\#3873](https://github.com/apache/datafusion/issues/3873)
- Binary expression canonical names are incorrect in some cases [\#3865](https://github.com/apache/datafusion/issues/3865)
- Using the window function lag causes panic. [\#3830](https://github.com/apache/datafusion/issues/3830)
- chrono crate : specify 0.4.22 as the minimum version due to spurious build failures [\#3827](https://github.com/apache/datafusion/issues/3827)
- datafusion-proto deserialize with q16 sql fails [\#3820](https://github.com/apache/datafusion/issues/3820)
- Filter predicates should not be aliased [\#3795](https://github.com/apache/datafusion/issues/3795)
- Write csv not save all lines of dataframe [\#3783](https://github.com/apache/datafusion/issues/3783)
- Regression in simplifying expressions in subqueries [\#3760](https://github.com/apache/datafusion/issues/3760)
- DataFusionError\(Internal\("The size of the sorted batch is larger than the size of the input batch: 2120 \> 2312"\)\) [\#3747](https://github.com/apache/datafusion/issues/3747)
- "labeler" PR check is broken [\#3743](https://github.com/apache/datafusion/issues/3743)
- `DataFrame::select_columns` doesn't work with names containing "." [\#3733](https://github.com/apache/datafusion/issues/3733)
- TPC-H Query 1 has regressed [\#3729](https://github.com/apache/datafusion/issues/3729)
- \[RUST\]\[Datafusion\] What causes "Error: Execution\("file size of 4 is less than footer"\)" error? [\#3800](https://github.com/apache/datafusion/issues/3800)
- Field names containing periods such as f.c cannot work [\#3682](https://github.com/apache/datafusion/issues/3682)
- TableProvider implementation for DataFrame does not support filter pushdown [\#3681](https://github.com/apache/datafusion/issues/3681)
- using Decimal\(0\) make system panicked [\#3665](https://github.com/apache/datafusion/issues/3665)
- Cannot query some parquet files in S3, but they work locally [\#3633](https://github.com/apache/datafusion/issues/3633)
- ` col / col` returns `1` when `col = 0` [\#3615](https://github.com/apache/datafusion/issues/3615)
- register_csv allow space in table_path [\#3589](https://github.com/apache/datafusion/issues/3589)
- Hardcoded u64 for WindowFrameBound fields [\#3571](https://github.com/apache/datafusion/issues/3571)
- `docs.rs` cannot build `datafusion-proto` crate [\#3538](https://github.com/apache/datafusion/issues/3538)
- Row Hash loads whole aggregation state to memory before sending [\#3460](https://github.com/apache/datafusion/issues/3460)
- approx_percentile_cont return wrong result when scan multi parquet files. [\#3140](https://github.com/apache/datafusion/issues/3140)
- User guide is incorrect regarding using CLI to register CSV files using schema inference [\#3001](https://github.com/apache/datafusion/issues/3001)
- Exception: Internal error, Exception: Schema error [\#2938](https://github.com/apache/datafusion/issues/2938)
- Version 0.6.0 Panic error during SQL execution [\#2738](https://github.com/apache/datafusion/issues/2738)
- wrong result when operation parquet [\#2044](https://github.com/apache/datafusion/issues/2044)
- Local object store accepts file:/// as base path, but LocalStore returns meta without the prefix. [\#1923](https://github.com/apache/datafusion/issues/1923)
- Reading nested parquet files results in `index out of bounds` [\#1383](https://github.com/apache/datafusion/issues/1383)
- `-` \(negation\) with NULL literals does not work: can't be evaluated because the expression's type is Utf8, not signed [\#1192](https://github.com/apache/datafusion/issues/1192)
- Inconsistent cast behavior [\#957](https://github.com/apache/datafusion/issues/957)
- single_distinct_to_groupby no longer drops qualifiers [\#4050](https://github.com/apache/datafusion/pull/4050) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
**Documentation updates:**
- Clarify in docs that Identifiers are made lower-case in SQL query [\#2374](https://github.com/apache/datafusion/issues/2374)
- Fix broken links in contributor guide [\#3956](https://github.com/apache/datafusion/pull/3956) ([Jefffrey](https://github.com/Jefffrey))
- add create view explanation [\#3925](https://github.com/apache/datafusion/pull/3925) ([retikulum](https://github.com/retikulum))
- Update `datafusion-examples` README [\#3814](https://github.com/apache/datafusion/pull/3814) ([alamb](https://github.com/alamb))
- Add Seafowl to list of projects using DataFusion [\#3792](https://github.com/apache/datafusion/pull/3792) ([mildbyte](https://github.com/mildbyte))
**Closed issues:**
- \[QUESTION\] How many times should be the function `create_name` called when executing a query? [\#3900](https://github.com/apache/datafusion/issues/3900)
- Improve the `Expr` string format [\#3878](https://github.com/apache/datafusion/issues/3878)
- Simplify division by zero \(division by one / multiplication by zero / multiplication by one\) for Decimal types as well [\#3643](https://github.com/apache/datafusion/issues/3643)
- InList: merge check branch [\#2833](https://github.com/apache/datafusion/issues/2833)
- Optimization InList: compare the float data type using OrderedFloat\<T\> [\#2831](https://github.com/apache/datafusion/issues/2831)
- Outdated section of the add function of the contribution guide [\#2560](https://github.com/apache/datafusion/issues/2560)
- Optimize InList implementation with native types rather than ScalarValue [\#2165](https://github.com/apache/datafusion/issues/2165)
- Improve testing of optimizers using EXPLAIN [\#1118](https://github.com/apache/datafusion/issues/1118)
- Crash on parsing sql query with Cyrillic letters [\#184](https://github.com/apache/datafusion/issues/184)
- \[EPIC\] Support all TPC-H queries in benchmark [\#158](https://github.com/apache/datafusion/issues/158)
- Implement optional second argument to ltrim and rtrim functions [\#144](https://github.com/apache/datafusion/issues/144)
- Benchmark crate does not have a SIMD feature [\#124](https://github.com/apache/datafusion/issues/124)
- ColumnarValue::into_array should not require batch [\#113](https://github.com/apache/datafusion/issues/113)
- \[Rust\] Parquet data source does not support complex types [\#83](https://github.com/apache/datafusion/issues/83)
**Merged pull requests:**
- Appease new clippy [\#4101](https://github.com/apache/datafusion/pull/4101) ([alamb](https://github.com/alamb))
- minor: Split parquet reader up into smaller modules [\#4099](https://github.com/apache/datafusion/pull/4099) ([alamb](https://github.com/alamb))
- \[MINOR\] Update `SET` in cli.md [\#4098](https://github.com/apache/datafusion/pull/4098) ([waitingkuo](https://github.com/waitingkuo))
- fix: Scheduler panic routing errors [\#4097](https://github.com/apache/datafusion/pull/4097) ([yukkit](https://github.com/yukkit))
- Automatically register tables if ObjectStore root is configured [\#4095](https://github.com/apache/datafusion/pull/4095) ([avantgardnerio](https://github.com/avantgardnerio))
- minor: Use Operator::swap [\#4092](https://github.com/apache/datafusion/pull/4092) ([alamb](https://github.com/alamb))
- Simplify small InListExpr [\#4090](https://github.com/apache/datafusion/pull/4090) ([Dandandan](https://github.com/Dandandan))
- Minor: Add arrow-rs ticket reference and turn some comments into docstrings [\#4088](https://github.com/apache/datafusion/pull/4088) ([alamb](https://github.com/alamb))
- Support Dictionary in InListExpr [\#4070](https://github.com/apache/datafusion/pull/4070) ([tustvold](https://github.com/tustvold))
- support `SET` variable [\#4069](https://github.com/apache/datafusion/pull/4069) [[sql](https://github.com/apache/datafusion/labels/sql)] ([waitingkuo](https://github.com/waitingkuo))
- Add in list bench [\#4068](https://github.com/apache/datafusion/pull/4068) ([tustvold](https://github.com/tustvold))
- Improve Error Handling and Readibility for downcasting `StructArray` [\#4061](https://github.com/apache/datafusion/pull/4061) ([retikulum](https://github.com/retikulum))
- Build tests separately from running [\#4060](https://github.com/apache/datafusion/pull/4060) ([alamb](https://github.com/alamb))
- Simplify InListExpr ~20-70% Faster [\#4057](https://github.com/apache/datafusion/pull/4057) ([tustvold](https://github.com/tustvold))
- MINOR: Print unoptimized logical plan in execute_query of tpch benchmark [\#4056](https://github.com/apache/datafusion/pull/4056) ([viirya](https://github.com/viirya))
- Minor: clean the code in `eliminate_filter` [\#4055](https://github.com/apache/datafusion/pull/4055) ([HaoYang670](https://github.com/HaoYang670))
- Implement `current_time` scalar function [\#4054](https://github.com/apache/datafusion/pull/4054) ([naosense](https://github.com/naosense))
- Cleanup hash_utils adding support for decimal256 and f16 [\#4053](https://github.com/apache/datafusion/pull/4053) ([tustvold](https://github.com/tustvold))
- Fix multicolumn parquet predicate pushdown \(\#4046\) [\#4048](https://github.com/apache/datafusion/pull/4048) ([tustvold](https://github.com/tustvold))
- Add CI checks that we can serde all benchmark queries [\#4047](https://github.com/apache/datafusion/pull/4047) ([andygrove](https://github.com/andygrove))
- Enable more benchmark verification tests [\#4044](https://github.com/apache/datafusion/pull/4044) ([andygrove](https://github.com/andygrove))
- Extract common parquet testing code to `parquet-test-util` crate [\#4042](https://github.com/apache/datafusion/pull/4042) ([alamb](https://github.com/alamb))
- add uuid\(\) function [\#4041](https://github.com/apache/datafusion/pull/4041) ([Jimexist](https://github.com/Jimexist))
- Update to arrow 26, change timezones [\#4039](https://github.com/apache/datafusion/pull/4039) [[sql](https://github.com/apache/datafusion/labels/sql)] ([tustvold](https://github.com/tustvold))
- Fix Decimal and Floating type coerce rule [\#4038](https://github.com/apache/datafusion/pull/4038) ([viirya](https://github.com/viirya))
- Reserve the literal expression of `Count` function [\#4031](https://github.com/apache/datafusion/pull/4031) [[sql](https://github.com/apache/datafusion/labels/sql)] ([HaoYang670](https://github.com/HaoYang670))
- Implement current_date scalar function [\#4022](https://github.com/apache/datafusion/pull/4022) ([comphead](https://github.com/comphead))
- Fix predicate pushdown bugs: project columns within DatafusionArrowPredicate \(\#4005\) \(\#4006\) [\#4021](https://github.com/apache/datafusion/pull/4021) ([tustvold](https://github.com/tustvold))
- minor: remove redundant code/TODO [\#4019](https://github.com/apache/datafusion/pull/4019) ([jackwener](https://github.com/jackwener))
- Add CI check to verify that benchmark queries return the expected results [\#4015](https://github.com/apache/datafusion/pull/4015) ([andygrove](https://github.com/andygrove))
- Minor: Add TODO and tracking ticket reference [\#4012](https://github.com/apache/datafusion/pull/4012) ([alamb](https://github.com/alamb))
- Add right anti join support and support it in HashBuildProbeOrder [\#4011](https://github.com/apache/datafusion/pull/4011) ([Dandandan](https://github.com/Dandandan))
- MINOR: Generate expected benchmark query results [\#4010](https://github.com/apache/datafusion/pull/4010) ([andygrove](https://github.com/andygrove))
- Minor: remove unecessary clippy allow [\#4008](https://github.com/apache/datafusion/pull/4008) ([alamb](https://github.com/alamb))
- Minor: Do what clippy says and clean up some code [\#4007](https://github.com/apache/datafusion/pull/4007) ([alamb](https://github.com/alamb))
- Improve Error Handling and Readibility for downcasting `Date32Array` [\#4004](https://github.com/apache/datafusion/pull/4004) ([retikulum](https://github.com/retikulum))
- Don't add projection for semi joins in HashBuildProbeOrder [\#4000](https://github.com/apache/datafusion/pull/4000) ([Dandandan](https://github.com/Dandandan))
- Minor: use `DataType::is_nested` [\#3995](https://github.com/apache/datafusion/pull/3995) ([alamb](https://github.com/alamb))
- \[minor\] bump prettier version [\#3992](https://github.com/apache/datafusion/pull/3992) ([Jimexist](https://github.com/Jimexist))
- Add parquet predicate pushdown metrics [\#3989](https://github.com/apache/datafusion/pull/3989) ([alamb](https://github.com/alamb))
- Pin datafusion-proto build dependencies [\#3987](https://github.com/apache/datafusion/pull/3987) ([tustvold](https://github.com/tustvold))
- Add TableProvider.statistics method [\#3986](https://github.com/apache/datafusion/pull/3986) ([andygrove](https://github.com/andygrove))
- Add Pull Request guidelines to contributor guide [\#3985](https://github.com/apache/datafusion/pull/3985) ([alamb](https://github.com/alamb))
- Update protos [\#3979](https://github.com/apache/datafusion/pull/3979) ([tustvold](https://github.com/tustvold))
- Revert async changes but keep deltalake working [\#3978](https://github.com/apache/datafusion/pull/3978) ([avantgardnerio](https://github.com/avantgardnerio))
- Correctness integration test for parquet filter pushdown [\#3976](https://github.com/apache/datafusion/pull/3976) ([alamb](https://github.com/alamb))
- MINOR: Stop pretty printing batches in benchmark when there are no results [\#3974](https://github.com/apache/datafusion/pull/3974) ([andygrove](https://github.com/andygrove))
- MINOR: Re-export Cast struct [\#3971](https://github.com/apache/datafusion/pull/3971) ([andygrove](https://github.com/andygrove))
- fix: check recursion limit in `Expr::to_bytes` [\#3970](https://github.com/apache/datafusion/pull/3970) ([crepererum](https://github.com/crepererum))
- \[Part1\] Partition and Sort Enforcement, PhysicalExpr enhancement [\#3969](https://github.com/apache/datafusion/pull/3969) ([mingmwang](https://github.com/mingmwang))
- Support pushdown multi-columns in PageIndex pruning. [\#3967](https://github.com/apache/datafusion/pull/3967) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Fix benchmarks README formatting [\#3966](https://github.com/apache/datafusion/pull/3966) ([Jefffrey](https://github.com/Jefffrey))
- Bug fix on DFField to Field conversion: preserve metadata [\#3965](https://github.com/apache/datafusion/pull/3965) ([metesynnada](https://github.com/metesynnada))
- Informative Error Message for LAG and LEAD functions [\#3963](https://github.com/apache/datafusion/pull/3963) ([mustafasrepo](https://github.com/mustafasrepo))
- Minor: Add some docstrings to `FileScanConfig` and `RuntimeEnv` [\#3962](https://github.com/apache/datafusion/pull/3962) ([alamb](https://github.com/alamb))
- Move common code used for testing code into datafusion/test_utils [\#3961](https://github.com/apache/datafusion/pull/3961) ([alamb](https://github.com/alamb))
- Update minimum chrono dependency to 0.4.22 [\#3959](https://github.com/apache/datafusion/pull/3959) ([alamb](https://github.com/alamb))
- Implement right semi join and support in HashBuildProbeorder [\#3958](https://github.com/apache/datafusion/pull/3958) ([Dandandan](https://github.com/Dandandan))
- Print the configurations of ConfigOptions in an ordered way so that we can directly compare the equality of two ConfigOptions by their debug strings [\#3953](https://github.com/apache/datafusion/pull/3953) ([yahoNanJing](https://github.com/yahoNanJing))
- Vendor Generated Protobuf Code \(\#3947\) [\#3950](https://github.com/apache/datafusion/pull/3950) ([tustvold](https://github.com/tustvold))
- Implement serialization for ScalarValue::FixedSizeBinary [\#3943](https://github.com/apache/datafusion/pull/3943) ([retikulum](https://github.com/retikulum))
- Consolidate physical join code into `datafusion/core/src/physical_plan/joins` [\#3942](https://github.com/apache/datafusion/pull/3942) ([alamb](https://github.com/alamb))
- Add optimizer test for simplifying predicates on timestamps [\#3939](https://github.com/apache/datafusion/pull/3939) ([alamb](https://github.com/alamb))
- Add test for querying predicate on dictionary [\#3937](https://github.com/apache/datafusion/pull/3937) ([alamb](https://github.com/alamb))
- fix: return error for unsupported SQL [\#3933](https://github.com/apache/datafusion/pull/3933) ([Kikkon](https://github.com/Kikkon))
- doc: fix doc about `CREATE TABLE IF NOT EXISTS` [\#3932](https://github.com/apache/datafusion/pull/3932) ([jackwener](https://github.com/jackwener))
- Refactor Expr::Cast to use a struct. [\#3931](https://github.com/apache/datafusion/pull/3931) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jackwener](https://github.com/jackwener))
- minor: fix some typo. [\#3930](https://github.com/apache/datafusion/pull/3930) ([jackwener](https://github.com/jackwener))
- chore: update cranelift-related dependencies [\#3926](https://github.com/apache/datafusion/pull/3926) ([xudong963](https://github.com/xudong963))
- Change cast error from Internal to NotImplemented [\#3924](https://github.com/apache/datafusion/pull/3924) ([alamb](https://github.com/alamb))
- Support inlining view / dataframes logical plan [\#3923](https://github.com/apache/datafusion/pull/3923) ([Dandandan](https://github.com/Dandandan))
- Add test for Simplify redundant predicates [\#3915](https://github.com/apache/datafusion/pull/3915) ([src255](https://github.com/src255))
- Implement ScalarValue for FixedSizeBinary [\#3911](https://github.com/apache/datafusion/pull/3911) ([maxburke](https://github.com/maxburke))
- Add serde for plans with tables from `TableProviderFactory`s [\#3907](https://github.com/apache/datafusion/pull/3907) ([avantgardnerio](https://github.com/avantgardnerio))
- Support filter/limit pushdown for views/dataframes [\#3905](https://github.com/apache/datafusion/pull/3905) ([Dandandan](https://github.com/Dandandan))
- Factorize common AND factors out of OR predicates to support filterPu… [\#3903](https://github.com/apache/datafusion/pull/3903) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Add `Substring(str [from int] [for int])` support in `datafusion-proto` [\#3902](https://github.com/apache/datafusion/pull/3902) ([r4ntix](https://github.com/r4ntix))
- Revert "Factorize common AND factors out of OR predicates to supportfilter Pu… \(\#3859\)" [\#3897](https://github.com/apache/datafusion/pull/3897) ([alamb](https://github.com/alamb))
- MINOR: Add notes on Apache Reporter [\#3893](https://github.com/apache/datafusion/pull/3893) ([andygrove](https://github.com/andygrove))
- Allow configuring collection of statistics during TPC-H benchmarks [\#3889](https://github.com/apache/datafusion/pull/3889) ([isidentical](https://github.com/isidentical))
- Improve formatting of binary expressions [\#3884](https://github.com/apache/datafusion/pull/3884) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Multiple optimizer passes [\#3880](https://github.com/apache/datafusion/pull/3880) ([andygrove](https://github.com/andygrove))
- \[MINOR\] Update docs with newly added configuration values [\#3877](https://github.com/apache/datafusion/pull/3877) ([alamb](https://github.com/alamb))
- \[MINOR\] Add a hint about how to resolve the `Cargo.lock` CI check [\#3876](https://github.com/apache/datafusion/pull/3876) ([alamb](https://github.com/alamb))
- Add `LogicalPlan::ViewTable` support in `datafusion-proto` [\#3875](https://github.com/apache/datafusion/pull/3875) ([r4ntix](https://github.com/r4ntix))
- Optimize the `concat_ws` function [\#3869](https://github.com/apache/datafusion/pull/3869) ([HaoYang670](https://github.com/HaoYang670))
- Implement foundational filter selectivity analysis [\#3868](https://github.com/apache/datafusion/pull/3868) ([isidentical](https://github.com/isidentical))
- Update `TableProviderFactory` trait to support real-world use-cases [\#3867](https://github.com/apache/datafusion/pull/3867) ([avantgardnerio](https://github.com/avantgardnerio))
- put subquery's equal clause into join on clauses instead of filter cl… [\#3862](https://github.com/apache/datafusion/pull/3862) ([AssHero](https://github.com/AssHero))
- Factorize common AND factors out of OR predicates to support filterPu… [\#3859](https://github.com/apache/datafusion/pull/3859) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Enable mimalloc by default in benchmark [\#3853](https://github.com/apache/datafusion/pull/3853) ([Dandandan](https://github.com/Dandandan))
- Refactor `Expr::Between` to use a struct [\#3850](https://github.com/apache/datafusion/pull/3850) [[sql](https://github.com/apache/datafusion/labels/sql)] ([b41sh](https://github.com/b41sh))
- Handle cardinality estimation for disjoint inner and outer joins [\#3848](https://github.com/apache/datafusion/pull/3848) ([isidentical](https://github.com/isidentical))
- Add setting for statistics collection [\#3846](https://github.com/apache/datafusion/pull/3846) ([Dandandan](https://github.com/Dandandan))
- Update to arrow 25.0.0 [\#3844](https://github.com/apache/datafusion/pull/3844) [[sql](https://github.com/apache/datafusion/labels/sql)] ([tustvold](https://github.com/tustvold))
- Tweak list of optimization rules [\#3841](https://github.com/apache/datafusion/pull/3841) ([Dandandan](https://github.com/Dandandan))
- Refactor Expr::GetIndexedField to use a struct [\#3838](https://github.com/apache/datafusion/pull/3838) [[sql](https://github.com/apache/datafusion/labels/sql)] ([ygf11](https://github.com/ygf11))
- Infer the count of maximum distinct values from min/max [\#3837](https://github.com/apache/datafusion/pull/3837) ([isidentical](https://github.com/isidentical))
- Refactor `Expr::Like`, `Expr::ILike`, `Expr::SimilarTo` to use a struct [\#3836](https://github.com/apache/datafusion/pull/3836) [[sql](https://github.com/apache/datafusion/labels/sql)] ([b41sh](https://github.com/b41sh))
- Refactor Expr::BinaryExpr to use a struct [\#3835](https://github.com/apache/datafusion/pull/3835) [[sql](https://github.com/apache/datafusion/labels/sql)] ([zhoudongyan](https://github.com/zhoudongyan))
- update postgres version to 15 in integration test [\#3831](https://github.com/apache/datafusion/pull/3831) ([Jimexist](https://github.com/Jimexist))
- Fix the panic when lpad/rpad parameter is negative [\#3829](https://github.com/apache/datafusion/pull/3829) ([ZuoTiJia](https://github.com/ZuoTiJia))
- MINOR: Document SHOW ALL in the users guide [\#3826](https://github.com/apache/datafusion/pull/3826) ([alamb](https://github.com/alamb))
- MINOR: Add datafusion-cli documentation on showing configuration [\#3825](https://github.com/apache/datafusion/pull/3825) ([alamb](https://github.com/alamb))
- Add/Remove Division Rules [\#3824](https://github.com/apache/datafusion/pull/3824) ([retikulum](https://github.com/retikulum))
- Minor: Sort the output of SHOW ALL by config name [\#3823](https://github.com/apache/datafusion/pull/3823) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- Add `precision != 0` check when making decimal type [\#3818](https://github.com/apache/datafusion/pull/3818) [[sql](https://github.com/apache/datafusion/labels/sql)] ([HaoYang670](https://github.com/HaoYang670))
- Infer schema when running benchmarks against parquet [\#3817](https://github.com/apache/datafusion/pull/3817) ([andygrove](https://github.com/andygrove))
- Finish removing deprecated `datafusion::logical_plan` module [\#3816](https://github.com/apache/datafusion/pull/3816) ([andygrove](https://github.com/andygrove))
- Clarify initial example with respect to capitalization [\#3815](https://github.com/apache/datafusion/pull/3815) ([alamb](https://github.com/alamb))
- Improve expression simplification by running it twice [\#3811](https://github.com/apache/datafusion/pull/3811) ([alamb](https://github.com/alamb))
- Make expression manipulation consistent and easier to use: `combine/split filter` `conjunction`, etc [\#3810](https://github.com/apache/datafusion/pull/3810) ([alamb](https://github.com/alamb))
- Consolidate expression manipulation functions into `datafusion_optimizer` [\#3809](https://github.com/apache/datafusion/pull/3809) ([alamb](https://github.com/alamb))
- Optimize `regexp_replace` when the input is a sparse array [\#3804](https://github.com/apache/datafusion/pull/3804) ([isidentical](https://github.com/isidentical))
- Stop ignoring errors when writing DataFrame to csv, parquet, json [\#3801](https://github.com/apache/datafusion/pull/3801) ([andygrove](https://github.com/andygrove))
- Update datafusion-cli Cargo.lock to fix CI check on master [\#3799](https://github.com/apache/datafusion/pull/3799) ([alamb](https://github.com/alamb))
- MINOR: Benchmark regression tests [\#3790](https://github.com/apache/datafusion/pull/3790) ([andygrove](https://github.com/andygrove))
- MINOR: Optimizer example and docs, deprecate `Expr::name` [\#3788](https://github.com/apache/datafusion/pull/3788) ([andygrove](https://github.com/andygrove))
- Join cardinality computation for cost-based nested join optimizations [\#3787](https://github.com/apache/datafusion/pull/3787) ([isidentical](https://github.com/isidentical))
- Optimizer now simplifies multiplication, division, module arg is a literal Decimal zero or one [\#3782](https://github.com/apache/datafusion/pull/3782) ([drrtuy](https://github.com/drrtuy))
- Implement parquet page-level skipping with column index, using min/ma… [\#3780](https://github.com/apache/datafusion/pull/3780) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Bump actions/labeler from 4.0.1 to 4.0.2 [\#3779](https://github.com/apache/datafusion/pull/3779) ([dependabot[bot]](https://github.com/apps/dependabot))
- MINOR: correct `ListingOptions.try_new` docs to include the enabled stat collection [\#3775](https://github.com/apache/datafusion/pull/3775) ([isidentical](https://github.com/isidentical))
- Teach a negative NULL expression to return NULL instead of an error [\#3771](https://github.com/apache/datafusion/pull/3771) ([drrtuy](https://github.com/drrtuy))
- Add benchmarks for testing row filtering [\#3769](https://github.com/apache/datafusion/pull/3769) ([thinkharderdev](https://github.com/thinkharderdev))
- move type coercion of agg and agg_udaf to logical phase [\#3768](https://github.com/apache/datafusion/pull/3768) ([liukun4515](https://github.com/liukun4515))
- User Guide: Add `EXPLAIN` to SQL reference [\#3767](https://github.com/apache/datafusion/pull/3767) ([unvalley](https://github.com/unvalley))
- Allow specialized implementations to produce hints for the array adapter [\#3765](https://github.com/apache/datafusion/pull/3765) ([isidentical](https://github.com/isidentical))
- Fix optimizer regression with simplifying expressions in subquery filters [\#3764](https://github.com/apache/datafusion/pull/3764) ([andygrove](https://github.com/andygrove))
- Run all `datafusion-examples` in CI tests [\#3761](https://github.com/apache/datafusion/pull/3761) ([alamb](https://github.com/alamb))
- MINOR: Remove deprecated module `datafusion::logical_plan::plan` [\#3759](https://github.com/apache/datafusion/pull/3759) ([andygrove](https://github.com/andygrove))
- Refactor `Expr::Case` to use a struct [\#3757](https://github.com/apache/datafusion/pull/3757) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Do not run labeler CI check if it would fail due to permissions [\#3756](https://github.com/apache/datafusion/pull/3756) ([alamb](https://github.com/alamb))
- MINOR: Improvements to `scalar_subquery_to_join` error handling [\#3754](https://github.com/apache/datafusion/pull/3754) ([andygrove](https://github.com/andygrove))
- Always track the final size of the in-mem sorted arrays [\#3753](https://github.com/apache/datafusion/pull/3753) ([isidentical](https://github.com/isidentical))
- Fix DataFrame::select_columns to handle column names with a period [\#3751](https://github.com/apache/datafusion/pull/3751) ([zhoudongyan](https://github.com/zhoudongyan))
- Fix `ListingTableUrl` to decode percent [\#3750](https://github.com/apache/datafusion/pull/3750) ([unvalley](https://github.com/unvalley))
- remove `type coercion` for physical ScalarFunction [\#3749](https://github.com/apache/datafusion/pull/3749) ([liukun4515](https://github.com/liukun4515))
- CI: Add a new run to check whether `datafusion-cli` lock file is up-to-date [\#3745](https://github.com/apache/datafusion/pull/3745) ([isidentical](https://github.com/isidentical))
- Add datafusion example of expression apis [\#3741](https://github.com/apache/datafusion/pull/3741) ([alamb](https://github.com/alamb))
- fix subquery where exists distinct [\#3732](https://github.com/apache/datafusion/pull/3732) ([b41sh](https://github.com/b41sh))
- Remove some uneeded code in `CommonSubexprEliminate` [\#3730](https://github.com/apache/datafusion/pull/3730) ([alamb](https://github.com/alamb))
- Consolidate and better tests for expression re-rewriting / aliasing [\#3727](https://github.com/apache/datafusion/pull/3727) ([alamb](https://github.com/alamb))
- Fix output schema generated by CommonSubExprEliminate [\#3726](https://github.com/apache/datafusion/pull/3726) ([alex-natzka](https://github.com/alex-natzka))
- Add type coercion rule for `concat` and `concat_ws` [\#3721](https://github.com/apache/datafusion/pull/3721) ([HaoYang670](https://github.com/HaoYang670))
- Expose and document a simpler public API for simplify expressions [\#3719](https://github.com/apache/datafusion/pull/3719) ([ygf11](https://github.com/ygf11))
- Remove dead code in `UnwrapCastExprRewriter` that may mask errors [\#3703](https://github.com/apache/datafusion/pull/3703) ([alamb](https://github.com/alamb))
- Fix `DataFrame::with_column` to handle creating column names with a period [\#3700](https://github.com/apache/datafusion/pull/3700) ([alamb](https://github.com/alamb))
- Add simplification rules for the `CONCAT` function [\#3684](https://github.com/apache/datafusion/pull/3684) ([HaoYang670](https://github.com/HaoYang670))
- Compressed CSV/JSON support [\#3642](https://github.com/apache/datafusion/pull/3642) [[sql](https://github.com/apache/datafusion/labels/sql)] ([Licht-T](https://github.com/Licht-T))
- Simplify serialization by removing redundant `PrimitiveScalarValue` [\#3612](https://github.com/apache/datafusion/pull/3612) ([alamb](https://github.com/alamb))
- Pushdown single column predicates from ON join clauses [\#3578](https://github.com/apache/datafusion/pull/3578) ([AssHero](https://github.com/AssHero))
- Simplify the serialization of `ScalarValue::List` [\#3547](https://github.com/apache/datafusion/pull/3547) ([alamb](https://github.com/alamb))
- Generate hash aggregation output in smaller record batches [\#3461](https://github.com/apache/datafusion/pull/3461) ([milenkovicm](https://github.com/milenkovicm))
- Improve doc on lowercase treatment of columns on SQL [\#3385](https://github.com/apache/datafusion/pull/3385) ([nanicpc](https://github.com/nanicpc))