blob: 2681d522c601369983f93cfe4e256463be993fbf [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## [5.0.0](https://github.com/apache/datafusion/tree/5.0.0) (2021-08-10)
[Full Changelog](https://github.com/apache/datafusion/compare/4.0.0...5.0.0)
**Breaking changes:**
- Box ScalarValue:Lists, reduce size by half size [\#788](https://github.com/apache/datafusion/pull/788) ([alamb](https://github.com/alamb))
- JOIN conditions are order dependent [\#778](https://github.com/apache/datafusion/pull/778) ([seddonm1](https://github.com/seddonm1))
- Show the result of all optimizer passes in EXPLAIN VERBOSE [\#759](https://github.com/apache/datafusion/pull/759) ([alamb](https://github.com/alamb))
- \#723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning [\#749](https://github.com/apache/datafusion/pull/749) ([lvheyang](https://github.com/lvheyang))
- Update API for extension planning to include logical plan [\#643](https://github.com/apache/datafusion/pull/643) ([alamb](https://github.com/alamb))
- Rename MergeExec to CoalescePartitionsExec [\#635](https://github.com/apache/datafusion/pull/635) ([andygrove](https://github.com/andygrove))
- fix 593, reduce cloning by taking ownership in logical planner's `from` fn [\#610](https://github.com/apache/datafusion/pull/610) ([Jimexist](https://github.com/Jimexist))
- fix join column handling logic for `On` and `Using` constraints [\#605](https://github.com/apache/datafusion/pull/605) ([houqp](https://github.com/houqp))
- Rewrite pruning logic in terms of PruningStatistics using Array trait \(option 2\) [\#426](https://github.com/apache/datafusion/pull/426) ([alamb](https://github.com/alamb))
- Support reading from NdJson formatted data sources [\#404](https://github.com/apache/datafusion/pull/404) ([heymind](https://github.com/heymind))
- Add metrics to RepartitionExec [\#398](https://github.com/apache/datafusion/pull/398) ([andygrove](https://github.com/andygrove))
- Use 4.x arrow-rs from crates.io rather than git sha [\#395](https://github.com/apache/datafusion/pull/395) ([alamb](https://github.com/alamb))
- Return Vec\<bool\> from PredicateBuilder rather than an `Fn` [\#370](https://github.com/apache/datafusion/pull/370) ([alamb](https://github.com/alamb))
- Refactor: move RowGroupPredicateBuilder into its own module, rename to PruningPredicateBuilder [\#365](https://github.com/apache/datafusion/pull/365) ([alamb](https://github.com/alamb))
- \[Datafusion\] NOW\(\) function support [\#288](https://github.com/apache/datafusion/pull/288) ([msathis](https://github.com/msathis))
- Implement select distinct [\#262](https://github.com/apache/datafusion/pull/262) ([Dandandan](https://github.com/Dandandan))
- Refactor datafusion/src/physical_plan/common.rs build_file_list to take less param and reuse code [\#253](https://github.com/apache/datafusion/pull/253) ([Jimexist](https://github.com/Jimexist))
- Support qualified columns in queries [\#55](https://github.com/apache/datafusion/pull/55) ([houqp](https://github.com/houqp))
- Read CSV format text from stdin or memory [\#54](https://github.com/apache/datafusion/pull/54) ([heymind](https://github.com/heymind))
- Use atomics for SQLMetric implementation, remove unused name field [\#25](https://github.com/apache/datafusion/pull/25) ([returnString](https://github.com/returnString))
**Implemented enhancements:**
- Allow extension nodes to correctly plan physical expressions with relations [\#642](https://github.com/apache/datafusion/issues/642)
- Filters aren't passed down to table scans in a union [\#557](https://github.com/apache/datafusion/issues/557)
- Support pruning for `boolean` columns [\#490](https://github.com/apache/datafusion/issues/490)
- Implement SQLMetrics for RepartitionExec [\#397](https://github.com/apache/datafusion/issues/397)
- DataFusion benchmarks should show executed plan with metrics after query completes [\#396](https://github.com/apache/datafusion/issues/396)
- Use published versions of arrow rather than github shas [\#393](https://github.com/apache/datafusion/issues/393)
- Add Compare to GroupByScalar [\#364](https://github.com/apache/datafusion/issues/364)
- Reusable "row group pruning" logic [\#363](https://github.com/apache/datafusion/issues/363)
- Add an Order Preserving merge operator [\#362](https://github.com/apache/datafusion/issues/362)
- Implement Postgres compatible `now()` function [\#251](https://github.com/apache/datafusion/issues/251)
- COUNT DISTINCT does not support dictionary types [\#249](https://github.com/apache/datafusion/issues/249)
- Use standard make_null_array for CASE [\#222](https://github.com/apache/datafusion/issues/222)
- Implement date_trunc\(\) function [\#203](https://github.com/apache/datafusion/issues/203)
- COUNT DISTINCT does not support for `Float64` [\#199](https://github.com/apache/datafusion/issues/199)
- Update SQLMetric to use atomics rather than a Mutex [\#30](https://github.com/apache/datafusion/issues/30)
- Implement PartialOrd for ScalarValue [\#838](https://github.com/apache/datafusion/pull/838) ([viirya](https://github.com/viirya))
- Support date datatypes in max/min [\#820](https://github.com/apache/datafusion/pull/820) ([viirya](https://github.com/viirya))
- Implement vectorized hashing for DictionaryArray types [\#812](https://github.com/apache/datafusion/pull/812) ([alamb](https://github.com/alamb))
- Convert unsupported conditions in left right join to filters [\#796](https://github.com/apache/datafusion/pull/796) [[sql](https://github.com/apache/datafusion/labels/sql)] ([Dandandan](https://github.com/Dandandan))
- Implement streaming versions of Dataframe.collect methods [\#789](https://github.com/apache/datafusion/pull/789) ([andygrove](https://github.com/andygrove))
- impl from str for column and scalar [\#762](https://github.com/apache/datafusion/pull/762) ([Jimexist](https://github.com/Jimexist))
- impl fmt::Display for PlanType [\#752](https://github.com/apache/datafusion/pull/752) ([Jimexist](https://github.com/Jimexist))
- Remove unnecessary projection in logical plan optimization phase [\#747](https://github.com/apache/datafusion/pull/747) ([waynexia](https://github.com/waynexia))
- Support table columns alias [\#735](https://github.com/apache/datafusion/pull/735) ([Dandandan](https://github.com/Dandandan))
- Derive PartialEq for datasource enums [\#734](https://github.com/apache/datafusion/pull/734) ([alamb](https://github.com/alamb))
- Allow filetype to be lowercase, Implement FromStr for FileType [\#728](https://github.com/apache/datafusion/pull/728) ([Jimexist](https://github.com/Jimexist))
- Update to use arrow 5.0 [\#721](https://github.com/apache/datafusion/pull/721) ([alamb](https://github.com/alamb))
- \#554: Lead/lag window function with offset and default value arguments [\#687](https://github.com/apache/datafusion/pull/687) ([jgoday](https://github.com/jgoday))
- dedup using join column in wildcard expansion [\#678](https://github.com/apache/datafusion/pull/678) ([houqp](https://github.com/houqp))
- Implement metrics for HashJoinExec [\#664](https://github.com/apache/datafusion/pull/664) ([andygrove](https://github.com/andygrove))
- Show physical plan with metrics in benchmark [\#662](https://github.com/apache/datafusion/pull/662) ([andygrove](https://github.com/andygrove))
- Allow non-equijoin filters in join condition [\#660](https://github.com/apache/datafusion/pull/660) ([Dandandan](https://github.com/Dandandan))
- Add End-to-end test for parquet pruning + metrics for ParquetExec [\#657](https://github.com/apache/datafusion/pull/657) ([alamb](https://github.com/alamb))
- Add support for leading field in interval [\#647](https://github.com/apache/datafusion/pull/647) ([Dandandan](https://github.com/Dandandan))
- Remove hard-coded PartitionMode from Ballista serde [\#637](https://github.com/apache/datafusion/pull/637) ([andygrove](https://github.com/andygrove))
- Ballista: Implement scalable distributed joins [\#634](https://github.com/apache/datafusion/pull/634) ([andygrove](https://github.com/andygrove))
- implement rank and dense_rank function and refactor built-in window function evaluation [\#631](https://github.com/apache/datafusion/pull/631) ([Jimexist](https://github.com/Jimexist))
- Improve "field not found" error messages [\#625](https://github.com/apache/datafusion/pull/625) ([andygrove](https://github.com/andygrove))
- Support modulus op [\#577](https://github.com/apache/datafusion/pull/577) ([gangliao](https://github.com/gangliao))
- implement `std::default::Default` for execution config [\#570](https://github.com/apache/datafusion/pull/570) ([Jimexist](https://github.com/Jimexist))
- `to_timestamp_millis()`, `to_timestamp_micros()`, `to_timestamp_seconds()` [\#567](https://github.com/apache/datafusion/pull/567) ([velvia](https://github.com/velvia))
- Filter push down for Union [\#559](https://github.com/apache/datafusion/pull/559) ([Dandandan](https://github.com/Dandandan))
- Implement window functions with `partition_by` clause [\#558](https://github.com/apache/datafusion/pull/558) ([Jimexist](https://github.com/Jimexist))
- support table alias in join clause [\#547](https://github.com/apache/datafusion/pull/547) ([houqp](https://github.com/houqp))
- Not equal predicate in physical_planning pruning [\#544](https://github.com/apache/datafusion/pull/544) ([jgoday](https://github.com/jgoday))
- add error handling and boundary checking for window frames [\#530](https://github.com/apache/datafusion/pull/530) ([Jimexist](https://github.com/Jimexist))
- Implement window functions with `order_by` clause [\#520](https://github.com/apache/datafusion/pull/520) ([Jimexist](https://github.com/Jimexist))
- support group by column positions [\#519](https://github.com/apache/datafusion/pull/519) [[sql](https://github.com/apache/datafusion/labels/sql)] ([jychen7](https://github.com/jychen7))
- Implement constant folding for CAST [\#513](https://github.com/apache/datafusion/pull/513) ([msathis](https://github.com/msathis))
- Add window frame constructs - alternative [\#506](https://github.com/apache/datafusion/pull/506) ([Jimexist](https://github.com/Jimexist))
- Add `partition by` constructs in window functions and modify logical planning [\#501](https://github.com/apache/datafusion/pull/501) ([Jimexist](https://github.com/Jimexist))
- Add support for boolean columns in pruning logic [\#500](https://github.com/apache/datafusion/pull/500) ([alamb](https://github.com/alamb))
- \#215 resolve aliases for group by exprs [\#485](https://github.com/apache/datafusion/pull/485) ([jychen7](https://github.com/jychen7))
- Support anti join [\#482](https://github.com/apache/datafusion/pull/482) ([Dandandan](https://github.com/Dandandan))
- Support semi join [\#470](https://github.com/apache/datafusion/pull/470) ([Dandandan](https://github.com/Dandandan))
- add `order by` construct in window function and logical plans [\#463](https://github.com/apache/datafusion/pull/463) ([Jimexist](https://github.com/Jimexist))
- Remove reundant filters \(e.g. c\> 5 AND c\>5 --\> c\>5\) [\#436](https://github.com/apache/datafusion/pull/436) ([jgoday](https://github.com/jgoday))
- fix: display the content of debug explain [\#434](https://github.com/apache/datafusion/pull/434) ([NGA-TRAN](https://github.com/NGA-TRAN))
- implement lead and lag built-in window function [\#429](https://github.com/apache/datafusion/pull/429) ([Jimexist](https://github.com/Jimexist))
- add support for ndjson for datafusion-cli [\#427](https://github.com/apache/datafusion/pull/427) ([Jimexist](https://github.com/Jimexist))
- add `first_value`, `last_value`, and `nth_value` built-in window functions [\#403](https://github.com/apache/datafusion/pull/403) ([Jimexist](https://github.com/Jimexist))
- export both `now` and `random` functions [\#389](https://github.com/apache/datafusion/pull/389) ([Jimexist](https://github.com/Jimexist))
- Function to create `ArrayRef` from an iterator of ScalarValues [\#381](https://github.com/apache/datafusion/pull/381) ([alamb](https://github.com/alamb))
- Sort preserving merge \(\#362\) [\#379](https://github.com/apache/datafusion/pull/379) ([tustvold](https://github.com/tustvold))
- Add support for multiple partitions with SortExec \(\#362\) [\#378](https://github.com/apache/datafusion/pull/378) ([tustvold](https://github.com/tustvold))
- add window expression stream, delegated window aggregation to aggregate functions, and implement `row_number` [\#375](https://github.com/apache/datafusion/pull/375) ([Jimexist](https://github.com/Jimexist))
- Add PartialOrd and Ord to GroupByScalar \(\#364\) [\#368](https://github.com/apache/datafusion/pull/368) ([tustvold](https://github.com/tustvold))
- Implement readable explain plans for physical plans [\#337](https://github.com/apache/datafusion/pull/337) ([alamb](https://github.com/alamb))
- Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only [\#334](https://github.com/apache/datafusion/pull/334) ([Jimexist](https://github.com/Jimexist))
- Use NullArray to Pass row count to ScalarFunctions that take 0 arguments [\#328](https://github.com/apache/datafusion/pull/328) ([Jimexist](https://github.com/Jimexist))
- add --quiet/-q flag and allow timing info to be turned on/off [\#323](https://github.com/apache/datafusion/pull/323) ([Jimexist](https://github.com/Jimexist))
- Implement hash partitioned aggregation [\#320](https://github.com/apache/datafusion/pull/320) ([Dandandan](https://github.com/Dandandan))
- Support COUNT\(DISTINCT timestamps\) [\#319](https://github.com/apache/datafusion/pull/319) ([charlibot](https://github.com/charlibot))
- add random SQL function [\#303](https://github.com/apache/datafusion/pull/303) ([Jimexist](https://github.com/Jimexist))
- allow datafusion cli to take -- comments [\#296](https://github.com/apache/datafusion/pull/296) ([Jimexist](https://github.com/Jimexist))
- Add json print format mode to datafusion cli [\#295](https://github.com/apache/datafusion/pull/295) ([Jimexist](https://github.com/Jimexist))
- Add print format param with support for tsv print format to datafusion cli [\#292](https://github.com/apache/datafusion/pull/292) ([Jimexist](https://github.com/Jimexist))
- Add print format param and support for csv print format to datafusion cli [\#289](https://github.com/apache/datafusion/pull/289) ([Jimexist](https://github.com/Jimexist))
- allow datafusion-cli to take a file param [\#285](https://github.com/apache/datafusion/pull/285) ([Jimexist](https://github.com/Jimexist))
- add param validation for datafusion-cli [\#284](https://github.com/apache/datafusion/pull/284) ([Jimexist](https://github.com/Jimexist))
- \[breaking change\] fix 265, log should be log10, and add ln [\#271](https://github.com/apache/datafusion/pull/271) ([Jimexist](https://github.com/Jimexist))
- Implement count distinct for dictionary arrays [\#256](https://github.com/apache/datafusion/pull/256) ([alamb](https://github.com/alamb))
- Count distinct floats [\#252](https://github.com/apache/datafusion/pull/252) ([pjmore](https://github.com/pjmore))
- Add rule to eliminate `LIMIT 0` and replace it with an `EmptyRelation` [\#213](https://github.com/apache/datafusion/pull/213) ([Dandandan](https://github.com/Dandandan))
- Allow table providers to indicate their type for catalog metadata [\#205](https://github.com/apache/datafusion/pull/205) ([returnString](https://github.com/returnString))
- Use arrow eq kernels in CaseWhen expression evaluation [\#52](https://github.com/apache/datafusion/pull/52) ([Dandandan](https://github.com/Dandandan))
- Re-export Arrow and Parquet crates from DataFusion [\#39](https://github.com/apache/datafusion/pull/39) ([returnString](https://github.com/returnString))
- \[DataFusion\] Optimize hash join inner workings, null handling fix [\#24](https://github.com/apache/datafusion/pull/24) ([Dandandan](https://github.com/Dandandan))
- \[ARROW-12441\] \[DataFusion\] Cross join implementation [\#11](https://github.com/apache/datafusion/pull/11) ([Dandandan](https://github.com/Dandandan))
**Fixed bugs:**
- Projection pushdown removes unqualified column names even when they are used [\#617](https://github.com/apache/datafusion/issues/617)
- Panic while running join datatypes/schema.rs:165:10 [\#601](https://github.com/apache/datafusion/issues/601)
- Indentation is incorrect for joins in formatted physical plans [\#345](https://github.com/apache/datafusion/issues/345)
- Error while running `COUNT DISTINCT (timestamp)`: 'Unexpected DataType for list [\#314](https://github.com/apache/datafusion/issues/314)
- When joining two tables, get Error: Plan\("Schema contains duplicate unqualified field name \'xxx\'"\) [\#311](https://github.com/apache/datafusion/issues/311)
- Incorrect answers with SELECT DISTINCT queries [\#250](https://github.com/apache/datafusion/issues/250)
- Intermitent failure in CI join_with_hash_collision [\#227](https://github.com/apache/datafusion/issues/227)
- `Concat` from Dataframe API no longer accepts multiple expressions [\#226](https://github.com/apache/datafusion/issues/226)
- Fix right, full join handling when having multiple non-matching rows at the left side [\#845](https://github.com/apache/datafusion/pull/845) ([Dandandan](https://github.com/Dandandan))
- Qualified field resolution too strict [\#810](https://github.com/apache/datafusion/pull/810) [[sql](https://github.com/apache/datafusion/labels/sql)] ([seddonm1](https://github.com/seddonm1))
- Better join order resolution logic [\#797](https://github.com/apache/datafusion/pull/797) [[sql](https://github.com/apache/datafusion/labels/sql)] ([seddonm1](https://github.com/seddonm1))
- Produce correct answers for Group BY NULL \(Option 1\) [\#793](https://github.com/apache/datafusion/pull/793) ([alamb](https://github.com/alamb))
- Use consistent version of string_to_timestamp_nanos in DataFusion [\#767](https://github.com/apache/datafusion/pull/767) ([alamb](https://github.com/alamb))
- \#723 limit pruning rule to simple expression [\#764](https://github.com/apache/datafusion/pull/764) ([lvheyang](https://github.com/lvheyang))
- \#699 fix return type conflict when calling builtin math fuctions [\#716](https://github.com/apache/datafusion/pull/716) ([lvheyang](https://github.com/lvheyang))
- Fix Date32 and Date64 parquet row group pruning [\#690](https://github.com/apache/datafusion/pull/690) ([alamb](https://github.com/alamb))
- Remove qualifiers on pushed down predicates / Fix parquet pruning [\#689](https://github.com/apache/datafusion/pull/689) ([alamb](https://github.com/alamb))
- use `Weak` ptr to break catalog list \<\> info schema cyclic reference [\#681](https://github.com/apache/datafusion/pull/681) ([crepererum](https://github.com/crepererum))
- honor table name for csv/parquet scan in ballista plan serde [\#629](https://github.com/apache/datafusion/pull/629) ([houqp](https://github.com/houqp))
- fix 621, where unnamed window functions shall be differentiated by partition and order by clause [\#622](https://github.com/apache/datafusion/pull/622) ([Jimexist](https://github.com/Jimexist))
- RFC: Do not prune out unnecessary columns with unqualified references [\#619](https://github.com/apache/datafusion/pull/619) ([alamb](https://github.com/alamb))
- \[fix\] select \* on empty table [\#613](https://github.com/apache/datafusion/pull/613) ([rdettai](https://github.com/rdettai))
- fix 592, support alias in window functions [\#607](https://github.com/apache/datafusion/pull/607) ([Jimexist](https://github.com/Jimexist))
- RepartitionExec should not error if output has hung up [\#576](https://github.com/apache/datafusion/pull/576) ([alamb](https://github.com/alamb))
- Fix pruning on not equal predicate [\#561](https://github.com/apache/datafusion/pull/561) ([alamb](https://github.com/alamb))
- hash float arrays using primitive usigned integer type [\#556](https://github.com/apache/datafusion/pull/556) ([houqp](https://github.com/houqp))
- Return errors properly from RepartitionExec [\#521](https://github.com/apache/datafusion/pull/521) ([alamb](https://github.com/alamb))
- refactor sort exec stream and combine batches [\#515](https://github.com/apache/datafusion/pull/515) ([Jimexist](https://github.com/Jimexist))
- Fix display of execution time in datafusion-cli [\#514](https://github.com/apache/datafusion/pull/514) ([Dandandan](https://github.com/Dandandan))
- Wrong aggregation arguments error. [\#505](https://github.com/apache/datafusion/pull/505) ([jgoday](https://github.com/jgoday))
- fix window aggregation with alias and add integration test case [\#454](https://github.com/apache/datafusion/pull/454) ([Jimexist](https://github.com/Jimexist))
- fix: don't duplicate existing filters [\#409](https://github.com/apache/datafusion/pull/409) ([e-dard](https://github.com/e-dard))
- Fixed incorrect logical type in GroupByScalar. [\#391](https://github.com/apache/datafusion/pull/391) ([jorgecarleitao](https://github.com/jorgecarleitao))
- Fix indented display for multi-child nodes [\#358](https://github.com/apache/datafusion/pull/358) ([alamb](https://github.com/alamb))
- Fix SQL planner to support multibyte column names [\#357](https://github.com/apache/datafusion/pull/357) ([agatan](https://github.com/agatan))
- Fix wrong projection 'optimization' [\#268](https://github.com/apache/datafusion/pull/268) ([Dandandan](https://github.com/Dandandan))
- Fix Left join implementation is incorrect for 0 or multiple batches on the right side [\#238](https://github.com/apache/datafusion/pull/238) ([Dandandan](https://github.com/Dandandan))
- Count distinct boolean [\#230](https://github.com/apache/datafusion/pull/230) ([pjmore](https://github.com/pjmore))
- Fix Filter / where clause without column names is removed in optimization pass [\#225](https://github.com/apache/datafusion/pull/225) ([Dandandan](https://github.com/Dandandan))
**Documentation updates:**
- No way to get to the examples from docs.rs [\#186](https://github.com/apache/datafusion/issues/186)
- Update docs to use vendored version of arrow [\#772](https://github.com/apache/datafusion/pull/772) ([alamb](https://github.com/alamb))
- Fix typo in DEVELOPERS.md [\#692](https://github.com/apache/datafusion/pull/692) ([lvheyang](https://github.com/lvheyang))
- update stale documentations related to window functions [\#598](https://github.com/apache/datafusion/pull/598) ([Jimexist](https://github.com/Jimexist))
- update readme to reflect work on window functions [\#471](https://github.com/apache/datafusion/pull/471) ([Jimexist](https://github.com/Jimexist))
- Add examples section to datafusion crate doc [\#457](https://github.com/apache/datafusion/pull/457) ([mluts](https://github.com/mluts))
- add invariants spec [\#443](https://github.com/apache/datafusion/pull/443) ([houqp](https://github.com/houqp))
- add output field name rfc [\#422](https://github.com/apache/datafusion/pull/422) ([houqp](https://github.com/houqp))
- Update more docs and also the developer.md doc [\#414](https://github.com/apache/datafusion/pull/414) ([Jimexist](https://github.com/Jimexist))
- use prettier to format md files [\#367](https://github.com/apache/datafusion/pull/367) ([Jimexist](https://github.com/Jimexist))
- Add new logo svg with white background [\#313](https://github.com/apache/datafusion/pull/313) ([parthsarthy](https://github.com/parthsarthy))
- Add projects \(Squirtle and Tensorbase\) to list in readme [\#312](https://github.com/apache/datafusion/pull/312) ([parthsarthy](https://github.com/parthsarthy))
- docs - fix the ballista link [\#274](https://github.com/apache/datafusion/pull/274) ([haoxins](https://github.com/haoxins))
- misc\(README\): Replace Cube.js with Cube Store [\#248](https://github.com/apache/datafusion/pull/248) ([ovr](https://github.com/ovr))
- Initial docs for SQL syntax [\#242](https://github.com/apache/datafusion/pull/242) ([Dandandan](https://github.com/Dandandan))
- Deduplicate README.md [\#79](https://github.com/apache/datafusion/pull/79) ([msathis](https://github.com/msathis))
**Performance improvements:**
- Speed up inlist for strings and primitives [\#813](https://github.com/apache/datafusion/pull/813) ([Dandandan](https://github.com/Dandandan))
- perf: improve performance of `SortPreservingMergeExec` operator [\#722](https://github.com/apache/datafusion/pull/722) ([e-dard](https://github.com/e-dard))
- Optimize min/max queries with table statistics [\#719](https://github.com/apache/datafusion/pull/719) ([b41sh](https://github.com/b41sh))
- perf: Improve materialisation performance of SortPreservingMergeExec [\#691](https://github.com/apache/datafusion/pull/691) ([e-dard](https://github.com/e-dard))
- Optimize count\(\*\) with table statistics [\#620](https://github.com/apache/datafusion/pull/620) ([Dandandan](https://github.com/Dandandan))
- optimize window function's `find_ranges_in_range` [\#595](https://github.com/apache/datafusion/pull/595) ([Jimexist](https://github.com/Jimexist))
- Collapse sort into window expr and do sort within logical phase [\#571](https://github.com/apache/datafusion/pull/571) ([Jimexist](https://github.com/Jimexist))
- Use repartition in window functions to speed up [\#569](https://github.com/apache/datafusion/pull/569) ([Jimexist](https://github.com/Jimexist))
- Constant fold / optimize `to_timestamp` function during planning [\#387](https://github.com/apache/datafusion/pull/387) ([msathis](https://github.com/msathis))
- Speed up `create_batch_from_map` [\#339](https://github.com/apache/datafusion/pull/339) ([Dandandan](https://github.com/Dandandan))
- Simplify math expression code \(use unary kernel\) [\#309](https://github.com/apache/datafusion/pull/309) ([Dandandan](https://github.com/Dandandan))
**Closed issues:**
- Confirm git tagging strategy for releases [\#770](https://github.com/apache/datafusion/issues/770)
- arrow::util::pretty::pretty_format_batches missing [\#769](https://github.com/apache/datafusion/issues/769)
- move the `assert_batches_eq!` macros to a non part of datafusion [\#745](https://github.com/apache/datafusion/issues/745)
- fix an issue where aliases are not respected in generating downstream schemas in window expr [\#592](https://github.com/apache/datafusion/issues/592)
- make the planner to print more succinct and useful information in window function explain clause [\#526](https://github.com/apache/datafusion/issues/526)
- move window frame module to be in `logical_plan` [\#517](https://github.com/apache/datafusion/issues/517)
- use a more rust idiomatic way of handling nth_value [\#448](https://github.com/apache/datafusion/issues/448)
- create a test with more than one partition for window functions [\#435](https://github.com/apache/datafusion/issues/435)
- COUNT DISTINCT does not support for `Boolean` [\#202](https://github.com/apache/datafusion/issues/202)
- Read CSV format text from stdin or memory [\#198](https://github.com/apache/datafusion/issues/198)
- Fix null handling hash join [\#195](https://github.com/apache/datafusion/issues/195)
- Allow TableProviders to indicate their type for the information schema [\#191](https://github.com/apache/datafusion/issues/191)
- Make DataFrame extensible [\#190](https://github.com/apache/datafusion/issues/190)
- TPC-H Query 19 [\#170](https://github.com/apache/datafusion/issues/170)
- TPC-H Query 7 [\#161](https://github.com/apache/datafusion/issues/161)
- Upgrade hashbrown to 0.10 [\#151](https://github.com/apache/datafusion/issues/151)
- Implement vectorized hashing for hash aggregate [\#149](https://github.com/apache/datafusion/issues/149)
- More efficient LEFT join implementation [\#143](https://github.com/apache/datafusion/issues/143)
- Implement vectorized hashing [\#142](https://github.com/apache/datafusion/issues/142)
- RFC Roadmap for 2021 \(DataFusion\) [\#140](https://github.com/apache/datafusion/issues/140)
- Implement hash partitioning [\#131](https://github.com/apache/datafusion/issues/131)
- Grouping by column position [\#110](https://github.com/apache/datafusion/issues/110)
- \[Datafusion\] GROUP BY with a high cardinality doesn't seem to finish [\#107](https://github.com/apache/datafusion/issues/107)
- \[Rust\] Add support for JSON data sources [\#103](https://github.com/apache/datafusion/issues/103)
- \[Rust\] Implement metrics framework [\#95](https://github.com/apache/datafusion/issues/95)
- Publically export Arrow crate from datafusion [\#36](https://github.com/apache/datafusion/issues/36)
- Implement hash-partitioned hash aggregate [\#27](https://github.com/apache/datafusion/issues/27)
- Consider using GitHub pages for DataFusion/Ballista documentation [\#18](https://github.com/apache/datafusion/issues/18)
- Update "repository" in Cargo.toml [\#16](https://github.com/apache/datafusion/issues/16)
**Merged pull requests:**
- Use `RawTable` API in hash join [\#827](https://github.com/apache/datafusion/pull/827) ([Dandandan](https://github.com/Dandandan))
- Add test for window functions on dictionary [\#823](https://github.com/apache/datafusion/pull/823) ([alamb](https://github.com/alamb))
- Update dependencies: prost to 0.8 and tonic to 0.5 [\#818](https://github.com/apache/datafusion/pull/818) ([alamb](https://github.com/alamb))
- Move `hash_array` into hash_utils.rs [\#807](https://github.com/apache/datafusion/pull/807) ([alamb](https://github.com/alamb))
- Remove GroupByScalar and use ScalarValue in preparation for supporting null values in GroupBy [\#786](https://github.com/apache/datafusion/pull/786) ([alamb](https://github.com/alamb))
- fix 226, make `concat`, `concat_ws`, and `random` work with `Python` crate [\#761](https://github.com/apache/datafusion/pull/761) ([Jimexist](https://github.com/Jimexist))
- Test for parquet pruning disabling [\#754](https://github.com/apache/datafusion/pull/754) ([alamb](https://github.com/alamb))
- Add explain verbose with limit push down [\#751](https://github.com/apache/datafusion/pull/751) ([Jimexist](https://github.com/Jimexist))
- Move assert_batches_eq! macros to test_utils.rs [\#746](https://github.com/apache/datafusion/pull/746) ([alamb](https://github.com/alamb))
- Show optimized physical and logical plans in EXPLAIN [\#744](https://github.com/apache/datafusion/pull/744) ([alamb](https://github.com/alamb))
- update `python` crate to support latest pyo3 syntax and gil sematics [\#741](https://github.com/apache/datafusion/pull/741) ([Jimexist](https://github.com/Jimexist))
- update `python` crate dependencies [\#740](https://github.com/apache/datafusion/pull/740) ([Jimexist](https://github.com/Jimexist))
- provide more details on required .parquet file extension error message [\#729](https://github.com/apache/datafusion/pull/729) ([Jimexist](https://github.com/Jimexist))
- split up windows functions into a dedicated module with separate files [\#724](https://github.com/apache/datafusion/pull/724) ([Jimexist](https://github.com/Jimexist))
- Use pytest in integration test [\#715](https://github.com/apache/datafusion/pull/715) ([Jimexist](https://github.com/Jimexist))
- replace once iter chain with array::IntoIter [\#704](https://github.com/apache/datafusion/pull/704) ([houqp](https://github.com/houqp))
- avoid iterator materialization in column index lookup [\#703](https://github.com/apache/datafusion/pull/703) ([houqp](https://github.com/houqp))
- Fix build with 1.52.1 [\#696](https://github.com/apache/datafusion/pull/696) ([alamb](https://github.com/alamb))
- Fix test output due to logical merge conflict [\#694](https://github.com/apache/datafusion/pull/694) ([alamb](https://github.com/alamb))
- add more integration tests [\#668](https://github.com/apache/datafusion/pull/668) ([Jimexist](https://github.com/Jimexist))
- Bump arrow and parquet versions to 4.4 [\#654](https://github.com/apache/datafusion/pull/654) ([toddtreece](https://github.com/toddtreece))
- Add query 15 to TPC-H queries [\#645](https://github.com/apache/datafusion/pull/645) ([Dandandan](https://github.com/Dandandan))
- Improve error message and comments [\#641](https://github.com/apache/datafusion/pull/641) ([alamb](https://github.com/alamb))
- add integration tests for rank, dense_rank, fix last_value evaluation with rank [\#638](https://github.com/apache/datafusion/pull/638) ([Jimexist](https://github.com/Jimexist))
- round trip TPCH queries in tests [\#630](https://github.com/apache/datafusion/pull/630) ([houqp](https://github.com/houqp))
- use Into\<String\> as argument type wherever applicable [\#615](https://github.com/apache/datafusion/pull/615) ([houqp](https://github.com/houqp))
- reuse alias map in aggregate logical planning and refactor position resolution [\#606](https://github.com/apache/datafusion/pull/606) ([Jimexist](https://github.com/Jimexist))
- fix clippy warnings [\#581](https://github.com/apache/datafusion/pull/581) ([Jimexist](https://github.com/Jimexist))
- Add benchmarks to window function queries [\#564](https://github.com/apache/datafusion/pull/564) ([Jimexist](https://github.com/Jimexist))
- reuse code for now function expr creation [\#548](https://github.com/apache/datafusion/pull/548) ([houqp](https://github.com/houqp))
- turn on clippy rule for needless borrow [\#545](https://github.com/apache/datafusion/pull/545) ([Jimexist](https://github.com/Jimexist))
- Refactor hash aggregates's planner building code [\#539](https://github.com/apache/datafusion/pull/539) ([Jimexist](https://github.com/Jimexist))
- Cleanup Repartition Exec code [\#538](https://github.com/apache/datafusion/pull/538) ([alamb](https://github.com/alamb))
- reuse datafusion physical planner in ballista building from protobuf [\#532](https://github.com/apache/datafusion/pull/532) ([Jimexist](https://github.com/Jimexist))
- remove redundant `into_iter()` calls [\#527](https://github.com/apache/datafusion/pull/527) ([Jimexist](https://github.com/Jimexist))
- Fix 517 - move `window_frames` module to `logical_plan` [\#518](https://github.com/apache/datafusion/pull/518) ([Jimexist](https://github.com/Jimexist))
- Refactor window aggregation, simplify batch processing logic [\#516](https://github.com/apache/datafusion/pull/516) ([Jimexist](https://github.com/Jimexist))
- Add datafusion::test_util, resolve test data paths without env vars [\#498](https://github.com/apache/datafusion/pull/498) ([mluts](https://github.com/mluts))
- Avoid warnings in tests when compiling without default features [\#489](https://github.com/apache/datafusion/pull/489) ([alamb](https://github.com/alamb))
- update cargo.toml in python crate and fix unit test due to hash joins [\#483](https://github.com/apache/datafusion/pull/483) ([Jimexist](https://github.com/Jimexist))
- use prettier check in CI [\#453](https://github.com/apache/datafusion/pull/453) ([Jimexist](https://github.com/Jimexist))
- Optimize `nth_value`, remove `first_value`, `last_value` structs and use idiomatic rust style [\#452](https://github.com/apache/datafusion/pull/452) ([Jimexist](https://github.com/Jimexist))
- Fixed typo / logical merge conflict [\#433](https://github.com/apache/datafusion/pull/433) ([jorgecarleitao](https://github.com/jorgecarleitao))
- include test data and add aggregation tests in integration test [\#425](https://github.com/apache/datafusion/pull/425) ([Jimexist](https://github.com/Jimexist))
- Add some padding around the logo [\#411](https://github.com/apache/datafusion/pull/411) ([parthsarthy](https://github.com/parthsarthy))
- Benchmark subcommand to distinguish between DataFusion and Ballista [\#402](https://github.com/apache/datafusion/pull/402) ([jgoday](https://github.com/jgoday))
- refactor datafusion/`scalar_value` to use more macro and avoid dup code [\#392](https://github.com/apache/datafusion/pull/392) ([Jimexist](https://github.com/Jimexist))
- Update TPC-H benchmark to show physical plan when debug mode is enabled [\#386](https://github.com/apache/datafusion/pull/386) ([andygrove](https://github.com/andygrove))
- Update arrow dependencies again [\#341](https://github.com/apache/datafusion/pull/341) ([alamb](https://github.com/alamb))
- Update arrow-rs deps [\#317](https://github.com/apache/datafusion/pull/317) ([alamb](https://github.com/alamb))
- Update PR template by commenting out instructions [\#315](https://github.com/apache/datafusion/pull/315) ([alamb](https://github.com/alamb))
- fix clippy warning [\#286](https://github.com/apache/datafusion/pull/286) ([Jimexist](https://github.com/Jimexist))
- add integration test to compare datafusion-cli against psql [\#281](https://github.com/apache/datafusion/pull/281) ([Jimexist](https://github.com/Jimexist))
- Update arrow deps [\#269](https://github.com/apache/datafusion/pull/269) ([alamb](https://github.com/alamb))
- Use multi-stage build dockerfile in datafusion-cli and reduce image size from 2.16GB to 89.9MB [\#266](https://github.com/apache/datafusion/pull/266) ([Jimexist](https://github.com/Jimexist))
- Enable redundant_field_names clippy lint [\#261](https://github.com/apache/datafusion/pull/261) ([Dandandan](https://github.com/Dandandan))
- fix clippy lint [\#259](https://github.com/apache/datafusion/pull/259) ([alamb](https://github.com/alamb))
- Move datafusion-cli to new crate [\#231](https://github.com/apache/datafusion/pull/231) ([Dandandan](https://github.com/Dandandan))
- Make test join_with_hash_collision deterministic [\#229](https://github.com/apache/datafusion/pull/229) ([Dandandan](https://github.com/Dandandan))
- Update arrow-rs deps \(to fix build due to flatbuffers update\) [\#224](https://github.com/apache/datafusion/pull/224) ([alamb](https://github.com/alamb))
- Use standard make_null_array for CASE [\#223](https://github.com/apache/datafusion/pull/223) ([alamb](https://github.com/alamb))
- update arrow-rs deps to latest master [\#216](https://github.com/apache/datafusion/pull/216) ([alamb](https://github.com/alamb))
- MINOR: Remove empty rust dir [\#61](https://github.com/apache/datafusion/pull/61) ([andygrove](https://github.com/andygrove))