blob: bc77b14cad0a2b464e7c2831242912bb45d1b8dc [file] [log] [blame] [view]
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## [10.0.0](https://github.com/apache/datafusion/tree/10.0.0) (2022-07-12)
[Full Changelog](https://github.com/apache/datafusion/compare/9.0.0...10.0.0)
**Breaking changes:**
- Convert batch_size to config option [\#2771](https://github.com/apache/datafusion/pull/2771) ([andygrove](https://github.com/andygrove))
- MINOR: Remove Offset struct [\#2734](https://github.com/apache/datafusion/pull/2734) ([andygrove](https://github.com/andygrove))
- feat: async extension planner [\#2713](https://github.com/apache/datafusion/pull/2713) ([waynexia](https://github.com/waynexia))
- Switch to object_store crate \(\#2489\) [\#2677](https://github.com/apache/datafusion/pull/2677) ([tustvold](https://github.com/tustvold))
**Implemented enhancements:**
- update documentation, fix styling to match main Arrow project [\#2864](https://github.com/apache/datafusion/issues/2864)
- Update top-level README [\#2850](https://github.com/apache/datafusion/issues/2850)
- \[Question\]How to call an async function in `ExecutionPlan::exec` method? [\#2847](https://github.com/apache/datafusion/issues/2847)
- Add `DataFrame::with_column` [\#2844](https://github.com/apache/datafusion/issues/2844)
- Improve ergonomics of physical expr `lit` [\#2827](https://github.com/apache/datafusion/issues/2827)
- Add Python examples for reading CSV and query by SQL in Doc [\#2824](https://github.com/apache/datafusion/issues/2824)
- eliminate multi limit-offset nodes to EmptyRelation if possible [\#2822](https://github.com/apache/datafusion/issues/2822)
- Make `LogicalPlan::Union` be consistent with other plans [\#2816](https://github.com/apache/datafusion/issues/2816)
- Use coerced data type from value and list expressions during planning inlist expression [\#2793](https://github.com/apache/datafusion/issues/2793)
- Add configuration option to enable/disalbe `CoalesceBatchesExec` [\#2790](https://github.com/apache/datafusion/issues/2790)
- Simplify FilterNullJoinKeys rule [\#2780](https://github.com/apache/datafusion/issues/2780)
- Allow configuration settings to be specified with environment variables [\#2776](https://github.com/apache/datafusion/issues/2776)
- Automatically update `configs.md` in user guide [\#2770](https://github.com/apache/datafusion/issues/2770)
- Support multiple paths for ListingTableScanNode [\#2768](https://github.com/apache/datafusion/issues/2768)
- Reduce outer joins [\#2757](https://github.com/apache/datafusion/issues/2757)
- support data type coerced and decimal in INLIST expr [\#2755](https://github.com/apache/datafusion/issues/2755)
- Change ExtensionPlanner::plan_extension\(\) to an async function [\#2749](https://github.com/apache/datafusion/issues/2749)
- Add `IsNotNull` filter to join inputs if one side of join condition does not allow null [\#2739](https://github.com/apache/datafusion/issues/2739)
- Sort preserving MergeJoin [\#2698](https://github.com/apache/datafusion/issues/2698)
- Improve readability of table scan projections in query plans [\#2697](https://github.com/apache/datafusion/issues/2697)
- DataFusion 9.0.0 Release [\#2676](https://github.com/apache/datafusion/issues/2676)
- Improve UX for `UNION` vs `UNION ALL` \(introduce a LogicalPlan::Distinct\) [\#2573](https://github.com/apache/datafusion/issues/2573) [[sql](https://github.com/apache/datafusion/labels/sql)]
- Implement some way to show the sql used to create a view [\#2529](https://github.com/apache/datafusion/issues/2529)
- Consider adopting IOx ObjectStore abstraction [\#2489](https://github.com/apache/datafusion/issues/2489)
- Support `sum0` as a built-in agg function [\#2067](https://github.com/apache/datafusion/issues/2067)
- implement grouping sets, cubes, and rollups [\#1327](https://github.com/apache/datafusion/issues/1327)
- Ruby bindings [\#1114](https://github.com/apache/datafusion/issues/1114)
- Support dates in hash join [\#2746](https://github.com/apache/datafusion/pull/2746) ([andygrove](https://github.com/andygrove))
**Fixed bugs:**
- Docker Error [\#2851](https://github.com/apache/datafusion/issues/2851)
- Anti join ignores join filters [\#2842](https://github.com/apache/datafusion/issues/2842)
- Can't test or compile sub-model code after upgrade to arrow-rs 17.0.0 [\#2835](https://github.com/apache/datafusion/issues/2835)
- Not evaluate the set expr in the InList for the optimization [\#2820](https://github.com/apache/datafusion/issues/2820)
- CASE When: result type should be coercible to a common type [\#2818](https://github.com/apache/datafusion/issues/2818)
- IN/NOT IN List: NULL is not equal to NULL [\#2817](https://github.com/apache/datafusion/issues/2817)
- panic when case statement returns null [\#2798](https://github.com/apache/datafusion/issues/2798)
- InList: Can't cast the list expr data type to value expr data type directly [\#2774](https://github.com/apache/datafusion/issues/2774)
- InList Expr: expr and list values must can be converted to a same data type [\#2759](https://github.com/apache/datafusion/issues/2759)
- tpchgen docker syntax change prevents volume from binding [\#2751](https://github.com/apache/datafusion/issues/2751)
- Cannot join on date columns \(Unsupported data type in hasher: Date32\) [\#2744](https://github.com/apache/datafusion/issues/2744)
- `rewrite_expression` does not properly handle `Exists` and `ScalarSubquery` [\#2736](https://github.com/apache/datafusion/issues/2736)
- LocalFileSystem Not sorted by file name, As a result, the data lines queried in multiple files are out of order. [\#2730](https://github.com/apache/datafusion/issues/2730)
- Filter push down need consider alias columns [\#2725](https://github.com/apache/datafusion/issues/2725)
- Recent API change in `GlobalLimitExec` breaks compatibility with Ballista [\#2720](https://github.com/apache/datafusion/issues/2720)
- Common Subexpression Eliminiation pass errors if run twice on some plans: Schema contains duplicate unqualified field name 'IsNull-Column-sys.host' [\#2712](https://github.com/apache/datafusion/issues/2712)
- The data type is not compatible with other system, for example spark or PG database [\#1379](https://github.com/apache/datafusion/issues/1379)
**Documentation updates:**
- Fix docs styling [\#2865](https://github.com/apache/datafusion/pull/2865) ([kmitchener](https://github.com/kmitchener))
- Various updates to top-level README [\#2854](https://github.com/apache/datafusion/pull/2854) ([andygrove](https://github.com/andygrove))
- MINOR: Add documentation for running integration tests [\#2839](https://github.com/apache/datafusion/pull/2839) ([andygrove](https://github.com/andygrove))
- add csv registration and sql query to examples [\#2825](https://github.com/apache/datafusion/pull/2825) ([waitingkuo](https://github.com/waitingkuo))
- \[minor\] refine doc [\#2753](https://github.com/apache/datafusion/pull/2753) ([Ted-Jiang](https://github.com/Ted-Jiang))
**Closed issues:**
- Consider adding a prominent note in the readme about ballista [\#2853](https://github.com/apache/datafusion/issues/2853)
- support decimal in \(NULL\) [\#2800](https://github.com/apache/datafusion/issues/2800)
- InList: Don't treat Null as UTF8\(None\) [\#2782](https://github.com/apache/datafusion/issues/2782)
- InList: don't need to treat Null as UTF8 data type [\#2773](https://github.com/apache/datafusion/issues/2773)
- Implement extensible configuration mechanism [\#138](https://github.com/apache/datafusion/issues/138)
**Merged pull requests:**
- Update CONTRIBUTING.md [\#2876](https://github.com/apache/datafusion/pull/2876) ([waitingkuo](https://github.com/waitingkuo))
- Make LogicalPlan::Union be consistent with other plans [\#2868](https://github.com/apache/datafusion/pull/2868) ([comphead](https://github.com/comphead))
- minor: remove unneeded files from project root [\#2863](https://github.com/apache/datafusion/pull/2863) ([kmitchener](https://github.com/kmitchener))
- chore: make cargo clippy happy in nigtly [\#2860](https://github.com/apache/datafusion/pull/2860) [[sql](https://github.com/apache/datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
- Update to arrow 18.0.0 [\#2856](https://github.com/apache/datafusion/pull/2856) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- chore: remove ballista-related docker-compose file [\#2852](https://github.com/apache/datafusion/pull/2852) ([xudong963](https://github.com/xudong963))
- Adding dataframe with_column function [\#2849](https://github.com/apache/datafusion/pull/2849) ([comphead](https://github.com/comphead))
- anti joins now respect join filters [\#2843](https://github.com/apache/datafusion/pull/2843) ([andygrove](https://github.com/andygrove))
- MINOR: make name meaningful and clean up code [\#2841](https://github.com/apache/datafusion/pull/2841) ([liukun4515](https://github.com/liukun4515))
- Make `lit` implementation more concise [\#2838](https://github.com/apache/datafusion/pull/2838) ([alamb](https://github.com/alamb))
- InList: set/list value must be evaluated to get the values [\#2834](https://github.com/apache/datafusion/pull/2834) ([liukun4515](https://github.com/liukun4515))
- Add SHOW CREATE TABLE with initial support for views [\#2830](https://github.com/apache/datafusion/pull/2830) [[sql](https://github.com/apache/datafusion/labels/sql)] ([mrob95](https://github.com/mrob95))
- Improve ergonomics of physical expr `lit` [\#2828](https://github.com/apache/datafusion/pull/2828) ([alamb](https://github.com/alamb))
- Eliminate multi limit-offset nodes to emptyRelation [\#2823](https://github.com/apache/datafusion/pull/2823) ([AssHero](https://github.com/AssHero))
- Fix the ci [\#2821](https://github.com/apache/datafusion/pull/2821) ([liukun4515](https://github.com/liukun4515))
- CaseWhen: coerce the all then and else data type to a common data type [\#2819](https://github.com/apache/datafusion/pull/2819) ([liukun4515](https://github.com/liukun4515))
- Fix `ScalarValue::isNull` calculation [\#2815](https://github.com/apache/datafusion/pull/2815) ([alamb](https://github.com/alamb))
- Fix nullability calculation for `CASE` expressions [\#2814](https://github.com/apache/datafusion/pull/2814) ([alamb](https://github.com/alamb))
- Bump numpy from 1.21.3 to 1.22.0 in /integration-tests [\#2811](https://github.com/apache/datafusion/pull/2811) ([xudong963](https://github.com/xudong963))
- Fix data type calculation for `CaseExpr` s with `NULLs` [\#2810](https://github.com/apache/datafusion/pull/2810) ([AssHero](https://github.com/AssHero))
- InList: fix bug for comparing with Null in the list using the set optimization [\#2809](https://github.com/apache/datafusion/pull/2809) ([liukun4515](https://github.com/liukun4515))
- Use specialized dictionary kernels \(\#1178\) [\#2808](https://github.com/apache/datafusion/pull/2808) ([tustvold](https://github.com/tustvold))
- fix schema nullability for `information_schema` schema [\#2804](https://github.com/apache/datafusion/pull/2804) ([alamb](https://github.com/alamb))
- fix: correctly calculate join output schema nullability [\#2803](https://github.com/apache/datafusion/pull/2803) ([alamb](https://github.com/alamb))
- Correct schema nullability declaration in tests [\#2802](https://github.com/apache/datafusion/pull/2802) ([alamb](https://github.com/alamb))
- Don't treat Null as UTF8\(None\) and change error info. [\#2801](https://github.com/apache/datafusion/pull/2801) ([liukun4515](https://github.com/liukun4515))
- MINOR: Remove reference to docker image that is no longer available [\#2795](https://github.com/apache/datafusion/pull/2795) ([andygrove](https://github.com/andygrove))
- Use coerced type in inlist expr planning [\#2794](https://github.com/apache/datafusion/pull/2794) ([viirya](https://github.com/viirya))
- Add LogicalPlan::Distinct [\#2792](https://github.com/apache/datafusion/pull/2792) [[sql](https://github.com/apache/datafusion/labels/sql)] ([mrob95](https://github.com/mrob95))
- Add config option for coalesce_batches physical optimization rule, make optional [\#2791](https://github.com/apache/datafusion/pull/2791) ([andygrove](https://github.com/andygrove))
- Improve readability of table scan projections in query plans \(remove `Some` and `None`\) [\#2789](https://github.com/apache/datafusion/pull/2789) [[sql](https://github.com/apache/datafusion/labels/sql)] ([comphead](https://github.com/comphead))
- Simplify FilterNullJoinKeys rule [\#2781](https://github.com/apache/datafusion/pull/2781) ([andygrove](https://github.com/andygrove))
- MINOR: re-export sqlparser from datafusion-sql crate [\#2779](https://github.com/apache/datafusion/pull/2779) [[sql](https://github.com/apache/datafusion/labels/sql)] ([andygrove](https://github.com/andygrove))
- Update to arrow 17.0.0 [\#2778](https://github.com/apache/datafusion/pull/2778) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- Support multiple paths for ListingTableScanNode [\#2775](https://github.com/apache/datafusion/pull/2775) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Remove expr_sub_expressions and rewrite_expression functions [\#2772](https://github.com/apache/datafusion/pull/2772) ([mrob95](https://github.com/mrob95))
- minor: update cranelift related dependencies [\#2769](https://github.com/apache/datafusion/pull/2769) ([xudong963](https://github.com/xudong963))
- minor: panic rather than fail silently on bad dictionary in hash join [\#2767](https://github.com/apache/datafusion/pull/2767) ([alamb](https://github.com/alamb))
- MINOR: make `prettier` use consistent between CI and contributing guide [\#2766](https://github.com/apache/datafusion/pull/2766) ([andygrove](https://github.com/andygrove))
- Rewrite subexpressions of InSubquery in rewrite_expression [\#2765](https://github.com/apache/datafusion/pull/2765) ([mrob95](https://github.com/mrob95))
- Support `DataType::Decimal` for `IN` and `NOT IN` expressions [\#2764](https://github.com/apache/datafusion/pull/2764) ([liukun4515](https://github.com/liukun4515))
- Implement extensible configuration mechanism [\#2754](https://github.com/apache/datafusion/pull/2754) ([andygrove](https://github.com/andygrove))
- Remove redundant docker argument [\#2752](https://github.com/apache/datafusion/pull/2752) ([avantgardnerio](https://github.com/avantgardnerio))
- Add optimizer pass to reduce `left`/`right`/`full` joins to `inner` join if possible [\#2750](https://github.com/apache/datafusion/pull/2750) [[sql](https://github.com/apache/datafusion/labels/sql)] ([AssHero](https://github.com/AssHero))
- MINOR: Remove legacy CLI context enum [\#2748](https://github.com/apache/datafusion/pull/2748) ([andygrove](https://github.com/andygrove))
- CSE unit test for duplicate fields [\#2747](https://github.com/apache/datafusion/pull/2747) ([waynexia](https://github.com/waynexia))
- MINOR: Improve unsupported data type error message [\#2745](https://github.com/apache/datafusion/pull/2745) ([andygrove](https://github.com/andygrove))
- Add optimizer rule to filter out null keys before a join [\#2740](https://github.com/apache/datafusion/pull/2740) ([andygrove](https://github.com/andygrove))
- Sort file names in a directory \#2730 [\#2735](https://github.com/apache/datafusion/pull/2735) ([yourenawo](https://github.com/yourenawo))
- fix: filter push down with `InList` expressions [\#2729](https://github.com/apache/datafusion/pull/2729) ([Ted-Jiang](https://github.com/Ted-Jiang))
- \[Minor\] add debug info in optimizer.rs [\#2726](https://github.com/apache/datafusion/pull/2726) ([Ted-Jiang](https://github.com/Ted-Jiang))
- Add public API for GlobalLimitExec and LocalLimitExec [\#2722](https://github.com/apache/datafusion/pull/2722) ([andygrove](https://github.com/andygrove))
- Add additional data types are supported in hash join [\#2721](https://github.com/apache/datafusion/pull/2721) ([AssHero](https://github.com/AssHero))
- Upgrade to arrow `16.0.0` [\#2718](https://github.com/apache/datafusion/pull/2718) [[sql](https://github.com/apache/datafusion/labels/sql)] ([alamb](https://github.com/alamb))
- Fix clippy warnings with toolchain 1.63 [\#2717](https://github.com/apache/datafusion/pull/2717) [[sql](https://github.com/apache/datafusion/labels/sql)] ([waynexia](https://github.com/waynexia))
- Support for GROUPING SETS/CUBE/ROLLUP [\#2716](https://github.com/apache/datafusion/pull/2716) ([thinkharderdev](https://github.com/thinkharderdev))
- fix: check redundant fields while building projection plan [\#2715](https://github.com/apache/datafusion/pull/2715) ([waynexia](https://github.com/waynexia))
- Sort preserving `SortMergeJoin` [\#2699](https://github.com/apache/datafusion/pull/2699) ([korowa](https://github.com/korowa))
- fix: union schema fix [\#2688](https://github.com/apache/datafusion/pull/2688) [[sql](https://github.com/apache/datafusion/labels/sql)] ([gandronchik](https://github.com/gandronchik))
- Support default precision and scale to`CAST <EXPR> AS DECIMAL` [\#2680](https://github.com/apache/datafusion/pull/2680) [[sql](https://github.com/apache/datafusion/labels/sql)] ([gandronchik](https://github.com/gandronchik))