create datafusion 6.0.0, ballista 0.6.0 and python 0.4.0 releases (#1253)
* create datafusion 6.0.0, ballista 0.6.0 and python 0.4.0 releases
diff --git a/.github_changelog_generator b/.github_changelog_generator
index 6ee6508..45eef2f 100644
--- a/.github_changelog_generator
+++ b/.github_changelog_generator
@@ -18,8 +18,6 @@
# under the License.
#
-# point to the old changelog in apache/arrow
-front-matter=For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)\n
# some issues are just documentation
add-sections={"documentation":{"prefix":"**Documentation updates:**","labels":["documentation"]},"performance":{"prefix":"**Performance improvements:**","labels":["performance"]}}
# uncomment to not show PRs. TBD if we shown them or not.
diff --git a/README.md b/README.md
index 5ca0802..1d2db92 100644
--- a/README.md
+++ b/README.md
@@ -129,7 +129,7 @@
```toml
[dependencies]
-datafusion = "5.0.0"
+datafusion = "6.0.0"
```
## Using DataFusion as a binary
diff --git a/ballista-examples/Cargo.toml b/ballista-examples/Cargo.toml
index e6c15e0..6b99f9b 100644
--- a/ballista-examples/Cargo.toml
+++ b/ballista-examples/Cargo.toml
@@ -31,7 +31,7 @@
[dependencies]
arrow-flight = { version = "6.1.0" }
datafusion = { path = "../datafusion" }
-ballista = { path = "../ballista/rust/client" }
+ballista = { path = "../ballista/rust/client", version = "0.6.0"}
prost = "0.8"
tonic = "0.5"
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] }
diff --git a/ballista/CHANGELOG.md b/ballista/CHANGELOG.md
index 287229b..b8268fc 100644
--- a/ballista/CHANGELOG.md
+++ b/ballista/CHANGELOG.md
@@ -17,10 +17,96 @@
under the License.
-->
-For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
-
# Changelog
+## [ballista-0.6.0](https://github.com/apache/arrow-datafusion/tree/ballista-0.6.0) (2021-11-13)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/ballista-0.5.0...ballista-0.6.0)
+
+**Breaking changes:**
+
+- File partitioning for ListingTable [\#1141](https://github.com/apache/arrow-datafusion/pull/1141) ([rdettai](https://github.com/rdettai))
+- Register tables in BallistaContext using TableProviders instead of Dataframe [\#1028](https://github.com/apache/arrow-datafusion/pull/1028) ([rdettai](https://github.com/rdettai))
+- Make TableProvider.scan\(\) and PhysicalPlanner::create\_physical\_plan\(\) async [\#1013](https://github.com/apache/arrow-datafusion/pull/1013) ([rdettai](https://github.com/rdettai))
+- Reorganize table providers by table format [\#1010](https://github.com/apache/arrow-datafusion/pull/1010) ([rdettai](https://github.com/rdettai))
+- Move CBOs and Statistics to physical plan [\#965](https://github.com/apache/arrow-datafusion/pull/965) ([rdettai](https://github.com/rdettai))
+- Update to sqlparser v 0.10.0 [\#934](https://github.com/apache/arrow-datafusion/pull/934) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- FilePartition and PartitionedFile for scanning flexibility [\#932](https://github.com/apache/arrow-datafusion/pull/932) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([yjshen](https://github.com/yjshen))
+- Improve SQLMetric APIs, port existing metrics [\#908](https://github.com/apache/arrow-datafusion/pull/908) ([alamb](https://github.com/alamb))
+- Add support for EXPLAIN ANALYZE [\#858](https://github.com/apache/arrow-datafusion/pull/858) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- Rename concurrency to target\_partitions [\#706](https://github.com/apache/arrow-datafusion/pull/706) ([andygrove](https://github.com/andygrove))
+
+**Implemented enhancements:**
+
+- Update datafusion-cli to support Ballista, or implement new ballista-cli [\#886](https://github.com/apache/arrow-datafusion/issues/886)
+- Prepare Ballista crates for publishing [\#509](https://github.com/apache/arrow-datafusion/issues/509)
+- Add drop table support [\#1266](https://github.com/apache/arrow-datafusion/pull/1266) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([viirya](https://github.com/viirya))
+- use arrow 6.1.0 [\#1255](https://github.com/apache/arrow-datafusion/pull/1255) ([Jimexist](https://github.com/Jimexist))
+- Add support for `create table as` via MemTable [\#1243](https://github.com/apache/arrow-datafusion/pull/1243) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Dandandan](https://github.com/Dandandan))
+- add values list expression [\#1165](https://github.com/apache/arrow-datafusion/pull/1165) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Jimexist](https://github.com/Jimexist))
+- Multiple files per partitions for CSV Avro Json [\#1138](https://github.com/apache/arrow-datafusion/pull/1138) ([rdettai](https://github.com/rdettai))
+- Implement INTERSECT & INTERSECT DISTINCT [\#1135](https://github.com/apache/arrow-datafusion/pull/1135) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- Simplify file struct abstractions [\#1120](https://github.com/apache/arrow-datafusion/pull/1120) ([rdettai](https://github.com/rdettai))
+- Implement `is [not] distinct from` [\#1117](https://github.com/apache/arrow-datafusion/pull/1117) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Dandandan](https://github.com/Dandandan))
+- add digest\(utf8, method\) function and refactor all current hash digest functions [\#1090](https://github.com/apache/arrow-datafusion/pull/1090) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add `blake3` algorithm to `digest` function [\#1086](https://github.com/apache/arrow-datafusion/pull/1086) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add blake2b and blake2s functions [\#1081](https://github.com/apache/arrow-datafusion/pull/1081) ([Jimexist](https://github.com/Jimexist))
+- Update sqlparser-rs to 0.11 [\#1052](https://github.com/apache/arrow-datafusion/pull/1052) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- remove hard coded partition count in ballista logicalplan deserialization [\#1044](https://github.com/apache/arrow-datafusion/pull/1044) ([xudong963](https://github.com/xudong963))
+- Indexed field access for List [\#1006](https://github.com/apache/arrow-datafusion/pull/1006) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Igosuki](https://github.com/Igosuki))
+- Update DataFusion to arrow 6.0 [\#984](https://github.com/apache/arrow-datafusion/pull/984) ([alamb](https://github.com/alamb))
+- Implement Display for Expr, improve operator display [\#971](https://github.com/apache/arrow-datafusion/pull/971) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([matthewmturner](https://github.com/matthewmturner))
+- ObjectStore API to read from remote storage systems [\#950](https://github.com/apache/arrow-datafusion/pull/950) ([yjshen](https://github.com/yjshen))
+- fixes \#933 replace placeholder fmt\_as fr ExecutionPlan impls [\#939](https://github.com/apache/arrow-datafusion/pull/939) ([tiphaineruy](https://github.com/tiphaineruy))
+- Support `NotLike` in Ballista [\#916](https://github.com/apache/arrow-datafusion/pull/916) ([Dandandan](https://github.com/Dandandan))
+- Avro Table Provider [\#910](https://github.com/apache/arrow-datafusion/pull/910) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Igosuki](https://github.com/Igosuki))
+- Add BaselineMetrics, Timestamp metrics, add for `CoalescePartitionsExec`, rename output\_time -\> elapsed\_compute [\#909](https://github.com/apache/arrow-datafusion/pull/909) ([alamb](https://github.com/alamb))
+- \[Ballista\] Add executor last seen info to the ui [\#895](https://github.com/apache/arrow-datafusion/pull/895) ([msathis](https://github.com/msathis))
+- add cross join support to ballista [\#891](https://github.com/apache/arrow-datafusion/pull/891) ([houqp](https://github.com/houqp))
+- Add Ballista support to DataFusion CLI [\#889](https://github.com/apache/arrow-datafusion/pull/889) ([andygrove](https://github.com/andygrove))
+- Add support for PostgreSQL regex match [\#870](https://github.com/apache/arrow-datafusion/pull/870) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([b41sh](https://github.com/b41sh))
+
+**Fixed bugs:**
+
+- Test execution\_plans::shuffle\_writer::tests::test Fail [\#1040](https://github.com/apache/arrow-datafusion/issues/1040)
+- Integration test fails to build docker images [\#918](https://github.com/apache/arrow-datafusion/issues/918)
+- Ballista: Remove hard-coded concurrency from logical plan serde code [\#708](https://github.com/apache/arrow-datafusion/issues/708)
+- How can I make ballista distributed compute work? [\#327](https://github.com/apache/arrow-datafusion/issues/327)
+- fix subquery alias [\#1067](https://github.com/apache/arrow-datafusion/pull/1067) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- Fix compilation for ballista in stand-alone mode [\#1008](https://github.com/apache/arrow-datafusion/pull/1008) ([Igosuki](https://github.com/Igosuki))
+
+**Documentation updates:**
+
+- Add Ballista roadmap [\#1166](https://github.com/apache/arrow-datafusion/pull/1166) ([andygrove](https://github.com/andygrove))
+- Adds note on compatible rust version [\#1097](https://github.com/apache/arrow-datafusion/pull/1097) ([1nF0rmed](https://github.com/1nF0rmed))
+- implement `approx_distinct` function using HyperLogLog [\#1087](https://github.com/apache/arrow-datafusion/pull/1087) ([Jimexist](https://github.com/Jimexist))
+- Improve User Guide [\#954](https://github.com/apache/arrow-datafusion/pull/954) ([andygrove](https://github.com/andygrove))
+- Update plan\_query\_stages doc [\#951](https://github.com/apache/arrow-datafusion/pull/951) ([rdettai](https://github.com/rdettai))
+- \[DataFusion\] - Add show and show\_limit function for DataFrame [\#923](https://github.com/apache/arrow-datafusion/pull/923) ([francis-du](https://github.com/francis-du))
+- update docs related to protoc and optional syntax [\#902](https://github.com/apache/arrow-datafusion/pull/902) ([Jimexist](https://github.com/Jimexist))
+- Improve Ballista crate README content [\#878](https://github.com/apache/arrow-datafusion/pull/878) ([andygrove](https://github.com/andygrove))
+
+**Performance improvements:**
+
+- optimize build profile for datafusion python binding, cli and ballista [\#1137](https://github.com/apache/arrow-datafusion/pull/1137) ([houqp](https://github.com/houqp))
+
+**Closed issues:**
+
+- InList expr with NULL literals do not work [\#1190](https://github.com/apache/arrow-datafusion/issues/1190)
+- update the homepage README to include values, `approx_distinct`, etc. [\#1171](https://github.com/apache/arrow-datafusion/issues/1171)
+- \[Python\]: Inconsistencies with Python package name [\#1011](https://github.com/apache/arrow-datafusion/issues/1011)
+- Wanting to contribute to project where to start? [\#983](https://github.com/apache/arrow-datafusion/issues/983)
+- delete redundant code [\#973](https://github.com/apache/arrow-datafusion/issues/973)
+- How to build DataFusion python wheel [\#853](https://github.com/apache/arrow-datafusion/issues/853)
+- Produce a design for a metrics framework [\#21](https://github.com/apache/arrow-datafusion/issues/21)
+
+**Merged pull requests:**
+
+- \[nit\] simplify ballista executor `CollectExec` impl codes [\#1140](https://github.com/apache/arrow-datafusion/pull/1140) ([panarch](https://github.com/panarch))
+
+
+For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
+
## [ballista-0.5.0](https://github.com/apache/arrow-datafusion/tree/ballista-0.5.0) (2021-08-10)
[Full Changelog](https://github.com/apache/arrow-datafusion/compare/4.0.0...ballista-0.5.0)
diff --git a/ballista/rust/client/Cargo.toml b/ballista/rust/client/Cargo.toml
index adac150..f444689 100644
--- a/ballista/rust/client/Cargo.toml
+++ b/ballista/rust/client/Cargo.toml
@@ -34,7 +34,7 @@
log = "0.4"
tokio = "1.0"
-datafusion = { path = "../../../datafusion", version = "5.1.0" }
+datafusion = { path = "../../../datafusion", version = "6.0.0" }
[features]
default = []
diff --git a/ballista/rust/core/Cargo.toml b/ballista/rust/core/Cargo.toml
index f90d03b..3d15e21 100644
--- a/ballista/rust/core/Cargo.toml
+++ b/ballista/rust/core/Cargo.toml
@@ -45,7 +45,7 @@
arrow-flight = { version = "6.1.0" }
-datafusion = { path = "../../../datafusion", version = "5.1.0" }
+datafusion = { path = "../../../datafusion", version = "6.0.0" }
[dev-dependencies]
tempfile = "3"
diff --git a/ballista/rust/executor/Cargo.toml b/ballista/rust/executor/Cargo.toml
index 5c01f1c..08116f5 100644
--- a/ballista/rust/executor/Cargo.toml
+++ b/ballista/rust/executor/Cargo.toml
@@ -35,7 +35,7 @@
async-trait = "0.1.36"
ballista-core = { path = "../core", version = "0.6.0" }
configure_me = "0.4.0"
-datafusion = { path = "../../../datafusion", version = "5.1.0" }
+datafusion = { path = "../../../datafusion", version = "6.0.0" }
env_logger = "0.9"
futures = "0.3"
log = "0.4"
diff --git a/ballista/rust/scheduler/Cargo.toml b/ballista/rust/scheduler/Cargo.toml
index ac0d987..a71be40 100644
--- a/ballista/rust/scheduler/Cargo.toml
+++ b/ballista/rust/scheduler/Cargo.toml
@@ -35,7 +35,7 @@
ballista-core = { path = "../core", version = "0.6.0" }
clap = "2"
configure_me = "0.4.0"
-datafusion = { path = "../../../datafusion", version = "5.1.0" }
+datafusion = { path = "../../../datafusion", version = "6.0.0" }
env_logger = "0.9"
etcd-client = { version = "0.7", optional = true }
futures = "0.3"
diff --git a/datafusion-cli/Cargo.toml b/datafusion-cli/Cargo.toml
index b1cc09a..3212b67 100644
--- a/datafusion-cli/Cargo.toml
+++ b/datafusion-cli/Cargo.toml
@@ -30,6 +30,6 @@
clap = "2.33"
rustyline = "9.0"
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] }
-datafusion = { path = "../datafusion", version = "5.1.0" }
+datafusion = { path = "../datafusion", version = "6.0.0" }
arrow = { version = "6.1.0" }
ballista = { path = "../ballista/rust/client", version = "0.6.0" }
diff --git a/datafusion/CHANGELOG.md b/datafusion/CHANGELOG.md
index 41afa28..c22b055 100644
--- a/datafusion/CHANGELOG.md
+++ b/datafusion/CHANGELOG.md
@@ -17,10 +17,197 @@
under the License.
-->
-For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
-
# Changelog
+## [6.0.0](https://github.com/apache/arrow-datafusion/tree/6.0.0) (2021-11-13)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/5.0.0...6.0.0)
+
+**Breaking changes:**
+
+- Removed deprecated with\_concurrency [\#1200](https://github.com/apache/arrow-datafusion/pull/1200) ([rdettai](https://github.com/rdettai))
+- File partitioning for ListingTable [\#1141](https://github.com/apache/arrow-datafusion/pull/1141) ([rdettai](https://github.com/rdettai))
+- Add function volatility to Signature [\#1071](https://github.com/apache/arrow-datafusion/pull/1071) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([pjmore](https://github.com/pjmore))
+- fix: allow duplicate field names in table join, fix output with duplicated names [\#1023](https://github.com/apache/arrow-datafusion/pull/1023) ([houqp](https://github.com/houqp))
+- Make TableProvider.scan\(\) and PhysicalPlanner::create\_physical\_plan\(\) async [\#1013](https://github.com/apache/arrow-datafusion/pull/1013) ([rdettai](https://github.com/rdettai))
+- Reorganize table providers by table format [\#1010](https://github.com/apache/arrow-datafusion/pull/1010) ([rdettai](https://github.com/rdettai))
+- Make Metrics::labels\(\) public [\#999](https://github.com/apache/arrow-datafusion/pull/999) ([alamb](https://github.com/alamb))
+- Rename NthValue::{first\_value,last\_value,nth\_value} to satisfy clippy in Rust 1.55 [\#986](https://github.com/apache/arrow-datafusion/pull/986) ([alamb](https://github.com/alamb))
+- Move CBOs and Statistics to physical plan [\#965](https://github.com/apache/arrow-datafusion/pull/965) ([rdettai](https://github.com/rdettai))
+- Update to sqlparser v 0.10.0 [\#934](https://github.com/apache/arrow-datafusion/pull/934) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- FilePartition and PartitionedFile for scanning flexibility [\#932](https://github.com/apache/arrow-datafusion/pull/932) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([yjshen](https://github.com/yjshen))
+- Improve SQLMetric APIs, port existing metrics [\#908](https://github.com/apache/arrow-datafusion/pull/908) ([alamb](https://github.com/alamb))
+- Add support for EXPLAIN ANALYZE [\#858](https://github.com/apache/arrow-datafusion/pull/858) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- Rename concurrency to target\_partitions [\#706](https://github.com/apache/arrow-datafusion/pull/706) ([andygrove](https://github.com/andygrove))
+
+**Implemented enhancements:**
+
+- Add booleans support to the `CASE` statement [\#1156](https://github.com/apache/arrow-datafusion/issues/1156)
+- Implement General Purpose Constant Folding with the Expression Evaluator [\#1070](https://github.com/apache/arrow-datafusion/issues/1070)
+- Mark volatility categories of functions [\#1069](https://github.com/apache/arrow-datafusion/issues/1069)
+- Add "show" support to DataFrame API [\#937](https://github.com/apache/arrow-datafusion/issues/937)
+- Add support for TRIM BOTH/LEADING/TRAILING [\#935](https://github.com/apache/arrow-datafusion/issues/935)
+- Add "baseline" metrics to all built in operators [\#866](https://github.com/apache/arrow-datafusion/issues/866)
+- Add SQL support for referencing fields in structs [\#119](https://github.com/apache/arrow-datafusion/issues/119)
+- add filename completer for create table statement [\#1278](https://github.com/apache/arrow-datafusion/pull/1278) ([Jimexist](https://github.com/Jimexist))
+- Add drop table support [\#1266](https://github.com/apache/arrow-datafusion/pull/1266) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([viirya](https://github.com/viirya))
+- Dataframe supports except and update readme [\#1261](https://github.com/apache/arrow-datafusion/pull/1261) ([xudong963](https://github.com/xudong963))
+- Implement EXCEPT & EXCEPT DISTINCT [\#1259](https://github.com/apache/arrow-datafusion/pull/1259) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- Add DataFrame support for `INTERSECT` and update readme [\#1258](https://github.com/apache/arrow-datafusion/pull/1258) ([xudong963](https://github.com/xudong963))
+- use arrow 6.1.0 [\#1255](https://github.com/apache/arrow-datafusion/pull/1255) ([Jimexist](https://github.com/Jimexist))
+- fix 1250, add editor support for datafusion cli with validation [\#1251](https://github.com/apache/arrow-datafusion/pull/1251) ([Jimexist](https://github.com/Jimexist))
+- Add support for `create table as` via MemTable [\#1243](https://github.com/apache/arrow-datafusion/pull/1243) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Dandandan](https://github.com/Dandandan))
+- Add cli show columns command to describe tables [\#1231](https://github.com/apache/arrow-datafusion/pull/1231) ([Jimexist](https://github.com/Jimexist))
+- datafusion-cli to add list table command [\#1229](https://github.com/apache/arrow-datafusion/pull/1229) ([Jimexist](https://github.com/Jimexist))
+- datafusion cli to handle EoF and interrupt signal [\#1225](https://github.com/apache/arrow-datafusion/pull/1225) ([Jimexist](https://github.com/Jimexist))
+- add \q as quit command and add \? for help [\#1224](https://github.com/apache/arrow-datafusion/pull/1224) ([Jimexist](https://github.com/Jimexist))
+- Add algebraic simplifications to constant\_folding [\#1208](https://github.com/apache/arrow-datafusion/pull/1208) ([matthewmturner](https://github.com/matthewmturner))
+- Improve GetIndexedFieldExpr adding utf8 key based access for struct v… [\#1204](https://github.com/apache/arrow-datafusion/pull/1204) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Igosuki](https://github.com/Igosuki))
+- Fix `between` in select query [\#1202](https://github.com/apache/arrow-datafusion/pull/1202) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([capkurmagati](https://github.com/capkurmagati))
+- Move code to fold Stable functions like `now()` from `Simplifier` to `ConstEvaluator` [\#1176](https://github.com/apache/arrow-datafusion/pull/1176) ([alamb](https://github.com/alamb))
+- DataFrame supports window function [\#1167](https://github.com/apache/arrow-datafusion/pull/1167) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- add values list expression [\#1165](https://github.com/apache/arrow-datafusion/pull/1165) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Jimexist](https://github.com/Jimexist))
+- Add booleans support to the CASE statement [\#1161](https://github.com/apache/arrow-datafusion/pull/1161) ([xudong963](https://github.com/xudong963))
+- Improve error messages when operations are not supported [\#1158](https://github.com/apache/arrow-datafusion/pull/1158) ([alamb](https://github.com/alamb))
+- Generic constant expression evaluation [\#1153](https://github.com/apache/arrow-datafusion/pull/1153) ([alamb](https://github.com/alamb))
+- python `lit` function to support bool and byte vec [\#1152](https://github.com/apache/arrow-datafusion/pull/1152) ([Jimexist](https://github.com/Jimexist))
+- \[nit\] simplify datafusion optimizer module codes [\#1146](https://github.com/apache/arrow-datafusion/pull/1146) ([panarch](https://github.com/panarch))
+- Add ScalarValue support for arbitrary list elements [\#1142](https://github.com/apache/arrow-datafusion/pull/1142) ([jonmmease](https://github.com/jonmmease))
+- Multiple files per partitions for CSV Avro Json [\#1138](https://github.com/apache/arrow-datafusion/pull/1138) ([rdettai](https://github.com/rdettai))
+- Implement INTERSECT & INTERSECT DISTINCT [\#1135](https://github.com/apache/arrow-datafusion/pull/1135) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- Simplify file struct abstractions [\#1120](https://github.com/apache/arrow-datafusion/pull/1120) ([rdettai](https://github.com/rdettai))
+- Implement `is [not] distinct from` [\#1117](https://github.com/apache/arrow-datafusion/pull/1117) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Dandandan](https://github.com/Dandandan))
+- Clean up spawned task on drop for `RepartitionExec`, `SortPreservingMergeExec`, `WindowAggExec` [\#1112](https://github.com/apache/arrow-datafusion/pull/1112) ([crepererum](https://github.com/crepererum))
+- add hyperloglog implementation \(`add` and `count`\) [\#1095](https://github.com/apache/arrow-datafusion/pull/1095) ([Jimexist](https://github.com/Jimexist))
+- Add ScalarValue::Struct variant [\#1091](https://github.com/apache/arrow-datafusion/pull/1091) ([jonmmease](https://github.com/jonmmease))
+- add digest\(utf8, method\) function and refactor all current hash digest functions [\#1090](https://github.com/apache/arrow-datafusion/pull/1090) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add `blake3` algorithm to `digest` function [\#1086](https://github.com/apache/arrow-datafusion/pull/1086) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add blake2b and blake2s functions [\#1081](https://github.com/apache/arrow-datafusion/pull/1081) ([Jimexist](https://github.com/Jimexist))
+- \[nit\] make schema qualifier error message in field lookup more readable [\#1079](https://github.com/apache/arrow-datafusion/pull/1079) ([Jimexist](https://github.com/Jimexist))
+- \[window function\] add `percent_rank` window function [\#1077](https://github.com/apache/arrow-datafusion/pull/1077) ([Jimexist](https://github.com/Jimexist))
+- \[window function\] add `cume_dist` implementation [\#1076](https://github.com/apache/arrow-datafusion/pull/1076) ([Jimexist](https://github.com/Jimexist))
+- Add a LogicalPlanBuilder::schema\(\) function [\#1075](https://github.com/apache/arrow-datafusion/pull/1075) ([alamb](https://github.com/alamb))
+- Add support for UNION \[DISTINCT\] sql [\#1068](https://github.com/apache/arrow-datafusion/pull/1068) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- fix: fix joins on Float32/Float64 columns bug [\#1054](https://github.com/apache/arrow-datafusion/pull/1054) ([francis-du](https://github.com/francis-du))
+- Update sqlparser-rs to 0.11 [\#1052](https://github.com/apache/arrow-datafusion/pull/1052) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([alamb](https://github.com/alamb))
+- Support querying CSV files without providing the schema [\#1050](https://github.com/apache/arrow-datafusion/pull/1050) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- remove hard coded partition count in ballista logicalplan deserialization [\#1044](https://github.com/apache/arrow-datafusion/pull/1044) ([xudong963](https://github.com/xudong963))
+- feat: add lit\_timestamp\_nanosecond [\#1030](https://github.com/apache/arrow-datafusion/pull/1030) ([NGA-TRAN](https://github.com/NGA-TRAN))
+- Ignore metadata on schema merge [\#1024](https://github.com/apache/arrow-datafusion/pull/1024) ([Smurphy000](https://github.com/Smurphy000))
+- add ExecutionConfig.with\_optimizer\_rules [\#1022](https://github.com/apache/arrow-datafusion/pull/1022) ([seddonm1](https://github.com/seddonm1))
+- Add baseline execution stats to `WindowAggExec` and `UnionExec`, and fixup `CoalescePartitionsExec` [\#1018](https://github.com/apache/arrow-datafusion/pull/1018) ([alamb](https://github.com/alamb))
+- Derive PartialOrd for Expr [\#1015](https://github.com/apache/arrow-datafusion/pull/1015) ([alamb](https://github.com/alamb))
+- Indexed field access for List [\#1006](https://github.com/apache/arrow-datafusion/pull/1006) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Igosuki](https://github.com/Igosuki))
+- Add metrics for Limit and Projection, and CoalesceBatches [\#1004](https://github.com/apache/arrow-datafusion/pull/1004) ([alamb](https://github.com/alamb))
+- Update DataFusion to arrow 6.0 [\#984](https://github.com/apache/arrow-datafusion/pull/984) ([alamb](https://github.com/alamb))
+- Implement Display for Expr, improve operator display [\#971](https://github.com/apache/arrow-datafusion/pull/971) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([matthewmturner](https://github.com/matthewmturner))
+- Add metrics for FilterExec [\#960](https://github.com/apache/arrow-datafusion/pull/960) ([alamb](https://github.com/alamb))
+- Change compound column field name rules [\#952](https://github.com/apache/arrow-datafusion/pull/952) ([waynexia](https://github.com/waynexia))
+- ObjectStore API to read from remote storage systems [\#950](https://github.com/apache/arrow-datafusion/pull/950) ([yjshen](https://github.com/yjshen))
+- Add baseline metrics to `SortPreservingMergeExec` [\#948](https://github.com/apache/arrow-datafusion/pull/948) ([alamb](https://github.com/alamb))
+- Add support for TRIM LEADING/TRAILING/BOTH syntax [\#947](https://github.com/apache/arrow-datafusion/pull/947) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([adsharma](https://github.com/adsharma))
+- fixes \#933 replace placeholder fmt\_as fr ExecutionPlan impls [\#939](https://github.com/apache/arrow-datafusion/pull/939) ([tiphaineruy](https://github.com/tiphaineruy))
+- Add metrics for SortExect + HashAggregateExec [\#938](https://github.com/apache/arrow-datafusion/pull/938) ([alamb](https://github.com/alamb))
+- Add some additional asserts in `utils::from_plan` [\#930](https://github.com/apache/arrow-datafusion/pull/930) ([alamb](https://github.com/alamb))
+- Avro Table Provider [\#910](https://github.com/apache/arrow-datafusion/pull/910) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Igosuki](https://github.com/Igosuki))
+- Add BaselineMetrics, Timestamp metrics, add for `CoalescePartitionsExec`, rename output\_time -\> elapsed\_compute [\#909](https://github.com/apache/arrow-datafusion/pull/909) ([alamb](https://github.com/alamb))
+- add cross join support to ballista [\#891](https://github.com/apache/arrow-datafusion/pull/891) ([houqp](https://github.com/houqp))
+- Add Ballista support to DataFusion CLI [\#889](https://github.com/apache/arrow-datafusion/pull/889) ([andygrove](https://github.com/andygrove))
+- support like on DictionaryArray [\#876](https://github.com/apache/arrow-datafusion/pull/876) ([b41sh](https://github.com/b41sh))
+- Register table based on known schema without file IO [\#872](https://github.com/apache/arrow-datafusion/pull/872) ([Dandandan](https://github.com/Dandandan))
+- Add support for PostgreSQL regex match [\#870](https://github.com/apache/arrow-datafusion/pull/870) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([b41sh](https://github.com/b41sh))
+- Include planning time in datafusion-cli printing [\#860](https://github.com/apache/arrow-datafusion/pull/860) ([Dandandan](https://github.com/Dandandan))
+- Implement basic common subexpression eliminate optimization [\#792](https://github.com/apache/arrow-datafusion/pull/792) ([waynexia](https://github.com/waynexia))
+- Impl `ops::Not` for `expr` [\#763](https://github.com/apache/arrow-datafusion/pull/763) ([Jimexist](https://github.com/Jimexist))
+
+**Fixed bugs:**
+
+- Can not use `between` in the select list: [\#1196](https://github.com/apache/arrow-datafusion/issues/1196)
+- ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' [\#1195](https://github.com/apache/arrow-datafusion/issues/1195)
+- window functions with NULL literals in `partition by` and `order by` do not work: Internal\("Sort operation is not applicable to scalar value NULL"\) [\#1194](https://github.com/apache/arrow-datafusion/issues/1194)
+- Operation name not included in internal errors -- Internal\("Data type Boolean not supported for binary operation on dyn arrays"\) [\#1157](https://github.com/apache/arrow-datafusion/issues/1157)
+- Physical plan explain UNION query says "ExecutionPlan\(PlaceHolder\)" [\#933](https://github.com/apache/arrow-datafusion/issues/933)
+- Can not use LIKE on DictionaryArray encoded strings [\#815](https://github.com/apache/arrow-datafusion/issues/815)
+- physical\_plan::repartition::tests::repartition\_with\_dropping\_output\_stream failing locally [\#614](https://github.com/apache/arrow-datafusion/issues/614)
+- Fix some `BuiltinScalarFunction` panics with zero arguments [\#1249](https://github.com/apache/arrow-datafusion/pull/1249) ([capkurmagati](https://github.com/capkurmagati))
+- fix: not do boolean folding on NULL and/or expr [\#1245](https://github.com/apache/arrow-datafusion/pull/1245) ([NGA-TRAN](https://github.com/NGA-TRAN))
+- ignore case of `with header row` in sql when creating external table [\#1237](https://github.com/apache/arrow-datafusion/pull/1237) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([lichuan6](https://github.com/lichuan6))
+- fix: Min/Max aggregation data type should not be dictionary [\#1235](https://github.com/apache/arrow-datafusion/pull/1235) ([NGA-TRAN](https://github.com/NGA-TRAN))
+- Fix build with `--no-default-features` [\#1219](https://github.com/apache/arrow-datafusion/pull/1219) ([alamb](https://github.com/alamb))
+- Prevent "future cannot be sent between threads safely" compilation error [\#1155](https://github.com/apache/arrow-datafusion/pull/1155) ([jonmmease](https://github.com/jonmmease))
+- Clean up spawned task on drop for `AnalyzeExec`, `CoalescePartitionsExec`, `HashAggregateExec` [\#1121](https://github.com/apache/arrow-datafusion/pull/1121) ([crepererum](https://github.com/crepererum))
+- Clean up spawned task on `SortStream` drop [\#1105](https://github.com/apache/arrow-datafusion/pull/1105) ([crepererum](https://github.com/crepererum))
+- fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 [\#1088](https://github.com/apache/arrow-datafusion/pull/1088) ([xudong963](https://github.com/xudong963))
+- python: fix generated table name in dataframe creation [\#1078](https://github.com/apache/arrow-datafusion/pull/1078) ([houqp](https://github.com/houqp))
+- fix subquery alias [\#1067](https://github.com/apache/arrow-datafusion/pull/1067) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([xudong963](https://github.com/xudong963))
+- fix pattern handling in regexp\_match function [\#1065](https://github.com/apache/arrow-datafusion/pull/1065) ([houqp](https://github.com/houqp))
+- fix: joins on Timestamp columns [\#1055](https://github.com/apache/arrow-datafusion/pull/1055) ([francis-du](https://github.com/francis-du))
+- Fix metric name typo [\#943](https://github.com/apache/arrow-datafusion/pull/943) ([alamb](https://github.com/alamb))
+- EXPLAIN ANALYZE should run all Optimizer passes [\#929](https://github.com/apache/arrow-datafusion/pull/929) ([alamb](https://github.com/alamb))
+
+**Documentation updates:**
+
+- update docs to fix DataFusion User Guide link [\#1238](https://github.com/apache/arrow-datafusion/pull/1238) ([jiangzhx](https://github.com/jiangzhx))
+- \[docs\] datafusion cli run via homebrew [\#1198](https://github.com/apache/arrow-datafusion/pull/1198) ([Jimexist](https://github.com/Jimexist))
+- add support for unary and binary values in values list, update docs [\#1172](https://github.com/apache/arrow-datafusion/pull/1172) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([Jimexist](https://github.com/Jimexist))
+- Add additional docstring comments to `from_plan` [\#1168](https://github.com/apache/arrow-datafusion/pull/1168) ([alamb](https://github.com/alamb))
+- \[nit\] fix document issue for `approx_distinct` [\#1110](https://github.com/apache/arrow-datafusion/pull/1110) ([Jimexist](https://github.com/Jimexist))
+- implement `approx_distinct` function using HyperLogLog [\#1087](https://github.com/apache/arrow-datafusion/pull/1087) ([Jimexist](https://github.com/Jimexist))
+- Remove unused `use` statements from examples [\#1032](https://github.com/apache/arrow-datafusion/pull/1032) ([alamb](https://github.com/alamb))
+- consolidate datafusion docs with sphinx [\#993](https://github.com/apache/arrow-datafusion/pull/993) ([houqp](https://github.com/houqp))
+- Updated user-guide library docs with optimized config [\#976](https://github.com/apache/arrow-datafusion/pull/976) ([matthewmturner](https://github.com/matthewmturner))
+- Improve User Guide [\#954](https://github.com/apache/arrow-datafusion/pull/954) ([andygrove](https://github.com/andygrove))
+- \[MINOR\] Fix typos in doc comments [\#945](https://github.com/apache/arrow-datafusion/pull/945) ([alamb](https://github.com/alamb))
+- \[DataFusion\] - Add show and show\_limit function for DataFrame [\#923](https://github.com/apache/arrow-datafusion/pull/923) ([francis-du](https://github.com/francis-du))
+- Typo fix in DataFusion crate documentation [\#914](https://github.com/apache/arrow-datafusion/pull/914) ([antoinewdg](https://github.com/antoinewdg))
+
+**Performance improvements:**
+
+- Improve avro reader performance by avoiding some cloning on avro\_rs::Value [\#1206](https://github.com/apache/arrow-datafusion/pull/1206) ([Igosuki](https://github.com/Igosuki))
+- optimize build profile for datafusion python binding, cli and ballista [\#1137](https://github.com/apache/arrow-datafusion/pull/1137) ([houqp](https://github.com/houqp))
+- Avoid stack overflow by reducing stack usage of `BinaryExpr::evaluate` in debug builds [\#1047](https://github.com/apache/arrow-datafusion/pull/1047) ([alamb](https://github.com/alamb))
+- Add ScalarValue::eq\_array optimized comparison function [\#844](https://github.com/apache/arrow-datafusion/pull/844) ([alamb](https://github.com/alamb))
+- Rework GroupByHash to for faster performance and support grouping by nulls [\#808](https://github.com/apache/arrow-datafusion/pull/808) ([alamb](https://github.com/alamb))
+
+**Closed issues:**
+
+- InList expr with NULL literals do not work [\#1190](https://github.com/apache/arrow-datafusion/issues/1190)
+- update the homepage README to include values, `approx_distinct`, etc. [\#1171](https://github.com/apache/arrow-datafusion/issues/1171)
+- \[Python\]: Inconsistencies with Python package name [\#1011](https://github.com/apache/arrow-datafusion/issues/1011)
+- Wanting to contribute to project where to start? [\#983](https://github.com/apache/arrow-datafusion/issues/983)
+- delete redundant code [\#973](https://github.com/apache/arrow-datafusion/issues/973)
+- How to build DataFusion python wheel [\#853](https://github.com/apache/arrow-datafusion/issues/853)
+- Add support for partition pruning [\#204](https://github.com/apache/arrow-datafusion/issues/204)
+- \[Datafusion\] Support joins on TimestampMillisecond columns [\#187](https://github.com/apache/arrow-datafusion/issues/187)
+- TPC-H Query 21 [\#173](https://github.com/apache/arrow-datafusion/issues/173)
+- TPC-H Query 13 [\#164](https://github.com/apache/arrow-datafusion/issues/164)
+- TPC-H Query 8 [\#162](https://github.com/apache/arrow-datafusion/issues/162)
+- implement split\_part\(string, delimiter, position\) [\#157](https://github.com/apache/arrow-datafusion/issues/157)
+- Join Statement: Schema contains duplicate unqualified field name [\#155](https://github.com/apache/arrow-datafusion/issues/155)
+- ParquetTable should avoid scanning all files twice [\#136](https://github.com/apache/arrow-datafusion/issues/136)
+- Add support for reading partitioned Parquet files [\#133](https://github.com/apache/arrow-datafusion/issues/133)
+- Add support for Parquet schema merging [\#132](https://github.com/apache/arrow-datafusion/issues/132)
+- Catalog abstraction [\#126](https://github.com/apache/arrow-datafusion/issues/126)
+- Optimizer rules should work with qualified column names [\#125](https://github.com/apache/arrow-datafusion/issues/125)
+- Add optional qualifier to Expr::Column [\#121](https://github.com/apache/arrow-datafusion/issues/121)
+- Implement modulus expression [\#99](https://github.com/apache/arrow-datafusion/issues/99)
+- \[Rust\] Add constant folding to expressions during logically planning [\#98](https://github.com/apache/arrow-datafusion/issues/98)
+- \[Rust\] Implement pretty print for physical query plan [\#93](https://github.com/apache/arrow-datafusion/issues/93)
+- Can not group by boolean columns \(add boolean to valid keys of groupBy\) [\#91](https://github.com/apache/arrow-datafusion/issues/91)
+- improve performance of building literal arrays [\#90](https://github.com/apache/arrow-datafusion/issues/90)
+- \[rust\]\[datafusion\] optimize count\(\*\) queries on parquet sources [\#89](https://github.com/apache/arrow-datafusion/issues/89)
+- Produce a design for a metrics framework [\#21](https://github.com/apache/arrow-datafusion/issues/21)
+
+**Merged pull requests:**
+
+- Add timezome string to stablize test [\#1265](https://github.com/apache/arrow-datafusion/pull/1265) ([viirya](https://github.com/viirya))
+- numerical\_coercion pattern match optimize [\#1256](https://github.com/apache/arrow-datafusion/pull/1256) ([Jimexist](https://github.com/Jimexist))
+- fix and update window function sql tests [\#1059](https://github.com/apache/arrow-datafusion/pull/1059) ([Jimexist](https://github.com/Jimexist))
+- reduce ScalarValue from trait boilerplate with macro [\#989](https://github.com/apache/arrow-datafusion/pull/989) ([houqp](https://github.com/houqp))
+
+
+For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
+
## [5.0.0](https://github.com/apache/arrow-datafusion/tree/5.0.0) (2021-08-10)
[Full Changelog](https://github.com/apache/arrow-datafusion/compare/4.0.0...5.0.0)
diff --git a/datafusion/Cargo.toml b/datafusion/Cargo.toml
index 7df8c8f..f0f368a 100644
--- a/datafusion/Cargo.toml
+++ b/datafusion/Cargo.toml
@@ -18,7 +18,7 @@
[package]
name = "datafusion"
description = "DataFusion is an in-memory query engine that uses Apache Arrow as the memory model"
-version = "5.1.0"
+version = "6.0.0"
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
readme = "../README.md"
diff --git a/dev/release/README.md b/dev/release/README.md
index 7a51573..775678a 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -82,7 +82,11 @@
git checkout apache/master
```
-Update datafusion version in `datafusion/Cargo.toml` to `5.1.0`.
+Update datafusion version in `datafusion/Cargo.toml` to `5.1.0`:
+
+```
+./dev/update_datafusion_versions.py 5.1.0
+```
If there is a ballista release, update versions in ballista Cargo.tomls, run
@@ -101,19 +105,9 @@
### Update CHANGELOG.md
-Create local release rc tags:
-
-```
-git tag -f 5.1.0-rc-local
-# if there is ballista release
-git tag -f ballista-0.5.0-rc-local
-# if there is python binding release
-git tag -f python-0.3.0-rc-local
-```
-
-Manully edit the previous release version tag in
+Manully edit the base version tag argument in
`dev/release/update_change_log-{ballista,datafusion,python}.sh`. Commits
-between the previous verstion tag and the new rc tag will be used to
+between the base verstion tag and the latest upstream master will be used to
populate the changelog content.
```bash
@@ -123,9 +117,6 @@
git commit -a -m 'Create changelog for release'
```
-Note that when reviewing the change log, rather than editing the
-`CHANGELOG.md`, it is preferred to update the issues and their labels.
-
You can add `invalid` or `development-process` label to exclude items from
release notes. Add `datafusion`, `ballista` and `python` labels to group items
into each sub-project's change log.
diff --git a/dev/release/update_change_log-all.sh b/dev/release/update_change_log-all.sh
index 9ef09eb..c5639cc 100755
--- a/dev/release/update_change_log-all.sh
+++ b/dev/release/update_change_log-all.sh
@@ -18,6 +18,8 @@
# under the License.
#
+set -e
+
# Usage:
# CHANGELOG_GITHUB_TOKEN=<TOKEN> ./update_change_log-datafusion.sh
diff --git a/dev/release/update_change_log-ballista.sh b/dev/release/update_change_log-ballista.sh
index 05c5f6f..b5ce827 100755
--- a/dev/release/update_change_log-ballista.sh
+++ b/dev/release/update_change_log-ballista.sh
@@ -25,4 +25,8 @@
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/ballista/rust/client/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
-${SOURCE_DIR}/update_change_log.sh ballista 4.0.0 "ballista-${CURRENT_VER}-rc-local"
+${SOURCE_DIR}/update_change_log.sh \
+ ballista \
+ ballista-0.5.0 \
+ --exclude-tags-regex "python-.+" \
+ --future-release "ballista-${CURRENT_VER}"
diff --git a/dev/release/update_change_log-datafusion.sh b/dev/release/update_change_log-datafusion.sh
index 1570c91..4259c86 100755
--- a/dev/release/update_change_log-datafusion.sh
+++ b/dev/release/update_change_log-datafusion.sh
@@ -25,4 +25,8 @@
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/datafusion/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
-${SOURCE_DIR}/update_change_log.sh datafusion 4.0.0 "${CURRENT_VER}-rc-local"
+${SOURCE_DIR}/update_change_log.sh \
+ datafusion \
+ 5.0.0 \
+ --exclude-tags-regex "(python|ballista)-.+" \
+ --future-release "${CURRENT_VER}"
diff --git a/dev/release/update_change_log-python.sh b/dev/release/update_change_log-python.sh
index 6b864f9..6d428e8 100755
--- a/dev/release/update_change_log-python.sh
+++ b/dev/release/update_change_log-python.sh
@@ -25,4 +25,8 @@
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
CURRENT_VER=$(grep version "${SOURCE_TOP_DIR}/python/Cargo.toml" | head -n 1 | awk '{print $3}' | tr -d '"')
-${SOURCE_DIR}/update_change_log.sh python 4.0.0 "python-${CURRENT_VER}-rc-local"
+${SOURCE_DIR}/update_change_log.sh \
+ python \
+ python-0.3.0 \
+ --exclude-tags-regex "ballista-.+" \
+ --future-release "python-${CURRENT_VER}"
diff --git a/dev/release/update_change_log.sh b/dev/release/update_change_log.sh
index 0c9c233..1d1570d 100755
--- a/dev/release/update_change_log.sh
+++ b/dev/release/update_change_log.sh
@@ -27,34 +27,44 @@
# arrow-datafusion/.github_changelog_generator
#
# Usage:
-# CHANGELOG_GITHUB_TOKEN=<TOKEN> ./update_change_log.sh <PROJECT> <FROM_VER> <TO_VER>
+# CHANGELOG_GITHUB_TOKEN=<TOKEN> ./update_change_log.sh <PROJECT> <SINCE_TAG> <EXTRA_ARGS...>
set -e
SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
-if [[ "$#" -ne 3 ]]; then
- echo "USAGE: $0 PROJECT FROM_VER TO_VER"
+echo $1
+
+if [[ "$#" -lt 2 ]]; then
+ echo "USAGE: $0 PROJECT SINCE_TAG EXTRA_ARGS..."
exit 1
fi
PROJECT=$1
-FROM_VER=$2
-TO_VER=$3
+SINCE_TAG=$2
+shift 2
+
OUTPUT_PATH="${PROJECT}/CHANGELOG.md"
pushd ${SOURCE_TOP_DIR}
+
+# reset content in changelog
+git co "${SINCE_TAG}" "${OUTPUT_PATH}"
+# remove license header so github-changelog-generator has a clean base to append
+sed -i '1,18d' "${OUTPUT_PATH}"
+
docker run -it --rm \
-e CHANGELOG_GITHUB_TOKEN=$CHANGELOG_GITHUB_TOKEN \
-v "$(pwd)":/usr/local/src/your-app \
- githubchangeloggenerator/github-changelog-generator \
+ githubchangeloggenerator/github-changelog-generator:1.16.2 \
--user apache \
--project arrow-datafusion \
- --since-tag "${FROM_VER}" \
+ --since-tag "${SINCE_TAG}" \
--include-labels "${PROJECT}" \
+ --base "${OUTPUT_PATH}" \
--output "${OUTPUT_PATH}" \
- --future-release "${TO_VER}"
+ "$@"
sed -i "s/\\\n/\n\n/" "${OUTPUT_PATH}"
diff --git a/dev/update_ballista_versions.py b/dev/update_ballista_versions.py
index 7023541..57e055e 100755
--- a/dev/update_ballista_versions.py
+++ b/dev/update_ballista_versions.py
@@ -35,10 +35,12 @@
data = f.read()
doc = tomlkit.parse(data)
- doc.get('package')['version'] = new_version
+ if cargo_toml.startswith("ballista/"):
+ doc.get('package')['version'] = new_version
# ballista crates also depend on each other
ballista_deps = (
+ 'ballista',
'ballista-core',
'ballista-executor',
'ballista-scheduler',
@@ -80,6 +82,7 @@
'ballista/rust/scheduler',
'ballista/rust/executor',
'ballista/rust/client',
+ 'datafusion-cli',
]
])
new_version = args.new_version
@@ -89,7 +92,10 @@
for cargo_toml in ballista_crates:
update_cargo_toml(cargo_toml, new_version)
- for path in ("benchmarks/docker-compose.yaml", "docs/user-guide/src/distributed/docker-compose.md"):
+ for path in (
+ "benchmarks/docker-compose.yaml",
+ "docs/source/user-guide/distributed/deployment/docker-compose.md",
+ ):
path = os.path.join(repo_root, path)
update_docker_compose(path, new_version)
diff --git a/dev/update_datafusion_versions.py b/dev/update_datafusion_versions.py
index d312f21..af16b51 100755
--- a/dev/update_datafusion_versions.py
+++ b/dev/update_datafusion_versions.py
@@ -22,6 +22,7 @@
# dependencies:
# pip install tomlkit
+import re
import os
import argparse
from pathlib import Path
@@ -61,6 +62,15 @@
f.write(tomlkit.dumps(doc))
+def update_docs(path: str, new_version: str):
+ print(f"updating docs in {path}")
+ with open(path, 'r+') as fd:
+ content = fd.read()
+ fd.seek(0)
+ content = re.sub(r'datafusion = "(.+)"', f'datafusion = "{new_version}"', content)
+ fd.write(content)
+
+
def main():
parser = argparse.ArgumentParser(
description=(
@@ -79,6 +89,8 @@
for cargo_toml in repo_root.rglob('Cargo.toml'):
update_downstream_versions(cargo_toml, new_version)
+ update_docs("README.md", new_version)
+
if __name__ == "__main__":
main()
diff --git a/python/CHANGELOG.md b/python/CHANGELOG.md
index a4964ab..a07cb00 100644
--- a/python/CHANGELOG.md
+++ b/python/CHANGELOG.md
@@ -17,10 +17,67 @@
under the License.
-->
-For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
-
# Changelog
+## [python-0.4.0](https://github.com/apache/arrow-datafusion/tree/python-0.4.0) (2021-11-13)
+
+[Full Changelog](https://github.com/apache/arrow-datafusion/compare/python-0.3.0...python-0.4.0)
+
+**Breaking changes:**
+
+- Add function volatility to Signature [\#1071](https://github.com/apache/arrow-datafusion/pull/1071) [[sql](https://github.com/apache/arrow-datafusion/labels/sql)] ([pjmore](https://github.com/pjmore))
+- Make TableProvider.scan\(\) and PhysicalPlanner::create\_physical\_plan\(\) async [\#1013](https://github.com/apache/arrow-datafusion/pull/1013) ([rdettai](https://github.com/rdettai))
+- Reorganize table providers by table format [\#1010](https://github.com/apache/arrow-datafusion/pull/1010) ([rdettai](https://github.com/rdettai))
+
+**Implemented enhancements:**
+
+- Build abi3 wheels for python binding [\#921](https://github.com/apache/arrow-datafusion/issues/921)
+- Release documentation for python binding [\#837](https://github.com/apache/arrow-datafusion/issues/837)
+- use arrow 6.1.0 [\#1255](https://github.com/apache/arrow-datafusion/pull/1255) ([Jimexist](https://github.com/Jimexist))
+- python `lit` function to support bool and byte vec [\#1152](https://github.com/apache/arrow-datafusion/pull/1152) ([Jimexist](https://github.com/Jimexist))
+- add python binding for `approx_distinct` aggregate function [\#1134](https://github.com/apache/arrow-datafusion/pull/1134) ([Jimexist](https://github.com/Jimexist))
+- refactor datafusion python `lit` function to allow different types [\#1130](https://github.com/apache/arrow-datafusion/pull/1130) ([Jimexist](https://github.com/Jimexist))
+- \[python\] add digest python function [\#1127](https://github.com/apache/arrow-datafusion/pull/1127) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add `blake3` algorithm to `digest` function [\#1086](https://github.com/apache/arrow-datafusion/pull/1086) ([Jimexist](https://github.com/Jimexist))
+- \[crypto\] add blake2b and blake2s functions [\#1081](https://github.com/apache/arrow-datafusion/pull/1081) ([Jimexist](https://github.com/Jimexist))
+- fix: fix joins on Float32/Float64 columns bug [\#1054](https://github.com/apache/arrow-datafusion/pull/1054) ([francis-du](https://github.com/francis-du))
+- Update DataFusion to arrow 6.0 [\#984](https://github.com/apache/arrow-datafusion/pull/984) ([alamb](https://github.com/alamb))
+- \[Python\] Add support to perform sql query on in-memory datasource. [\#981](https://github.com/apache/arrow-datafusion/pull/981) ([mmuru](https://github.com/mmuru))
+- \[Python\] - Support show function for DataFrame api of python library [\#942](https://github.com/apache/arrow-datafusion/pull/942) ([francis-du](https://github.com/francis-du))
+- Rework the python bindings using conversion traits from arrow-rs [\#873](https://github.com/apache/arrow-datafusion/pull/873) ([kszucs](https://github.com/kszucs))
+
+**Fixed bugs:**
+
+- Error in `python test` check / maturn python build: `function or associated item not found in `proc_macro::Literal` [\#961](https://github.com/apache/arrow-datafusion/issues/961)
+- Use UUID to create unique table names in python binding [\#1111](https://github.com/apache/arrow-datafusion/pull/1111) ([hippowdon](https://github.com/hippowdon))
+- python: fix generated table name in dataframe creation [\#1078](https://github.com/apache/arrow-datafusion/pull/1078) ([houqp](https://github.com/houqp))
+- fix: joins on Timestamp columns [\#1055](https://github.com/apache/arrow-datafusion/pull/1055) ([francis-du](https://github.com/francis-du))
+- register datafusion.functions as a python package [\#995](https://github.com/apache/arrow-datafusion/pull/995) ([houqp](https://github.com/houqp))
+
+**Documentation updates:**
+
+- python: update docs to use new APIs [\#1287](https://github.com/apache/arrow-datafusion/pull/1287) ([houqp](https://github.com/houqp))
+- Fix typo on Python functions [\#1207](https://github.com/apache/arrow-datafusion/pull/1207) ([j-a-m-l](https://github.com/j-a-m-l))
+- fix deadlink in python/readme [\#1002](https://github.com/apache/arrow-datafusion/pull/1002) ([waynexia](https://github.com/waynexia))
+
+**Performance improvements:**
+
+- optimize build profile for datafusion python binding, cli and ballista [\#1137](https://github.com/apache/arrow-datafusion/pull/1137) ([houqp](https://github.com/houqp))
+
+**Closed issues:**
+
+- InList expr with NULL literals do not work [\#1190](https://github.com/apache/arrow-datafusion/issues/1190)
+- update the homepage README to include values, `approx_distinct`, etc. [\#1171](https://github.com/apache/arrow-datafusion/issues/1171)
+- \[Python\]: Inconsistencies with Python package name [\#1011](https://github.com/apache/arrow-datafusion/issues/1011)
+- Wanting to contribute to project where to start? [\#983](https://github.com/apache/arrow-datafusion/issues/983)
+- delete redundant code [\#973](https://github.com/apache/arrow-datafusion/issues/973)
+- \[Python\]: register custom datasource [\#906](https://github.com/apache/arrow-datafusion/issues/906)
+- How to build DataFusion python wheel [\#853](https://github.com/apache/arrow-datafusion/issues/853)
+- Produce a design for a metrics framework [\#21](https://github.com/apache/arrow-datafusion/issues/21)
+
+
+For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANGELOG.md)
+
## [python-0.3.0](https://github.com/apache/arrow-datafusion/tree/python-0.3.0) (2021-08-10)
[Full Changelog](https://github.com/apache/arrow-datafusion/compare/4.0.0...python-0.3.0)
diff --git a/python/Cargo.toml b/python/Cargo.toml
index 3d3ebfa..568f3c7 100644
--- a/python/Cargo.toml
+++ b/python/Cargo.toml
@@ -17,7 +17,7 @@
[package]
name = "datafusion-python"
-version = "0.3.0"
+version = "0.4.0"
homepage = "https://github.com/apache/arrow"
repository = "https://github.com/apache/arrow"
authors = ["Apache Arrow <dev@arrow.apache.org>"]
@@ -31,7 +31,7 @@
tokio = { version = "1.0", features = ["macros", "rt", "rt-multi-thread", "sync"] }
rand = "0.7"
pyo3 = { version = "0.14", features = ["extension-module", "abi3", "abi3-py36"] }
-datafusion = { path = "../datafusion", version = "5.1.0", features = ["pyarrow"] }
+datafusion = { path = "../datafusion", version = "6.0.0", features = ["pyarrow"] }
uuid = { version = "0.8", features = ["v4"] }
[lib]