commit | 1ed681912be7246695cdd938ea632e1751403f67 | [log] [tgz] |
---|---|---|
author | Andrew Lamb <andrew@nerdnetworks.org> | Tue Apr 13 07:14:45 2021 -0400 |
committer | Andrew Lamb <andrew@nerdnetworks.org> | Tue Apr 13 07:14:45 2021 -0400 |
tree | 53b9e393d93f8b4fb731284988c76acce1eec72b | |
parent | 72249203be90b45a315cf8028536fd72a7f9427b [diff] |
ARROW-12277: [Rust][DataFusion] Implement Sum/Count/Min/Max aggregates for Timestamp(_,_) # Rationale: If you try and aggregate (via SUM, for example) a column of a timestamp type, DataFusion generates an error: ``` Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed. ``` For example, from IOx ``` > show columns from t; +---------------+--------------+------------+-------------+-----------------------------+-------------+ | table_catalog | table_schema | table_name | column_name | data_type | is_nullable | +---------------+--------------+------------+-------------+-----------------------------+-------------+ | datafusion | public | t | a | Utf8 | NO | | datafusion | public | t | b | Timestamp(Nanosecond, None) | NO | +---------------+--------------+------------+-------------+-----------------------------+-------------+ 2 row in set. Query took 0 seconds. > select sum(b) from t; Plan("Coercion from [Timestamp(Nanosecond, None)] to the signature Uniform(1, [Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float32, Float64]) failed.") ``` # Changes: Add support for aggregating timestamp types and tests for same # Notes Note this is follow on / more fleshing out of the work done in #9773 by @velvia (👋 thanks for adding Timestamps to `ScalarValue`) Supporting AVG on timestamps is tracked by https://issues.apache.org/jira/browse/ARROW-12318. It is more involved (as currently Avg assumes the output type is always F64), and not important for myuse case at the moment. Closes #9970 from alamb/alamb/ARROW-12277-aggregate-timestamps Authored-by: Andrew Lamb <andrew@nerdnetworks.org> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.
Major components of the project include:
Arrow is an Apache Software Foundation project. Learn more at arrow.apache.org.
The reference Arrow libraries contain many distinct software components:
The official Arrow libraries in this repository are in different stages of implementing the Arrow format and related features. See our current feature matrix on git master.
Please read our latest project contribution guide.
Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved: