Make a PR to update the rust-toolchain file in the root of the repository.
Implementation
| Function type | Location to implement | Trait to implement | Macros to use | Example |
|---|---|---|---|---|
| Scalar | functions | ScalarUDFImpl | make_udf_function!() and export_functions!() | advanced_udf.rs |
| Nested | functions-nested | ScalarUDFImpl | make_udf_expr_and_func!() | |
| Aggregate | functions-aggregate | AggregateUDFImpl and an Accumulator | make_udaf_expr_and_func!() | advanced_udaf.rs |
| Window | functions-window | WindowUDFImpl and a PartitionEvaluator | define_udwf_and_expr!() | advanced_udwf.rs |
| Table | functions-table | TableFunctionImpl and a TableProvider | create_udtf_function!() | simple_udtf.rs |
mod.rs or lib.rs.#[user_doc(...)] attribute so their documentation can be included in the SQL reference documentation (see below section)GroupsAccumulator for better performanceSpark compatible functions are located in separate crate but otherwise follow the same steps, though all function types (e.g. scalar, nested, aggregate) are grouped together in the single location.
Testing
Prefer adding sqllogictest integration tests where the function is called via SQL against well known data and returns an expected result. See the existing test files if there is an appropriate file to add test cases to, otherwise create a new file. See the sqllogictest documentation for details on how to construct these tests. Ensure edge case, null input cases are considered in these tests.
If a behaviour cannot be tested via sqllogictest (e.g. testing simplify(), needs to be tested in isolation from the optimizer, difficult to construct exact input via sqllogictest) then tests can be added as Rust unit tests in the implementation module, though these should be kept minimal where possible
Documentation
Run documentation update script ./dev/update_function_docs.sh which will update the relevant markdown document here (see the documents for scalar, aggregate and window functions)
The query plans represented by LogicalPlan nodes can be graphically rendered using Graphviz.
To do so, save the output of the display_graphviz function to a file.:
// Create plan somehow... let mut output = File::create("/tmp/plan.dot")?; write!(output, "{}", plan.display_graphviz());
Then, use the dot command line tool to render it into a file that can be displayed. For example, the following command creates a /tmp/plan.pdf file:
dot -Tpdf < /tmp/plan.dot > /tmp/plan.pdf
.md documentsWe use prettier to format .md files.
You can either use npm i -g prettier to install it globally or use npx to run it as a standalone binary. Using npx requires a working node environment. Upgrading to the latest prettier is recommended (by adding --upgrade to the npm command).
$ prettier --version 2.3.0
After you've confirmed your prettier version, you can format all the .md files:
prettier -w {datafusion,datafusion-cli,datafusion-examples,dev,docs}/**/*.md
.toml filesWe use taplo to format .toml files.
To install via cargo:
cargo install taplo-cli --locked
Refer to the taplo installation documentation for other ways to install it.
$ taplo --version taplo 0.9.0
After you've confirmed your taplo version, you can format all the .toml files:
taplo fmt
For the proto and proto-common crates, the prost/tonic code is generated by running their respective ./regen.sh scripts, which in turn invokes the Rust binary located in ./gen.
This is necessary after modifying the protobuf definitions or altering the dependencies of ./gen, and requires a valid installation of protoc (see installation instructions for details).
# From repository root # proto-common ./datafusion/proto-common/regen.sh # proto ./datafusion/proto/regen.sh