DataFusion provides a flexible extension system that allows you to customize SQL parsing and planning without modifying the core codebase. This is useful when you need to:
-> for JSON)TABLESAMPLE, PIVOT/UNPIVOT, or MATCH_RECOGNIZEYou can read more about this topic in the Extending SQL in DataFusion: from ->> to TABLESAMPLE blog.
When DataFusion processes a SQL query, it goes through these stages:
┌─────────────┐ ┌─────────┐ ┌──────────────────────┐ ┌─────────────┐ │ SQL String │───▶│ Parser │───▶│ SqlToRel │───▶│ LogicalPlan │ └─────────────┘ └─────────┘ │ (SQL to LogicalPlan) │ └─────────────┘ └──────────────────────┘ │ │ uses ▼ ┌───────────────────────┐ │ Extension Planners │ │ • ExprPlanner │ │ • TypePlanner │ │ • RelationPlanner │ └───────────────────────┘
The extension planners intercept specific parts of the SQL AST during the SqlToRel phase and allow you to customize how they are converted to DataFusion's logical plan.
DataFusion provides three planner traits for extending SQL:
| Trait | Purpose | Registration Method |
|---|---|---|
ExprPlanner | Custom expressions and operators | ctx.register_expr_planner() |
TypePlanner | Custom SQL data types | SessionStateBuilder::with_type_planner() |
RelationPlanner | Custom FROM clause elements (relations) | ctx.register_relation_planner() |
Planner Precedence: Multiple ExprPlanners and RelationPlanners can be registered; they are invoked in reverse registration order (last registered wins). Return Original(...) to delegate to the next planner. Only one TypePlanner can be active at a time.
Use ExprPlanner to customize how SQL expressions are converted to DataFusion logical expressions. This is useful for:
->, ->>, @>, ?)| Category | Methods |
|---|---|
| Operators | plan_binary_op, plan_any |
| Literals | plan_array_literal, plan_dictionary_literal, plan_struct_literal |
| Functions | plan_extract, plan_substring, plan_overlay, plan_position, plan_make_map |
| Identifiers | plan_field_access, plan_compound_identifier |
| Aggregates/Windows | plan_aggregate, plan_window |
See the ExprPlanner API documentation for full method signatures.
This example maps the -> operator to string concatenation:
# use std::sync::Arc; # use datafusion::common::DFSchema; # use datafusion::error::Result; # use datafusion::logical_expr::Operator; # use datafusion::prelude::*; # use datafusion::sql::sqlparser::ast::BinaryOperator; use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr}; # use datafusion_expr::BinaryExpr; #[derive(Debug)] struct MyCustomPlanner; impl ExprPlanner for MyCustomPlanner { fn plan_binary_op( &self, expr: RawBinaryExpr, _schema: &DFSchema, ) -> Result<PlannerResult<RawBinaryExpr>> { match &expr.op { // Map `->` to string concatenation BinaryOperator::Arrow => { Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr { left: Box::new(expr.left.clone()), right: Box::new(expr.right.clone()), op: Operator::StringConcat, }))) } _ => Ok(PlannerResult::Original(expr)), } } } #[tokio::main] async fn main() -> Result<()> { // Use postgres dialect to enable `->` operator parsing let config = SessionConfig::new() .set_str("datafusion.sql_parser.dialect", "postgres"); let mut ctx = SessionContext::new_with_config(config); // Register the custom planner ctx.register_expr_planner(Arc::new(MyCustomPlanner))?; // Now `->` works as string concatenation let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?; // Returns: "helloworld" Ok(()) }
For more details, see the ExprPlanner API documentation and the expr_planner test examples.
Use TypePlanner to map SQL data types to Arrow/DataFusion types. This is useful when you need to support SQL types that aren't natively recognized.
# use std::sync::Arc; # use arrow::datatypes::{DataType, TimeUnit}; # use datafusion::error::Result; # use datafusion::prelude::*; # use datafusion::execution::SessionStateBuilder; use datafusion_expr::planner::TypePlanner; # use sqlparser::ast; #[derive(Debug)] struct MyTypePlanner; impl TypePlanner for MyTypePlanner { fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> { match sql_type { // Map DATETIME(precision) to Arrow Timestamp ast::DataType::Datetime(precision) => { let time_unit = match precision { Some(0) => TimeUnit::Second, Some(3) => TimeUnit::Millisecond, Some(6) => TimeUnit::Microsecond, None | Some(9) => TimeUnit::Nanosecond, _ => return Ok(None), // Let default handling take over }; Ok(Some(DataType::Timestamp(time_unit, None))) } _ => Ok(None), // Return None for types we don't handle } } } #[tokio::main] async fn main() -> Result<()> { let state = SessionStateBuilder::new() .with_default_features() .with_type_planner(Arc::new(MyTypePlanner)) .build(); let ctx = SessionContext::new_with_state(state); // Now DATETIME type is recognized ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?; Ok(()) }
For more details, see the TypePlanner API documentation.
Use RelationPlanner to handle custom relations in the FROM clause. This enables you to implement SQL constructs like:
TABLESAMPLE for sampling dataPIVOT / UNPIVOT for data reshapingMATCH_RECOGNIZE for pattern matchingWhen implementing RelationPlanner, you receive a RelationPlannerContext that provides utilities for planning:
| Method | Purpose |
|---|---|
plan(relation) | Recursively plan a nested relation |
sql_to_expr(expr, schema) | Convert SQL expression to DataFusion Expr |
context_provider() | Access session configuration, tables, functions |
See the RelationPlanner API documentation for additional methods like normalize_ident() and object_name_to_table_reference().
There are two main approaches when implementing a RelationPlanner:
Rewrite to Standard SQL: Transform custom syntax into equivalent standard operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY with CASE expressions). This is the simplest approach when possible.
Custom Logical and Physical Nodes: Create a UserDefinedLogicalNode to represent the operation in the logical plan, along with a custom ExecutionPlan to execute it. Both are required for end-to-end execution.
# use std::sync::Arc; # use datafusion::error::Result; # use datafusion::prelude::*; use datafusion_expr::planner::{ PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning, }; use datafusion_sql::sqlparser::ast::TableFactor; #[derive(Debug)] struct MyRelationPlanner; impl RelationPlanner for MyRelationPlanner { fn plan_relation( &self, relation: TableFactor, ctx: &mut dyn RelationPlannerContext, ) -> Result<RelationPlanning> { match relation { // Handle your custom relation TableFactor::Pivot { table, alias, .. } => { // Plan the input table let input = ctx.plan(*table)?; // Transform or wrap the plan as needed // ... Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias))) } // Return Original for relations you don't handle other => Ok(RelationPlanning::Original(other)), } } } #[tokio::main] async fn main() -> Result<()> { let ctx = SessionContext::new(); // Register the custom planner ctx.register_relation_planner(Arc::new(MyRelationPlanner))?; Ok(()) }
The DataFusion repository includes comprehensive examples demonstrating each approach:
The table_sample.rs example shows a complete end-to-end implementation of how to support queries such as:
SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)
The pivot_unpivot.rs example demonstrates rewriting custom syntax to standard SQL for queries such as:
SELECT * FROM sales PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))
ExprPlanner for custom operators and expression handlingTypePlanner for custom SQL data typesRelationPlanner for custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.)SessionContext or SessionStateBuilderExprPlanner, TypePlanner, RelationPlanner