Extending SQL Syntax

DataFusion provides a flexible extension system that allows you to customize SQL parsing and planning without modifying the core codebase. This is useful when you need to:

  • Support custom operators from other SQL dialects (e.g., PostgreSQL's -> for JSON)
  • Add custom data types not natively supported
  • Implement SQL constructs like TABLESAMPLE, PIVOT/UNPIVOT, or MATCH_RECOGNIZE

You can read more about this topic in the Extending SQL in DataFusion: from ->> to TABLESAMPLE blog.

Architecture Overview

When DataFusion processes a SQL query, it goes through these stages:

┌─────────────┐    ┌─────────┐    ┌──────────────────────┐    ┌─────────────┐
 SQL String  │───▶│ Parser  │───▶│      SqlToRel        │───▶│ LogicalPlan 
└─────────────┘    └─────────┘     (SQL to LogicalPlan)     └─────────────┘
                                  └──────────────────────┘
                                              
                                               uses
                                              
                                  ┌───────────────────────┐
                                    Extension Planners   
                                     ExprPlanner        
                                     TypePlanner        
                                     RelationPlanner    
                                  └───────────────────────┘

The extension planners intercept specific parts of the SQL AST during the SqlToRel phase and allow you to customize how they are converted to DataFusion's logical plan.

Extension Points

DataFusion provides three planner traits for extending SQL:

TraitPurposeRegistration Method
ExprPlannerCustom expressions and operatorsctx.register_expr_planner()
TypePlannerCustom SQL data typesSessionStateBuilder::with_type_planner()
RelationPlannerCustom FROM clause elements (relations)ctx.register_relation_planner()

Planner Precedence: Multiple ExprPlanners and RelationPlanners can be registered; they are invoked in reverse registration order (last registered wins). Return Original(...) to delegate to the next planner. Only one TypePlanner can be active at a time.

ExprPlanner: Custom Expressions and Operators

Use ExprPlanner to customize how SQL expressions are converted to DataFusion logical expressions. This is useful for:

  • Custom binary operators (e.g., ->, ->>, @>, ?)
  • Custom field access patterns
  • Custom aggregate or window function handling

Available Methods

CategoryMethods
Operatorsplan_binary_op, plan_any
Literalsplan_array_literal, plan_dictionary_literal, plan_struct_literal
Functionsplan_extract, plan_substring, plan_overlay, plan_position, plan_make_map
Identifiersplan_field_access, plan_compound_identifier
Aggregates/Windowsplan_aggregate, plan_window

See the ExprPlanner API documentation for full method signatures.

Example: Custom Arrow Operator

This example maps the -> operator to string concatenation:

# use std::sync::Arc;
# use datafusion::common::DFSchema;
# use datafusion::error::Result;
# use datafusion::logical_expr::Operator;
# use datafusion::prelude::*;
# use datafusion::sql::sqlparser::ast::BinaryOperator;
use datafusion_expr::planner::{ExprPlanner, PlannerResult, RawBinaryExpr};
# use datafusion_expr::BinaryExpr;

#[derive(Debug)]
struct MyCustomPlanner;

impl ExprPlanner for MyCustomPlanner {
    fn plan_binary_op(
        &self,
        expr: RawBinaryExpr,
        _schema: &DFSchema,
    ) -> Result<PlannerResult<RawBinaryExpr>> {
        match &expr.op {
            // Map `->` to string concatenation
            BinaryOperator::Arrow => {
                Ok(PlannerResult::Planned(Expr::BinaryExpr(BinaryExpr {
                    left: Box::new(expr.left.clone()),
                    right: Box::new(expr.right.clone()),
                    op: Operator::StringConcat,
                })))
            }
            _ => Ok(PlannerResult::Original(expr)),
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    // Use postgres dialect to enable `->` operator parsing
    let config = SessionConfig::new()
        .set_str("datafusion.sql_parser.dialect", "postgres");
    let mut ctx = SessionContext::new_with_config(config);

    // Register the custom planner
    ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;

    // Now `->` works as string concatenation
    let results = ctx.sql("SELECT 'hello'->'world'").await?.collect().await?;
    // Returns: "helloworld"
    Ok(())
}

For more details, see the ExprPlanner API documentation and the expr_planner test examples.

TypePlanner: Custom Data Types

Use TypePlanner to map SQL data types to Arrow/DataFusion types. This is useful when you need to support SQL types that aren't natively recognized.

Example: Custom DATETIME Type

# use std::sync::Arc;
# use arrow::datatypes::{DataType, TimeUnit};
# use datafusion::error::Result;
# use datafusion::prelude::*;
# use datafusion::execution::SessionStateBuilder;
use datafusion_expr::planner::TypePlanner;
# use sqlparser::ast;

#[derive(Debug)]
struct MyTypePlanner;

impl TypePlanner for MyTypePlanner {
    fn plan_type(&self, sql_type: &ast::DataType) -> Result<Option<DataType>> {
        match sql_type {
            // Map DATETIME(precision) to Arrow Timestamp
            ast::DataType::Datetime(precision) => {
                let time_unit = match precision {
                    Some(0) => TimeUnit::Second,
                    Some(3) => TimeUnit::Millisecond,
                    Some(6) => TimeUnit::Microsecond,
                    None | Some(9) => TimeUnit::Nanosecond,
                    _ => return Ok(None), // Let default handling take over
                };
                Ok(Some(DataType::Timestamp(time_unit, None)))
            }
            _ => Ok(None), // Return None for types we don't handle
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let state = SessionStateBuilder::new()
        .with_default_features()
        .with_type_planner(Arc::new(MyTypePlanner))
        .build();

    let ctx = SessionContext::new_with_state(state);

    // Now DATETIME type is recognized
    ctx.sql("CREATE TABLE events (ts DATETIME(3))").await?;
    Ok(())
}

For more details, see the TypePlanner API documentation.

RelationPlanner: Custom FROM Clause Elements

Use RelationPlanner to handle custom relations in the FROM clause. This enables you to implement SQL constructs like:

  • TABLESAMPLE for sampling data
  • PIVOT / UNPIVOT for data reshaping
  • MATCH_RECOGNIZE for pattern matching
  • Any custom relation syntax parsed by sqlparser

The RelationPlannerContext

When implementing RelationPlanner, you receive a RelationPlannerContext that provides utilities for planning:

MethodPurpose
plan(relation)Recursively plan a nested relation
sql_to_expr(expr, schema)Convert SQL expression to DataFusion Expr
context_provider()Access session configuration, tables, functions

See the RelationPlanner API documentation for additional methods like normalize_ident() and object_name_to_table_reference().

Implementation Strategies

There are two main approaches when implementing a RelationPlanner:

  1. Rewrite to Standard SQL: Transform custom syntax into equivalent standard operations that DataFusion already knows how to execute (e.g., PIVOT → GROUP BY with CASE expressions). This is the simplest approach when possible.

  2. Custom Logical and Physical Nodes: Create a UserDefinedLogicalNode to represent the operation in the logical plan, along with a custom ExecutionPlan to execute it. Both are required for end-to-end execution.

Example: Basic RelationPlanner Structure

# use std::sync::Arc;
# use datafusion::error::Result;
# use datafusion::prelude::*;
use datafusion_expr::planner::{
    PlannedRelation, RelationPlanner, RelationPlannerContext, RelationPlanning,
};
use datafusion_sql::sqlparser::ast::TableFactor;

#[derive(Debug)]
struct MyRelationPlanner;

impl RelationPlanner for MyRelationPlanner {
    fn plan_relation(
        &self,
        relation: TableFactor,
        ctx: &mut dyn RelationPlannerContext,
    ) -> Result<RelationPlanning> {
        match relation {
            // Handle your custom relation
            TableFactor::Pivot { table, alias, .. } => {
                // Plan the input table
                let input = ctx.plan(*table)?;

                // Transform or wrap the plan as needed
                // ...

                Ok(RelationPlanning::Planned(PlannedRelation::new(input, alias)))
            }

            // Return Original for relations you don't handle
            other => Ok(RelationPlanning::Original(other)),
        }
    }
}

#[tokio::main]
async fn main() -> Result<()> {
    let ctx = SessionContext::new();

    // Register the custom planner
    ctx.register_relation_planner(Arc::new(MyRelationPlanner))?;

    Ok(())
}

Complete Examples

The DataFusion repository includes comprehensive examples demonstrating each approach:

TABLESAMPLE (Custom Logical and Physical Nodes)

The table_sample.rs example shows a complete end-to-end implementation of how to support queries such as:

SELECT * FROM table TABLESAMPLE BERNOULLI(10 PERCENT) REPEATABLE(42)

PIVOT/UNPIVOT (Rewrite Strategy)

The pivot_unpivot.rs example demonstrates rewriting custom syntax to standard SQL for queries such as:

SELECT * FROM sales
  PIVOT (SUM(amount) FOR quarter IN ('Q1', 'Q2', 'Q3', 'Q4'))

Recap

  1. Use ExprPlanner for custom operators and expression handling
  2. Use [TypePlanner for custom SQL data types
  3. Use RelationPlanner for custom FROM clause syntax (TABLESAMPLE, PIVOT, etc.)
  4. Register planners via SessionContext or SessionStateBuilder

See Also