tree: 7bdb11ccb030225086ae55f3bbb09dd9bd006fc2 [path history] [tgz]
  1. data/
  2. examples/
  3. src/
  4. Cargo.toml
  5. README.md
datafusion-examples/README.md

DataFusion Examples

This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.

Prerequisites

Run git submodule update --init to init test files.

Running Examples

To run an example, use the cargo run command, such as:

git clone https://github.com/apache/datafusion
cd datafusion
# Download test data
git submodule update --init

# Change to the examples directory
cd datafusion-examples/examples

# Run all examples in a group
cargo run --example <group> -- all

# Run a specific example within a group
cargo run --example <group> -- <subcommand>

# Run all examples in the `dataframe` group
cargo run --example dataframe -- all

# Run a single example from the `dataframe` group
# (apply the same pattern for any other group)
cargo run --example dataframe -- dataframe

Builtin Functions Examples

Group: builtin_functions

Category: Single Process

SubcommandFile PathDescription
date_timebuiltin_functions/date_time.rsExamples of date-time related functions and queries
function_factorybuiltin_functions/function_factory.rsRegister CREATE FUNCTION handler to implement SQL macros
regexpbuiltin_functions/regexp.rsExamples of using regular expression functions

Custom Data Source Examples

Group: custom_data_source

Category: Single Process

SubcommandFile PathDescription
csv_sql_streamingcustom_data_source/csv_sql_streaming.rsRun a streaming SQL query against CSV data
csv_json_openercustom_data_source/csv_json_opener.rsUse low-level FileOpener APIs for CSV/JSON
custom_datasourcecustom_data_source/custom_datasource.rsQuery a custom TableProvider
custom_file_castscustom_data_source/custom_file_casts.rsImplement custom casting rules
custom_file_formatcustom_data_source/custom_file_format.rsWrite to a custom file format
default_column_valuescustom_data_source/default_column_values.rsCustom default values using metadata
file_stream_providercustom_data_source/file_stream_provider.rsRead/write via FileStreamProvider for streams

Data IO Examples

Group: data_io

Category: Single Process

SubcommandFile PathDescription
catalogdata_io/catalog.rsRegister tables into a custom catalog
json_shreddingdata_io/json_shredding.rsImplement filter rewriting for JSON shredding
parquet_adv_idxdata_io/parquet_advanced_index.rsCreate a secondary index across multiple parquet files
parquet_emb_idxdata_io/parquet_embedded_index.rsStore a custom index inside Parquet files
parquet_encdata_io/parquet_encrypted.rsRead & write encrypted Parquet files
parquet_enc_with_kmsdata_io/parquet_encrypted_with_kms.rsEncrypted Parquet I/O using a KMS-backed factory
parquet_exec_visitordata_io/parquet_exec_visitor.rsExtract statistics by visiting an ExecutionPlan
parquet_idxdata_io/parquet_index.rsCreate a secondary index
query_http_csvdata_io/query_http_csv.rsQuery CSV files via HTTP
remote_catalogdata_io/remote_catalog.rsInteract with a remote catalog

DataFrame Examples

Group: dataframe

Category: Single Process

SubcommandFile PathDescription
cache_factorydataframe/cache_factory.rsCustom lazy caching for DataFrames using CacheFactory
dataframedataframe/dataframe.rsQuery DataFrames from various sources and write output
deserialize_to_structdataframe/deserialize_to_struct.rsConvert Arrow arrays into Rust structs

Execution Monitoring Examples

Group: execution_monitoring

Category: Single Process

SubcommandFile PathDescription
mem_pool_exec_planexecution_monitoring/memory_pool_execution_plan.rsMemory-aware ExecutionPlan with spilling
mem_pool_trackingexecution_monitoring/memory_pool_tracking.rsDemonstrates memory tracking
tracingexecution_monitoring/tracing.rsDemonstrates tracing integration

External Dependency Examples

Group: external_dependency

Category: Single Process

SubcommandFile PathDescription
dataframe_to_s3external_dependency/dataframe_to_s3.rsQuery DataFrames and write results to S3
query_aws_s3external_dependency/query_aws_s3.rsQuery S3-backed data using object_store

Flight Examples

Group: flight

Category: Distributed

SubcommandFile PathDescription
serverflight/server.rsRun DataFusion server accepting FlightSQL/JDBC queries
clientflight/client.rsExecute SQL queries via Arrow Flight protocol
sql_serverflight/sql_server.rsStandalone SQL server for JDBC clients

Proto Examples

Group: proto

Category: Single Process

SubcommandFile PathDescription
composed_extension_codecproto/composed_extension_codec.rsUse multiple extension codecs for serialization/deserialization

Query Planning Examples

Group: query_planning

Category: Single Process

SubcommandFile PathDescription
analyzer_rulequery_planning/analyzer_rule.rsCustom AnalyzerRule to change query semantics
expr_apiquery_planning/expr_api.rsCreate, execute, analyze, and coerce Exprs
optimizer_rulequery_planning/optimizer_rule.rsReplace predicates via a custom OptimizerRule
parse_sql_exprquery_planning/parse_sql_expr.rsParse SQL into DataFusion Expr
plan_to_sqlquery_planning/plan_to_sql.rsGenerate SQL from expressions or plans
planner_apiquery_planning/planner_api.rsAPIs for logical and physical plan manipulation
pruningquery_planning/pruning.rsUse pruning to skip irrelevant files
thread_poolsquery_planning/thread_pools.rsConfigure custom thread pools for DataFusion execution

Relation Planner Examples

Group: relation_planner

Category: Single Process

SubcommandFile PathDescription
match_recognizerelation_planner/match_recognize.rsImplement MATCH_RECOGNIZE pattern matching
pivot_unpivotrelation_planner/pivot_unpivot.rsImplement PIVOT / UNPIVOT
table_samplerelation_planner/table_sample.rsImplement TABLESAMPLE

SQL Ops Examples

Group: sql_ops

Category: Single Process

SubcommandFile PathDescription
analysissql_ops/analysis.rsAnalyze SQL queries
custom_sql_parsersql_ops/custom_sql_parser.rsImplement a custom SQL parser to extend DataFusion
frontendsql_ops/frontend.rsBuild LogicalPlans from SQL
querysql_ops/query.rsQuery data using SQL

UDF Examples

Group: udf

Category: Single Process

SubcommandFile PathDescription
adv_udafudf/advanced_udaf.rsAdvanced User Defined Aggregate Function (UDAF)
adv_udfudf/advanced_udf.rsAdvanced User Defined Scalar Function (UDF)
adv_udwfudf/advanced_udwf.rsAdvanced User Defined Window Function (UDWF)
async_udfudf/async_udf.rsAsynchronous User Defined Scalar Function
udafudf/simple_udaf.rsSimple UDAF example
udfudf/simple_udf.rsSimple UDF example
udtfudf/simple_udtf.rsSimple UDTF example
udwfudf/simple_udwf.rsSimple UDWF example