This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init to init test files.
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run all examples in a group cargo run --example <group> -- all # Run a specific example within a group cargo run --example <group> -- <subcommand> # Run all examples in the `dataframe` group cargo run --example dataframe -- all # Run a single example from the `dataframe` group # (apply the same pattern for any other group) cargo run --example dataframe -- dataframe
builtin_functions| Subcommand | File Path | Description |
|---|---|---|
| date_time | builtin_functions/date_time.rs | Examples of date-time related functions and queries |
| function_factory | builtin_functions/function_factory.rs | Register CREATE FUNCTION handler to implement SQL macros |
| regexp | builtin_functions/regexp.rs | Examples of using regular expression functions |
custom_data_source| Subcommand | File Path | Description |
|---|---|---|
| csv_sql_streaming | custom_data_source/csv_sql_streaming.rs | Run a streaming SQL query against CSV data |
| csv_json_opener | custom_data_source/csv_json_opener.rs | Use low-level FileOpener APIs for CSV/JSON |
| custom_datasource | custom_data_source/custom_datasource.rs | Query a custom TableProvider |
| custom_file_casts | custom_data_source/custom_file_casts.rs | Implement custom casting rules |
| custom_file_format | custom_data_source/custom_file_format.rs | Write to a custom file format |
| default_column_values | custom_data_source/default_column_values.rs | Custom default values using metadata |
| file_stream_provider | custom_data_source/file_stream_provider.rs | Read/write via FileStreamProvider for streams |
data_io| Subcommand | File Path | Description |
|---|---|---|
| catalog | data_io/catalog.rs | Register tables into a custom catalog |
| json_shredding | data_io/json_shredding.rs | Implement filter rewriting for JSON shredding |
| parquet_adv_idx | data_io/parquet_advanced_index.rs | Create a secondary index across multiple parquet files |
| parquet_emb_idx | data_io/parquet_embedded_index.rs | Store a custom index inside Parquet files |
| parquet_enc | data_io/parquet_encrypted.rs | Read & write encrypted Parquet files |
| parquet_enc_with_kms | data_io/parquet_encrypted_with_kms.rs | Encrypted Parquet I/O using a KMS-backed factory |
| parquet_exec_visitor | data_io/parquet_exec_visitor.rs | Extract statistics by visiting an ExecutionPlan |
| parquet_idx | data_io/parquet_index.rs | Create a secondary index |
| query_http_csv | data_io/query_http_csv.rs | Query CSV files via HTTP |
| remote_catalog | data_io/remote_catalog.rs | Interact with a remote catalog |
dataframe| Subcommand | File Path | Description |
|---|---|---|
| cache_factory | dataframe/cache_factory.rs | Custom lazy caching for DataFrames using CacheFactory |
| dataframe | dataframe/dataframe.rs | Query DataFrames from various sources and write output |
| deserialize_to_struct | dataframe/deserialize_to_struct.rs | Convert Arrow arrays into Rust structs |
execution_monitoring| Subcommand | File Path | Description |
|---|---|---|
| mem_pool_exec_plan | execution_monitoring/memory_pool_execution_plan.rs | Memory-aware ExecutionPlan with spilling |
| mem_pool_tracking | execution_monitoring/memory_pool_tracking.rs | Demonstrates memory tracking |
| tracing | execution_monitoring/tracing.rs | Demonstrates tracing integration |
external_dependency| Subcommand | File Path | Description |
|---|---|---|
| dataframe_to_s3 | external_dependency/dataframe_to_s3.rs | Query DataFrames and write results to S3 |
| query_aws_s3 | external_dependency/query_aws_s3.rs | Query S3-backed data using object_store |
flight| Subcommand | File Path | Description |
|---|---|---|
| server | flight/server.rs | Run DataFusion server accepting FlightSQL/JDBC queries |
| client | flight/client.rs | Execute SQL queries via Arrow Flight protocol |
| sql_server | flight/sql_server.rs | Standalone SQL server for JDBC clients |
proto| Subcommand | File Path | Description |
|---|---|---|
| composed_extension_codec | proto/composed_extension_codec.rs | Use multiple extension codecs for serialization/deserialization |
query_planning| Subcommand | File Path | Description |
|---|---|---|
| analyzer_rule | query_planning/analyzer_rule.rs | Custom AnalyzerRule to change query semantics |
| expr_api | query_planning/expr_api.rs | Create, execute, analyze, and coerce Exprs |
| optimizer_rule | query_planning/optimizer_rule.rs | Replace predicates via a custom OptimizerRule |
| parse_sql_expr | query_planning/parse_sql_expr.rs | Parse SQL into DataFusion Expr |
| plan_to_sql | query_planning/plan_to_sql.rs | Generate SQL from expressions or plans |
| planner_api | query_planning/planner_api.rs | APIs for logical and physical plan manipulation |
| pruning | query_planning/pruning.rs | Use pruning to skip irrelevant files |
| thread_pools | query_planning/thread_pools.rs | Configure custom thread pools for DataFusion execution |
relation_planner| Subcommand | File Path | Description |
|---|---|---|
| match_recognize | relation_planner/match_recognize.rs | Implement MATCH_RECOGNIZE pattern matching |
| pivot_unpivot | relation_planner/pivot_unpivot.rs | Implement PIVOT / UNPIVOT |
| table_sample | relation_planner/table_sample.rs | Implement TABLESAMPLE |
sql_ops| Subcommand | File Path | Description |
|---|---|---|
| analysis | sql_ops/analysis.rs | Analyze SQL queries |
| custom_sql_parser | sql_ops/custom_sql_parser.rs | Implement a custom SQL parser to extend DataFusion |
| frontend | sql_ops/frontend.rs | Build LogicalPlans from SQL |
| query | sql_ops/query.rs | Query data using SQL |
udf| Subcommand | File Path | Description |
|---|---|---|
| adv_udaf | udf/advanced_udaf.rs | Advanced User Defined Aggregate Function (UDAF) |
| adv_udf | udf/advanced_udf.rs | Advanced User Defined Scalar Function (UDF) |
| adv_udwf | udf/advanced_udwf.rs | Advanced User Defined Window Function (UDWF) |
| async_udf | udf/async_udf.rs | Asynchronous User Defined Scalar Function |
| udaf | udf/simple_udaf.rs | Simple UDAF example |
| udf | udf/simple_udf.rs | Simple UDF example |
| udtf | udf/simple_udtf.rs | Simple UDTF example |
| udwf | udf/simple_udwf.rs | Simple UDWF example |