This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init to init test files.
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run the `dataframe` example: # ... use the equivalent for other examples cargo run --example dataframe
examples/udf/advanced_udaf.rs: Define and invoke a more complicated User Defined Aggregate Function (UDAF)examples/udf/advanced_udf.rs: Define and invoke a more complicated User Defined Scalar Function (UDF)examples/udf/advanced_udwf.rs: Define and invoke a more complicated User Defined Window Function (UDWF)examples/data_io/parquet_advanced_index.rs: Creates a detailed secondary index that covers the contents of several parquet filesexamples/udf/async_udf.rs: Define and invoke an asynchronous User Defined Scalar Function (UDF)examples/query_planning/analyzer_rule.rs: Use a custom AnalyzerRule to change a query's semantics (row level access control)examples/data_io/catalog.rs: Register the table into a custom catalogexamples/data_io/json_shredding.rs: Shows how to implement custom filter rewriting for JSON shreddingexamples/proto/composed_extension_codec: Example of using multiple extension codecs for serialization / deserializationexamples/custom_data_source/csv_sql_streaming.rs: Build and run a streaming query plan from a SQL statement against a local CSV fileexamples/custom_data_source/csv_json_opener.rs: Use low level FileOpener APIs to read CSV/JSON into Arrow RecordBatchesexamples/custom_data_source/custom_datasource.rs: Run queries against a custom datasource (TableProvider)examples/custom_data_source/custom_file_casts.rs: Implement custom casting rules to adapt file schemasexamples/custom_data_source/custom_file_format.rs: Write data to a custom file formatdataframe-to-s3.rs: Run a query using a DataFrame against a parquet file from s3 and writing back to s3dataframe.rs: Run a query using a DataFrame API against parquet files, csv files, and in-memory data, including multiple subqueries. Also demonstrates the various methods to write out a DataFrame to a table, parquet file, csv file, and json file.examples/builtin_functions/date_time: Examples of date-time related functions and queriesdefault_column_values.rs: Implement custom default value handling for missing columns using field metadata and PhysicalExprAdapterdeserialize_to_struct.rs: Convert query results (Arrow ArrayRefs) into Rust structsexamples/query_planning/expr_api.rs: Create, execute, simplify, analyze and coerce Exprsexamples/custom_data_source/file_stream_provider.rs: Run a query on FileStreamProvider which implements StreamProvider for reading and writing to arbitrary stream sources / sinks.flight/sql_server.rs: Run DataFusion as a standalone process and execute SQL queries from Flight and and FlightSQL (e.g. JDBC) clientsexamples/builtin_functions/function_factory.rs: Register CREATE FUNCTION handler to implement SQL macrosexamples/execution_monitoring/memory_pool_tracking.rs: Demonstrates TrackConsumersPool for memory tracking and debugging with enhanced error messagesexamples/execution_monitoring/memory_pool_execution_plan.rs: Shows how to implement memory-aware ExecutionPlan with memory reservation and spillingexamples/execution_monitoring/tracing.rs: Demonstrates the tracing injection feature for the DataFusion runtimeexamples/query_planning/optimizer_rule.rs: Use a custom OptimizerRule to replace certain predicatesexamples/data_io/parquet_embedded_index.rs: Store a custom index inside a Parquet file and use it to speed up queriesexamples/data_io/parquet_encrypted.rs: Read and write encrypted Parquet files using DataFusionexamples/data_io/parquet_encrypted_with_kms.rs: Read and write encrypted Parquet files using an encryption factoryexamples/data_io/parquet_index.rs: Create an secondary index over several parquet files and use it to speed up queriesexamples/data_io/parquet_exec_visitor.rs: Extract statistics by visiting an ExecutionPlan after executionexamples/query_planning/parse_sql_expr.rs: Parse SQL text into DataFusion Expr.examples/query_planning/plan_to_sql.rs: Generate SQL from DataFusion Expr and LogicalPlanexamples/query_planning/planner_api.rs APIs to manipulate logical and physical plansexamples/query_planning/pruning.rs: Use pruning to rule out files based on statisticsexamples/query_planning/thread_pools.rs: Demonstrates TrackConsumersPool for memory tracking and debugging with enhanced error messages and shows how to implement memory-aware ExecutionPlan with memory reservation and spillingquery-aws-s3.rs: Configure object_store and run a query against files stored in AWS S3examples/data_io/query_http_csv.rs: Configure object_store and run a query against files via HTTPexamples/builtin_functions/regexp.rs: Examples of using regular expression functionsexamples/data_io/remote_catalog.rs: Examples of interfacing with a remote catalog (e.g. over a network)examples/udf/simple_udaf.rs: Define and invoke a User Defined Aggregate Function (UDAF)examples/udf/simple_udf.rs: Define and invoke a User Defined Scalar Function (UDF)examples/udf/simple_udtf.rs: Define and invoke a User Defined Table Function (UDTF)examples/udf/simple_udfw.rs: Define and invoke a User Defined Window Function (UDWF)examples/sql_ops/analysis.rs: Analyse SQL queries with DataFusion structuresexamples/sql_ops/frontend.rs: Create LogicalPlans (only) from sql stringsexamples/sql_ops/dialect.rs: Example of implementing a custom SQL dialect on top of DFParserexamples/sql_ops/query.rs: Query data using SQL (in memory RecordBatches, local Parquet files)examples/flight/client.rs and examples/flight/server.rs: Run DataFusion as a standalone process and execute SQL queries from a client using the Arrow Flight protocol.