This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init to init test files.
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run the `dataframe` example: # ... use the equivalent for other examples cargo run --example dataframe
examples/udf/advanced_udaf.rs: Define and invoke a more complicated User Defined Aggregate Function (UDAF)examples/udf/advanced_udf.rs: Define and invoke a more complicated User Defined Scalar Function (UDF)examples/udf/advanced_udwf.rs: Define and invoke a more complicated User Defined Window Function (UDWF)advanced_parquet_index.rs: Creates a detailed secondary index that covers the contents of several parquet filesexamples/udf/async_udf.rs: Define and invoke an asynchronous User Defined Scalar Function (UDF)analyzer_rule.rs: Use a custom AnalyzerRule to change a query's semantics (row level access control)catalog.rs: Register the table into a custom catalogcomposed_extension_codec: Example of using multiple extension codecs for serialization / deserializationcsv_sql_streaming.rs: Build and run a streaming query plan from a SQL statement against a local CSV filecsv_json_opener.rs: Use low level FileOpener APIs to read CSV/JSON into Arrow RecordBatchescustom_datasource.rs: Run queries against a custom datasource (TableProvider)custom_file_casts.rs: Implement custom casting rules to adapt file schemascustom_file_format.rs: Write data to a custom file formatdataframe-to-s3.rs: Run a query using a DataFrame against a parquet file from s3 and writing back to s3dataframe.rs: Run a query using a DataFrame API against parquet files, csv files, and in-memory data, including multiple subqueries. Also demonstrates the various methods to write out a DataFrame to a table, parquet file, csv file, and json file.default_column_values.rs: Implement custom default value handling for missing columns using field metadata and PhysicalExprAdapterdeserialize_to_struct.rs: Convert query results (Arrow ArrayRefs) into Rust structsexpr_api.rs: Create, execute, simplify, analyze and coerce Exprsfile_stream_provider.rs: Run a query on FileStreamProvider which implements StreamProvider for reading and writing to arbitrary stream sources / sinks.flight/sql_server.rs: Run DataFusion as a standalone process and execute SQL queries from Flight and and FlightSQL (e.g. JDBC) clientsfunction_factory.rs: Register CREATE FUNCTION handler to implement SQL macrosmemory_pool_tracking.rs: Demonstrates TrackConsumersPool for memory tracking and debugging with enhanced error messagesmemory_pool_execution_plan.rs: Shows how to implement memory-aware ExecutionPlan with memory reservation and spillingoptimizer_rule.rs: Use a custom OptimizerRule to replace certain predicatesparquet_embedded_index.rs: Store a custom index inside a Parquet file and use it to speed up queriesparquet_encrypted.rs: Read and write encrypted Parquet files using DataFusionparquet_encrypted_with_kms.rs: Read and write encrypted Parquet files using an encryption factoryparquet_index.rs: Create an secondary index over several parquet files and use it to speed up queriesparquet_exec_visitor.rs: Extract statistics by visiting an ExecutionPlan after executionparse_sql_expr.rs: Parse SQL text into DataFusion Expr.plan_to_sql.rs: Generate SQL from DataFusion Expr and LogicalPlanplanner_api.rs APIs to manipulate logical and physical planspruning.rs: Use pruning to rule out files based on statisticsquery-aws-s3.rs: Configure object_store and run a query against files stored in AWS S3query-http-csv.rs: Configure object_store and run a query against files vi HTTPregexp.rs: Examples of using regular expression functionsremote_catalog.rs: Examples of interfacing with a remote catalog (e.g. over a network)examples/udf/simple_udaf.rs: Define and invoke a User Defined Aggregate Function (UDAF)examples/udf/simple_udf.rs: Define and invoke a User Defined Scalar Function (UDF)examples/udf/simple_udtf.rs: Define and invoke a User Defined Table Function (UDTF)examples/udf/simple_udfw.rs: Define and invoke a User Defined Window Function (UDWF)sql_analysis.rs: Analyse SQL queries with DataFusion structuressql_frontend.rs: Create LogicalPlans (only) from sql stringssql_dialect.rs: Example of implementing a custom SQL dialect on top of DFParsersql_query.rs: Query data using SQL (in memory RecordBatches, local Parquet files)date_time_function.rs: Examples of date-time related functions and queries.examples/flight/client.rs and examples/flight/server.rs: Run DataFusion as a standalone process and execute SQL queries from a client using the Arrow Flight protocol.