This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init to init test files.
To run an example, use the cargo run command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run the `dataframe` example: # ... use the equivalent for other examples cargo run --example dataframe
advanced_udaf.rs: Define and invoke a more complicated User Defined Aggregate Function (UDAF)advanced_udf.rs: Define and invoke a more complicated User Defined Scalar Function (UDF)advanced_udwf.rs: Define and invoke a more complicated User Defined Window Function (UDWF)advanced_parquet_index.rs: Creates a detailed secondary index that covers the contents of several parquet filesanalyzer_rule.rs: Use a custom AnalyzerRule to change a query's semantics (row level access control)catalog.rs: Register the table into a custom catalogcomposed_extension_codec: Example of using multiple extension codecs for serialization / deserializationcsv_sql_streaming.rs: Build and run a streaming query plan from a SQL statement against a local CSV filecustom_datasource.rs: Run queries against a custom datasource (TableProvider)custom_file_format.rs: Write data to a custom file formatdataframe-to-s3.rs: Run a query using a DataFrame against a parquet file from s3 and writing back to s3dataframe.rs: Run a query using a DataFrame against a local parquet filedataframe_in_memory.rs: Run a query using a DataFrame against data in memorydataframe_output.rs: Examples of methods which write data out from a DataFramedeserialize_to_struct.rs: Convert query results into rust structs using serdeexpr_api.rs: Create, execute, simplify and analyze Exprsfile_stream_provider.rs: Run a query on FileStreamProvider which implements StreamProvider for reading and writing to arbitrary stream sources / sinks.flight_sql_server.rs: Run DataFusion as a standalone process and execute SQL queries from JDBC clientsfunction_factory.rs: Register CREATE FUNCTION handler to implement SQL macrosmake_date.rs: Examples of using the make_date functionmemtable.rs: Create an query data in memory using SQL and RecordBatchesoptimizer_rule.rs: Use a custom OptimizerRule to replace certain predicatesparquet_index.rs: Create an secondary index over several parquet files and use it to speed up queriesparquet_sql_multiple_files.rs: Build and run a query plan from a SQL statement against multiple local Parquet filesparquet_exec_visitor.rs: Extract statistics by visiting an ExecutionPlan after executionparse_sql_expr.rs: Parse SQL text into DataFusion Expr.plan_to_sql.rs: Generate SQL from DataFusion Expr and LogicalPlanpruning.rs: Use pruning to rule out files based on statisticsquery-aws-s3.rs: Configure object_store and run a query against files stored in AWS S3query-http-csv.rs: Configure object_store and run a query against files vi HTTPregexp.rs: Examples of using regular expression functionssimple_udaf.rs: Define and invoke a User Defined Aggregate Function (UDAF)simple_udf.rs: Define and invoke a User Defined Scalar Function (UDF)simple_udfw.rs: Define and invoke a User Defined Window Function (UDWF)sql_analysis.rs: Analyse SQL queries with DataFusion structuressql_frontend.rs: Create LogicalPlans (only) from sql stringssql_dialect.rs: Example of implementing a custom SQL dialect on top of DFParserto_char.rs: Examples of using the to_char functionto_timestamp.rs: Examples of using to_timestamp functionsflight_client.rs and flight_server.rs: Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol.