This crate includes end to end, highly commented examples of how to use various DataFusion APIs to help you get started.
Run git submodule update --init
to init test files.
To run an example, use the cargo run
command, such as:
git clone https://github.com/apache/datafusion cd datafusion # Download test data git submodule update --init # Change to the examples directory cd datafusion-examples/examples # Run the `dataframe` example: # ... use the equivalent for other examples cargo run --example dataframe
advanced_udaf.rs
: Define and invoke a more complicated User Defined Aggregate Function (UDAF)advanced_udf.rs
: Define and invoke a more complicated User Defined Scalar Function (UDF)advanced_udwf.rs
: Define and invoke a more complicated User Defined Window Function (UDWF)advanced_parquet_index.rs
: Creates a detailed secondary index that covers the contents of several parquet filesanalyzer_rule.rs
: Use a custom AnalyzerRule to change a query's semantics (row level access control)catalog.rs
: Register the table into a custom catalogcomposed_extension_codec
: Example of using multiple extension codecs for serialization / deserializationcsv_sql_streaming.rs
: Build and run a streaming query plan from a SQL statement against a local CSV filecustom_datasource.rs
: Run queries against a custom datasource (TableProvider)custom_file_format.rs
: Write data to a custom file formatdataframe-to-s3.rs
: Run a query using a DataFrame against a parquet file from s3 and writing back to s3dataframe.rs
: Run a query using a DataFrame against a local parquet filedataframe_in_memory.rs
: Run a query using a DataFrame against data in memorydataframe_output.rs
: Examples of methods which write data out from a DataFramedeserialize_to_struct.rs
: Convert query results into rust structs using serdeexpr_api.rs
: Create, execute, simplify and analyze Expr
sfile_stream_provider.rs
: Run a query on FileStreamProvider
which implements StreamProvider
for reading and writing to arbitrary stream sources / sinks.flight_sql_server.rs
: Run DataFusion as a standalone process and execute SQL queries from JDBC clientsfunction_factory.rs
: Register CREATE FUNCTION
handler to implement SQL macrosmake_date.rs
: Examples of using the make_date functionmemtable.rs
: Create an query data in memory using SQL and RecordBatch
esoptimizer_rule.rs
: Use a custom OptimizerRule to replace certain predicatesparquet_index.rs
: Create an secondary index over several parquet files and use it to speed up queriesparquet_sql_multiple_files.rs
: Build and run a query plan from a SQL statement against multiple local Parquet filesparquet_exec_visitor.rs
: Extract statistics by visiting an ExecutionPlan after executionparse_sql_expr.rs
: Parse SQL text into DataFusion Expr
.plan_to_sql.rs
: Generate SQL from DataFusion Expr
and LogicalPlan
pruning.rs
: Use pruning to rule out files based on statisticsquery-aws-s3.rs
: Configure object_store
and run a query against files stored in AWS S3query-http-csv.rs
: Configure object_store
and run a query against files vi HTTPregexp.rs
: Examples of using regular expression functionssimple_udaf.rs
: Define and invoke a User Defined Aggregate Function (UDAF)simple_udf.rs
: Define and invoke a User Defined Scalar Function (UDF)simple_udfw.rs
: Define and invoke a User Defined Window Function (UDWF)sql_analysis.rs
: Analyse SQL queries with DataFusion structuressql_frontend.rs
: Create LogicalPlans (only) from sql stringssql_dialect.rs
: Example of implementing a custom SQL dialect on top of DFParser
to_char.rs
: Examples of using the to_char functionto_timestamp.rs
: Examples of using to_timestamp functionsflight_client.rs
and flight_server.rs
: Run DataFusion as a standalone process and execute SQL queries from a client using the Flight protocol.