Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.
Early development: the API will change between releases. Bug reports and contributions welcome.
import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.apache.datafusion.DataFrame; import org.apache.datafusion.SessionContext; try (var allocator = new RootAllocator(); var ctx = new SessionContext()) { ctx.registerParquet("orders", "/path/to/orders.parquet"); try (DataFrame df = ctx.sql( "SELECT o_orderpriority, COUNT(*) AS n " + "FROM orders GROUP BY o_orderpriority"); ArrowReader reader = df.collect(allocator)) { while (reader.loadNextBatch()) { var batch = reader.getVectorSchemaRoot(); // ... } } }
See the User Guide for installation, the DataFrame and SQL APIs, and Parquet ingestion. See the Contributor Guide for build, test, and release workflows.
:maxdepth: 1 :caption: Links :hidden: GitHub Repository <https://github.com/apache/datafusion-java> Issue Tracker <https://github.com/apache/datafusion-java/issues> Apache DataFusion <https://datafusion.apache.org/> Code of Conduct <https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md>
:maxdepth: 2 :caption: Documentation :hidden: User Guide <user-guide/index> Contributor Guide <contributor-guide/index>