Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.
Early development: the API will change between releases. Bug reports and contributions welcome.
Released to Maven Central. The JAR bundles the native library for Linux and macOS on x86_64 and aarch64. Windows users need to build from source.
Maven:
<dependency> <groupId>org.apache.datafusion</groupId> <artifactId>datafusion-java</artifactId> <version>0.1.0</version> </dependency>
Gradle:
implementation("org.apache.datafusion:datafusion-java:0.1.0")
Arrow needs --add-opens=java.base/java.nio=ALL-UNNAMED on the JVM command line. See the installation guide for details and for building from source.
import org.apache.arrow.memory.RootAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.apache.datafusion.DataFrame; import org.apache.datafusion.SessionContext; try (var allocator = new RootAllocator(); var ctx = new SessionContext()) { ctx.registerParquet("orders", "/path/to/orders.parquet"); try (DataFrame df = ctx.sql( "SELECT o_orderpriority, COUNT(*) AS n " + "FROM orders GROUP BY o_orderpriority"); ArrowReader reader = df.collect(allocator)) { while (reader.loadNextBatch()) { var batch = reader.getVectorSchemaRoot(); // ... } } }
SessionContext and DataFrame are AutoCloseable and not thread-safe.
The full documentation lives under docs/source/ and is built with Sphinx (see docs/README.md for the build steps):
JDK 17+. Building from source: see docs/source/contributor-guide/development.md.
Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.
Apache License 2.0. See LICENSE.txt and NOTICE.txt.