Apache DataFusion Java

Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.

Early development: the API will change between releases. Bug reports and contributions welcome.

Quickstart

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;

try (var allocator = new RootAllocator();
     var ctx = new SessionContext()) {

    ctx.registerParquet("orders", "/path/to/orders.parquet");

    try (DataFrame df = ctx.sql(
            "SELECT o_orderpriority, COUNT(*) AS n " +
            "FROM orders GROUP BY o_orderpriority");
         ArrowReader reader = df.collect(allocator)) {
        while (reader.loadNextBatch()) {
            var batch = reader.getVectorSchemaRoot();
            // ...
        }
    }
}

See the User Guide for installation, the DataFrame and SQL APIs, and Parquet ingestion. See the Contributor Guide for build, test, and release workflows.

:maxdepth: 1
:caption: Links
:hidden:

GitHub Repository <https://github.com/apache/datafusion-java>
Issue Tracker <https://github.com/apache/datafusion-java/issues>
Apache DataFusion <https://datafusion.apache.org/>
Code of Conduct <https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md>

:maxdepth: 2
:caption: Documentation
:hidden:

User Guide <user-guide/index>
Contributor Guide <contributor-guide/index>