build: Cargo workspace + native-common extraction (1/6) (#104)
33 files changed
tree: a560e58b7d969e8a0c06c6b9d76ad2a2e25f77e7
  1. .cargo/
  2. .github/
  3. .mvn/
  4. core/
  5. dev/
  6. docs/
  7. examples/
  8. native/
  9. native-common/
  10. proto/
  11. .asf.yaml
  12. .gitignore
  13. Cargo.lock
  14. Cargo.toml
  15. CONTRIBUTING.md
  16. LICENSE.txt
  17. Makefile
  18. mvnw
  19. mvnw.cmd
  20. NOTICE.txt
  21. pom.xml
  22. README.md
README.md

Apache DataFusion Java

Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.

Early development: the API will change between releases. Bug reports and contributions welcome.

Install

Released to Maven Central. The JAR bundles the native library for Linux and macOS on x86_64 and aarch64. Windows users need to build from source.

Maven:

<dependency>
    <groupId>org.apache.datafusion</groupId>
    <artifactId>datafusion-java</artifactId>
    <version>0.1.0</version>
</dependency>

Gradle:

implementation("org.apache.datafusion:datafusion-java:0.1.0")

Arrow needs --add-opens=java.base/java.nio=ALL-UNNAMED on the JVM command line. See the installation guide for details and for building from source.

Quickstart

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;

try (var allocator = new RootAllocator();
     var ctx = new SessionContext()) {

    ctx.registerParquet("orders", "/path/to/orders.parquet");

    try (DataFrame df = ctx.sql(
            "SELECT o_orderpriority, COUNT(*) AS n " +
            "FROM orders GROUP BY o_orderpriority");
         ArrowReader reader = df.collect(allocator)) {
        while (reader.loadNextBatch()) {
            var batch = reader.getVectorSchemaRoot();
            // ...
        }
    }
}

SessionContext and DataFrame are AutoCloseable and not thread-safe.

Documentation

The full documentation lives under docs/source/ and is built with Sphinx (see docs/README.md for the build steps):

  • User guide — installation, the DataFrame and SQL APIs, Parquet ingestion.
  • Contributor guide — build, test, code style, and how to bump the DataFusion version.

Requirements

JDK 17+. Building from source: see docs/source/contributor-guide/development.md.

Contributing

Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.

License

Apache License 2.0. See LICENSE.txt and NOTICE.txt.