feat(write): add DataFrame.writeParquet with ParquetWriteOptions (#27)
5 files changed
tree: 596a424c8f3d89e6ada051c3ba014f5a9f1f3b6f
  1. .github/
  2. .mvn/
  3. docs/
  4. native/
  5. src/
  6. .asf.yaml
  7. .gitignore
  8. CONTRIBUTING.md
  9. LICENSE.txt
  10. Makefile
  11. mvnw
  12. mvnw.cmd
  13. NOTICE.txt
  14. pom.xml
  15. README.md
README.md

Apache DataFusion Java

Java bindings for Apache DataFusion. Queries run in native Rust and results return to the JVM as Apache Arrow batches via the Arrow C Data Interface.

Early development: no releases yet, API will change. Bug reports and contributions welcome.

Quickstart

import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.ipc.ArrowReader;
import org.apache.datafusion.DataFrame;
import org.apache.datafusion.SessionContext;

try (var allocator = new RootAllocator();
     var ctx = new SessionContext()) {

    ctx.registerParquet("orders", "/path/to/orders.parquet");

    try (DataFrame df = ctx.sql(
            "SELECT o_orderpriority, COUNT(*) AS n " +
            "FROM orders GROUP BY o_orderpriority");
         ArrowReader reader = df.collect(allocator)) {
        while (reader.loadNextBatch()) {
            var batch = reader.getVectorSchemaRoot();
            // ...
        }
    }
}

SessionContext and DataFrame are AutoCloseable and not thread-safe.

Documentation

The full documentation lives under docs/source/ and is built with Sphinx (see docs/README.md for the build steps):

  • User guide — installation, the DataFrame and SQL APIs, Parquet ingestion, project status.
  • Contributor guide — build, test, code style, and how to bump the DataFusion version.

Requirements

JDK 17+. Building from source: see docs/source/contributor-guide/development.md.

Contributing

Open an issue to discuss non-trivial changes before sending a PR. See the contributor guide.

License

Apache License 2.0. See LICENSE.txt and NOTICE.txt.