DataFusion Java reads Parquet through two entry points on SessionContext: registerParquet to expose a file as a named table, and readParquet to get a DataFrame directly.
ctx.registerParquet("orders", "/path/to/orders.parquet"); try (DataFrame df = ctx.sql("SELECT * FROM orders LIMIT 10")) { df.show(); }
The file's footer is read at registration time. The table remains in the catalog for the lifetime of the SessionContext.
try (DataFrame df = ctx.readParquet("/path/to/orders.parquet")) { df.show(); }
readParquet skips the catalog and hands back a DataFrame straight away.
Both entry points accept a ParquetReadOptions to tune the underlying read. Construct one directly and chain setters:
ParquetReadOptions opts = new ParquetReadOptions() .fileExtension(".parquet"); ctx.registerParquet("orders", "/path/to/orders.parquet", opts); // or try (DataFrame df = ctx.readParquet("/path/to/orders.parquet", opts)) { df.show(); }
The supported setters track what DataFusion exposes on its Rust ParquetReadOptions builder. Inspect the class on the Java side for the exact setters available in the version you are using.