SessionContext.registerTable(name, provider) registers a Java-implemented table. SQL queries that reference name call back into your TableProvider to fetch batches. Data flows from Java to native code via the Arrow C Data Interface, so there are no extra copies in the hot path. This is the Java counterpart to DataFusion's Rust SessionContext::register_table.
import org.apache.arrow.memory.BufferAllocator; import org.apache.arrow.vector.ipc.ArrowReader; import org.apache.arrow.vector.types.pojo.Schema; import org.apache.datafusion.TableProvider; public final class MyTable implements TableProvider { private final Schema schema; public MyTable(Schema schema) { this.schema = schema; } @Override public Schema schema() { return schema; } @Override public ArrowReader scan(BufferAllocator allocator) { // Return a fresh ArrowReader. The reader must allocate its buffers // from `allocator` (or a child of it) — the framework needs the // allocator hierarchy to share a root. return openMyReader(allocator); } }
For the common case of “I have a schema and a function that returns an ArrowReader,” SimpleTableProvider packages those two into a ready-made TableProvider without having to subclass:
TableProvider t = new SimpleTableProvider(mySchema(), allocator -> openMyReader(allocator)); ctx.registerTable("t", t);
try (SessionContext ctx = new SessionContext(); BufferAllocator allocator = new RootAllocator()) { ctx.registerTable("t", new MyTable(mySchema())); try (DataFrame df = ctx.sql("SELECT * FROM t WHERE x > 10"); ArrowReader r = df.collect(allocator)) { while (r.loadNextBatch()) { // ... } } }
schema() is called exactly once, on the caller's thread, at registration time. Throwing from it aborts registration with the original exception.scan(allocator) is called once per SQL query that touches the table, on a worker thread. It must return a fresh, independent ArrowReader on every call — this is what makes self-joins and UNION ALL over the same table work.scan must allocate its buffers from the supplied allocator (or a child of it). Arrow Java‘s Data.exportArrayStream requires the reader’s allocator and the export allocator to share a root.schema(). A mismatch fails the query.Exceptions thrown from scan() or from the returned reader surface in the RuntimeException raised by collect(). The error message includes the Java exception class and getMessage(), in the same format used for scalar UDF errors.
SessionContext is single-threaded, but scan(allocator) may be invoked from any DataFusion worker thread. If your implementation maintains mutable state across scans, synchronise it.
deregisterTable. Tables live until the SessionContext is closed.