title: Row Format sidebar_position: 9 id: java_row_format license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Apache Fory™ provides a random-access row format that enables reading nested fields from binary data without full deserialization. This drastically reduces overhead when working with large objects where only partial data access is needed.
Row format is a cache-friendly binary random access format that supports:
public class Bar { String f1; List<Long> f2; } public class Foo { int f1; List<Integer> f2; Map<String, Integer> f3; List<Bar> f4; } RowEncoder<Foo> encoder = Encoders.bean(Foo.class); // Create large dataset Foo foo = new Foo(); foo.f1 = 10; foo.f2 = IntStream.range(0, 1_000_000).boxed().collect(Collectors.toList()); foo.f3 = IntStream.range(0, 1_000_000).boxed().collect(Collectors.toMap(i -> "k" + i, i -> i)); List<Bar> bars = new ArrayList<>(1_000_000); for (int i = 0; i < 1_000_000; i++) { Bar bar = new Bar(); bar.f1 = "s" + i; bar.f2 = LongStream.range(0, 10).boxed().collect(Collectors.toList()); bars.add(bar); } foo.f4 = bars; // Encode to row format (cross-language compatible with Python/C++) BinaryRow binaryRow = encoder.toRow(foo); // Zero-copy random access without full deserialization BinaryArray f2Array = binaryRow.getArray(1); // Access f2 list BinaryArray f4Array = binaryRow.getArray(3); // Access f4 list BinaryRow bar10 = f4Array.getStruct(10); // Access 11th Bar long value = bar10.getArray(1).getInt64(5); // Access 6th element of bar.f2 // Partial deserialization - only deserialize what you need RowEncoder<Bar> barEncoder = Encoders.bean(Bar.class); Bar bar1 = barEncoder.fromRow(f4Array.getStruct(10)); // Deserialize 11th Bar only Bar bar2 = barEncoder.fromRow(f4Array.getStruct(20)); // Deserialize 21st Bar only // Full deserialization when needed Foo newFoo = encoder.fromRow(binaryRow);
| Feature | Description |
|---|---|
| Zero-Copy Access | Read nested fields without deserializing entire object |
| Memory Efficiency | Memory-map large datasets directly from disk |
| Cross-Language | Binary format compatible between Java, Python, C++ |
| Partial Deserialization | Deserialize only specific elements you need |
| High Performance | Skip unnecessary data parsing for analytics workloads |
Row format is ideal for:
Row format works seamlessly across languages. The same binary data can be accessed from:
import pyfory import pyarrow as pa from dataclasses import dataclass from typing import List, Dict @dataclass class Bar: f1: str f2: List[pa.int64] @dataclass class Foo: f1: pa.int32 f2: List[pa.int32] f3: Dict[str, pa.int32] f4: List[Bar] encoder = pyfory.encoder(Foo) binary: bytes = encoder.to_row(foo).to_bytes() # Zero-copy access foo_row = pyfory.RowData(encoder.schema, binary) print(foo_row.f2[100000]) print(foo_row.f4[100000].f1)
#include "fory/encoder/row_encoder.h" #include "fory/row/writer.h" struct Bar { std::string f1; std::vector<int64_t> f2; }; FORY_FIELD_INFO(Bar, f1, f2); struct Foo { int32_t f1; std::vector<int32_t> f2; std::map<std::string, int32_t> f3; std::vector<Bar> f4; }; FORY_FIELD_INFO(Foo, f1, f2, f3, f4); fory::encoder::RowEncoder<Foo> encoder; encoder.Encode(foo); auto row = encoder.GetWriter().ToRow(); // Zero-copy random access auto f2_array = row->GetArray(1); auto f4_array = row->GetArray(3); auto bar10 = f4_array->GetStruct(10); int64_t value = bar10->GetArray(1)->GetInt64(5); std::string str = bar10->GetString(0);
| Operation | Object Format | Row Format |
|---|---|---|
| Full deserialization | Allocates all objects | Zero allocation |
| Single field access | Full deserialization required | Direct offset read |
| Memory usage | Full object graph in memory | Only accessed fields |
| Suitable for | Small objects, full access | Large objects, selective access |