Doris Storage Format V3 is a major evolution from the Segment V2 format. Through metadata decoupling and encoding strategy optimization, it specifically improves performance for wide tables, complex data types (such as Variant), and cloud-native storage-compute separation scenarios.
ColumnMetaPB) is stored in the Footer of the Segment file. For wide tables with thousands of columns or auto-scaling Variant scenarios, the Footer can grow to several megabytes.ColumnMetaPB from the Footer and stores it in a separate area within the file (External Column Meta Area).PLAIN_ENCODING (raw binary storage) for numerical types (such as INT, BIGINT), instead of the traditional BitShuffle.PLAIN_ENCODING provides higher read throughput and lower CPU overhead. In modern high-speed IO environments, this “trading decompression for performance” strategy offers a clear advantage when scanning large volumes of data.BINARY_PLAIN_ENCODING_V2, using a [length(varuint)][raw_data] streaming layout, replacing the old format that relied on trailing offset tables.The design philosophy of V3 can be summarized as: “Metadata Decoupling, Encoding Simplification, and Streaming Layout”. By reducing metadata processing bottlenecks and leveraging the high efficiency of modern CPUs in processing simple encodings, it achieves high-performance analysis under complex schemas.
VARIANT or JSON types.Specify storage_format as V3 in the PROPERTIES of the CREATE TABLE statement:
CREATE TABLE table_v3 ( id BIGINT, data VARIANT ) DISTRIBUTED BY HASH(id) BUCKETS 32 PROPERTIES ( "storage_format" = "V3" );