title: “FileFormat” weight: 7 type: docs aliases:
Currently, supports Parquet, Avro, ORC, CSV, JSON, and Lance file formats.
Parquet is the default file format for Paimon.
The following table lists the type mapping from Paimon type to Parquet type.
Limitations:
The following table lists the type mapping from Paimon type to Avro type.
Note:
In addition to the types listed above, for nullable types. Paimon maps nullable types to Avro union(something, null), where something is the Avro type converted from Paimon type.
You can refer to Avro Specification for more information about Avro types.
The following table lists the type mapping from Paimon type to Orc type.
Limitations:
TIMESTAMP_LOCAL_ZONE type, saving the millis value corresponding to the UTC literal time. Due to compatibility issues, this behavior cannot be modified.Experimental feature, not recommended for production.
Format Options:
Paimon CSV format uses jackson databind API to parse and generate CSV string.
The following table lists the type mapping from Paimon type to CSV type.
Experimental feature, not recommended for production.
Format Options:
The Paimon text table contains only one field, and it is of string type.
Experimental feature, not recommended for production.
Format Options:
Paimon JSON format uses jackson databind API to parse and generate JSON string.
The following table lists the type mapping from Paimon type to JSON type.
Lance is a modern columnar data format optimized for machine learning and vector search workloads. It provides high-performance read and write operations with native support for Apache Arrow.
The following table lists the type mapping from Paimon type to Lance (Arrow) type.
Limitations:
MAP type.TIMESTAMP_LOCAL_ZONE type.The BLOB format is a specialized format for storing large binary objects such as images, videos, and other multimodal data. Unlike other formats that store data inline, BLOB format stores large binary data in separate files with an optimized layout for random access.
BLOB files use the .blob extension and have the following structure:
+------------------+ | Blob Entry 1 | | Magic Number | 4 bytes (1481511375, Little Endian) | Blob Data | Variable length | Length | 8 bytes (Little Endian) | CRC32 | 4 bytes (Little Endian) +------------------+ | Blob Entry 2 | | ... | +------------------+ | Index | Variable (Delta-Varint compressed) +------------------+ | Index Length | 4 bytes (Little Endian) | Version | 1 byte +------------------+
Key features:
Limitations:
For usage details, configuration options, and examples, see [Blob Type]({{< ref “append-table/blob” >}}).