File | Description |
---|---|
delta_byte_array.parquet | string columns with DELTA_BYTE_ARRAY encoding. See delta_byte_array.md for details. |
delta_length_byte_array.parquet | string columns with DELTA_LENGTH_BYTE_ARRAY encoding. |
delta_binary_packed.parquet | INT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See delta_binary_packed.md for details. |
delta_encoding_required_column.parquet | required INT32 and STRING columns with delta encoding. See delta_encoding_required_column.md for details. |
delta_encoding_optional_column.parquet | optional INT64 and STRING columns with delta encoding. See delta_encoding_optional_column.md for details. |
nested_structs.rust.parquet | Used to test that the Rust Arrow reader can lookup the correct field from a nested struct. See ARROW-11452 |
data_index_bloom_encoding_stats.parquet | optional STRING column. Contains optional metadata: bloom filters, column index, offset index and encoding stats. |
null_list.parquet | an empty list. Generated from this json {"emptylist":[]} and for the purposes of testing correct read/write behaviour of this base case. |
alltypes_tiny_pages.parquet | small page sizes with dictionary encoding with page index from impala. |
alltypes_tiny_pages_plain.parquet | small page sizes with plain encoding with page index impala. |
rle_boolean_encoding.parquet | option boolean columns with RLE encoding |
fixed_length_byte_array.parquet | optional FIXED_LENGTH_BYTE_ARRAY column with page index. See fixed_length_byte_array.md for details. |
datapage_v1-uncompressed-checksum.parquet | uncompressed INT32 columns in v1 data pages with a matching CRC |
datapage_v1-snappy-compressed-checksum.parquet | compressed INT32 columns in v1 data pages with a matching CRC |
datapage_v1-corrupt-checksum.parquet | uncompressed INT32 columns in v1 data pages with a mismatching CRC |
TODO: Document what each file is in the table above.
Tests files with .parquet.encrypted suffix are encrypted using Parquet Modular Encryption.
A detailed description of the Parquet Modular Encryption specification can be found here:
https://github.com/apache/parquet-format/blob/encryption/Encryption.md
Following are the keys and key ids (when using key_retriever) used to encrypt the encrypted columns and footer in the all the encrypted files:
The following files are encrypted with AAD prefix “tester”:
A sample that reads and checks these files can be found at the following tests:
cpp/src/parquet/encryption-read-configurations-test.cc cpp/src/parquet/test-encryption-util.h
The schema for the datapage_v1-*-checksum.parquet
test files is:
message m { required int32 a; required int32 b; }
The detailed structure for these files is as follows:
data/datapage_v1-uncompressed-checksum.parquet
:
[ Column "a" [ Page 0 [correct crc] | Uncompressed Contents ][ Page 1 [correct crc] | Uncompressed Contents ]] [ Column "b" [ Page 0 [correct crc] | Uncompressed Contents ][ Page 1 [correct crc] | Uncompressed Contents ]]
data/datapage_v1-snappy-compressed-checksum.parquet
:
[ Column "a" [ Page 0 [correct crc] | Snappy Contents ][ Page 1 [correct crc] | Snappy Contents ]] [ Column "b" [ Page 0 [correct crc] | Snappy Contents ][ Page 1 [correct crc] | Snappy Contents ]]
data/datapage_v1-corrupt-checksum.parquet
:
[ Column "a" [ Page 0 [bad crc] | Uncompressed Contents ][ Page 1 [correct crc] | Uncompressed Contents ]] [ Column "b" [ Page 0 [correct crc] | Uncompressed Contents ][ Page 1 [bad crc] | Uncompressed Contents ]]