tree: 2e8061ff2b73ff17fe4cf406080180e0008b3a97 [path history] [tgz]
  1. alltypes_dictionary.parquet
  2. alltypes_plain.parquet
  3. alltypes_plain.snappy.parquet
  4. binary.parquet
  5. bloom_filter.bin
  6. byte_array_decimal.parquet
  7. datapage_v2.snappy.parquet
  8. delta_binary_packed.md
  9. delta_binary_packed.parquet
  10. delta_binary_packed_expect.csv
  11. dict-page-offset-zero.parquet
  12. encrypt_columns_and_footer.parquet.encrypted
  13. encrypt_columns_and_footer_aad.parquet.encrypted
  14. encrypt_columns_and_footer_ctr.parquet.encrypted
  15. encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
  16. encrypt_columns_plaintext_footer.parquet.encrypted
  17. fixed_length_decimal.parquet
  18. fixed_length_decimal_legacy.parquet
  19. hadoop_lz4_compressed.parquet
  20. hadoop_lz4_compressed_larger.parquet
  21. int32_decimal.parquet
  22. int64_decimal.parquet
  23. list_columns.parquet
  24. lz4_raw_compressed.parquet
  25. lz4_raw_compressed_larger.parquet
  26. nation.dict-malformed.parquet
  27. nested_lists.snappy.parquet
  28. nested_maps.snappy.parquet
  29. nested_structs.rust.parquet
  30. non_hadoop_lz4_compressed.parquet
  31. nonnullable.impala.parquet
  32. nullable.impala.parquet
  33. nulls.snappy.parquet
  34. README.md
  35. repeated_no_annotation.parquet
  36. single_nan.parquet
  37. uniform_encryption.parquet.encrypted
data/README.md

Test data files for Parquet compatibility and regression testing

FileDescription
delta_binary_packed.parquetINT32 and INT64 columns with DELTA_BINARY_PACKED encoding. See delta_binary_packed.md for details.
nested_structs.rust.parquetUsed to test that the Rust Arrow reader can lookup the correct field from a nested struct. See ARROW-11452

TODO: Document what each file is in the table above.

Encrypted Files

Tests files with .parquet.encrypted suffix are encrypted using Parquet Modular Encryption.

A detailed description of the Parquet Modular Encryption specification can be found here:

 https://github.com/apache/parquet-format/blob/encryption/Encryption.md

Following are the keys and key ids (when using key_retriever) used to encrypt the encrypted columns and footer in the all the encrypted files:

  • Encrypted/Signed Footer:
    • key: {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5}
    • key_id: “kf”
  • Encrypted column named double_field:
    • key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,0}
    • key_id: “kc1”
  • Encrypted column named float_field:
    • key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,1}
    • key_id: “kc2”

The following files are encrypted with AAD prefix “tester”:

  1. encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
  2. encrypt_columns_and_footer_aad.parquet.encrypted

A sample that reads and checks these files can be found at the following tests:

cpp/src/parquet/encryption-read-configurations-test.cc
cpp/src/parquet/test-encryption-util.h