Test data files for Parquet compatibility and regression testing

TODO: Document what each file is

Encrypted Files

Tests files with .parquet.encrypted suffix are encrypted using Parquet Modular Encryption.

A detailed description of the Parquet Modular Encryption specification can be found here:

 https://github.com/apache/parquet-format/blob/encryption/Encryption.md

Following are the keys and key ids (when using key_retriever) used to encrypt the encrypted columns and footer in the all the encrypted files:

  • Encrypted/Signed Footer:
    • key: {0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5}
    • key_id: “kf”
  • Encrypted column named double_field:
    • key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,0}
    • key_id: “kc1”
  • Encrypted column named float_field:
    • key: {1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,1}
    • key_id: “kc2”

The following files are encrypted with AAD prefix “tester”:

  1. encrypt_columns_and_footer_disable_aad_storage.parquet.encrypted
  2. encrypt_columns_and_footer_aad.parquet.encrypted

A sample that reads and checks these files can be found at the following tests:

cpp/src/parquet/encryption-read-configurations-test.cc
cpp/src/parquet/test-encryption-util.h