docs/source/cpp/parquet.rst - arrow - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
 .. or more contributor license agreements.  See the NOTICE file
 .. distributed with this work for additional information
 .. regarding copyright ownership.  The ASF licenses this file
 .. to you under the Apache License, Version 2.0 (the
 .. "License"); you may not use this file except in compliance
 .. with the License.  You may obtain a copy of the License at

 ..   http://www.apache.org/licenses/LICENSE-2.0

 .. Unless required by applicable law or agreed to in writing,
 .. software distributed under the License is distributed on an
 .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 .. KIND, either express or implied.  See the License for the
 .. specific language governing permissions and limitations
 .. under the License.

 .. default-domain:: cpp
 .. highlight:: cpp

 .. cpp:namespace:: parquet

 =================================
 Reading and writing Parquet files
 =================================

 .. seealso::
    :ref:`Parquet reader and writer API reference <cpp-api-parquet>`.

 The `Parquet format <https://parquet.apache.org/docs/>`__
 is a space-efficient columnar storage format for complex data.  The Parquet
 C++ implementation is part of the Apache Arrow project and benefits
 from tight integration with the Arrow C++ classes and facilities.

 Reading Parquet files
 =====================

 The :class:`arrow::FileReader` class reads data into Arrow Tables and Record
 Batches.

 The :class:`StreamReader` class allows for data to be read using a C++ input
 stream approach to read fields column by column and row by row.  This approach
 is offered for ease of use and type-safety.  It is of course also useful when
 data must be streamed as files are read and written incrementally.

 Please note that the performance of the :class:`StreamReader` will not
 be as good due to the type checking and the fact that column values
 are processed one at a time.

 FileReader
 ----------

 To read Parquet data into Arrow structures, use :class:`arrow::FileReader`.
 To construct, it requires a :class:`::arrow::io::RandomAccessFile` instance
 representing the input file. To read the whole file at once,
 use :func:`arrow::FileReader::ReadTable`:

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_read_write.cc
    :language: cpp
    :start-after: arrow::Status ReadFullFile(
    :end-before: return arrow::Status::OK();
    :emphasize-lines: 9-10,14
    :dedent: 2

 Finer-grained options are available through the
 :class:`arrow::FileReaderBuilder` helper class, which accepts the :class:`ReaderProperties`
 and :class:`ArrowReaderProperties` classes.

 For reading as a stream of batches, use the :func:`arrow::FileReader::GetRecordBatchReader`
 method to retrieve a :class:`arrow::RecordBatchReader`. It will use the batch
 size set in :class:`ArrowReaderProperties`.

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_read_write.cc
    :language: cpp
    :start-after: arrow::Status ReadInBatches(
    :end-before: return arrow::Status::OK();
    :emphasize-lines: 25
    :dedent: 2

 .. seealso::

    For reading multi-file datasets or pushing down filters to prune row groups,
    see :ref:`Tabular Datasets<cpp-dataset>`.

 Performance and Memory Efficiency
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 For remote filesystems, use read coalescing (pre-buffering) to reduce number of API calls:

 .. code-block:: cpp

    auto arrow_reader_props = parquet::ArrowReaderProperties();
    reader_properties.set_prebuffer(true);

 The defaults are generally tuned towards good performance, but parallel column
 decoding is off by default. Enable it in the constructor of :class:`ArrowReaderProperties`:

 .. code-block:: cpp

    auto arrow_reader_props = parquet::ArrowReaderProperties(/*use_threads=*/true);

 If memory efficiency is more important than performance, then:

 #. Do *not* turn on read coalescing (pre-buffering) in :class:`parquet::ArrowReaderProperties`.
 #. Read data in batches using :func:`arrow::FileReader::GetRecordBatchReader`.
 #. Turn on ``enable_buffered_stream`` in :class:`parquet::ReaderProperties`.

 In addition, if you know certain columns contain many repeated values, you can
 read them as :term:`dictionary encoded<dictionary-encoding>` columns. This is
 enabled with the ``set_read_dictionary`` setting on :class:`ArrowReaderProperties`.
 If the files were written with Arrow C++ and the ``store_schema`` was activated,
 then the original Arrow schema will be automatically read and will override this
 setting.

 StreamReader
 ------------

 The :class:`StreamReader` allows for Parquet files to be read using
 standard C++ input operators which ensures type-safety.

 Please note that types must match the schema exactly i.e. if the
 schema field is an unsigned 16-bit integer then you must supply a
 ``uint16_t`` type.

 Exceptions are used to signal errors.  A :class:`ParquetException` is
 thrown in the following circumstances:

 * Attempt to read field by supplying the incorrect type.

 * Attempt to read beyond end of row.

 * Attempt to read beyond end of file.

 .. code-block:: cpp

    #include "arrow/io/file.h"
    #include "parquet/stream_reader.h"

    {
       std::shared_ptr<arrow::io::ReadableFile> infile;

       PARQUET_ASSIGN_OR_THROW(
          infile,
          arrow::io::ReadableFile::Open("test.parquet"));

       parquet::StreamReader stream{parquet::ParquetFileReader::Open(infile)};

       std::string article;
       float price;
       uint32_t quantity;

       while ( !stream.eof() )
       {
          stream >> article >> price >> quantity >> parquet::EndRow;
          // ...
       }
    }

 Writing Parquet files
 =====================

 WriteTable
 ----------

 The :func:`arrow::WriteTable` function writes an entire
 :class:`::arrow::Table` to an output file.

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_read_write.cc
    :language: cpp
    :start-after: arrow::Status WriteFullFile(
    :end-before: return arrow::Status::OK();
    :emphasize-lines: 19-21
    :dedent: 2

 .. note::

    Column compression is off by default in C++. See :ref:`below <parquet-writer-properties>`
    for how to choose a compression codec in the writer properties.

 To write out data batch-by-batch, use :class:`arrow::FileWriter`.

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_read_write.cc
    :language: cpp
    :start-after: arrow::Status WriteInBatches(
    :end-before: return arrow::Status::OK();
    :emphasize-lines: 23-25,32,36
    :dedent: 2

 StreamWriter
 ------------

 The :class:`StreamWriter` allows for Parquet files to be written using
 standard C++ output operators, similar to reading with the :class:`StreamReader`
 class. This type-safe approach also ensures that rows are written without
 omitting fields and allows for new row groups to be created automatically
 (after certain volume of data) or explicitly by using the :type:`EndRowGroup`
 stream modifier.

 Exceptions are used to signal errors.  A :class:`ParquetException` is
 thrown in the following circumstances:

 * Attempt to write a field using an incorrect type.

 * Attempt to write too many fields in a row.

 * Attempt to skip a required field.

 .. code-block:: cpp

    #include "arrow/io/file.h"
    #include "parquet/stream_writer.h"

    {
       std::shared_ptr<arrow::io::FileOutputStream> outfile;

       PARQUET_ASSIGN_OR_THROW(
          outfile,
          arrow::io::FileOutputStream::Open("test.parquet"));

       parquet::WriterProperties::Builder builder;
       std::shared_ptr<parquet::schema::GroupNode> schema;

       // Set up builder with required compression type etc.
       // Define schema.
       // ...

       parquet::StreamWriter os{
          parquet::ParquetFileWriter::Open(outfile, schema, builder.build())};

       // Loop over some data structure which provides the required
       // fields to be written and write each row.
       for (const auto& a : getArticles())
       {
          os << a.name() << a.price() << a.quantity() << parquet::EndRow;
       }
    }

 .. _parquet-writer-properties:

 Writer properties
 -----------------

 To configure how Parquet files are written, use the :class:`WriterProperties::Builder`:

 .. code-block:: cpp

    #include "parquet/arrow/writer.h"
    #include "arrow/util/type_fwd.h"

    using parquet::WriterProperties;
    using parquet::ParquetVersion;
    using parquet::ParquetDataPageVersion;
    using arrow::Compression;

    std::shared_ptr<WriterProperties> props = WriterProperties::Builder()
       .max_row_group_length(64 * 1024)
       .created_by("My Application")
       .version(ParquetVersion::PARQUET_2_6)
       .data_page_version(ParquetDataPageVersion::V2)
       .compression(Compression::SNAPPY)
       .build();

 The ``max_row_group_length`` sets an upper bound on the number of rows per row
 group that takes precedent over the ``chunk_size`` passed in the write methods.

 You can set the version of Parquet to write with ``version``, which determines
 which logical types are available. In addition, you can set the data page version
 with ``data_page_version``. It's V1 by default; setting to V2 will allow more
 optimal compression (skipping compressing pages where there isn't a space
 benefit), but not all readers support this data page version.

 Compression is off by default, but to get the most out of Parquet, you should
 also choose a compression codec. You can choose one for the whole file or
 choose one for individual columns. If you choose a mix, the file-level option
 will apply to columns that don't have a specific compression codec. See
 :class:`::arrow::Compression` for options.

 Column data encodings can likewise be applied at the file-level or at the
 column level. By default, the writer will attempt to dictionary encode all
 supported columns, unless the dictionary grows too large. This behavior can
 be changed at file-level or at the column level with ``disable_dictionary()``.
 When not using dictionary encoding, it will fallback to the encoding set for
 the column or the overall file; by default ``Encoding::PLAIN``, but this can
 be changed with ``encoding()``.

 .. code-block:: cpp

    #include "parquet/arrow/writer.h"
    #include "arrow/util/type_fwd.h"

    using parquet::WriterProperties;
    using arrow::Compression;
    using parquet::Encoding;

    std::shared_ptr<WriterProperties> props = WriterProperties::Builder()
      .compression(Compression::SNAPPY)        // Fallback
      ->compression("colA", Compression::ZSTD) // Only applies to column "colA"
      ->encoding(Encoding::BIT_PACKED)         // Fallback
      ->encoding("colB", Encoding::RLE)        // Only applies to column "colB"
      ->disable_dictionary("colB")             // Never dictionary-encode column "colB"
      ->build();

 Statistics are enabled by default for all columns. You can disable statistics for
 all columns or specific columns using ``disable_statistics`` on the builder.
 There is a ``max_statistics_size`` which limits the maximum number of bytes that
 may be used for min and max values, useful for types like strings or binary blobs.
 If a column has enabled page index using ``enable_write_page_index``, then it does
 not write statistics to the page header because it is duplicated in the ColumnIndex.

 There are also Arrow-specific settings that can be configured with
 :class:`parquet::ArrowWriterProperties`:

 .. code-block:: cpp

    #include "parquet/arrow/writer.h"

    using parquet::ArrowWriterProperties;

    std::shared_ptr<ArrowWriterProperties> arrow_props = ArrowWriterProperties::Builder()
       .enable_deprecated_int96_timestamps() // default False
       ->store_schema() // default False
       ->build();

 These options mostly dictate how Arrow types are converted to Parquet types.
 Turning on ``store_schema`` will cause the writer to store the serialized Arrow
 schema within the file metadata. Since there is no bijection between Parquet
 schemas and Arrow schemas, storing the Arrow schema allows the Arrow reader
 to more faithfully recreate the original data. This mapping from Parquet types
 back to original Arrow types includes:

 * Reading timestamps with original timezone information (Parquet does not
   support time zones);
 * Reading Arrow types from their storage types (such as Duration from int64
   columns);
 * Reading string and binary columns back into large variants with 64-bit offsets;
 * Reading back columns as dictionary encoded (whether an Arrow column and
   the serialized Parquet version are dictionary encoded are independent).

 Supported Parquet features
 ==========================

 The Parquet format has many features, and Parquet C++ supports a subset of them.

 Page types
 ----------

 +-------------------+---------+
 | Page type         | Notes   |
 +===================+=========+
 | DATA_PAGE         |         |
 +-------------------+---------+
 | DATA_PAGE_V2      |         |
 +-------------------+---------+
 | DICTIONARY_PAGE   |         |
 +-------------------+---------+

 *Unsupported page type:* INDEX_PAGE. When reading a Parquet file, pages of
 this type are ignored.

 Compression
 -----------

 +-------------------+---------+
 | Compression codec | Notes   |
 +===================+=========+
 | SNAPPY            |         |
 +-------------------+---------+
 | GZIP              |         |
 +-------------------+---------+
 | BROTLI            |         |
 +-------------------+---------+
 | LZ4               | \(1)    |
 +-------------------+---------+
 | ZSTD              |         |
 +-------------------+---------+

 * \(1) On the read side, Parquet C++ is able to decompress both the regular
   LZ4 block format and the ad-hoc Hadoop LZ4 format used by the
   `reference Parquet implementation <https://github.com/apache/parquet-mr>`__.
   On the write side, Parquet C++ always generates the ad-hoc Hadoop LZ4 format.

 *Unsupported compression codec:* LZO.

 Encodings
 ---------

 +--------------------------+----------+----------+---------+
 | Encoding                 | Reading  | Writing  | Notes   |
 +==========================+==========+==========+=========+
 | PLAIN                    | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+
 | PLAIN_DICTIONARY         | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+
 | BIT_PACKED               | ✓        | ✓        | \(1)    |
 +--------------------------+----------+----------+---------+
 | RLE                      | ✓        | ✓        | \(1)    |
 +--------------------------+----------+----------+---------+
 | RLE_DICTIONARY           | ✓        | ✓        | \(2)    |
 +--------------------------+----------+----------+---------+
 | BYTE_STREAM_SPLIT        | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+
 | DELTA_BINARY_PACKED      | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+
 | DELTA_BYTE_ARRAY         | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+
 | DELTA_LENGTH_BYTE_ARRAY  | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+

 * \(1) Only supported for encoding definition and repetition levels,
   and boolean values.

 * \(2) On the write path, RLE_DICTIONARY is only enabled if Parquet format version
   2.4 or greater is selected in :func:`WriterProperties::version`.

 Types
 -----

 Physical types
 ~~~~~~~~~~~~~~

 +--------------------------+------------------------------------+------------+
 | Physical type            | Mapped Arrow type                  | Notes      |
 +==========================+====================================+============+
 | BOOLEAN                  | Boolean                            |            |
 +--------------------------+------------------------------------+------------+
 | INT32                    | Int32 / other                      | \(1)       |
 +--------------------------+------------------------------------+------------+
 | INT64                    | Int64 / other                      | \(1)       |
 +--------------------------+------------------------------------+------------+
 | INT96                    | Timestamp (nanoseconds)            | \(2)       |
 +--------------------------+------------------------------------+------------+
 | FLOAT                    | Float32                            |            |
 +--------------------------+------------------------------------+------------+
 | DOUBLE                   | Float64                            |            |
 +--------------------------+------------------------------------+------------+
 | BYTE_ARRAY               | Binary / LargeBinary / BinaryView  | \(1)       |
 +--------------------------+------------------------------------+------------+
 | FIXED_LENGTH_BYTE_ARRAY  | FixedSizeBinary / other            | \(1)       |
 +--------------------------+------------------------------------+------------+

 * \(1) Can be mapped to other Arrow types, depending on the logical type
   (see table below).

 * \(2) On the write side, :func:`ArrowWriterProperties::support_deprecated_int96_timestamps`
   must be enabled.

 Logical types
 ~~~~~~~~~~~~~

 Specific logical types can override the default Arrow type mapping for a given
 physical type.

 +-------------------+-----------------------------+------------------------------+-----------+
 | Logical type      | Physical type               | Mapped Arrow type            | Notes     |
 +===================+=============================+==============================+===========+
 | NULL              | Any                         | Null                         | \(1)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | INT               | INT32                       | Int8 / UInt8 / Int16 /       |           |
 |                   |                             | UInt16 / Int32 / UInt32      |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | INT               | INT64                       | Int64 / UInt64               |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | DECIMAL           | INT32 / INT64 / BYTE_ARRAY  | Decimal32/ Decimal64 /       | \(2)      |
 |                   | / FIXED_LENGTH_BYTE_ARRAY   | Decimal128 / Decimal256      |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | DATE              | INT32                       | Date32                       | \(3)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | TIME              | INT32                       | Time32 (milliseconds)        |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | TIME              | INT64                       | Time64 (micro- or            |           |
 |                   |                             | nanoseconds)                 |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | TIMESTAMP         | INT64                       | Timestamp (milli-, micro-    |           |
 |                   |                             | or nanoseconds)              |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | STRING            | BYTE_ARRAY                  | String / LargeString /       |           |
 |                   |                             | StringView                   |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | LIST              | Any                         | List / LargeList             | \(4)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | MAP               | Any                         | Map                          | \(5)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | FLOAT16           | FIXED_LENGTH_BYTE_ARRAY     | HalfFloat                    |           |
 +-------------------+-----------------------------+------------------------------+-----------+
 | UUID              | FIXED_LENGTH_BYTE_ARRAY     | Extension (``arrow.uuid``)   | \(6)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | JSON              | BYTE_ARRAY                  | Extension (``arrow.json``)   | \(6)      |
 +-------------------+-----------------------------+------------------------------+-----------+
 | GEOMETRY          | BYTE_ARRAY                  | Extension (``geoarrow.wkb``) | \(6) \(7) |
 +-------------------+-----------------------------+------------------------------+-----------+
 | GEOGRAPHY         | BYTE_ARRAY                  | Extension (``geoarrow.wkb``) | \(6) \(7) |
 +-------------------+-----------------------------+------------------------------+-----------+

 * \(1) On the write side, the Parquet physical type INT32 is generated.

 * \(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted
   except if ``store_decimal_as_integer`` is set to true.

 * \(3) On the write side, an Arrow Date64 is also mapped to a Parquet DATE INT32.

 * \(4) On the write side, an Arrow FixedSizedList is also mapped to a Parquet LIST.

 * \(5) On the read side, a key with multiple values does not get deduplicated,
   in contradiction with the
   `Parquet specification <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps>`__.

 * \(6) Requires that ``arrow_extensions_enabled`` in ``ArrowReaderProperties`` is ``true``.
   When ``false``, the underlying storage type is read.

 * \(7) Requires that the ``geoarrow.wkb`` extension type is registered.

 *Unsupported logical types:* BSON.  If such a type is encountered
 when reading a Parquet file, the default physical type mapping is used (for
 example, a Parquet BSON column may be read as Arrow Binary or FixedSizeBinary).

 Converted types
 ~~~~~~~~~~~~~~~

 While converted types are deprecated in the Parquet format (they are superceded
 by logical types), they are recognized and emitted by the Parquet C++
 implementation so as to maximize compatibility with other Parquet
 implementations.

 Special cases
 ~~~~~~~~~~~~~

 An Arrow Extension type is written out as its storage type.  It can still
 be recreated at read time using Parquet metadata (see "Roundtripping Arrow
 types" below). Some extension types have Parquet LogicalType equivalents
 (e.g., UUID, JSON, GEOMETRY, GEOGRAPHY). These are created automatically
 if the appropriate option is set in the ``ArrowReaderProperties`` even if
 there was no Arrow schema stored in the Parquet metadata.

 An Arrow Dictionary type is written out as its value type.  It can still
 be recreated at read time using Parquet metadata (see "Roundtripping Arrow
 types" below).

 Roundtripping Arrow types and schema
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 While there is no bijection between Arrow types and Parquet types, it is
 possible to serialize the Arrow schema as part of the Parquet file metadata.
 This is enabled using :func:`ArrowWriterProperties::store_schema`.

 On the read path, the serialized schema will be automatically recognized
 and will recreate the original Arrow data, converting the Parquet data as
 required.

 As an example, when serializing an Arrow LargeList to Parquet:

 * The data is written out as a Parquet LIST

 * When read back, the Parquet LIST data is decoded as an Arrow LargeList if
   :func:`ArrowWriterProperties::store_schema` was enabled when writing the file;
   otherwise, it is decoded as an Arrow List.

 Parquet field id
 """"""""""""""""

 The Parquet format supports an optional integer *field id* which can be assigned
 to a given field. This is used for example in the
 `Apache Iceberg specification <https://github.com/apache/iceberg/blob/main/format/spec.md#column-projection>`__.

 On the writer side, if ``PARQUET:field_id`` is present as a metadata key on an
 Arrow field, then its value is parsed as a non-negative integer and is used as
 the field id for the corresponding Parquet field.

 On the reader side, Arrow will convert such a field id to a metadata key named
 ``PARQUET:field_id`` on the corresponding Arrow field.

 Serialization details
 """""""""""""""""""""

 The Arrow schema is serialized as a :ref:`Arrow IPC <format-ipc>` schema message,
 then base64-encoded and stored under the ``ARROW:schema`` metadata key in
 the Parquet file metadata.


 Limitations
 ~~~~~~~~~~~

 Writing or reading back FixedSizedList data with null entries is not supported.

 Encryption
 ----------

 Parquet C++ implements all features specified in the
 `encryption specification <https://github.com/apache/parquet-format/blob/master/Encryption.md>`__,
 except for encryption of column index and bloom filter modules.

 More specifically, Parquet C++ supports:

 * AES_GCM_V1 and AES_GCM_CTR_V1 encryption algorithms.
 * AAD suffix for Footer, ColumnMetaData, Data Page, Dictionary Page,
   Data PageHeader, Dictionary PageHeader module types. Other module types
   (ColumnIndex, OffsetIndex, BloomFilter Header, BloomFilter Bitset) are not
   supported.
 * EncryptionWithFooterKey and EncryptionWithColumnKey modes.
 * Encrypted Footer and Plaintext Footer modes.

 Configuration
 ~~~~~~~~~~~~~

 Parquet encryption uses a ``parquet::encryption::CryptoFactory`` that has access to a
 Key Management System (KMS), which stores actual encryption keys, referenced by key ids.
 The Parquet encryption configuration only uses key ids, no actual keys.

 Parquet metadata encryption is configured via ``parquet::encryption::EncryptionConfiguration``:

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_column_encryption.cc
    :language: cpp
    :start-at: // Set write options with encryption configuration
    :end-before: encryption_config->column_keys
    :dedent: 2

 If ``encryption_config->uniform_encryption`` is set to ``true``, then all columns are
 encrypted with the same key as the Parquet metadata. Otherwise, individual
 columns are encrypted with individual keys as configured via
 ``encryption_config->column_keys``. This field expects a string of the format
 ``"columnKeyID1:colName1,colName2;columnKeyID3:colName3..."``.

 .. literalinclude:: ../../../cpp/examples/arrow/parquet_column_encryption.cc
    :language: cpp
    :start-at: // Set write options with encryption configuration
    :end-at: encryption_config->column_keys
    :emphasize-lines: 4
    :dedent: 2

 See the full `Parquet column encryption example <examples/parquet_column_encryption.html>`_.

 .. note::

    Columns with nested fields (struct or map data types) can be encrypted as a whole, or only
    individual fields. Configure an encryption key for the root column name to encrypt all nested
    fields with this key, or configure a key for individual leaf nested fields.

    Conventionally, the key and value fields of a map column ``m`` have the names
    ``m.key_value.key`` and ``m.key_value.value``, respectively.
    An inner field ``f`` of a struct column ``s`` has the name ``s.f``.

    With above example, *all* inner fields are encrypted with the same key by configuring that key
    for column ``m`` and ``s``, respectively.

 Miscellaneous
 -------------

 +--------------------------+----------+----------+---------+
 | Feature                  | Reading  | Writing  | Notes   |
 +==========================+==========+==========+=========+
 | Column Index             | ✓        | ✓        | \(1)    |
 +--------------------------+----------+----------+---------+
 | Offset Index             | ✓        | ✓        | \(1)    |
 +--------------------------+----------+----------+---------+
 | Bloom Filter             | ✓        | ✓        | \(1)    |
 +--------------------------+----------+----------+---------+
 | CRC checksums            | ✓        | ✓        |         |
 +--------------------------+----------+----------+---------+

 * \(1) Access to the Column Index, Offset Index and Bloom Filter structures
   is provided, but data read APIs do not currently make any use of them.