tree: 84e5e8c7606624dc098b70e9cb1d4ef465d5eb5f [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
  4. REVIEWERS.md
parquet-avro/README.md

Apache Avro integration

TODO: Add description and examples how to use parquet-avro

Available options via Hadoop Configuration

Configuration for reading

NameTypeDescription
parquet.avro.data.supplierClassThe implementation of the interface org.apache.parquet.avro.AvroDataSupplier. Available implementations in the library: GenericDataSupplier, ReflectDataSupplier, SpecificDataSupplier.
The default value is org.apache.parquet.avro.SpecificDataSupplier
parquet.avro.read.schemaStringThe Avro schema to be used for reading. It shall be compatible with the file schema. The file schema will be used directly if not set.
parquet.avro.projectionStringThe Avro schema to be used for projection.
parquet.avro.compatiblebooleanFlag for compatibility mode. true for materializing Avro IndexedRecord objects, false for materializing the related objects for either generic, specific, or reflect records.
The default value is true.
parquet.avro.readInt96AsFixedbooleanFlag for handling the INT96 Parquet types. true for converting it to the fixed Avro type, false for not handling INT96 types (throwing exception).
The default value is false.
NOTE: The INT96 Parquet type is deprecated. This option is only to support old data.

Configuration for writing

NameTypeDescription
parquet.avro.write.data.supplierClassThe implementation of the interface org.apache.parquet.avro.AvroDataSupplier. Available implementations in the library: GenericDataSupplier, ReflectDataSupplier, SpecificDataSupplier.
The default value is org.apache.parquet.avro.SpecificDataSupplier
parquet.avro.schemaStringThe Avro schema to be used for generating the Parquet schema of the file.
parquet.avro.write-old-list-structurebooleanFlag whether to write list structures in the old way (2 levels) or the new one (3 levels). When writing at 2 levels no null values are available at the element level.
The default value is true
parquet.avro.add-list-element-recordsbooleanFlag whether to assume that any repeated element in the schema is a list element.
The default value is true.
parquet.avro.write-parquet-uuidbooleanFlag whether to write the Parquet UUID logical type in case of an Avro UUID type is present.
The default value is false.
parquet.avro.writeFixedAsInt96StringComma separated list of paths pointing to Avro schema elements which are to be converted to INT96 Parquet types.
The path is a '.' separated list of field names and does not contain the name of the schema nor the namespace. The type of the referenced schema elements must be fixed with the size of 12 bytes.
NOTE: The INT96 Parquet type is deprecated. This option is only to support old data.