blob: 479611d360e934c07f2cfe2e41eb6de41df39b1b [file] [log] [blame]
The two Parquet files (legacy_nested.parquet and modern_nested.parquet) were generated
using the kite script located here:
testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java
The Parquet files can be regenerated by running the following commands in the testdata
directory:
mvn package
mvn exec:java \
-Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \
-Dexec.args="--legacy_collection_format
data/schemas/nested/nested.avsc
data/schemas/nested/nested.json
data/schemas/nested/legacy_nested.parquet"
mvn exec:java \
-Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \
-Dexec.args="
data/schemas/nested/nested.avsc
data/schemas/nested/nested.json
data/schemas/nested/modern_nested.parquet"
The script takes an Avro schema and a JSON file with data and creates a Parquet file.
The --legacy_collection_format flag makes the script output a Parquet file that uses the
legacy two-level format for nested types, rather than the modern three-level format.
More information about the Parquet nested types format can be found here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md