blob: 3b46f1d007a9da5ba6a120506ab63590f84bd537 [file] [log] [blame]
The Parquet files AmbiguousList_Modern.parquet and AmbiguousList_Legacy.parquet were generated
using the kite script located here:
testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java
The Parquet files can be regenerated by running the following commands in the testdata
directory:
mvn package
mvn exec:java \
-Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \
-Dexec.args="--legacy_collection_format
data/parquet_nested_types_encodings/AmbiguousList.avsc.avsc
data/parquet_nested_types_encodings/AmbiguousList.avsc.json
data/parquet_nested_types_encodings/AmbiguousList_Legacy.parquet"
mvn exec:java \
-Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \
-Dexec.args="
data/parquet_nested_types_encodings/AmbiguousList.avsc.avsc
data/parquet_nested_types_encodings/AmbiguousList.avsc.json
data/parquet_nested_types_encodings/AmbiguousList_Modern.parquet"
The script takes an Avro schema and a JSON file with data and creates a Parquet file.
The --legacy_collection_format flag makes the script output a Parquet file that uses the
legacy two-level format for nested types, rather than the modern three-level format.
More information about the Parquet nested types format can be found here:
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md