| The Parquet files AmbiguousList_Modern.parquet and AmbiguousList_Legacy.parquet were generated |
| using the kite script located here: |
| testdata/src/main/java/org/apache/impala/datagenerator/JsonToParquetConverter.java |
| |
| The Parquet files can be regenerated by running the following commands in the testdata |
| directory: |
| |
| mvn package |
| |
| mvn exec:java \ |
| -Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \ |
| -Dexec.args="--legacy_collection_format |
| data/parquet_nested_types_encodings/AmbiguousList.avsc.avsc |
| data/parquet_nested_types_encodings/AmbiguousList.avsc.json |
| data/parquet_nested_types_encodings/AmbiguousList_Legacy.parquet" |
| |
| mvn exec:java \ |
| -Dexec.mainClass="org.apache.impala.datagenerator.JsonToParquetConverter" \ |
| -Dexec.args=" |
| data/parquet_nested_types_encodings/AmbiguousList.avsc.avsc |
| data/parquet_nested_types_encodings/AmbiguousList.avsc.json |
| data/parquet_nested_types_encodings/AmbiguousList_Modern.parquet" |
| |
| The script takes an Avro schema and a JSON file with data and creates a Parquet file. |
| The --legacy_collection_format flag makes the script output a Parquet file that uses the |
| legacy two-level format for nested types, rather than the modern three-level format. |
| |
| More information about the Parquet nested types format can be found here: |
| https://github.com/apache/parquet-format/blob/master/LogicalTypes.md |