blob: 6931bab1694d42ef65201d8fcf750ad2d9323aa9 [file] [log] [blame]
The two Parquet files (nullable.parq and nonnullable_orc.parq) were generated
as testdata/data/schemas/nested/README stated.
The two ORC files (nullable.orc and nonnullable.orc) were generated by the orc-tools
which can convert JSON files into ORC format. However, we need to modify nullable.json
and nonnullable.json to meet the format it requires. The whole file should not be a array.
It should be JSON objects of each row joined by '\n'. Assume the JSON files are
nullable_orc.json and nonnullable_orc.json.
The ORC files can be regenerated by running the following commands in current directory:
wget https://search.maven.org/remotecontent?filepath=org/apache/orc/orc-tools/1.5.4/orc-tools-1.5.4-uber.jar \
-O orc-tools-1.5.4-uber.jar
java -jar orc-tools-1.5.4-uber.jar convert \
-s "struct<id:bigint,int_array:array<int>,int_array_Array:array<array<int>>,int_map:map<string,int>,int_Map_Array:array<map<string,int>>,nested_struct:struct<A:int,b:array<int>,C:struct<d:array<array<struct<E:int,F:string>>>>,g:map<string,struct<H:struct<i:array<double>>>>>>" \
-o nullable.orc \
nullable_orc.json
java -jar orc-tools-1.5.4-uber.jar convert \
-s "struct<ID:bigint,Int_Array:array<int>,int_array_array:array<array<int>>,Int_Map:map<string,int>,int_map_array:array<map<string,int>>,nested_Struct:struct<a:int,B:array<int>,c:struct<D:array<array<struct<e:int,f:string>>>>,G:map<string,struct<h:struct<i:array<double>>>>>>" \
-o nonnullable.orc \
nonnullable_orc.json