Displays the contents of the ORC file as a JSON document. With the columns
argument only the selected columns are printed.
% orc-contents [--columns=1,2,...] <filename>
If you run it on the example file TestOrcFile.test1.orc, you'll see (without the line breaks within each record):
% orc-contents examples/TestOrcFile.test1.orc {"boolean1": false, "byte1": 1, "short1": 1024, "int1": 65536, \\ "long1": 9223372036854775807, "float1": 1, "double1": -15, \\ "bytes1": [0, 1, 2, 3, 4], "string1": "hi", "middle": \\ {"list": [{"int1": 1, "string1": "bye"}, \\ {"int1": 2, "string1": "sigh"}]}, \\ "list": [{"int1": 3, "string1": "good"}, \\ {"int1": 4, "string1": "bad"}], \\ "map": []} {"boolean1": true, "byte1": 100, "short1": 2048, "int1": 65536, "long1": 9223372036854775807, "float1": 2, "double1": -5, \\ "bytes1": [], "string1": "bye", \\ "middle": {"list": [{"int1": 1, "string1": "bye"}, \\ {"int1": 2, "string1": "sigh"}]}, \\ "list": [{"int1": 100000000, "string1": "cat"}, \\ {"int1": -100000, "string1": "in"}, \\ {"int1": 1234, "string1": "hat"}], \\ "map": [{"key": "chani", "value": {"int1": 5, "string1": "chani"}}, \\ {"key": "mauddib", \\ "value": {"int1": 1, "string1": "mauddib"}}]}
Displays the metadata of the ORC file as a JSON document. With the verbose
option additional information about the layout of the file is also printed.
For diagnosing problems, it is useful to use the ‘--raw’ option that prints the protocol buffers from the ORC file directly rather than interpreting them.
% orc-metadata [-v] [--raw] <filename>
If you run it on the example file TestOrcFile.test1.orc, you'll see:
% orc-metadata examples/TestOrcFile.test1.orc { "name": "../examples/TestOrcFile.test1.orc", "type": "struct<boolean1:boolean,byte1:tinyint,short1:smallint, int1:int,long1:bigint,float1:float,double1:double,bytes1:binary, string1:string,middle:struct<list:array<struct<int1:int,string1: string>>>,list:array<struct<int1:int,string1:string>>,map:map< string,struct<int1:int,string1:string>>>", "rows": 2, "stripe count": 1, "format": "0.12", "writer version": "HIVE-8732", "compression": "zlib", "compression block": 10000, "file length": 1711, "content": 1015, "stripe stats": 250, "footer": 421, "postscript": 24, "row index stride": 10000, "user metadata": { }, "stripes": [ { "stripe": 0, "rows": 2, "offset": 3, "length": 1012, "index": 570, "data": 243, "footer": 199 } ] }
Imports CSV file into an Orc file using the specified schema. Compound types are not yet supported. delimiter
option indicates the delimiter in the input CSV file and by default is ,
. stripe
option means the stripe size and set to 128MB by default. block
option is compression block size which is 64KB by default. batch
option is by default 1024 rows for one batch.
% csv-import [--delimiter=<character>] [--stripe=<size>] [--block=<size>] [--batch=<size>] <schema> <inputCSVFile> <outputORCFile>
If you run it on the example file TestCSVFileImport.test10rows.csv, you'll see:
% csv-import "struct<a:bigint,b:string,c:double>" examples/TestCSVFileImport.test10rows.csv /tmp/test.orc [2018-04-11 11:12:16] Start importing Orc file... [2018-04-11 11:12:16] Finish importing Orc file. [2018-04-11 11:12:16] Total writer elasped time: 0.001352s. [2018-04-11 11:12:16] Total writer CPU time: 0.001339s.
Scans and displays the row count of the ORC file. With the batch
option to set the batch size which is 1024 rows by default. It is useful to check if the ORC file is damaged.
% orc-scan [--batch=<size>] <filename>
If you run it on the example file TestOrcFile.test1.orc, you'll see:
% orc-scan examples/TestOrcFile.test1.orc Rows: 2 Batches: 1
Displays the file-level and stripe-level column statistics of the ORC file. With the withIndex
option to include column statistics in each row group.
% orc-statistics [--withIndex] <filename>
If you run it on the example file TestOrcFile.TestOrcFile.columnProjection.orc you'll see:
% orc-statistics examples/TestOrcFile.columnProjection.orc File examples/TestOrcFile.columnProjection.orc has 3 columns *** Column 0 *** Column has 21000 values and has null value: no *** Column 1 *** Data type: Integer Values: 21000 Has null: no Minimum: -2147439072 Maximum: 2147257982 Sum: 268482658568 *** Column 2 *** Data type: String Values: 21000 Has null: no Minimum: 100119c272d7db89 Maximum: fffe9f6f23b287f3 Total length: 334559 File examples/TestOrcFile.columnProjection.orc has 5 stripes *** Stripe 0 *** --- Column 0 --- Column has 5000 values and has null value: no --- Column 1 --- Data type: Integer Values: 5000 Has null: no Minimum: -2145365268 Maximum: 2147025027 Sum: -29841423854 --- Column 2 --- Data type: String Values: 5000 Has null: no Minimum: 1005350489418be2 Maximum: fffbb8718c92b09f Total length: 79644 *** Stripe 1 *** --- Column 0 --- Column has 5000 values and has null value: no --- Column 1 --- Data type: Integer Values: 5000 Has null: no Minimum: -2147115959 Maximum: 2147257982 Sum: 108604887785 --- Column 2 --- Data type: String Values: 5000 Has null: no Minimum: 100119c272d7db89 Maximum: fff0ae41d41e6afc Total length: 79640 *** Stripe 2 *** --- Column 0 --- Column has 5000 values and has null value: no --- Column 1 --- Data type: Integer Values: 5000 Has null: no Minimum: -2145932387 Maximum: 2145877119 Sum: 70064190848 --- Column 2 --- Data type: String Values: 5000 Has null: no Minimum: 10130af874ae036c Maximum: fffe9f6f23b287f3 Total length: 79645 *** Stripe 3 *** --- Column 0 --- Column has 5000 values and has null value: no --- Column 1 --- Data type: Integer Values: 5000 Has null: no Minimum: -2147439072 Maximum: 2147074354 Sum: 104681356482 --- Column 2 --- Data type: String Values: 5000 Has null: no Minimum: 102547d48ed06518 Maximum: fffa47c57dc7b69a Total length: 79689 *** Stripe 4 *** --- Column 0 --- Column has 1000 values and has null value: no --- Column 1 --- Data type: Integer Values: 1000 Has null: no Minimum: -2141222223 Maximum: 2145816096 Sum: 14973647307 --- Column 2 --- Data type: String Values: 1000 Has null: no Minimum: 1059d81c9025a217 Maximum: ffc17f0e35e1a6c0 Total length: 15941