Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.
Currently these tools are available for UN*X systems.
If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.
cd parquet-tools && mvn clean package -Plocal
To use it in hadoop mode, the default profile will exclude the hadoop client dependency
cd parquet-tools && mvn clean package
The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it
See Commands Usage for command to use
hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
See Commands Usage for command to use
java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
To see usage instructions for all commands:
java -jar ./parquet-tools-<VERSION>.jar --help
Note: To run it on hadoop, you should use hadoop jar
instead of java -jar
Acronym | Definition |
---|---|
RC | Row Count |
TS | Total Byte Size |
Acronym | Definition |
---|---|
DO | Dictionary Page Offset |
FPO | First Data Page Offset |
SZ:{x}/{y}/{z} | Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio |
VC | Value Count |
RLE | Run-Length Encoding |