Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.
Currently these tools are available for UN*X systems.
If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.
cd parquet-tools && mvn clean package -Plocal
To use it in hadoop mode, the default profile will exclude the hadoop client dependency
cd parquet-tools && mvn clean package
The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it
See Commands Usage for command to use
hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
See Commands Usage for command to use
java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
To see usage instructions for all commands:
java -jar ./parquet-tools-<VERSION>.jar --help
Note: To run it on hadoop, you should use hadoop jar instead of java -jar
| Acronym | Definition |
|---|---|
| RC | Row Count |
| TS | Total Byte Size |
| Acronym | Definition |
|---|---|
| DO | Dictionary Page Offset |
| FPO | First Data Page Offset |
| SZ:{x}/{y}/{z} | Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio |
| VC | Value Count |
| RLE | Run-Length Encoding |