tree: 404b8241bf12f082b65c02149ee4d8521871d1ca [path history] [tgz]
  1. src/
  2. pom.xml

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.


If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Run locally

See Commands Usage for command to use

java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To see usage instructions for all commands:

java -jar ./parquet-tools-<VERSION>.jar --help

Note: To run it on hadoop, you should use hadoop jar instead of java -jar

Meta Legend

Row Group Totals

RCRow Count
TSTotal Byte Size

Row Group Column Details

DODictionary Page Offset
FPOFirst Data Page Offset
SZ:{x}/{y}/{z}Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio
VCValue Count
RLERun-Length Encoding