tree: f009841d1a5bc60b88960435a16b2538e170cfb6 [path history] [tgz]
  1. README.md
  2. REVIEWERS.md
  3. pom.xml
  4. src/
parquet-tools/README.md

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.

Build

If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Run locally

See Commands Usage for command to use

java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To see usage instructions for all commands:

java -jar ./parquet-tools-<VERSION>.jar --help

Note: To run it on hadoop, you should use hadoop jar instead of java -jar

Meta Legend

Row Group Totals

AcronymDefinition
RCRow Count
TSTotal Byte Size

Row Group Column Details

AcronymDefinition
DODictionary Page Offset
FPOFirst Data Page Offset
SZ:{x}/{y}/{z}Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio
VCValue Count
RLERun-Length Encoding