tree: 13df69eec4ef56311f39c5ab2a36b7775a0ff607 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
  4. REVIEWERS.md
parquet-tools-deprecated/README.md

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.

Build

If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Run locally

See Commands Usage for command to use

java -jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To see usage instructions for all commands:

java -jar ./parquet-tools-<VERSION>.jar --help

Note: To run it on hadoop, you should use hadoop jar instead of java -jar

Meta Legend

Row Group Totals

AcronymDefinition
RCRow Count
TSTotal Byte Size

Row Group Column Details

AcronymDefinition
DODictionary Page Offset
FPOFirst Data Page Offset
SZ:{x}/{y}/{z}Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio
VCValue Count
RLERun-Length Encoding