tree: 0f5049284c48078efc7f53f3d840c84d174d513f [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
  4. REVIEWERS.md
parquet-tools/README.md

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.

Build

If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

#Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

#Run locally

See Commands Usage for command to use

java jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To see usage instructions for all commands:

java jar ./parquet-tools-<VERSION>.jar --help

Note: To run it on hadoop, you should use hadoop jar instead of java jar

Meta Legend

Row Group Totals

AcronymDefinition
RCRow Count
TSTotal Byte Size

Row Group Column Details

AcronymDefinition
DODictionary Page Offset
FPOFirst Data Page Offset
SZ:{x}/{y}/{z}Size in bytes. x = Compressed total, y = uncompressed total, z = y:x ratio
VCValue Count
RLERun-Length Encoding