tree: a62c30c1e23ed75452adccddd5a8e9f6e7dd7754 [path history] [tgz]
  1. src/
  2. pom.xml
  3. README.md
  4. REVIEWERS.md
parquet-tools/README.md

Parquet Tools

Parquet-Tools contains java based command line tools that aid in the inspection of Parquet files.

Currently these tools are available for UN*X systems.

Build

If you want to use parquet-tools in local mode, you should use the local profile so the hadoop client dependency is included.

cd parquet-tools && mvn clean package -Plocal 

To use it in hadoop mode, the default profile will exclude the hadoop client dependency

cd parquet-tools && mvn clean package 

The resulting jar is target/parquet-tools-.jar, you can copy it to the place where you want to use it

#Run from hadoop

See Commands Usage for command to use

hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

#Run locally

See Commands Usage for command to use

java jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet

Commands Usage

To run it on hadoop, you should use “hadoop jar” instead of “java jar”

usage: java jar ./parquet-tools-<VERSION>.jar cat [option...] <input>
where option is one of:
       --debug     Disable color output even if supported
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: java jar ./parquet-tools-<VERSION>.jar head [option...] <input>
where option is one of:
       --debug          Disable color output even if supported
    -h,--help           Show this help string
    -n,--records <arg>  The number of records to show (default: 5)
       --no-color       Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: java jar ./parquet-tools-<VERSION>.jar schema [option...] <input>
where option is one of:
    -d,--detailed <arg>  Show detailed information about the schema.
       --debug           Disable color output even if supported
    -h,--help            Show this help string
       --no-color        Disable color output even if supported
where <input> is the parquet file containing the schema to show

usage: java jar ./parquet-tools-<VERSION>.jar meta [option...] <input>
where option is one of:
       --debug     Disable color output even if supported
    -h,--help      Show this help string
       --no-color  Disable color output even if supported
where <input> is the parquet file to print to stdout

usage: java jar dump [option...] <input>
where option is one of:
    -c,--column <arg>  Dump only the given column, can be specified more than
                       once
    -d,--disable-data  Do not dump column data
       --debug         Disable color output even if supported
    -h,--help          Show this help string
    -m,--disable-meta  Do not dump row group and page metadata
       --no-color      Disable color output even if supported
where <input> is the parquet file to print to stdout