parquet-cli - parquet-java

tree: 6f298abbdb63f263f53ab24fa0e942fa57feb4b5 [path history] [tgz]

parquet-cli/README.md

Building

You can build this project using maven:

mvn clean install -DskipTests

Running

The build produces a shaded Jar that can be run using the hadoop command:

hadoop jar parquet-cli-1.9.1-runtime.jar org.apache.parquet.cli.Main

For a shorter command-line invocation, add an alias to your shell like this:

alias parquet="hadoop jar /path/to/parquet-cli-1.9.1-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"

Running without Hadoop

To run from the target directory instead of using the hadoop command, first copy the dependencies to a folder:

mvn dependency:copy-dependencies

Then, run the command-line and add target/dependencies/* to the classpath:

java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main

Help

The parquet tool includes help for the included commands:

parquet help

Usage: parquet [options] [command] [command options]

  Options:

    -v, --verbose, --debug
        Print extra debugging information

  Commands:

    help
        Retrieves details on the functions of other commands
    meta
        Print a Parquet file's metadata
    pages
        Print page summaries for a Parquet file
    dictionary
        Print dictionaries for a Parquet column
    check-stats
        Check Parquet files for corrupt page and column stats (PARQUET-251)
    schema
        Print the Avro schema for a file
    csv-schema
        Build a schema from a CSV data sample
    convert-csv
        Create a file from CSV data
    convert
        Create a Parquet file from a data file
    to-avro
        Create an Avro file from a data file
    cat
        Print the first N records from a file
    head
        Print the first N records from a file

  Examples:

    # print information for create
    parquet help create

  See 'parquet help <command>' for more information on a specific command.