You can build this project using maven:
mvn clean install -DskipTests
The build produces a shaded Jar that can be run using the hadoop command:
hadoop jar parquet-cli-1.9.1-runtime.jar org.apache.parquet.cli.Main
For a shorter command-line invocation, add an alias to your shell like this:
alias parquet="hadoop jar /path/to/parquet-cli-1.9.1-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"
To run from the target directory instead of using the hadoop command, first copy the dependencies to a folder:
mvn dependency:copy-dependencies
Then, run the command-line and add target/dependencies/* to the classpath:
java -cp 'target/*:target/dependency/*' org.apache.parquet.cli.Main
The parquet tool includes help for the included commands:
parquet help
Usage: parquet [options] [command] [command options]
Options:
-v, --verbose, --debug
Print extra debugging information
Commands:
help
Retrieves details on the functions of other commands
meta
Print a Parquet file's metadata
pages
Print page summaries for a Parquet file
dictionary
Print dictionaries for a Parquet column
check-stats
Check Parquet files for corrupt page and column stats (PARQUET-251)
schema
Print the Avro schema for a file
csv-schema
Build a schema from a CSV data sample
convert-csv
Create a file from CSV data
convert
Create a Parquet file from a data file
to-avro
Create an Avro file from a data file
cat
Print the first N records from a file
head
Print the first N records from a file
Examples:
# print information for create
parquet help create
See 'parquet help <command>' for more information on a specific command.