You can build this project using maven:
mvn clean install -DskipTests
The build produces a shaded Jar that can be run using the hadoop
command:
hadoop jar parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main
For a shorter command-line invocation, add an alias to your shell like this:
alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"
To run from the target directory instead of using the hadoop
command, first copy the dependencies to a folder:
mvn dependency:copy-dependencies
Then, run the command-line and add target/dependencies/*
to the classpath:
java -cp 'target/parquet-cli-1.12.3.jar:target/dependency/*' org.apache.parquet.cli.Main
Note that you shouldn‘t include the runtime jar used above into the classpath in this case. In that jar, the org.apache.avro package
is relocated for avoiding conflict with Hadoop’s one. That relocation changes method signatures, so it can cause NoSuchMethodError
depending on the class loading order. See PARQUET-2142 for details.
The parquet
tool includes help for the included commands:
parquet help
Usage: parquet [options] [command] [command options] Options: -v, --verbose, --debug Print extra debugging information Commands: help Retrieves details on the functions of other commands meta Print a Parquet file's metadata pages Print page summaries for a Parquet file dictionary Print dictionaries for a Parquet column check-stats Check Parquet files for corrupt page and column stats (PARQUET-251) schema Print the Avro schema for a file csv-schema Build a schema from a CSV data sample convert-csv Create a file from CSV data convert Create a Parquet file from a data file to-avro Create an Avro file from a data file cat Print the first N records from a file head Print the first N records from a file column-index Prints the column and offset indexes of a Parquet file column-size Print the column sizes of a parquet file prune (Deprecated: will be removed in 2.0.0, use rewrite command instead) Prune column(s) in a Parquet file and save it to a new file. The columns left are not changed. trans-compression (Deprecated: will be removed in 2.0.0, use rewrite command instead) Translate the compression from one to another (It doesn't support bloom filter feature yet). masking (Deprecated: will be removed in 2.0.0, use rewrite command instead) Replace columns with masked values and write to a new Parquet file footer Print the Parquet file footer in json format bloom-filter Check bloom filters for a Parquet column scan Scan all records from a file rewrite Rewrite one or more Parquet files to a new Parquet file Examples: # print information for create parquet help meta See 'parquet help <command>' for more information on a specific command.