change buffer size to match rle threshold
1 file changed
tree: 68c23f6765e4868b547a901a4a34134bb7118e83
  1. .gitignore
  2. .travis.yml
  6. doc/
  8. parquet-avro/
  9. parquet-cascading/
  10. parquet-column/
  11. parquet-hadoop/
  12. parquet-pig/
  13. parquet-scrooge/
  14. parquet-test-hadoop2/
  15. parquet-thrift/
  16. pom.xml
  17. src/

Parquet MR Build Status

Parquet-mr is the java implementation of the Parquet format to be used in Hadoop. It uses the record shredding and assembly algorithm described in the Dremel paper. Integration with Pig and Map/Reduce are provided.

Apache Pig integration

A Loader and a Storer are provided to read and write Parquet files with Apache Pig

Map/Reduce integration


Thrift mapping to the parquet schema is provided using a TBase extending class. You can read and write parquet files using Thrift generated classes.

Create your own objects

  • The ParquetOutputFormat can be provided a WriteSupport to write your own objects to an event based RecordConsumer.
  • the ParquetInputFormat can be provided a ReadSupport to materialize your own POJOs by implementing a RecordMaterializer

See the APIs:


to run the unit tests: mvn test

to build the jars: mvn package

The build runs in Travis CI: Build Status

Add Parquet as a dependency in Maven

Snapshot releases


Official releases

We haven't published a 1.0.0 yet

Authors and contributors



Copyright 2012 Twitter, Inc.

Licensed under the Apache License, Version 2.0: