change buffer size to match rle threshold
1 file changed
tree: 68c23f6765e4868b547a901a4a34134bb7118e83
  1. doc/
  2. parquet-avro/
  3. parquet-cascading/
  4. parquet-column/
  5. parquet-hadoop/
  6. parquet-pig/
  7. parquet-scrooge/
  8. parquet-test-hadoop2/
  9. parquet-thrift/
  10. src/
  11. .gitignore
  12. .travis.yml
  13. LICENSE
  14. NOTICE
  15. optimize_cost.py
  16. pom.xml
  17. README.md
README.md

Parquet MR Build Status

Parquet-mr is the java implementation of the Parquet format to be used in Hadoop. It uses the record shredding and assembly algorithm described in the Dremel paper. Integration with Pig and Map/Reduce are provided.

Apache Pig integration

A Loader and a Storer are provided to read and write Parquet files with Apache Pig

Map/Reduce integration

Thrift

Thrift mapping to the parquet schema is provided using a TBase extending class. You can read and write parquet files using Thrift generated classes.

Create your own objects

  • The ParquetOutputFormat can be provided a WriteSupport to write your own objects to an event based RecordConsumer.
  • the ParquetInputFormat can be provided a ReadSupport to materialize your own POJOs by implementing a RecordMaterializer

See the APIs:

Build

to run the unit tests: mvn test

to build the jars: mvn package

The build runs in Travis CI: Build Status

Add Parquet as a dependency in Maven

Snapshot releases

  <repositories>
    <repository>
      <id>sonatype-nexus-snapshots</id>
      <url>https://oss.sonatype.org/content/repositories/snapshots</url>
      <releases>
        <enabled>false</enabled>
      </releases>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
     </repository>
  </repositories>
  <dependencies>
    <dependency>
      <groupId>com.twitter</groupId>
      <artifactId>parquet-column</artifactId>
      <version>1.0.0-SNAPSHOT</version>
    </dependency>
    <dependency>
      <groupId>com.twitter</groupId>
      <artifactId>parquet-hadoop</artifactId>
      <version>1.0.0-SNAPSHOT</version>
    </dependency>
  </dependencies>

Official releases

We haven't published a 1.0.0 yet

Authors and contributors

Discussions

License

Copyright 2012 Twitter, Inc.

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0