commit | 42e80eeac1fef4a7ac38fa3f2e71699751ef099e | [log] [tgz] |
---|---|---|
author | Alex Levenson <alexlevenson@twitter.com> | Fri May 31 01:26:19 2013 -0700 |
committer | Alex Levenson <alexlevenson@twitter.com> | Fri May 31 01:26:19 2013 -0700 |
tree | 68c23f6765e4868b547a901a4a34134bb7118e83 | |
parent | e3fb1336587cf7210a9995214bd06fc20189ec16 [diff] |
change buffer size to match rle threshold
Parquet-mr is the java implementation of the Parquet format to be used in Hadoop. It uses the record shredding and assembly algorithm described in the Dremel paper. Integration with Pig and Map/Reduce are provided.
A Loader and a Storer are provided to read and write Parquet files with Apache Pig
Thrift mapping to the parquet schema is provided using a TBase extending class. You can read and write parquet files using Thrift generated classes.
See the APIs:
to run the unit tests: mvn test
to build the jars: mvn package
The build runs in Travis CI:
<repositories> <repository> <id>sonatype-nexus-snapshots</id> <url>https://oss.sonatype.org/content/repositories/snapshots</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>com.twitter</groupId> <artifactId>parquet-column</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> <dependency> <groupId>com.twitter</groupId> <artifactId>parquet-hadoop</artifactId> <version>1.0.0-SNAPSHOT</version> </dependency> </dependencies>
We haven't published a 1.0.0 yet
Copyright 2012 Twitter, Inc.
Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0