| commit | f75f9c4fdf9b94944b3d51cf6dab256e077899e7 | [log] [tgz] |
|---|---|---|
| author | Philip (flip) Kromer <flip@infochimps.org> | Mon May 19 09:12:34 2014 -0500 |
| committer | Matthew Hayes <matthew.terence.hayes@gmail.com> | Tue May 20 07:27:18 2014 -0700 |
| tree | 1dcb6a1f27c56e0628a70225f7249dc5ba8e7edc | |
| parent | 3cfcea78d5dfed961625f8dfce0f75579ac0b0ff [diff] |
DATAFU-49 Examples work again * Removed dependency on Guava (which it didn't actually depend on) and on Piggybank (date functions are now first class) * Path to datafu jar correct for current repo layout * I made the quantile examples demonstrate a comparison of the exact vs approx algorithms * Added the script to generate data for quantile. * Quantile examples demonstrate both ways of constructing a Quantile UDF (number of partitions vs list of breakpoints) https://issues.apache.org/jira/browse/DATAFU-49 Signed-off-by: Matthew Hayes <matthew.terence.hayes@gmail.com>
Apache DataFu is a collection of libraries for working with large-scale data in Hadoop. The project was inspired by the need for stable, well-tested libraries for data mining and statistics.
It consists of two libraries:
For more information please visit the website:
If you'd like to jump in and get started, check out the corresponding guides for each library:
Bugs and feature requests can be filed here. For other help please see the discussion group.
The Apache DataFu Pig library can be built by running the command below. More information about working with the source code can be found in the DataFu Pig Contributing Guide.
./gradlew assemble
The built JAR can be found under datafu-pig/build/libs by the name datafu-pig-x.y.z.jar, where x.y.z is the version.
This command generates the eclipse project and classpath files:
./gradlew eclipse
To clean up the eclipse files:
./gradlew cleanEclipse
To run all the tests:
./gradlew test
To run tests for a single class, use the test.single property. For example, to run only the QuantileTests:
/gradlew :datafu-pig:test -Dtest.single=QuantileTests
The tests can also be run from within eclipse.
The Apache DataFu Pig library can be built by running the commands below. More information about working with the source code can be found in the DataFu Hourglass Contributing Guide.
cd contrib/hourglass ant jar