| # 1.3.3 |
| |
| Additions: |
| |
| * UDF for hash functions such as murmur3 and others. (DATAFU-47) |
| * UDF for diffing tuples. (DATAFU-119) |
| * Support for macros in DataFu. Macros count_all_non_distinct and count_distinct_keys were added. (DATAFU-123) |
| * Macro for TFIDF. (DATAFU-61) |
| |
| Improvements: |
| |
| * Added lifecylce hooks to ContextualEvalFunc. (DATAFU-50) |
| * SessionCount and Sessionize now support millisecond precision. (DATAFU-124) |
| * Upgraded to Guava 20.0. (DATAFU-48) |
| * Updated Gradle to 3.5.1. (DATAFU-125) |
| * Rat tasks automatically run during assemble. (DATAFU-118) |
| * Building now works on Windows. (DATAFU-99) |
| |
| # 1.3.2 |
| |
| Improvements: |
| |
| * LICENSE, NOTICE, and DISCLAIMER now included in META-INF of JARs. |
| * Test files now generated to build/test-files within projects. |
| * AliasableEvalFunc now uses getInputSchema. |
| |
| # 1.3.1 |
| |
| Additions: |
| |
| * New UDF CountDistinctUpTo that counts tuples within a bag to a preset limit (DATAFU-117) |
| |
| Improvements: |
| |
| * TupleFromBag and FirstTupleFromBag now implement Accumulator interface as well (DATAFU-114, DATAFU-115) |
| |
| Build System: |
| |
| * IntelliJ Idea support added to build file (DATAFU-103) |
| * JDK version now validated when building (DATAFU-95) |
| |
| # 1.3.0 |
| |
| Additions: |
| |
| * New UDFs for entropy and weighted sampling algorithms (DATAFU-2, DATAFU-26) |
| * Updated SimpleRandomSample to be consistent with SimpleRandomSampleWithReplacement (DATAFU-5) |
| * Created OpenNLP UDF wrappers (DATAFU-8) |
| * Created RandomUUID UDF (DATAFU-18) |
| * Added LSH implementation (DATAFU-37) |
| * Added Base64Encode/Decode (DATAFU-52) |
| * URLInfo UDF (DATAFU-62) |
| * Created SelectFieldByName UDF (DATAFU-69) |
| * Added generic BagJoin that supports inner, left, and full outer joins (DATAFU-70) |
| * Added ZipBags UDF which can zip and arbitrary number of bags into one (DATAFU-79) |
| * Hadoop 2.0 compatibility (DATAFU-58) |
| * Created TupleFromBag.java file (DATAFU-92) |
| |
| Improvements: |
| |
| * Simplified BagGroup output (DATAFU-42) |
| |
| Changes: |
| |
| * StagedOutputJob no longer writes counters by default (DATAFU-35) |
| |
| Fixes: |
| |
| * ReservoirSample does not behave as expected when grouping by a key other than ALL (DATAFU-11) |
| * DistinctBy does not work correctly on strings containing minuses (DATAFU-31) |
| * Hourglass does not honor "fail on missing" in all cases (DATAFU-35) |
| * Hash UDFs return zero-padded strings of uniform length even when leading bits are zero (DATAFU 46) |
| * UDF examples work again (DATAFU-49) |
| * SampleByKey can throw NullPointerException (DATAFU-68) |
| |
| Build system: |
| |
| * Removed legacy checked in jars (DATAFU-55) |
| * Updated to use Pig 0.12.1 (DATAFU-10) |
| * Switched from Ant to Gradle 1.12 (DATAFU-27, DATAFU-44, DATAFU-43, DATAFU-66) |
| * Removed checked in jars, download where necessary (DATAFU-55, DATAFU-55) |
| * Fixed test.sh to use gradlew (DATAFU-77) |
| |
| Release related: |
| |
| * NOTICE updated with dependencies used or shipped with DataFu. |
| * Apache license headers added to all necessary files (DATAFU-4, DATAFU-75) |
| * Added doap file (DATAFU-36) |
| * Source tarball generation, gradle bootstrapping, and release instructions (DATAFU-57, DATAFU-78, DATAFU-72) |
| * Removed author tags (DATAFU-74) |
| * Resolved issues with build-plugin directory (DATAFU-76) |
| * Used Apache RAT to verify correct file headers (DATAFU-73, DATAFU-84) |
| |
| Documentation related: |
| |
| * New website (DATAFU-20, etc.) |
| * StreamingQuantile PDF link is broken (DATAFU-29) |
| * README file updated |
| |
| # 1.2.0 |
| |
| Additions: |
| |
| * Pair of UDFs for simple random sampling with replacement. |
| * More dependencies now packaged in DataFu so fewer JAR dependencies required. |
| * SetDifference UDF for computing set difference A-B or A-B-C. |
| * HyperLogLogPlusPlus UDF for efficient cardinality estimation. |
| |
| # 1.1.0 |
| |
| This release adds compatibility with Pig 0.12 (courtesy of jarcec). |
| |
| Additions: |
| |
| * Added SHA hash UDF. |
| * InUDF and AssertUDF added for Pig 0.12 compatibility. These are the same as In and Assert. |
| * SimpleRandomSample, which implements a scalable simple random sampling algorithm. |
| |
| Fixes: |
| |
| * Fixed the schema declarations of several UDFs for compatibility with Pig 0.12, which is now stricter with schemas. |
| |
| # 1.0.0 |
| |
| **This is not a backwards compatible release.** |
| |
| Additions: |
| |
| * Added SampleByKey, which provides a way to sample tuples based on certain fields. |
| * Added Coalesce, which returns the first non-null value from a list of arguments like SQL's COALESCE. |
| * Added BagGroup, which performs an in-memory group operation on a bag. |
| * Added ReservoirSample |
| * Added In filter func, which behaves like SQL's IN |
| * Added EmptyBagToNullFields, which enables multi-relation left joins using COGROUP |
| * Sessionize now supports long values for timestamp, in addition to string representation of time. |
| * BagConcat can now operate on a bag of bags, in addition to a tuple of bags |
| * Created TransposeTupleToBag, which creates a bag of key-value pairs from a tuple |
| * SessionCount now implements Accumulator interface |
| * DistinctBy now implements Accumulator interface |
| * Using PigUnit from Maven for testing, instead of checked-in JAR |
| * Added many more test cases to improve coverage |
| * Improved documentation |
| |
| Changes: |
| |
| * Moved WeightedSample to datafu.pig.sampling |
| * Using Pig 0.11.1 for testing. |
| * Renamed package datafu.pig.numbers to datafu.pig.random |
| * Renamed package datafu.pig.bag.sets to datafu.pig.sets |
| * Renamed TimeCount to SessionCount, moved to datafu.pig.sessions |
| * ASSERT renamed to Assert |
| * MD5Base64 merged into MD5 implementation, constructor arg picks which method, default being hex |
| |
| Removals: |
| |
| * Removed ApplyQuantiles |
| * Removed AliasBagFields, since can now achieve with nested foreach |
| |
| Fixes: |
| |
| * Quantile now outputs schemas consistent with StreamingQuantile |
| * Necessary fastutil classes now packaged in datafu JAR, so fastutil JAR not needed as dependency |
| * Non-deterministic UDFs now marked as so |
| |
| # 0.0.10 |
| |
| Additions: |
| |
| * CountEach now implements Accumulator |
| * Added AliasableEvalFunc, a base class to enable UDFs to access fields in tuple by name instead of position |
| * Added BagLeftOuterJoin, which can perform left join on two or more reasonably sized bags without a reduce |
| |
| Fixes: |
| |
| * StreamingQuantile schema fix |
| |
| # 0.0.9 |
| |
| Additions: |
| |
| * WeightedSample can now take a seed |
| |
| Changes: |
| |
| * Test against Pig 0.11.0 |
| |
| Fixes: |
| |
| * Null pointer fix for Enumerate's Accumulator implementation |