blob: a7873c8b34a260dbfa817d09424d76a73652ba62 [file] [log] [blame]
Pig Change Log
Trunk (unreleased changes)
INCOMPATIBLE CHANGES
IMPROVEMENTS
OPTIMIZATIONS
BUG FIXES
Release 0.1.1 - Unreleased
INCOMPATIBLE CHANGES
NEW FEATURES
IMPROVEMENTS
PIG-253: integration with hadoop-18
BUG FIXES
PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. (bdimcheff via gates)
Release 0.1.0 - - 2008-09-11
INCOMPATIBLE CHANGES
PIG-123: requires escape of '\' in chars and string
NEW FEATURES
PIG-20 Added custom comparator functions for order by (phunt via gates)
PIG-94: Streaming implementation
PIG-58: parameter substitution
PIG-55: added custom splitter (groves via olgan)
PIG-59: Add a new ILLUSTRATE command (shubhamc via gates).
PIG-256: Added variable argument support for UDFs (pi_song)
IMPROVEMENTS
PIG-8 added binary comparator (olgan)
PIG-11 Add capability to search for jar file to register. (antmagna via
olgan)
PIG-7: Added use of combiner in some restricted cases. (gates)
PIG-47: Added methods to DataMap to provide access to its content
PIG-30: Rewrote DataBags to better handle decisions of when to spill to
disk and to spill more intelligently. (gates)
PIG-12: Added time stamps to log4j messages (phunt via gates).
PIG-44: Added adaptive decision of the number of records to hold in memory
before spilling (utkarsh)
PIG-56: Made DataBag implement Iterable. (groves via gates)
PIG-39: created more efficient version of read (spullara via olgan)
PIG-32: ABstraction layer (olgan)
PIG-83: Change everything except grunt and Main (PigServer on down) to use
common logging abstraction instead of log4j. By default in grunt, log4j
still used as logging layer. Also converted all System.out/err.println
statements to use logging instead. (francisoud via gates)
PIG-13: adding version to the system (joa23 via olgan)
PIG-113: Make explain output more understandable (pi_song via gates)
PIG-120: Support map reduce in local mode. To do this user needs to
specify execution type as mapreduce and cluster name as local (joa23 via
gates).
PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud
via gates).
PIG-111: Reworked configuration to be setable via properties.
(Contributions from joa23, pi_song, and oae via gates).
BUG FIXES
PIG-24 Files that were incorrectly placed under test/reports have been
removed. ant clean now cleans test/reports. (milindb via gates)
PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@)
PIG-23 Made pig work with java 1.5. (milindb via gates)
PIG-17 integrated with Hadoop 0.15 (olgan@)
PIG-33 Help was commented out - uncommented (olgan)
PIG-31: second half of concurrent mode problem addressed (olgan)
PIG-14: added heartbeat functionality (olgan)
PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release
PIG-29: fixed bag factory to be properly initialized (utkarsh)
PIG-43: fixed problem where using the combiner prevented a pig alias
from being evaluated more than once. (gates)
PIG-45: Fixed pig.pl to not assume hodrc file is named the same as
cluster name (gates).
PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples
instead of Tuples, causing Reducer to crash in some cases.
PIG-41: Added patterns to svn:ignore
PIG-51: Fixed combiner in the presence of flattening
PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the
comparator function instead of Class.forName. (gates)
PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@)
PIG-77: Added eclipse specific files to svn:ignore
PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates)
PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates)
PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arun
via olgan)
PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default
path. Also fix it to not die if pigclient.conf is missing. (craigm via
gates).
PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill
files when they are done spilling (contributions by craigm, breed, and
gates, committed by gates).
PIG-95: Remove System.exit() statements from inside pig (joa23 via gates).
PIG-65: convert tabs to spaces (groves via olgan)
PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when
more than one bag is involved (gates).
PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf
reference. (francisoud via gates)
PIG-80: In a number of places stack trace information was being lost by an
exception being caught, and a different exception then thrown. All those
locations have been changed so that the new exception now wraps the old.
(francisoud via gates).
PIG-84: Converted printStackTrace calls to calls to the logger.
(francisoud via gates).
PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates).
PIG-99: Fix to make unit tests not run out of memory. (francisoud via
gates).
PIG-107: enabled several tests. (francisoud via olgan)
PIG-46: abort processing on error for non-interactive mode (olston via
olgan)
PIG-109: improved exception handling (oae via olgan)
PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can
be run w/o access to a hadoop cluster. (xuzh via gates)
PIG-68: improvements to build.xml (joa23 via olgan)
PIG-110: Replaced code accidently merged out in PIG-32 fix that handled
flattening the combiner case. (gates and oae)
PIG-68 broke the build process by hardwiring hadoop15 jar for the purpose
of compile. Fixed that (olgan)
PIG-124: only run one test (ant test -Dtestcase=TestMapReduce) not the
complete test suite (xuzh vi olgan)
PIG-127: changes to build.xml to have description for each target
(francisoud via olgan)
PIG-101: changes in tests to use enum type (francisoud via olgan)
PIG-125: Improve exception handling in cases when an attempt is made to
access a field as a tuple, and it turns out not to be a tuple (oae via
gates).
PIG-13: make the code use svn only if available (joa23 via olgan)
PIG-118: make sure union/join/cross takes 2 params (pi_song vi olgan)
PIG-94: M1 for streaming: maps and reduce side support with default
(de)serializer (acmurthy via olgan)
PIG-129: making sure that temp files are stored in task's home dir and
cleaned up
PIG-115: Removed Yahoo specific scripts/pig.pl, replaced with generic
bash script bin/pig. Moved startHOD.expect to bin (joa23 via gates).
PIG-18: changes to make pig work with Hadoop 0.16 and HOD 0.4 (olgan)
PIG-164: Fix memory issue in SpillableMemoryManager to partially clean the list of
bags each time a new bag is added rather than waiting until the garbage
collector tells us we are out of memory (gates).
PIG-154: moving parsing for DEFINE and STORE into QueryParser
PIG-100: improved error handling
PIG-94: changes for M2 of streaming: input/ouptut/ ship/cache error
handling
PIG-108: Fixed PigCombine to not do initialization on every call to
reduce, but instead only do it once in the call to configure. (joa23 via
gates).
PIG-172: dealing with NULL error messages in exceptions (olgan)
PIG-170: sort bags so that largest ones are released first
PIG-122: Added build and src-gen to the list of ignore files in
the top level directory (joa23 via gates).
PIG-94: M3 code update for streaming (arunc via olgan)
PIG-179: Changed PigRecordReader to be a static singleton rather than
thread local. (gates).
PIG-174,180: bug fixes in streaming (arunc via olgan)
PIG-181: streaming bug fixing (arunc via olgan)
PIG-182: streaming bug fix (arunc via olgan)
PIG-184: streaming bug fixes
PIG-153: Incorrect result caused by dump in between statements (pi_song
via gates).
PIG-178: Use of schema on secondary output of SPLIT throws
IndexOutOfBoundsException (kali via gates).
PIG-203: Fix bug in parameter substitution code where any pig script over
1k caused pig to freeze. (kali via gates)
PIG-204: Repair broken input splits (acmurthy via gates).
PIG-188: Fix mismatches between pig slicer changes and new streaming
feature (acmurthy via gates).
PIG-149, PIG-150: Fix doc target so that ant can generate docs (xuzh via
gates).
PIG-183: Catch when a UDF has been compiled with the wrong version of
java and give a RuntimeException (pi_song via gates).
PIG-114: store one alias/logicalPlan twice leads to instantiation of
StoreFunc as LoadFunc (pi_song via gates).
PIG-213: Remove non-static references to logger from data bags and tuples,
as it causes significant overhead (vgeschel via gates).
PIG-216: Fix streaming to work with commands that use unix pipes (acmurthy
via gates).
PIG-207: Fix illustrate command to work in mapreduce mode (shubhamc via
gates).
PIG-218: Fixed param generation to work with arbitrary commands
PIG-220: Fixed definition of parameter name for param substitution
PIG-151: fixes to code that handles bzip files
PIG-222: fix build break
PIG-226: fix for streaming optimization bug (acmurthy via olgan)
PIG-228: make multiple streaming outputs adhere to spec (acmurthy via olgan)
PIG-224: fix to error handling code to produce correct error code
PIG-176: Change bag spilling so that bags below a certain threshold are
not spilled, thus avoiding proliferation of small files (pi_song via
gates).
PIG-227: making load/store function optional in stream input/output spec
(acmurthy via olgan)
PIG-215: Cleanup a few dangling ends left by PIG-111 (pi_song via gates).
PIG-229: Proper error handling in case of deserializer failure
PIG-230: Handling shipment for multiple ship/cache commands (acmurthy via
olgan)
PIG-219: Change unit tests to run both local and map reduce modes
(kali via gates).
PIG-202: Fix Order by so that user provided comparator func is used for
quantile determination (kali via gates).
PIG-231: validation for ship, cache, and skippath (acmurthy via olgan)
PIG-232: fix for number of output records when BinaryStirage is used
(acmurthy via olgan)
PIG-232: fix for number of input records when BinaryStirage is used
(acmurthy via olgan)
PIG-232: let valid cache specifications through (acmurthy via olgan)
PIG-237: validation of the output directory (pi_song via olgan)
PIG-236: Fix properties so that values specified via the
command line (-D) are not ignored (pkamath via gates).
PIG-198: integration with hadoop 17 (acmurthy via olgan)
PIG-85: allowing control characters as delimiters for PigStorage (pi_song
via olgan)
PIG-250: disabling speculative execution (olgan)
PIG-250: re-enabling speculative execution and fixing the failure
(acmurthy via olgan)
PIG-85: memory optimization (pi_song via olgan)
PIG-243: Fixing unit tests on windows (daijy via olgan)
PIG-198: Fixed pig script to pick up hadoop 17 instead of 15 (pi_song via gates).
PIG-266: fix warnings caused by HOD (olgan)
PIG-245: added math functions to the piggybank (ajaygarg via olgan)
PIG-255: make non-default constructors work for algebraic functions
(ajaygarg via olgan)
PIG-272: problem with streaming and intermediate store (acmurthy via olgan)
PIG-243: make unit tests work on windows (daijy via olgan)
PIG-271: added tutorial to SVN (olgan)
PIG-235: memory management improvements (pkamath via olgan)
PIG-284: target for building source jar (oae via olgan)
PIG-34: added missing licenses
PIG-34: added LICENSE, NOTICE and README file
PIG-291: hod.param parameters not passed properly (thatha via olgan)
PIG-34: changes to build process to create distribution tar file
PIG-34: updated CHANGES.txt
PIG-472: Added RegExLoader to piggybank, an abstract loader class to parse
text files via regular espressions (spackest via gates)
PIG-473: Added CommonLogLoader, a subclass of RegExLoader to piggybank (spackest via gates)
PIG-474: Added MyRegexLoader, a subclass of RegExLoader, to piggybank (spackest via gates)