| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| Pig Change Log |
| |
| Trunk (unreleased changes) |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-619: Create one InputSplit even when the input file is zero length |
| so that hadoop runs maps and creates output for the next |
| job (gates). |
| |
| PIG-693: Proposed improvements to pig's optimizer (sms) |
| |
| PIG-700: To automate the pig patch test process (gkesavan via sms) |
| |
| PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create empty bags (pradeepkth) |
| |
| BUG FIXES |
| |
| PIG-804: problem with lineage with double map redirection (pradeepkth) |
| |
| PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs |
| "FileSystem closed" error on large input (pradeepkth) |
| |
| PIG-693: Parameter to UDF which is an alias returned in another UDF in nested |
| foreach causes incorrect results (thejas via sms) |
| |
| PIG-725: javadoc: warning - Multiple sources of package comments found for |
| package "org.apache.commons.logging" (gkesavan via sms) |
| |
| PIG-745: Add DataType.toString() to force basic types to chararray, useful |
| for UDFs that want to handle all simple types as strings (ciemo via gates). |
| |
| PIG-514: COUNT returns no results as a result of two filter statements in |
| FOREACH (pradeepkth) |
| |
| Release 0.2.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-157: Add types and rework execution pipeline (gates) |
| |
| PIG-458: integration with Hadoop 18 (olgan) |
| |
| NEW FEATURES |
| PIG-139: command line editing (daijy via olgan) |
| |
| PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates) |
| |
| PIG-535: added rmf command |
| |
| PIG-704 Added ALIASES command that shows all currently defined ALIASES. |
| Changed semantics of DEFINE to define last used alias if no argument is |
| given (ericg via gates). |
| |
| PIG-713 Added alias completion as part of tab completion in grunt (ericg |
| via gates). |
| |
| IMPROVEMENTS |
| |
| PIG-270: proper line number for parse errors (daijy via olgan) |
| |
| PIG-367: convinience function for UDFs to name schema |
| |
| PIG-443: Illustrate for the Types branch (shubham via olgan) |
| |
| PIG-599: Added buffering to BufferedPositionedInputStream (gates) |
| |
| PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth |
| via olgan) |
| |
| PIG-628: misc performance improvements (pradeepkth via olgan) |
| |
| PIG-589: error handling, phase 1-2 (sms via olgan) |
| |
| PIG-590: error handling, phase 3 (sms) |
| |
| PIG-591: error handling, phase 4 (sms) |
| |
| PIG-545: PERFORMANCE: Sampler for order bys does not produce a good |
| distribution (pradeepkth) |
| |
| PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan) |
| |
| PIG-636: Use lightweight bag implementations which do not register with |
| SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-563: support for multiple combiner invocations (pradeepkth via olgan) |
| |
| PIG-465: performance improvement - removing keys from the value (pradeepkth |
| via olgan) |
| |
| PIG-450: PERFORMANCE: Distinct should make use of combiner to remove |
| duplicate values from keys. (gates) |
| |
| PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth |
| via gates) |
| |
| BUG FIXES |
| |
| PIG-294: string comparator unit tests (sms via pi_song) |
| |
| PIG-258: cleaning up directories on failure (daijy via olgan) |
| |
| PIG-363: fix for describe to produce schema name |
| |
| PIG-368: making JobConf available to Load/Store UDFs |
| |
| PIG-311: cross is broken |
| |
| PIG-369: support for filter UDFs |
| |
| PIG-375: support for implicit split |
| |
| PIG-301: fix for order by descending |
| |
| PIG-378: fix for GENERATE + LIMIT |
| |
| PIG-362: don't push limit above generate with flatten |
| |
| PIG-381: bincond does not handle null data |
| |
| PIG-382: bincond throws typecast exception |
| |
| PIG-352: java.lang.ClassCastException when invalid field is accessed |
| |
| PIG-329: TestStoreOld, 2 unit tests were broken |
| |
| PIG-353: parsing of complex types |
| |
| PIG-392: error handling with multiple MRjobs |
| |
| PIG-397: code defaults to single reducer |
| |
| PIG-373: unconnected load causes problem, |
| |
| PIG-413: problem with float sum |
| |
| PIG-398: Expressions not allowed inside foreach (sms via olgan) |
| |
| PIG-418: divide by 0 problem |
| |
| PIG-402: order by with user comparator (shravanmn via olgan) |
| |
| PIG-415: problem with comparators (shravanmn via olgan) |
| |
| PIG-422: cross is broken (shravanmn via olgan) |
| |
| PIG-407: need to clone operators (pradeepkth via olgan) |
| |
| PIG-428: TypeCastInserter does not replace projects in inner plans |
| correctly (pradeepkth vi olgan) |
| |
| PIG-421: error with complex nested plan (sms via olgan) |
| |
| PIG-429: Self join wth implicit split has the join output in wrong order |
| (pradeepkth via olgan) |
| |
| PIG-434: short-circuit AND and OR (pradeepkth viia olgan) |
| |
| PIG-333: allowing no parethesis with single column alias with flatten (sms |
| via olgan) |
| |
| PIG-426: Adding result of two UDFs gives a syntax error |
| |
| PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan) |
| |
| PIG-436: alias is lost when single column is flattened (pradeepkth via |
| olgan) |
| |
| PIG-364: Limit return incorrect records when we use multiple reducer |
| (daijy via olgan) |
| |
| PIG-439: disallow alias renaming (pradeepkth via olgan) |
| |
| PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth |
| via olgan) |
| |
| PIG-442: Disambiguated alias after a foreach flatten is not accessible a |
| couple of statements after the foreach (sms via olgan) |
| |
| PIG-424: nested foreach with flatten and agg gives an error (sms via |
| olgan) |
| |
| PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD |
| connection is fully established (olgan) |
| |
| PIG-430: Projections in nested filter and inside foreach do not work (sms |
| via olgan) |
| |
| PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries |
| (shravanmn via olgan) |
| |
| PIG-444: job.jar is left behined (pradeepkth via olgan) |
| |
| PIG-447: improved error messages (pradeepkth via olgan) |
| |
| PIG-448: explain broken after load with types (pradeepkth via olgan) |
| |
| PIG-380: invalid schema for databag constant (sms via olgan) |
| |
| PIG-451: If an field is part of group followed by flatten, then referring |
| to it causes a parse error (pradeepkth via olgan) |
| |
| PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan) |
| |
| PIG-459: increased sleep time before checking for job progress |
| |
| PIG-462: LIMIT N should create one output file with N rows (shravanmn via |
| olgan) |
| |
| PIG-376: set job name (olgan) |
| |
| PIG-463: POCast changes (pradeepkth via olgan) |
| |
| PIG-427: casting input to UDFs |
| |
| PIG-437: as in alias names causing problems (sms via olgan) |
| |
| PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan) |
| |
| PIG-470: TextLoader should produce bytearrays (sms via olgan) |
| |
| PIG-335: lineage (sms vi olgan) |
| |
| PIG-464: bag schema definition (pradeepkth via olgan) |
| |
| PIG-457: report 100% on successful jobs only (shravanmn via olgan) |
| |
| PIG-471: ignoring status errors from hadoop (pradeepkth via olgan) |
| |
| PIG-489: (*) processing (sms via olgan) |
| |
| PIG-475: missing heartbeats (shravanmn via olgan) |
| |
| PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan) |
| |
| PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan) |
| |
| PIG-501: Make branches/types work under cygwin (daijy via olgan) |
| |
| PIG-504: cleanup illustrate not to produce cn= (shubham via olgan) |
| |
| PIG-469: make sure that describe says "int" not "integer" (sms via olgan) |
| |
| PIG-495: projecting of bags only give 1 field (olgan) |
| |
| PIG-500: Load Func for POCast is not being set in some cases (sms via |
| olgan) |
| |
| PIG-499: parser issue with as (sms via olgan) |
| |
| PIG-507: permission error not reported (pradeepkth via olgan) |
| |
| PIG-508: problem with double joins (pradeepkth via olgan) |
| |
| PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan) |
| |
| PIG-505: working with map elements (sms via olgan) |
| |
| PIG-517: load functiin with parameters does not work with cast (pradeepkth |
| via olgan) |
| |
| PIG-525: make sure cast for udf parameters works (olgan) |
| |
| PIG-512: Expressions in foreach lead to errors (sms via olgan) |
| |
| PIG-528: use UDF return in schema computation (sms via olgan) |
| |
| PIG-527: allow PigStorage to write out complex output (sms via olgan) |
| |
| PIG-537: Failure in Hadoop map collect stage due to type mismatch in the |
| keys used in cogroup (pradeepkth vi olgan) |
| |
| PIG-538: support for null constants (pradeepkth via olgan) |
| |
| PIG-385: more null handling (pradeepkth via olgan) |
| |
| PIG-546: FilterFunc calls empty constructor when it should be calling |
| parameterized constructor (sms via olgan) |
| |
| PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via |
| olgan) |
| |
| PIG-501: make unit tests run under windows (daijy via olgan) |
| |
| PIG-543: Restore local mode to truly run locally instead of use map |
| reduce. (shubhamc via gates) |
| |
| PIG-556: Changed FindQuantiles to report progress. Fixed issue with null |
| reporter being passed to EvalFuncs. (gates) |
| |
| PIG-6: Add load support from hbase (hustlmsp via gates). |
| |
| PIG-522: make negation work (pradeepkth via olgan) |
| |
| PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple |
| error (pradeepkth via olgan) |
| |
| PIG-572 A PigServer.registerScript() method, which lets a client |
| programmatically register a Pig Script. (shubhamc via gates) |
| |
| PIG-570: problems with handling bzip data (breed via olgan) |
| |
| PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan) |
| |
| PIG-623: Fix spelling errors in output messages (tomwhite via sms) |
| |
| PIG-622: Include pig executable in distribution (tomwhite via sms) |
| |
| PIG-615: Wrong number of jobs with limit (shravanmn via sms) |
| |
| PIG-635: POCast.java has incorrect formatting (sms) |
| |
| PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext() |
| gives a null pointer exception (pradeepkth) |
| |
| PIG-632: Improved error message for binary operators (sms) |
| |
| PIG-636: Performance improvement: Use lightweight bag implementations which do not |
| register with SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-631: 4 Unit test failures on Windows (daijy) |
| |
| PIG-645: Streaming is broken with the latest trunk (pradeepkth) |
| |
| PIG-646: Distinct UDF should report progress (sms) |
| |
| PIG-647: memory sized passed on pig command line does not get propagated |
| to JobConf (sms) |
| |
| PIG-648: BinStorage fails when it finds markers unexpectedly in the data |
| (pradeepkth) |
| |
| PIG-649: RandomSampleLoader does not handle skipping correctly in |
| getNext() (pradeepkth) |
| |
| PIG-560: UTFDataFormatException (encoded string too long) is thrown when |
| storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms) |
| |
| PIG-642: Limit after FRJ causes problems (daijy) |
| |
| PIG-637: Limit broken after order by in the local mode (shubhamc via |
| olgan) |
| |
| PIG-553: EvalFunc.finish() not getting called (shravanmn via sms) |
| |
| PIG-654: Optimize build.xml (daijy) |
| |
| PIG-574: allowing to run scripts from within grunt shell (hagleitn via |
| olgan) |
| |
| PIG-665: Map key type not correctly set (for use when key is null) when |
| map plan does not have localrearrange (pradeepkth) |
| |
| PIG-590: error handling on the backend (sms via olgan) |
| |
| PIG-590: error handling on the backend (sms) |
| |
| PIG-658: Data type long : When 'L' or 'l' is included with data |
| (123L or 123l) load produces null value. Also the case with Float (thejas |
| via sms) |
| |
| PIG-591: Error handling phase four (sms via pradeepkth) |
| |
| PIG-664: Semantics of * is not consistent (sms) |
| |
| PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms) |
| |
| PIG-655: Comparison of schemas of bincond operands is flawed (sms via |
| pradeepkth) |
| |
| PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth |
| via sms) |
| |
| PIG-577: outer join query looses name information (sms via pradeepkth) |
| |
| PIG-690: UNION doesn't work in the latest code (pradeepkth via sms) |
| |
| PIG-544: Utf8StorageConverter.java does not always produce NULLs when data |
| is malformed(thejas via sms) |
| |
| PIG-532: Casting a field removes its alias.(thejas via sms) |
| |
| PIG-705: Pig should display a better error message when backend error |
| messages cannot be parsed (sms) |
| |
| PIG-650: pig should look for and use the pig specific |
| 'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the |
| HOD case (sms) |
| |
| PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan) |
| |
| PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan) |
| |
| PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via |
| olgan) |
| |
| PIG-703: user documentation (chandec vi olgan) |
| |
| PIG-711: Implement checkstyle for pig (gkesavan via olgan) |
| |
| PIG-715: doc updates (chandec vi olgan) |
| |
| PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates) |
| |
| PIG-692: When running a job from a script, use the name of that script as |
| the default name for the job (vzaliva via gates) |
| |
| PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan) |
| |
| PIG-720: further doc cleanup (gkesavan via olgan) |
| |
| Release 0.1.1 - 2008-12-04 |
| |
| INCOMPATIBLE CHANGES |
| |
| NEW FEATURES |
| |
| IMPROVEMENTS |
| |
| PIG-253: integration with hadoop-18 |
| |
| BUG FIXES |
| |
| PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. |
| (bdimcheff via gates) |
| |
| Release 0.1.0 - 2008-09-11 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-123: requires escape of '\' in chars and string |
| |
| NEW FEATURES |
| |
| PIG-20 Added custom comparator functions for order by (phunt via gates) |
| |
| PIG-94: Streaming implementation (arunc via olgan) |
| |
| PIG-58: parameter substitution |
| |
| PIG-55: added custom splitter (groves via olgan) |
| |
| PIG-59: Add a new ILLUSTRATE command (shubhamc via gates). |
| |
| PIG-256: Added variable argument support for UDFs (pi_song) |
| |
| IMPROVEMENTS: |
| |
| PIG-8 added binary comparator (olgan) |
| |
| PIG-11 Add capability to search for jar file to register. (antmagna via olgan) |
| |
| PIG-7: Added use of combiner in some restricted cases. (gates) |
| |
| PIG-47: Added methods to DataMap to provide access to its content |
| |
| PIG-30: Rewrote DataBags to better handle decisions of when to spill to |
| disk and to spill more intelligently. (gates) |
| |
| PIG-12: Added time stamps to log4j messages (phunt via gates). |
| |
| PIG-44: Added adaptive decision of the number of records to hold in memory |
| before spilling (utkarsh) |
| |
| PIG-56: Made DataBag implement Iterable. (groves via gates) |
| |
| PIG-39: created more efficient version of read (spullara via olgan) |
| |
| PIG-32: ABstraction layer (olgan) |
| |
| PIG-83: Change everything except grunt and Main (PigServer on down) to use |
| common logging abstraction instead of log4j. By default in grunt, log4j |
| still used as logging layer. Also converted all System.out/err.println |
| statements to use logging instead. (francisoud via gates) |
| |
| PIG-13: adding version to the system (joa23 via olgan) |
| |
| PIG-113: Make explain output more understandable (pi_song via gates) |
| |
| PIG-120: Support map reduce in local mode. To do this user needs to |
| specify execution type as mapreduce and cluster name as local (joa23 via gates). |
| |
| PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates). |
| |
| PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates). |
| |
| BUG FIXES |
| PIG-24 Files that were incorrectly placed under test/reports have been |
| removed. ant clean now cleans test/reports. (milindb via gates) |
| |
| PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@) |
| |
| PIG-23 Made pig work with java 1.5. (milindb via gates) |
| |
| PIG-17 integrated with Hadoop 0.15 (olgan@) |
| |
| PIG-33 Help was commented out - uncommented (olgan) |
| |
| PIG-31: second half of concurrent mode problem addressed (olgan) |
| |
| PIG-14: added heartbeat functionality (olgan) |
| |
| PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release |
| |
| PIG-29: fixed bag factory to be properly initialized (utkarsh) |
| |
| PIG-43: fixed problem where using the combiner prevented a pig alias |
| from being evaluated more than once. (gates) |
| |
| PIG-45: Fixed pig.pl to not assume hodrc file is named the same as |
| cluster name (gates). |
| |
| PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples |
| instead of Tuples, causing Reducer to crash in some cases. |
| |
| PIG-41: Added patterns to svn:ignore |
| |
| PIG-51: Fixed combiner in the presence of flattening |
| |
| PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the |
| comparator function instead of Class.forName. (gates) |
| |
| PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@) |
| |
| PIG-77: Added eclipse specific files to svn:ignore |
| |
| PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates) |
| |
| PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates) |
| |
| PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arun |
| via olgan) |
| |
| PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default |
| path. Also fix it to not die if pigclient.conf is missing. (craigm via |
| gates). |
| |
| PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill |
| files when they are done spilling (contributions by craigm, breed, and |
| gates, committed by gates). |
| |
| PIG-95: Remove System.exit() statements from inside pig (joa23 via gates). |
| |
| PIG-65: convert tabs to spaces (groves via olgan) |
| |
| PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when |
| more than one bag is involved (gates). |
| |
| PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf |
| reference. (francisoud via gates) |
| |
| PIG-80: In a number of places stack trace information was being lost by an |
| exception being caught, and a different exception then thrown. All those |
| locations have been changed so that the new exception now wraps the old. |
| (francisoud via gates). |
| |
| PIG-84: Converted printStackTrace calls to calls to the logger. |
| (francisoud via gates). |
| |
| PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates). |
| |
| PIG-99: Fix to make unit tests not run out of memory. (francisoud via |
| gates). |
| |
| PIG-107: enabled several tests. (francisoud via olgan) |
| |
| PIG-46: abort processing on error for non-interactive mode (olston via |
| olgan) |
| |
| PIG-109: improved exception handling (oae via olgan) |
| |
| PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can |
| be run w/o access to a hadoop cluster. (xuzh via gates) |
| |
| PIG-68: improvements to build.xml (joa23 via olgan) |
| |
| PIG-110: Replaced code accidently merged out in PIG-32 fix that handled |
| flattening the combiner case. (gates and oae) |
| |
| PIG-213: Remove non-static references to logger from data bags and tuples, |
| as it causes significant overhead (vgeschel via gates). |
| |
| PIG-284: target for building source jar (oae via olgan) |
| |