| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| Pig Change Log |
| |
| Trunk (unreleased changes) |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-1234: Unable to create input slice for har:// files (pradeepkth) |
| |
| PIG-1200: Using TableInputFormat in HBaseStorage (zjffdu via pradeepkth) |
| |
| PIG-1148: Move splitable logic from pig latin to InputFormat (zjffdu via |
| pradeepkth) |
| |
| PIG-1141: Make streaming work with the new load-store interfaces (rding via |
| pradeepkth) |
| |
| PIG-1110: Handle compressed file formats -- Gz, BZip with the new proposal |
| (rding via pradeepkth) |
| |
| PIG-1088: change merge join and merge join indexer to work with new LoadFunc |
| interface (thejas via pradeepkth) |
| |
| PIG-879: Pig should provide a way for input location string in load statement |
| to be passed as-is to the Loader (rding via pradeepkth) |
| |
| PIG-966: load-store-redesign branch: change SampleLoader and subclasses to |
| work with new LoadFunc interface (thejas via pradeepkth) |
| |
| PIG-1094: Fix unit tests corresponding to source changes so far (pradeepkth) |
| |
| PIG-1090: Update sources to reflect recent changes in load-store interfaces |
| (pradeepkth) |
| |
| PIG-1072: ReversibleLoadStoreFunc interface should be removed to enable |
| different load and store implementation classes to be used in a reversible |
| manner (rding via pradeepkth) |
| |
| IMPROVEMENTS |
| |
| PIG-1226: suuport for additional jar files (thejas via olgan) |
| |
| PIG-1230: Streaming input in POJoinPackage should use nonspillable bag to |
| collect tuples (ashutoshc) |
| |
| PIG-1224: Collected group should change to use new (internal) bag (ashutoshc) |
| |
| PIG-1046: join algorithm specification is within double quotes (ashutoshc) |
| |
| PIG-1209: Port POJoinPackage to proactively spill (ashutoshc) |
| |
| PIG-1190: Handling of quoted strings in pig-latin/grunt commands (ashutoshc) |
| |
| PIG-1214: Pig 0.6 Docs fixes (chandec via olgan) |
| |
| PIG-977: exit status does not account for JOB_STATUS.TERMINATED (ashutoshc) |
| |
| PIG-1192: Pig 0.6 Docs fixes (chandec via olgan) |
| |
| PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan) |
| |
| PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) |
| |
| PIG-1102: Collect number of spills per job (sriranjan via olgan) |
| |
| PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs |
| (dvryaboy via pradeepkth) |
| |
| PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan) |
| |
| PIG-1163: Pig/Zebra 0.6.0 release (chandec via olgan) |
| |
| PIG-1156: Add aliases to ExecJobs and PhysicalOperators (dvryaboy via gates) |
| |
| PIG-1161: add missing license headers (dvryaboy via olgan) |
| |
| PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi |
| via olgan) |
| |
| PIG-760: Add a new PigStorageSchema load/store function that |
| store schemas for text files (dvryaboy via gates) |
| |
| PIG-1106: FR join should not spill (ankit.modi via olgan) |
| |
| PIG-1147: Zebra Docs for Pig 0.6.0 (chandec via olgan) |
| |
| PIG-1129: Pig UDF doc: fieldsToRead function (chandec via olgan) |
| |
| PIG-978: MQ docs update (chandec via olgan) |
| |
| PIG-990: Provide a way to pin LogicalOperator Options (dvryaboy via gates) |
| |
| PIG-1103: refactoring of commit tests (olgan) |
| |
| PIG-1101: Allow arugment to limit to be long in addition to int (ashutoshc via |
| gates) |
| |
| PIG-872: use distributed cache for the replicated data set in FR join |
| (sriranjan via olgan) |
| |
| PIG-1053: Consider moving to Hadoop for local mode (ankit.modi via olgan) |
| |
| PIG-1085: Pass JobConf and UDF specific configuration information to UDFs |
| (gates) |
| |
| PIG-1173: pig cannot be built without an internet connection (jmhodges via daijy) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-1216: New load store design does not allow Pig to validate inputs and |
| outputs up front (ashutoshc via pradeepkth) |
| |
| PIG-1239: PigContext.connect() should not create a jobClient and jobClient |
| should be created on demand when needed (pradeepkth) |
| |
| PIG-1169: Top-N queries produce incorrect results when a store statement is added between order by and limit statement (rding) |
| |
| PIG-1131: Pig simple join does not work when it contains empty lines (ashutoshc) |
| |
| PIG-834: incorrect plan when algebraic functions are nested (ashutoshc) |
| |
| PIG-1217: Fix argToFuncMapping in Piggybank Top function (dvryaboy via gates) |
| |
| PIG-1154: Local Mode fails when hadoop config directory is specified in |
| classpath (ankit.modi via gates) |
| |
| PIG-1124: Unable to set Custom Job Name using the -Dmapred.job.name parameter (ashutoshc) |
| |
| PIG-1213: Schema serialization is broken (pradeepkth) |
| |
| PIG-1194: ERROR 2055: Received Error while processing the map plan (rding via ashutoshc) |
| |
| PIG-1204: Pig hangs when joining two streaming relations in local mode |
| (rding) |
| |
| PIG-1191: POCast throws exception for certain sequences of LOAD, FILTER, |
| FORACH (pradeepkth via gates) |
| |
| PIG-1171: Top-N queries produce incorrect results when followed by a cross statement (rding via olgan) |
| |
| PIG-1159: merge join right side table does not support comma seperated paths |
| (rding via olgan) |
| |
| PIG-1158: pig command line -M option doesn't support table union correctly |
| (comma seperated paths) (rding via olgan) |
| |
| PIG-1143: Poisson Sample Loader should compute the number of samples required |
| only once (sriranjan via olgan) |
| |
| PIG-1157: Sucessive replicated joins do not generate Map Reduce plan and fails |
| due to OOM (rding via olgan) |
| |
| PIG-1075: Error in Cogroup when key fields types don't match (rding via olgan) |
| |
| PIG-973: type resolution inconsistency (rding via olgan) |
| |
| PIG-1135: skewed join partitioner returns negative partition index (yinghe |
| via olgan) |
| |
| PIG-1134: Skewed Join sampling job overwhelms the name node (sriranjan via |
| olgan) |
| |
| PIG-1105: COUNT_STAR accumulate interface implementation cases failure |
| (sriranjan via olgan) |
| |
| PIG-1118: expression with aggregate functions returning null, with accumulate |
| interface (yinghe via olgan) |
| |
| PIG-1068: COGROUP fails with 'Type mismatch in key from map: expected |
| org.apache.pig.impl.io.NullableText, recieved |
| org.apache.pig.impl.io.NullableTuple' (rding via gates) |
| |
| PIG-1113: Diamond query optimization throws error in JOIN (rding via olgan) |
| |
| PIG-1116: Remove redundant map-reduce job for merge join (pradeepkth) |
| |
| PIG-1114: MultiQuery optimization throws error when merging 2 level spl (rding |
| via olgan) |
| |
| PIG-1108: Incorrect map output key type in MultiQuery optimiza (rding via |
| olgan) |
| |
| PIG-1022: optimizer pushes filter before the foreach that generates column |
| used by filter (daijy via gates) |
| |
| PIG-1107: PigLineRecordReader bails out on an empty line for compressed data |
| (ankit.modi via olgan) |
| |
| PIG-598: Parameter substitution ($PARAMETER) should not be performed in |
| comments (thejas via olgan) |
| |
| PIG-1064: Behaviour of COGROUP with and without schema when using "*" operator |
| (pradeepkth) |
| |
| PIG-1086: Nested sort by * throw exception (rding via daijy) |
| |
| PIG-1146: Inconsistent column pruning in LOUnion (daijy) |
| |
| PIG-1176: Column Pruner issues in union of loader with and without schema |
| (daijy) |
| |
| PIG-1184: PruneColumns optimization does not handle the case of foreach |
| flatten correctly if flattened bag is not used later (daijy) |
| |
| PIG-1189: StoreFunc UDF should ship to the backend automatically without |
| "register" (daijy) |
| |
| PIG-1212: LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null (daijy) |
| |
| Release 0.6.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-922: Logical optimizer: push up project (daijy) |
| |
| IMPROVEMENTS |
| |
| PIG-1084: Pig 0.6.0 Documentation improvements (chandec via olgan) |
| |
| PIG-1089: Pig 0.6.0 Documentation (chandec via olgan) |
| |
| PIG-958: Splitting output data on key field (ankur via pradeepkth) |
| |
| PIG-1058: FINDBUGS: remaining "Correctness Warnings" (olgan) |
| |
| PIG-1036: Fragment-replicate left outer join (ankit.modi via pradeepkth) |
| |
| PIG-920: optimizing diamond queries (rding via pradeepkth) |
| |
| PIG-1040: FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be (olgan) |
| |
| PIG-1059: FINDBUGS: remaining Bad practice + Multithreaded correctness Warning (olgan) |
| |
| PIG-953: Enable merge join in pig to work with loaders and store functions |
| which can internally index sorted data (pradeepkth) |
| |
| PIG-1055: FINDBUGS: remaining "Dodgy Warnings" (olgan) |
| |
| PIG-1052: FINDBUGS: remaining performance warningse(olgan) |
| |
| PIG-1037: Converted sorted and distinct bags to use the new active spilling |
| paradigm (yinghe via gates) |
| |
| PIG-1051: FINDBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (olgan) |
| |
| PIG-1050: FINDBUGS: DLS_DEAD_LOCAL_STORE: Dead store to local variable (olgan) |
| |
| PIG-1045: Integration with Hadoop 20 New API (rding via pradeepkth) |
| |
| PIG-1043: FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class |
| (olgan) |
| |
| PIG-1047: FINDBUGS: URF_UNREAD_FIELD: Unread field (olgan) |
| |
| PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new |
| String(String) constructor (olgan) |
| |
| PIG-984: Add map side grouping for data that is already collected when |
| it is read into the map (rding via gates) |
| |
| PIG-1025: Add ability to set job priority from Pig Latin script (kevinweil via |
| gates) |
| |
| PIG-1028: FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number |
| constructor; use static valueOf instead (olgan) |
| |
| PIG-1012: FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in |
| serializable class (olgan) |
| |
| PIG-1013: FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on |
| an array (olgan) |
| |
| PIG-1011: FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't |
| define serialVersionUID (olgan) |
| |
| PIG-1009: FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream (olgan) |
| |
| PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan) |
| |
| PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with |
| a lower case letter (olgan) |
| |
| PIG-1023: FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL (olgan) |
| |
| PIG-1019: added findbugs exclusion file (olgan) |
| |
| PIG-983: PERFORMANCE: multi-query optimization on multiple group bys |
| following a join or cogroup (rding via pradeepkth) |
| |
| PIG-975: Need a databag that does not register with SpillableMemoryManager and |
| spill data pro-actively (yinghe via olgan) |
| |
| PIG-891: Fixing dfs statement for Pig (zjffdu via daijy |
| |
| PIG-956: 10 minute commit tests (olgan) |
| |
| PIG-948: [Usability] Relating pig script with MR jobs (ashutoshc via daijy) |
| |
| PIG-960: Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage ( ankit.modi via daijy) |
| |
| PIG-1020: Include an ant target to build pig.jar without hadoop libraries (daijy) |
| |
| PIG-1033: javac warnings: deprecated hadoop APIs (daijy) |
| |
| PIG-1041: javac warnings: cast, fallthrough, serial (daijy) |
| |
| PIG-1042: javac warnings: unchecked (daijy) |
| |
| PIG-1038: Optimize nested distinct/sort to use secondary key (daijy) |
| |
| PIG-979: Acummulator Interface for UDFs (yinghe via daijy) |
| |
| OPTIMIZATIONS |
| |
| PIG-922: Logical optimizer: push up project (daijy) |
| |
| BUG FIXES |
| |
| PIG-1080: PigStorage may miss records when loading a file (rding via olgan) |
| |
| PIG-1071: Support comma separated file/directory names in load statements |
| (rding via pradeepkth) |
| |
| PIG-970: Changes to make HBase loader work with HBase 0.20 (vbarat and zjffdu |
| via gates) |
| |
| PIG-1035: support for skewed outer join (sriranjan via pradeepkth) |
| |
| PIG-1030: explain and dump not working with two UDFs inside inner plan of |
| foreach (rding via pradeepkth) |
| |
| PIG-1048: inner join using 'skewed' produces multiple rows for keys with |
| single row in both input relations (sriranjan via gates) |
| |
| PIG-1063: Pig does not call checkOutSpecs() on OutputFormat provided by |
| StoreFunc in the multistore case (pradeepkth) |
| |
| PIG-746: Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag |
| should never be serialized (rding via pradeepkth) |
| |
| PIG-1027: Number of bytes written are always zero in local mode (zjffdu via gates) |
| |
| PIG-976: Multi-query optimization throws ClassCastException (rding via |
| pradeepkth) |
| |
| PIG-858: Order By followed by "replicated" join fails while compiling MR-plan |
| from physical plan (ashutoshc via gates) |
| |
| PIG-968: Fix findContainingJar to work properly when there is a + in the jar |
| path (tlipcon via gates) |
| |
| PIG-738: Regexp passed from pigscript fails in UDF (pradeepkth) |
| |
| PIG-942: Maps are not implicitly casted (pradeepkth) |
| |
| PIG-513: Removed unecessary bounds check in DefaultTuple (ashutoshc via |
| gates) |
| |
| PIG-951: Set parallelism explicitly to 1 for indexing job in merge join |
| (ashutoshc via gates) |
| |
| PIG-592: schema inferred incorrectly (daijy) |
| |
| PIG-989: Allow type merge between numerical type and non-numerical type (daijy) |
| |
| PIG-894: order-by fails when input is empty (daijy) |
| |
| PIG-995: Limit Optimizer throw exception "ERROR 2156: Error while fixing projections" (daijy) |
| |
| PIG-1000: InternalCachedBag.java generates javac warning and findbug warning (yinghe via daijy) |
| |
| PIG-921: Strange use case for Join which produces different results in local and map reduce mode (daijy) |
| |
| PIG-1024: Script contains nested limit fail due to "LOLimit does not support multiple outputs" (daijy) |
| |
| PIG-644: Duplicate column names in foreach do not throw parser error (daijy) |
| |
| PIG-927: null should be handled consistently in Join (daijy) |
| |
| PIG-790: Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) (daijy) |
| |
| PIG-1001: Generate more meaningful error message when one input file does not exist (daijy) |
| |
| PIG-1060: MultiQuery optimization throws error for multi-level splits (rding via daijy) |
| |
| PIG-1128: column pruning causing failure when foreach has user-specified |
| schema (daijy) |
| |
| PIG-1127: Logical operator should contains individual copy of schema object |
| (daijy) |
| |
| PIG-1133: UDFContext should be made available to LoadFunc.bindTo (daijy) |
| |
| PIG-1132: Column Pruner issues in dealing with unprunable loader (daijy) |
| |
| PIG-1142: Got NullPointerException merge join with pruning (daijy) |
| |
| PIG-1155: Need to make sure existing loaders work "as is" (daijy) |
| |
| PIG-1144: set default_parallelism construct does not set the number of |
| reducers correctly (daijy) |
| |
| PIG-1165: Signature of loader does not set correctly for order by (daijy) |
| |
| PIG-761: ERROR 2086 on simple JOIN (daijy) |
| |
| PIG-1172: PushDownForeachFlatten shall not push ForEach below Join if the |
| flattened fields is used in Join (daijy) |
| |
| PIG-1180: Piggybank should compile even if we only have |
| "pig-withouthadoop.jar" but no "pig.jar" in the pig home directory (daijy) |
| |
| PIG-1185: Data bags do not close spill files after using iterator to read |
| tuples (yinghe via daijy) |
| |
| PIG-1186: Pig do not take values in "pig-cluster-hadoop-site.xml" (daijy) |
| |
| PIG-1193: Secondary sort issue on nested desc sort (daijy) |
| |
| PIG-1195: POSort should take care of sort order (daijy) |
| |
| PIG-1210: fieldsToRead send the same fields more than once in some cases (daijy) |
| |
| PIG-1231: DefaultDataBagIterator.hasNext() should be idempotent in all cases |
| (daijy) |
| |
| Release 0.5.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-1039: documentation update (chandec via olgan) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-963: Join in local mode matches null keys (pradeepkth) |
| PIG-660: Integration with Hadoop 20 (sms via olgan) |
| |
| Release 0.4.0 - 2009-09-26 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart |
| (olgan) |
| PIG-734: Changed maps to only take strings as keys (gates) |
| |
| IMPROVEMENTS |
| |
| PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan) |
| |
| PIG-578: join ... outer, ... outer semantics are a no-ops, should produce |
| corresponding null values (pradeepkth) |
| |
| PIG-936: making dump and PigDump independent from Tuple.toString (daijy) |
| |
| PIG-890: Create a sampler interface and improve the skewed join sampler (sriranjan via daijy) |
| |
| PIG-922: Logical optimizer: push up project part 1 (daijy) |
| |
| PIG-812: COUNT(*) does not work (breed) |
| |
| PIG-923: Allow specifying log file location through pig.properties (dvryaboy via daijy) |
| |
| PIG-926: Merge-Join phase 2 (ashutoshc via pradeepkth) |
| |
| PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) |
| |
| PIG-893: Added string -> integer, long, float, and double casts (zjffdu via gates) |
| |
| PIG-833: Added Zebra, new columnar storage mechanism for HDFS (rangadi plus many others via gates) |
| |
| PIG-697: Proposed improvements to pig's optimizer, Phase5 (daijy) |
| |
| PIG-895: Default parallel for Pig (daijy) |
| |
| PIG-820: Change RandomSampleLoader to take a LoadFunc instead of extending |
| BinStorage. Added new Samplable interface for loaders to implement |
| allowing them to be used by RandomSampleLoader (ashutoshc via gates) |
| |
| PIG-832: Make import list configurable (daijy) |
| |
| PIG-697: Proposed improvements to pig's optimizer (sms) |
| |
| PIG-753: Allow UDFs with no parameters (zjffdu via gates) |
| |
| PIG-765: jdiff for pig ( gkesavan |
| |
| OPTIMIZATIONS |
| |
| PIG-792: skew join implementation (sriranjan via olgan) |
| |
| BUG FIXES |
| |
| PIG-964: Handling null in skewed join (sriranjan via olgan) |
| |
| PIG-962: Skewed join creates 3 map reduce jobs (sriranjan via olgan) |
| |
| PIG-957: Tutorial is broken with 0.4 branch and trunk (pradeepkth) |
| |
| PIG-955: Skewed join produces invalid results (yinghe via olgan) |
| |
| PIG-954: Skewed join fails when pig.skewedjoin.reduce.memusage is not |
| configured(yinghe via olgan) |
| |
| PIG-882: log level not propogated to loggers - duplicate message (daijy) |
| |
| PIG-943: Pig crash when it cannot get counter from hadoop (daijy) |
| |
| PIG-935: Skewed join throws an exception when used with map keys(sriranjan |
| via pradeepkth) |
| |
| PIG-934: Merge join implementation currently does not seek to right point |
| on the right side input based on the offset provided by the index |
| (ashutoshc via pradeepkth) |
| |
| PIG-925: Fix join in local mode (daijy) |
| |
| PIG-913: Error in Pig script when grouping on chararray column (daijy) |
| |
| PIG-907: Provide multiple version of HashFNV (Piggybank) (daijy) |
| |
| PIG-905: TOKENIZE throws exception on null data (daijy) |
| |
| PIG-901: InputSplit (SliceWrapper) created by Pig is big in size due to |
| serialized PigContext (pradeepkth) |
| |
| PIG-882: log level not propogated to loggers (daijy) |
| |
| PIG-880: Order by is borken with complex fields (sms) |
| |
| PIG-773: Empty complex constants (empty bag, empty tuple and empty map) |
| should be supported (ashutoshc via sms) |
| |
| PIG-695: Pig should not fail when error logs cannot be created (sms) |
| |
| PIG-878: Pig is returning too many blocks in the input split. (arunc via gates) |
| |
| PIG-888: Pig do not pass udf to the backend in some situation (daijy) |
| |
| PIG-728: All backend error messages must be logged to preserve the |
| original error messages (sms) |
| |
| PIG-877: Push up filter does not account for added columns in foreach |
| (sms) |
| |
| PIG-883: udf import list does not send to the backend (daijy) |
| |
| PIG-881: Pig should ship load udfs to the backend (daijy) |
| |
| PIG-876: limit changes order of order-by to ascending (daijy) |
| |
| PIG-851: Map type used as return type in UDFs not recognized at all times |
| (zjffdu via sms) |
| |
| PIG-861: POJoinPackage lose tuple in large dataset (daijy) |
| |
| PIG-797: Limit with ORDER BY producing wrong results (daijy) |
| |
| PIG-850: Dump produce wrong result while "store into" is ok (daijy) |
| |
| PIG-852: pig -version or pig -help returns exit code of 1 (milindb via |
| olgan) |
| |
| PIG-849: Local engine loses records in splits (hagleitn via olgan) |
| |
| PIG-939: Fix checkstyle ivy configuration ( gkesavan ) |
| |
| Release 0.3.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-817: documentation update (chandec via olgan) |
| |
| PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates) |
| |
| PIG-831: Turned off reporting of records and bytes written for mutli-store |
| queries as the returned results are confusing and wrong. (gates) |
| |
| PIG-813: documentation updates (chandec via olgan) |
| |
| PIG-825: PIG_HADOOP_VERSION should be set to 18 (dvryaboy via gates) |
| |
| PIG-795: support for SAMPLE command (ericg via olgan) |
| |
| PIG-619: Create one InputSplit even when the input file is zero length |
| so that hadoop runs maps and creates output for the next |
| job (gates) |
| |
| PIG-697: Proposed improvements to pig's optimizer (sms) |
| |
| PIG-700: To automate the pig patch test process (gkesavan via sms) |
| |
| PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu |
| via gates) |
| |
| PIG-652: Adapt changes in store interface to multi-query changes (hagleitn |
| via gates) |
| |
| PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create |
| empty bags (pradeepkth) |
| |
| PIG-741: Allow limit to be nested in a foreach. |
| |
| PIG-627: multiquery support phase 3 (hagleitn and Richard Ding via olgan) |
| |
| PIG-743: To implement clover (gkesavan) |
| |
| PIG-701: Implement IVY for resolving pig dependencies (gkesavan) |
| |
| PIG-626: Add access to hadoop counters (shubhamc via gates) |
| |
| PIG-627: multiquery support phase 1 and phase 2 (hagleitn and Richard Ding via pradeepkth) |
| |
| BUG FIXES |
| |
| PIG-846: MultiQuery optimization in some cases has an issue when there is a |
| split in the map plan (pradeepkth) |
| |
| PIG-835: Multiquery optimization does not handle the case where the map keys |
| in the split plans have different key types (tuple and non tuple key type) |
| (pradeepkth) |
| |
| PIG-839: incorrect return codes on failure when using -f or -e flags (hagleitn |
| via sms) |
| |
| PIG-796: support conversion from numeric types to chararray (Ashutosh Chauhan |
| via pradeepkth) |
| |
| PIG-564: problem with parameter substitution and special charachters (olgan) |
| |
| PIG-802: PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan) |
| |
| PIG-816: PigStorage() does not accept Unicode characters in its contructor (pradeepkth) |
| |
| PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan) |
| |
| PIG-819: run -param -param; is a valid grunt command (milindb via olgan) |
| |
| PIG-656: Use of eval or any other keyword in the package hierarchy of a UDF causes |
| parse exception (milindb via sms) |
| |
| PIG-814: Make Binstorage more robust when data contains record markers (pradeepkth) |
| |
| PIG-811: Globs with "?" in the pattern are broken in local mode (hagleitn via |
| olgan) |
| |
| PIG-810: Fixed NPE in PigStats (gates) |
| |
| PIG-804: problem with lineage with double map redirection (pradeepkth) |
| |
| PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs |
| "FileSystem closed" error on large input (pradeepkth) |
| |
| PIG-693: Parameter to UDF which is an alias returned in another UDF in nested |
| foreach causes incorrect results (thejas via sms) |
| |
| PIG-725: javadoc: warning - Multiple sources of package comments found for |
| package "org.apache.commons.logging" (gkesavan via sms) |
| |
| PIG-745: Add DataType.toString() to force basic types to chararray, useful |
| for UDFs that want to handle all simple types as strings (ciemo via gates) |
| |
| PIG-514: COUNT returns no results as a result of two filter statements in |
| FOREACH (pradeepkth) |
| |
| PIG-789: Fix dump and illustrate to work with new multi-query feature |
| (hagleitn via gates) |
| |
| PIG-774: Pig does not handle Chinese characters (in both the parameter subsitution |
| using -param_file or embedded in the Pig script) correctly (daijy) |
| |
| PIG-800: Fix distinct and order in local mode to not go into an infinite loop |
| (gates) |
| |
| PIG-806: to remove author tags in the pig source code (sms |
| |
| PIG-799: Unit tests on windows are failing after multiquery commit (daijy) |
| |
| PIG-781: Error reporting for failed MR jobs (hagleitn via olgan) |
| |
| Release 0.2.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-157: Add types and rework execution pipeline (gates) |
| |
| PIG-458: integration with Hadoop 18 (olgan) |
| |
| NEW FEATURES |
| PIG-139: command line editing (daijy via olgan) |
| |
| PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates) |
| |
| PIG-535: added rmf command |
| |
| PIG-704 Added ALIASES command that shows all currently defined ALIASES. |
| Changed semantics of DEFINE to define last used alias if no argument is |
| given (ericg via gates) |
| |
| PIG-713 Added alias completion as part of tab completion in grunt (ericg |
| via gates) |
| |
| IMPROVEMENTS |
| |
| PIG-270: proper line number for parse errors (daijy via olgan) |
| |
| PIG-367: convinience function for UDFs to name schema |
| |
| PIG-443: Illustrate for the Types branch (shubhamc via olgan) |
| |
| PIG-599: Added buffering to BufferedPositionedInputStream (gates) |
| |
| PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth |
| via olgan) |
| |
| PIG-628: misc performance improvements (pradeepkth via olgan) |
| |
| PIG-589: error handling, phase 1-2 (sms via olgan) |
| |
| PIG-590: error handling, phase 3 (sms) |
| |
| PIG-591: error handling, phase 4 (sms) |
| |
| PIG-545: PERFORMANCE: Sampler for order bys does not produce a good |
| distribution (pradeepkth) |
| |
| PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan) |
| |
| PIG-636: Use lightweight bag implementations which do not register with |
| SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-563: support for multiple combiner invocations (pradeepkth via olgan) |
| |
| PIG-465: performance improvement - removing keys from the value (pradeepkth |
| via olgan) |
| |
| PIG-450: PERFORMANCE: Distinct should make use of combiner to remove |
| duplicate values from keys. (gates) |
| |
| PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth |
| via gates) |
| |
| BUG FIXES |
| |
| PIG-294: string comparator unit tests (sms via pi_song) |
| |
| PIG-258: cleaning up directories on failure (daijy via olgan) |
| |
| PIG-363: fix for describe to produce schema name |
| |
| PIG-368: making JobConf available to Load/Store UDFs |
| |
| PIG-311: cross is broken |
| |
| PIG-369: support for filter UDFs |
| |
| PIG-375: support for implicit split |
| |
| PIG-301: fix for order by descending |
| |
| PIG-378: fix for GENERATE + LIMIT |
| |
| PIG-362: don't push limit above generate with flatten |
| |
| PIG-381: bincond does not handle null data |
| |
| PIG-382: bincond throws typecast exception |
| |
| PIG-352: java.lang.ClassCastException when invalid field is accessed |
| |
| PIG-329: TestStoreOld, 2 unit tests were broken |
| |
| PIG-353: parsing of complex types |
| |
| PIG-392: error handling with multiple MRjobs |
| |
| PIG-397: code defaults to single reducer |
| |
| PIG-373: unconnected load causes problem, |
| |
| PIG-413: problem with float sum |
| |
| PIG-398: Expressions not allowed inside foreach (sms via olgan) |
| |
| PIG-418: divide by 0 problem |
| |
| PIG-402: order by with user comparator (shravanmn via olgan) |
| |
| PIG-415: problem with comparators (shravanmn via olgan) |
| |
| PIG-422: cross is broken (shravanmn via olgan) |
| |
| PIG-407: need to clone operators (pradeepkth via olgan) |
| |
| PIG-428: TypeCastInserter does not replace projects in inner plans |
| correctly (pradeepkth vi olgan) |
| |
| PIG-421: error with complex nested plan (sms via olgan) |
| |
| PIG-429: Self join wth implicit split has the join output in wrong order |
| (pradeepkth via olgan) |
| |
| PIG-434: short-circuit AND and OR (pradeepkth viia olgan) |
| |
| PIG-333: allowing no parethesis with single column alias with flatten (sms |
| via olgan) |
| |
| PIG-426: Adding result of two UDFs gives a syntax error |
| |
| PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan) |
| |
| PIG-436: alias is lost when single column is flattened (pradeepkth via |
| olgan) |
| |
| PIG-364: Limit return incorrect records when we use multiple reducer |
| (daijy via olgan) |
| |
| PIG-439: disallow alias renaming (pradeepkth via olgan) |
| |
| PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth |
| via olgan) |
| |
| PIG-442: Disambiguated alias after a foreach flatten is not accessible a |
| couple of statements after the foreach (sms via olgan) |
| |
| PIG-424: nested foreach with flatten and agg gives an error (sms via |
| olgan) |
| |
| PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD |
| connection is fully established (olgan) |
| |
| PIG-430: Projections in nested filter and inside foreach do not work (sms |
| via olgan) |
| |
| PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries |
| (shravanmn via olgan) |
| |
| PIG-444: job.jar is left behined (pradeepkth via olgan) |
| |
| PIG-447: improved error messages (pradeepkth via olgan) |
| |
| PIG-448: explain broken after load with types (pradeepkth via olgan) |
| |
| PIG-380: invalid schema for databag constant (sms via olgan) |
| |
| PIG-451: If an field is part of group followed by flatten, then referring |
| to it causes a parse error (pradeepkth via olgan) |
| |
| PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan) |
| |
| PIG-459: increased sleep time before checking for job progress |
| |
| PIG-462: LIMIT N should create one output file with N rows (shravanmn via |
| olgan) |
| |
| PIG-376: set job name (olgan) |
| |
| PIG-463: POCast changes (pradeepkth via olgan) |
| |
| PIG-427: casting input to UDFs |
| |
| PIG-437: as in alias names causing problems (sms via olgan) |
| |
| PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan) |
| |
| PIG-470: TextLoader should produce bytearrays (sms via olgan) |
| |
| PIG-335: lineage (sms vi olgan) |
| |
| PIG-464: bag schema definition (pradeepkth via olgan) |
| |
| PIG-457: report 100% on successful jobs only (shravanmn via olgan) |
| |
| PIG-471: ignoring status errors from hadoop (pradeepkth via olgan) |
| |
| PIG-489: (*) processing (sms via olgan) |
| |
| PIG-475: missing heartbeats (shravanmn via olgan) |
| |
| PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan) |
| |
| PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan) |
| |
| PIG-501: Make branches/types work under cygwin (daijy via olgan) |
| |
| PIG-504: cleanup illustrate not to produce cn= (shubhamc via olgan) |
| |
| PIG-469: make sure that describe says "int" not "integer" (sms via olgan) |
| |
| PIG-495: projecting of bags only give 1 field (olgan) |
| |
| PIG-500: Load Func for POCast is not being set in some cases (sms via |
| olgan) |
| |
| PIG-499: parser issue with as (sms via olgan) |
| |
| PIG-507: permission error not reported (pradeepkth via olgan) |
| |
| PIG-508: problem with double joins (pradeepkth via olgan) |
| |
| PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan) |
| |
| PIG-505: working with map elements (sms via olgan) |
| |
| PIG-517: load functiin with parameters does not work with cast (pradeepkth |
| via olgan) |
| |
| PIG-525: make sure cast for udf parameters works (olgan) |
| |
| PIG-512: Expressions in foreach lead to errors (sms via olgan) |
| |
| PIG-528: use UDF return in schema computation (sms via olgan) |
| |
| PIG-527: allow PigStorage to write out complex output (sms via olgan) |
| |
| PIG-537: Failure in Hadoop map collect stage due to type mismatch in the |
| keys used in cogroup (pradeepkth vi olgan) |
| |
| PIG-538: support for null constants (pradeepkth via olgan) |
| |
| PIG-385: more null handling (pradeepkth via olgan) |
| |
| PIG-546: FilterFunc calls empty constructor when it should be calling |
| parameterized constructor (sms via olgan) |
| |
| PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via |
| olgan) |
| |
| PIG-501: make unit tests run under windows (daijy via olgan) |
| |
| PIG-543: Restore local mode to truly run locally instead of use map |
| reduce. (shubhamc via gates) |
| |
| PIG-556: Changed FindQuantiles to report progress. Fixed issue with null |
| reporter being passed to EvalFuncs. (gates) |
| |
| PIG-6: Add load support from hbase (hustlmsp via gates) |
| |
| PIG-522: make negation work (pradeepkth via olgan) |
| |
| PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple |
| error (pradeepkth via olgan) |
| |
| PIG-572 A PigServer.registerScript() method, which lets a client |
| programmatically register a Pig Script. (shubhamc via gates) |
| |
| PIG-570: problems with handling bzip data (breed via olgan) |
| |
| PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan) |
| |
| PIG-623: Fix spelling errors in output messages (tomwhite via sms) |
| |
| PIG-622: Include pig executable in distribution (tomwhite via sms) |
| |
| PIG-615: Wrong number of jobs with limit (shravanmn via sms) |
| |
| PIG-635: POCast.java has incorrect formatting (sms) |
| |
| PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext() |
| gives a null pointer exception (pradeepkth) |
| |
| PIG-632: Improved error message for binary operators (sms) |
| |
| PIG-636: Performance improvement: Use lightweight bag implementations which do not |
| register with SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-631: 4 Unit test failures on Windows (daijy) |
| |
| PIG-645: Streaming is broken with the latest trunk (pradeepkth) |
| |
| PIG-646: Distinct UDF should report progress (sms) |
| |
| PIG-647: memory sized passed on pig command line does not get propagated |
| to JobConf (sms) |
| |
| PIG-648: BinStorage fails when it finds markers unexpectedly in the data |
| (pradeepkth) |
| |
| PIG-649: RandomSampleLoader does not handle skipping correctly in |
| getNext() (pradeepkth) |
| |
| PIG-560: UTFDataFormatException (encoded string too long) is thrown when |
| storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms) |
| |
| PIG-642: Limit after FRJ causes problems (daijy) |
| |
| PIG-637: Limit broken after order by in the local mode (shubhamc via |
| olgan) |
| |
| PIG-553: EvalFunc.finish() not getting called (shravanmn via sms) |
| |
| PIG-654: Optimize build.xml (daijy) |
| |
| PIG-574: allowing to run scripts from within grunt shell (hagleitn via |
| olgan) |
| |
| PIG-665: Map key type not correctly set (for use when key is null) when |
| map plan does not have localrearrange (pradeepkth) |
| |
| PIG-590: error handling on the backend (sms via olgan) |
| |
| PIG-590: error handling on the backend (sms) |
| |
| PIG-658: Data type long : When 'L' or 'l' is included with data |
| (123L or 123l) load produces null value. Also the case with Float (thejas |
| via sms) |
| |
| PIG-591: Error handling phase four (sms via pradeepkth) |
| |
| PIG-664: Semantics of * is not consistent (sms) |
| |
| PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms) |
| |
| PIG-655: Comparison of schemas of bincond operands is flawed (sms via |
| pradeepkth) |
| |
| PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth |
| via sms) |
| |
| PIG-577: outer join query looses name information (sms via pradeepkth) |
| |
| PIG-690: UNION doesn't work in the latest code (pradeepkth via sms) |
| |
| PIG-544: Utf8StorageConverter.java does not always produce NULLs when data |
| is malformed(thejas via sms) |
| |
| PIG-532: Casting a field removes its alias.(thejas via sms) |
| |
| PIG-705: Pig should display a better error message when backend error |
| messages cannot be parsed (sms) |
| |
| PIG-650: pig should look for and use the pig specific |
| 'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the |
| HOD case (sms) |
| |
| PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan) |
| |
| PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan) |
| |
| PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via |
| olgan) |
| |
| PIG-703: user documentation (chandec vi olgan) |
| |
| PIG-711: Implement checkstyle for pig (gkesavan via olgan) |
| |
| PIG-715: doc updates (chandec vi olgan) |
| |
| PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates) |
| |
| PIG-692: When running a job from a script, use the name of that script as |
| the default name for the job (vzaliva via gates) |
| |
| PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan) |
| |
| PIG-720: further doc cleanup (gkesavan via olgan) |
| |
| Release 0.1.1 - 2008-12-04 |
| |
| INCOMPATIBLE CHANGES |
| |
| NEW FEATURES |
| |
| IMPROVEMENTS |
| |
| PIG-253: integration with hadoop-18 |
| |
| BUG FIXES |
| |
| PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. |
| (bdimcheff via gates) |
| |
| Release 0.1.0 - 2008-09-11 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-123: requires escape of '\' in chars and string |
| |
| NEW FEATURES |
| |
| PIG-20 Added custom comparator functions for order by (phunt via gates) |
| |
| PIG-94: Streaming implementation (arunc via olgan) |
| |
| PIG-58: parameter substitution |
| |
| PIG-55: added custom splitter (groves via olgan) |
| |
| PIG-59: Add a new ILLUSTRATE command (shubhamc via gates) |
| |
| PIG-256: Added variable argument support for UDFs (pi_song) |
| |
| IMPROVEMENTS: |
| |
| PIG-8 added binary comparator (olgan) |
| |
| PIG-11 Add capability to search for jar file to register. (antmagna via olgan) |
| |
| PIG-7: Added use of combiner in some restricted cases. (gates) |
| |
| PIG-47: Added methods to DataMap to provide access to its content |
| |
| PIG-30: Rewrote DataBags to better handle decisions of when to spill to |
| disk and to spill more intelligently. (gates) |
| |
| PIG-12: Added time stamps to log4j messages (phunt via gates) |
| |
| PIG-44: Added adaptive decision of the number of records to hold in memory |
| before spilling (utkarsh) |
| |
| PIG-56: Made DataBag implement Iterable. (groves via gates) |
| |
| PIG-39: created more efficient version of read (spullara via olgan) |
| |
| PIG-32: ABstraction layer (olgan) |
| |
| PIG-83: Change everything except grunt and Main (PigServer on down) to use |
| common logging abstraction instead of log4j. By default in grunt, log4j |
| still used as logging layer. Also converted all System.out/err.println |
| statements to use logging instead. (francisoud via gates) |
| |
| PIG-13: adding version to the system (joa23 via olgan) |
| |
| PIG-113: Make explain output more understandable (pi_song via gates) |
| |
| PIG-120: Support map reduce in local mode. To do this user needs to |
| specify execution type as mapreduce and cluster name as local (joa23 via gates) |
| |
| PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates) |
| |
| PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates) |
| |
| BUG FIXES |
| PIG-24 Files that were incorrectly placed under test/reports have been |
| removed. ant clean now cleans test/reports. (milindb via gates) |
| |
| PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@) |
| |
| PIG-23 Made pig work with java 1.5. (milindb via gates) |
| |
| PIG-17 integrated with Hadoop 0.15 (olgan@) |
| |
| PIG-33 Help was commented out - uncommented (olgan) |
| |
| PIG-31: second half of concurrent mode problem addressed (olgan) |
| |
| PIG-14: added heartbeat functionality (olgan) |
| |
| PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release |
| |
| PIG-29: fixed bag factory to be properly initialized (utkarsh) |
| |
| PIG-43: fixed problem where using the combiner prevented a pig alias |
| from being evaluated more than once. (gates) |
| |
| PIG-45: Fixed pig.pl to not assume hodrc file is named the same as |
| cluster name (gates) |
| |
| PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples |
| instead of Tuples, causing Reducer to crash in some cases. |
| |
| PIG-41: Added patterns to svn:ignore |
| |
| PIG-51: Fixed combiner in the presence of flattening |
| |
| PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the |
| comparator function instead of Class.forName. (gates) |
| |
| PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@) |
| |
| PIG-77: Added eclipse specific files to svn:ignore |
| |
| PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates) |
| |
| PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates) |
| |
| PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arunc |
| via olgan) |
| |
| PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default |
| path. Also fix it to not die if pigclient.conf is missing. (craigm via |
| gates) |
| |
| PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill |
| files when they are done spilling (contributions by craigm, breed, and |
| gates, committed by gates) |
| |
| PIG-95: Remove System.exit() statements from inside pig (joa23 via gates) |
| |
| PIG-65: convert tabs to spaces (groves via olgan) |
| |
| PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when |
| more than one bag is involved (gates) |
| |
| PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf |
| reference. (francisoud via gates) |
| |
| PIG-80: In a number of places stack trace information was being lost by an |
| exception being caught, and a different exception then thrown. All those |
| locations have been changed so that the new exception now wraps the old. |
| (francisoud via gates) |
| |
| PIG-84: Converted printStackTrace calls to calls to the logger. |
| (francisoud via gates) |
| |
| PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates) |
| |
| PIG-99: Fix to make unit tests not run out of memory. (francisoud via |
| gates) |
| |
| PIG-107: enabled several tests. (francisoud via olgan) |
| |
| PIG-46: abort processing on error for non-interactive mode (olston via |
| olgan) |
| |
| PIG-109: improved exception handling (oae via olgan) |
| |
| PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can |
| be run w/o access to a hadoop cluster. (xuzh via gates) |
| |
| PIG-68: improvements to build.xml (joa23 via olgan) |
| |
| PIG-110: Replaced code accidently merged out in PIG-32 fix that handled |
| flattening the combiner case. (gates and oae) |
| |
| PIG-213: Remove non-static references to logger from data bags and tuples, |
| as it causes significant overhead (vgeschel via gates) |
| |
| PIG-284: target for building source jar (oae via olgan) |
| |