blob: 70252d015ffdbd0ab0e70ca0c3170a82d6b0699f [file] [log] [blame]
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Pig Change Log
Release 0.11.2 (Unreleased)
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3380: Fix e2e test float precision related test failures when run with -Dpig.exec.mapPartAgg=true (anlilin via rohini)
OPTIMIZATIONS
PIG-2769: a simple logic causes very long compiling time on pig 0.10.0 (njw45 via dvryaboy) (prev. applied to 0.12)
BUG FIXES
PIG-3455: Pig 0.11.1 OutOfMemory error (rohini)
PIG-3435: Custom Partitioner not working with MultiQueryOptimizer (knoguchi via daijy)
PIG-3385: DISTINCT no longer uses custom partitioner (knoguchi via daijy)
PIG-2507: Semicolon in paramenters for UDF results in parsing error (tnachen via daijy)
PIG-3341: Strict datetime parsing and improve performance of loading datetime values (rohini)
PIG-3329: RANK operator failed when working with SPLIT (xalan via cheolsoo)
PIG-3345: Handle null in DateTime functions (rohini)
PIG-3315: Automaton dependency missing from Pig 11.1-h2 POM. (stevel@apache.org via daijy)
PIG-3223: AvroStorage does not handle comma separated input paths (dreambird via rohini)
PIG-3281: Pig version in pig.pom is incorrect in branch-0.11 (cheolsoo)
PIG-3282: Pig pom.xml does not bring in joda-time as dependency (sushanth via daijy)
PIG-3262: Pig contrib 0.11 doesn't compile on certain rpm systems (mgrover via cheolsoo)
PIG-3264: mvn signanddeploy target broken for pigunit, pigsmoke and piggybank (billgraham)
Release 0.11.1
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3256: Upgrade jython to 2.5.3 (legal concern) (daijy)
PIG-2988: start deploying pigunit maven artifact part of Pig release process (njw45 via rohini)
PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag. (knoguchi via rohini)
PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)
PIG-3202: CUBE operator not documented in user docs (prasanth_j via billgraham)
OPTIMIZATIONS
BUG FIXES
PIG-3267: HCatStorer fail in limit query (daijy)
PIG-3252: AvroStorage gives wrong schema for schemas with named records (mwagner via cheolsoo)
PIG-3132: NPE when illustrating a relation with HCatLoader (daijy)
PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 (prkommireddi via dvryaboy)
PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
PIG-3144: Erroneous map entry alias resolution leading to "Duplicate schema alias" errors (jcoveney via cheolsoo)
PIG-3212: Race Conditions in POSort and (Internal)SortedBag during Proactive Spill (kadeng via dvryaboy)
PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase (rohini)
Release 0.11.0 (2013-02-02)
INCOMPATIBLE CHANGES
PIG-3034: Remove Penny code from Pig repository (gates via cheolsoo)
PIG-2931: $ signs in the replacement string make parameter substitution fail (cheolsoo via jcoveney)
PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates)
IMPROVEMENTS
PIG-3140: Document PigProgressNotificationListener configs (billgraham)
PIG-3139: Document reducer estimation (billgraham)
PIG-2341: Need better documentation on Pig/HBase integration (jthakrar and billgraham via billgraham)
PIG-3044: Trigger POPartialAgg compaction under GC pressure (dvryaboy)
PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini)
PIG-2934: HBaseStorage filter optimizations (billgraham)
PIG-2980: documentation for DateTime datatype (zjshen via thejas)
PIG-2982: add unit tests for DateTime type that test setting timezone (zjshen via thejas)
PIG-2937: generated field in nested foreach does not inherit the variable name as the field name (jcoveney)
PIG-3019: Need a target in build.xml for source releases (gates)
PIG-2832: org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext (prkommireddi via rohini)
PIG-2898: Parallel execution of e2e tests (iveselovsky via rohini)
PIG-2913: org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file (cheolsoo via julien)
PIG-2976: Reduce HBaseStorage logging (billgraham)
PIG-2947: Documentation for Rank operator (xalan via azaroth)
PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy)
PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)
PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)
PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham)
PIG-2579: Support for multiple input schemas in AvroStorage (cheolsoo via sms)
PIG-2946: Documentation of "history" and "clear" commands (xalan via azaroth)
PIG-2877: Make SchemaTuple work in foreach (and thus, in loads) (jcoveney)
PIG-2923: Lazily register bags with SpillableMemoryManager (dvryaboy)
PIG-2929: Improve documentation around AVG, CONCAT, MIN, MAX (cheolsoo via billgraham)
PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney)
PIG-2879: Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix. (initialcontext via azaroth)
PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates)
PIG-2918: Avoid Spillable bag overhead where possible (dvryaboy)
PIG-2900: Streaming should provide conf settings in the environment (dvryaboy)
PIG-2353: RANK function like in SQL (xalan via azaroth)
PIG-2915: Builtin TOP udf is sensitive to null input bags (hazen via dvryaboy)
PIG-2901: Errors and lacks in document "Pig Latin Basics" (miyakawataku via billgraham)
PIG-2905: Improve documentation around REPLACE (cheolsoo via billgraham)
PIG-2882: Use Deque instead of Stack (mkhadikov via dvryaboy)
PIG-2781: LOSort isEqual method (xalan via dvryaboy)
PIG-2835: Optimizing the convertion from bytes to Integer/Long (jay23jack via dvryaboy)
PIG-2886: Add Scan TimeRange to HBaseStorage (ted.m via dvryaboy)
PIG-2895: jodatime jar missing in pig-withouthadoop.jar (thejas)
PIG-2888: Improve performance of POPartialAgg (dvryaboy)
PIG-2708: split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator (analog.sony via daijy)
PIG-2890: Revert PIG-2578 (dvryaboy)
PIG-2850: Pig should support loading macro files as resources stored in JAR files (matterhayes via dvryaboy)
PIG-1314: Add DateTime Support to Pig (zjshen via thejas)
PIG-2785: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0 (matterhayes via sms)
PIG-2556: CSVExcelStorage load: quoted field with newline as first character sees newline as record end (tivv via dvryaboy)
PIG-2875: Add recursive record support to AvroStorage (cheolsoo via sms)
PIG-2662: skew join does not honor its config parameters (rajesh.balamohan via thejas)
PIG-2871: Refactor signature for PigReducerEstimator (billgraham)
PIG-2851: Add flag to ant to run tests with a debugger port (billgraham)
PIG-2862: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier (jcoveney)
PIG-2855: Provide a method to measure time spent in UDFs (dvryaboy)
PIG-2837: AvroStorage throws StackOverFlowError (cheolsoo via sms)
PIG-2856: AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories. (cheolsoo via sms)
PIG-2569: Fix org.apache.pig.test.TestInvoker.testSpeed (aklochkov via dvryaboy)
PIG-2858: Improve PlanHelper to allow finding any PhysicalOperator in a plan (dvryaboy)
PIG-2854: AvroStorage doesn't work with Avro 1.7.1 (cheolsoo via sms)
PIG-2779: Refactoring the code for setting number of reducers (jay23jack via billgraham)
PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator (prasanth_j via dvryaboy)
PIG-2814: Fix issues with Sample operator documentation (prasanth_j via dvryaboy)
PIG-2817: Documentation for Groovy UDFs (herberts via julien)
PIG-2492: AvroStorage should recognize globs and commas (cheolsoo via sms)
PIG-2706: Add clear to list of grunt commands (xalan via azaroth)
PIG-2823: TestPigContext.testImportList() does not pass if another javac in on the PATH (julien)
PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" (jgordon via azaroth)
PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead (jgordon via azaroth)
PIG-2820: relToAbsolutePath is not replayed properly when Grunt reparses the script after PIG-2699 (julien)
PIG-2763: Groovy UDFs (herberts via julien)
PIG-2780: MapReduceLauncher should break early when one of the jobs throws an exception (jay23jack via daijy)
PIG-2804: Remove "PIG" exec type (dvryaboy)
PIG-2726: Handling legitimate NULL values in Cube operator (prasanth_j via dvryaboy)
PIG-2808: Add *.project to .gitignore (azaroth)
PIG-2787: change the module name in ivy to lowercase to match the maven repo (julien)
PIG-2632: Create a SchemaTuple which generates efficient Tuples via code gen (jcoveney)
PIG-2750: add artifacts to the ivy.xml for other jars Pig generates (julien)
PIG-2748: Change the names of the jar produced in the build folder to match maven conventions (julien)
PIG-2770: Allow easy inclusion of custom build targets (julien)
PIG-2697: pretty print schema via pig.pretty.print.schema (rangadi via jcoveney)
PIG-2673: Allow Merge join to follow an ORDER statement (dvryaboy)
PIG-2699: Reduce the number of instances of Load and Store Funcs down to 2+1 (julien)
PIG-2166: UDFs to join a bag (hluu via daijy)
PIG-2651: Provide a much easier to use accumulator interface (jcoveney via daijy)
PIG-2658: Add pig.script.submitted.timestamp and pig.job.submitted.timestamp in generated Map-Reduce job conf (billgraham)
PIG-2735: Add a pig.version.suffix property in build.xml to easily override with a build number (julien)
PIG-2705: outputSchema modification from scripting UDFs (levyjoshua via julien)
PIG-2724: Make Tuple Iterable (jcoveney)
PIG-2733: Add *.patch, *.log, *.orig, *.rej, *.class to gitignore (jcoveney)
PIG-2732: Let's get rid of the deprecated Tuple methods (jcoveney)
PIG-2638: Optimize BinInterSedes treatment of longs (jcoveney)
PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy)
PIG-2710: Implement Naive CUBE operator (prasanth_j via dvryaboy)
PIG-2714: Pig documentation on TOP funcation has issues (daijy)
PIG-2066: Accumulators should be able to early-terminate (jcoveney)
PIG-2600: Better Map support (prkommireddi via jcoveney)
PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy)
PIG-2702: Make Pig local mode (and tests) faster by working around the hard coded sleep(5000) in hadoop's JobControl (julien)
PIG-2659: add source location of the aliases in the physical plan (julien)
PIG-2547: Easier UDFs: Convenient EvalFunc super-classes (billgraham, dvryaboy)
PIG-2639: Utils.getSchemaFromString should automatically give name to all types, but fails on boolean (jcoveney)
PIG-2696: Enhance Job Stat to print out median map and reduce time (hluu via daijy)
PIG-2583: Add Grunt command to list the statements in cache (xalan via daijy)
PIG-2688: Log the aliases being processed for the current job (ddaniels888 via azaroth)
PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney)
PIG-2685: Fix error in EvalFunc ctor when implementing Algebraic UDF whose return type is parameterized (andy schlaikjer via jcoveney)
PIG-2664: Allow PPNL impls to get more job info during the run (billgraham)
PIG-2663: Expose helpful ScriptState methods (billgraham)
PIG-2660: PPNL notified of plan before it gets executed (billgraham)
PIG-2574: Make reducer estimator plugable (billgraham)
PIG-2677: Add target to build.xml to generate clover summary reports (gates)
PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien)
PIG-2257: AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor (billgraham)
PIG-2587: Compute LogicalPlan signature and store in job conf (billgraham)
PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false
PIG-2604: Pig should print its build info at runtime (traviscrawford via dvryaboy)
PIG-2573: Automagically setting parallelism based on input file size does not work with HCatalog (traviscrawford via julien)
PIG-2538: Add helper wrapper classes for StoreFunc (billgraham via dvryaboy)
PIG-2010: registered jars on distributed cache (traviscrawford and julienledem via dvryaboy)
PIG-2533: Pig MR job exceptions masked on frontend (traviscrawford via dvryaboy)
PIG-2525: Support pluggable PigProcessNotifcationListeners on the command line (dvryaboy)
PIG-2515: [piggybank] Make CustomFormatToISO return null on Exception in parsing dates (rjurney via dvryaboy)
PIG-2503: Make @MonitoredUDF inherited (dvryaboy)
PIG-2488: Move Python unit tests to e2e tests (alangates via daijy)
PIG-2456: Pig should have a pigrc to specify default script cache (prkommireddi via daijy)
PIG-2496: Cache resolved classes in PigContext (dvryaboy)
PIG-2482: Integrate HCat DDL command into Pig (daijy)
PIG-2479: changingPattern should be used with checkmodified in ivysettings.xml (abayer via azaroth)
PIG-2349: Ant build repeats ivy-buildJar several times (azaroth)
PIG-2359: Support more efficient Tuples when schemas are known (dvryaboy)
PIG-2282: Automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy (azaroth via daijy)
PIG-2468: Speed up TestBuiltin (dvryaboy)
PIG-2467: Speed up TestCommit (dvryaboy)
PIG-2460: Use guava 11 instead of r06 (dvryaboy)
PIG-2267: Make the name of the columns in schema optional (jcoveney via daijy)
PIG-2453: Fetching schema can be very slow for multi-thousand file LOADs (dvryaboy)
PIG-2443: [Piggybank] Add UDFs to check if a String is an Integer And if a String is Numeric (prkommireddi via daijy)
PIG-2437: Use Ivy to get automaton.jar (azaroth)
PIG-2448: Convert more tests to use LOCAL mode (dvryaboy)
PIG-2438: Do not hardcode commons-lang version in build.xml (azaroth)
PIG-2422: Add log messages for Jython schema definitions (vivekp via gates)
PIG-2403: Reduce code duplication in SUM, MAX, MIN udfs (dvryaboy)
PIG-2245: Add end to end test for tokenize (markroddy via gates)
PIG-2327: bin/pig doesn't have any hooks for picking up ZK installation deployed from tarballs (rvs via hashutosh)
PIG-2382: Modify .gitignore to ignore pig-withouthadoop.jar (azaroth via hashutosh)
PIG-2380: Expose version information more cleanly (jcoveney via azaroth)
PIG-2311: STRSPLIT needs to allow bytearray arguments (xuting via olgan)
PIG-2365: Current TOP implementation needlessly results in a null bag name (jcoveney via dvryaboy)
PIG-2151: Add annotation to specify output schema in Java UDFs (dvryaboy)
PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan)
PIG-2328: Add builtin UDFs for building and using bloom filters (gates)
PIG-2338: Need signature for EvalFunc (daijy)
PIG-2337: Provide UDF with input schema (xutingz via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-3147: Spill failing with "java.lang.RuntimeException: InternalCachedBag.spill() should not be called (knoguchi via dvryaboy)
PIG-2645: PigSplit does not handle the case where SerializationFactory returns null (shami via gates)
PIG-3109: Missing license headers (jarcec via cheolsoo)
PIG-3022: TestRegisteredJarVisibility.testRegisteredJarVisibility fails with hadoop-2.0.x (rohini via cheolsoo)
PIG-3125: Fix zebra compilation error (cheolsoo)
PIG-3029: TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution (jgordon via gates)
PIG-3051: java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning (knoguchi via rohini)
PIG-3076: make TestScalarAliases more reliable (julien)
PIG-3020: "Duplicate uid in schema" error when joining two relations derived from the same load statement (jcoveney)
PIG-3044: hotfix to remove divide by 0 error (jcoveney)
PIG-3033: test-patch failed with javadoc warnings (fang fang chen via cheolsoo)
PIG-3058: Upgrade junit to at least 4.8 (fang fang chen via cheolsoo)
PIG-2978: TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x (cheolsoo)
PIG-3039: Not possible to use custom version of jackson jars (rohini)
PIG-2979: Pig.jar doesn't work with hadoop-2.0.x (cheolsoo)
PIG-2405: some unit test case failed with open JDK (fang fang chen via cheolsoo)
PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini)
PIG-3018: Refactor TestScriptLanguage to remove duplication and write script in different files (julien)
PIG-2973: TestStreaming test times out (cheolsoo)
PIG-3001: TestExecutableManager.testAddJobConfToEnv fails randomly (cheolsoo)
PIG-3017: Pig's object serialization should use compression (jcoveney)
PIG-2968: ColumnMapKeyPrune fails to prune a subtree inside foreach (knoguchi via cheolsoo)
PIG-2999: Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing (knoguchi via azaroth)
PIG-2998: Fix TestScriptLanguage and TestMacroExpansion (cheolsoo via jcoveney)
PIG-2975: TestTypedMap.testOrderBy failing with incorrect result (knoguchi via jcoveney)
PIG-1283: COUNT on null bag causes failure (analog.sony via jcoveney)
PIG-2958: Pig tests do not appear to have a logger attached (daijyc via jcoveney)
PIG-2926: TestPoissonSampleLoader failing on rhel environment (jcoveney)
PIG-2971: Add new parameter to specify the streaming environment (jcoveney)
PIG-2963: Illustrate command and POPackageLite (cheolsoo via jcoveney)
PIG-2961: BinInterSedesRawComparator broken by TUPLE_number patch (jcoveney)
PIG-2932: Setting high default_parallel causes IOException in local mode (cheolsoo via gates)
PIG-2737: [piggybank] TestIndexedStorage is failing, should be refactored (jcoveney)
PIG-2935: Catch NoSuchMethodError when StoreFuncInterface's new cleanupOnSuccess method isn't implemented. (gates via dvryaboy)
PIG-2920: e2e tests override PERL5LIB environment variable (azaroth)
PIG-2917: SpillableMemoryManager memory leak for WeakReference (haitao.yao via dvryaboy)
PIG-2938: All unit tests that use MR2 MiniCluster are broken in trunk (cheolsoo via dvryaboy)
PIG-2936: Tuple serialization bug (jcoveney)
PIG-2930: ant test doesn't compile in trunk (cheolsoo via daijy)
PIG-2791: Pig does not work with ViewFileSystem (rohini via daijy)
PIG-2833: org.apache.pig.pigunit.pig.PigServer does not initialize set default log level of pigContext (cheolsoo via jcoveney)
PIG-2744: Handle Pig command line with XML special characters (lulynn_2008 via daijy)
PIG-2637: Command-line option -e throws TokenMgrError exception (lulynn_2008 via daijy)
PIG-2887: Macro cannot handle negative number (knoguchi via gates)
PIG-2844: ant makepom is misconfigured (julien)
PIG-2896: Pig does not fail anymore if two macros are declared with the same name (julien)
PIG-2848: TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output (julien)
PIG-2884: JobControlCompiler mis-logs after reducer estimation (billgraham)
PIG-2876: Bump up Xerces version (jcoveney)
PIG-2866: PigServer fails with macros without a script file (billgraham)
PIG-2860: [piggybank] TestAvroStorageUtils.testGetConcretePathFromGlob fails on some version of hadoop (cheolsoo via jcoveney)
PIG-2861: PlanHelper imports org.python.google.common.collect.Lists instead of org.google.common.collect.Lists (jcoveney)
PIG-2849: Errors in document Getting Started (miyakawataku via billgraham)
PIG-2843: Typo in Documentation (eric59 via billgraham)
PIG-2841: Inconsistent URL in Docs (eric59 via billgraham)
PIG-2740: get rid of "java[77427:1a03] Unable to load realm info from SCDynamicStore" log lines when running pig tests (julien)
PIG-2839: mock.Storage overwrites output with the last relation written when storing UNION (julien)
PIG-2840: Fix SchemaTuple bugs (jcoveney)
PIG-2842: TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file (julien)
PIG-2827: Unwrap exception swallowing in TOP (haitao.yao via jcoveney)
PIG-2825: StoreFunc signature setting in LogicalPlan broken (jcoveney)
PIG-2815: class loader management in PigContext (rangadi via jcoveney)
PIG-2813: Fix test regressions from PIG-2632 (jcoveney)
PIG-2806: Fix merge join test regression from PIG-2632 (jcoveney)
PIG-2809: TestUDFContext broken by PIG-2699 (julienledem via daijy)
PIG-2807: TestParser TestPigStorage TestNewPlanOperatorPlan broken by PIG-2699 (julienledem via daijy)
PIG-2782: Specifying sorting field(s) at nightly.conf (cheolsoo via daijy)
PIG-2790: After Pig-2699 the script schema (LOAD ... USING ... AS {script schema}) is passed after getSchema is called (daijy)
PIG-2777: Docs are broken due to malformed xml after PIG-2673 (dvryaboy)
PIG-2593: Filter by a boolean value does not work (jay23jack via daijy)
PIG-2665: Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts with embedded Pig Latin (daijy)
PIG-2736: Support implicit cast from bytearray to boolean (jay23jack via daijy)
PIG-2508: PIG can unpredictably ignore deprecated Hadoop config options (thw via dvryaboy)
PIG-2691: Duplicate TOKENIZE schema (jay23jack via azaroth)
PIG-2173: piggybank datetime conversion javadocs not properly formatted (hluu via daijy)
PIG-2709: PigAvroRecordReader should specify which file has a problem when throwing IOException (mpercy via daijy)
PIG-2640: Usage message gives wrong information for Pig additional jars (prkommireddi via daijy)
PIG-2652: Skew join and order by don't trigger reducer estimation (dvryaboy)
PIG-2616: JobControlCompiler.getInputSizeFromLoader must handle exceptions from LoadFunc.getStatistics (billgraham)
PIG-2644: Piggybank's HadoopJobHistoryLoader throws NPE when reading broken history file (herberts via daijy)
PIG-2627: Custom partitioner not set when POSplit is involved in Plan (aniket486 via daijy)
PIG-2596: Jython UDF does not handle boolean output (aniket486 via daijy)
PIG-2649: org.apache.pig.parser.ParserValidationException does not expose the cause exception
PIG-2540: [piggybank] AvroStorage can't read schema on amazon s3 in elastic mapreduce (rjurney via jcoveney)
PIG-2618: e2e local fails to build
PIG-2608: Typo in PigStorage documentation for source tagging (prkommireddi via daijy)
PIG-2590: running ant tar and rpm targets on same copy of pig source results in problems (thejas)
PIG-2581: HashFNV inconsistent/non-deterministic due to default platform encoding (prkommireddi via daijy)
PIG-2514: REGEX_EXTRACT not returning correct group with non greedy regex (romainr via daijy)
PIG-2532: Registered classes fail deserialization in frontend (traviscrawford via julien)
PIG-2549: org.apache.pig.piggybank.storage.avro - Broken documentation link for AvroStorage (chrisas via daijy)
PIG-2322: varargs functions do not get passed the arguments in Python embedding (julien)
PIG-2491: Pig docs still mention hadoop-site.xml (daijy)
PIG-2504: Incorrect sample provided for REGEX_EXTRACT (prkommireddi via daijy)
PIG-2502: Make "hcat.bin" configurable in e2e test (daijy)
PIG-2501: Changes needed to contrib/piggybank/java/build.xml in order to build piggybank.jar with Hadoop 0.23
(ekoontz via daijy)
PIG-2499: Pig TestGrunt.testShellCommand occasionally fails (tomwhite via daijy)
PIG-2326: Pig minicluster tests can not be run from eclipse (julienledem via daijy)
PIG-2432: Eclipse .classpath file is out of date (gates)
PIG-2427: getSchemaFromString throws away the name of the tuple that is in a bag (jcoveney via dvryaboy)
PIG-2425: Aggregate Warning does not work as expected on Embedding Pig in Java 0.9.1 (prkommireddi via thejas)
PIG-2384: Generic Invokers should use PigContext to resolve classes (dvryaboy)
PIG-2379: Bug in Schema.getPigSchema(ResourceSchema rSchema) improperly adds two level access (jcoveney via dvryaboy)
PIG-2355: ant clean does not clean e2e test build artifacts (daijy)
PIG-2352: e2e test harness' use of environment variables causes unintended effects between tests (gates)
Release 0.10.1 - Unreleased
BUG FIXES
PIG-3107: bin and autocomplete are missing in src release (daijy)
PIG-3106: Missing license header in several java file (daijy)
PIG-3099: Pig unit test fixes for TestGrunt(1), TestStore(2), TestEmptyInputDir(3) (vikram.dixit via daijy)
PIG-3045: Specifying sorting field(s) at nightly.conf - fix sortArgs (rohini via cheolsoo)
PIG-2953: "which" utility does not exist on Windows (daijy)
PIG-2960: Increase the timeout for unit test (daijy)
PIG-2942: DevTests, TestLoad has a false failure on Windows (jgordon via daijy)
PIG-2801: grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens
(jgordon via daijy)
PIG-2798: pig streaming tests assume interpreters are auto-resolved (jgordon via daijy)
PIG-2795: Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path (jgordon via daijy)
PIG-2796: Local temporary paths are not always valid HDFS path names (jgordon via daijy)
Release 0.10.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-2664: Allow PPNL impls to get more job info during the run (billgraham)
PIG-2663: Expose helpful ScriptState methods (billgraham)
PIG-2660: PPNL notified of plan before it gets executed (billgraham)
PIG-2574: Make reducer estimator plugable (billgraham)
PIG-2601: Additional document for 0.10 (daijy)
PIG-2317: Ruby/Jruby UDFs (jcoveney via daijy)
PIG-1270: Push limit into loader (daijy)
PIG-2589: Additional e2e test for 0.10 new features (daijy)
PIG-2182: Add more append support to DataByteArray (gsingers via daijy)
PIG-438: Handle realiasing of existing Alias (A=B;) (daijy)
PIG-2548: Support for providing parameters to python script (daijy)
PIG-2518: Add ability to clean ivy cache in build.xml (daijy)
PIG-2300: Pig Docs - release 0.10.0 (and 0.9.1) (chandec via daijy)
PIG-2332: JsonLoader/JsonStorage (daijy)
PIG-2334: Set default number of reducers for S3N filesystem (ddaniels888 via daijy)
PIG-1387: Syntactical Sugar for PIG-1385 (azaroth)
PIG-2305: Pig should log the split locations in task logs (vivekp via thejas)
PIG-2293: Pig should support a more efficient merge join against data sources that natively support point
lookups or where the join is against large, sparse tables (aklish via daijy)
PIG-2287: add test cases for limit and sample that use expressions with
constants only (no scalar variables) (thejas via gates)
PIG-2092: Missing sh command from Grant shell (olgan)
PIG-2163: Improve nested cross to stream one relation (zjshen via daijy)
PIG-2249: Enable pig e2e testing on EC2 (gates)
PIG-2256: Upgrade Avro dependency to 1.5.3 (tucu00 via dvryaboy)
PIG-604: Kill the Pig job should kill all associated Hadoop Jobs (daijy)
PIG-2096: End to end tests for new Macro feature (gates)
PIG-2242: Allow the delimiter to be specified when calling TOKENIZE (markroddy via hashutosh)
PIG-2240: Allow any compression codec to be specified in AvroStorage (tomwhite via dvryaboy)
PIG-2229: Pig end-to-end tests should test local mode as well as mr mode (gates)
PIG-2235: Several files in e2e tests aren't being run (gates)
PIG-2196: Test harness should be independent of Pig (hashutosh) -- Missed few
changes in last commit.
PIG-2196: Test harness should be independent of Pig (hashutosh)
PIG-1429: Add Boolean Data Type to Pig (zjshen via daijy)
PIG-2218: Pig end-to-end tests should be accessible from top level build.xml (gates)
PIG-2176: add logical plan assumption checker (thejas)
PIG-1631: Support to 2 level nested foreach (aniket486 via daijy)
PIG-2191: Reduce amount of log spam generated by UDFs (dvryaboy)
PIG-2200: Piggybank cannot be built from the Git mirror (dvryaboy)
PIG-2168: CubeDimensions UDF (dvryaboy)
PIG-2189: e2e test harness needs to use Pig as a source of truth (gates via daijy)
PIG-1904: Default split destination (azaroth via thejas)
PIG-2143: Make PigStorage optionally store schema; improve docs. (dvryaboy)
PIG-1973: UDFContext.getUDFContext usage of ThreadLocal pattern
is not typical (woody via thejas)
PIG-2053: PigInputFormat uses class.isAssignableFrom() where
instanceof is more appropriate (woody via thejas)
PIG-2161: TOTUPLE should use no-copy tuple creation (dvryaboy)
PIG-1946: HBaseStorage constructor syntax is error prone (billgraham via dvryaboy)
PIG-2001: DefaultTuple(List) constructor is inefficient, causes List.size()
System.arraycopy() calls (though they are 0 byte copies),
DefaultTuple(int) constructor is a bit misleading wrt time
complexity (woody via thejas)
PIG-1916: Nested cross (zjshen via daijy)
PIG-2121: e2e test harness should use ant instead of make (gates)
PIG-2142: Allow registering multiple jars from DFS via single statement (rangadi via dvryaboy)
PIG-1926: Sample/Limit should take scalar (azaroth via thejas)
PIG-1950: e2e test harness needs to be able to compare to previous version of
Pig (gates)
PIG-536: the shell script 'pig' does not work if PIG_HOME has the word 'hadoop' in it's directory (miguno via olgan)
PIG-2108 e2e test harness needs to be able to mark certain tests as ignored
(gates)
PIG-1825: ability to turn off the write ahead log for pig's HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1824: Support import modules in Jython UDF (woody via rding)
PIG-1994: e2e test harness deployment implementation for existing cluster
(gates)
PIG-2036: [piggybank] Set header delimiter in PigStorageSchema (mmoeller via dvryaboy)
PIG-1949: e2e test harness should use bin/pig rather than calling java
directly (gates)
PIG-2026: e2e tests in eclipse classpath (azaroth via hashutosh)
PIG-2024: Incorrect jar paths in .classpath template for eclipse (azaroth via hashutosh)
OPTIMIZATIONS
PIG-2011: Speed up TestTypedMap.java (dvryaboy)
PIG-2228: support partial aggregation in map task (thejas)
BUG FIXES
PIG-2940: HBaseStorage store fails in secure cluster (cheolsoo via daijy)
PIG-2821: HBaseStorage should work with secure hbase (rohini via daijy)
PIG-2859: Fix few e2e test failures (rohini via daijy)
PIG-2729: Macro expansion does not use pig.import.search.path - UnitTest borked (johannesch via daijy)
PIG-2783: Fix Iterator_1 e2e test for Hadoop 23 (rohini via daijy)
PIG-2761: With hadoop23 importing modules inside python script does not work (rohini via daijy)
PIG-2759: Typo in document "Built In Functions" (daijy)
PIG-2745: Pig e2e test RubyUDFs fails in MR mode when running from tarball (cheolsoo via daijy)
PIG-2741: Python script throws an NameError: name 'Configuration' is not defined in case cache dir is not created
(knoguchi via daijy)
PIG-2669: Pig release should include pig-default.properties after rebuild (daijy)
PIG-2739: PyList should map to Bag automatically in Jython (daijy)
PIG-2730: TFileStorage getStatistics incorrectly throws an exception instead of returning null (traviscrawford via daijy)
PIG-2717: Tuple field mangled during flattening (daijy)
PIG-2721: Wrong output generated while loading bags as input (knoguchi via daijy)
PIG-2578: Multiple Store-commands mess up mapred.output.dir. (daijy)
PIG-2623: Support S3 paths for registering UDFs (nshkrob via daijy)
PIG-2505: AvroStorage won't read any file not ending in .avro (russell.jurney via daijy)
PIG-2585: Enable ignored e2e test cases (daijy)
PIG-2563: IndexOutOfBoundsException: while projecting fields from a bag (daijy)
PIG-2411: AvroStorage UDF in PiggyBank fails to STORE a bag of single-field tuples as Avro arrays (russell.jurney via daijy)
PIG-2565: Support IMPORT for macros stored in S3 Buckets (daijy)
PIG-2570: LimitOptimizer fails with dynamic LIMIT argument (daijy)
PIG-2543: PigStats.isSuccessful returns false if embedded pig script has sh commands (daijy)
PIG-2509: Util.getSchemaFromString fails with java.lang.NullPointerException when a tuple in a bag has no name (as when used in MongoStorage UDF) (jcoveney via daijy)
PIG-2559: Embedded pig in python; invoking sys.exit(0) causes script failure (vivekp via daijy)
PIG-2530: Reusing alias name in nested foreach causes incorrect results (daijy)
PIG-2489: Input Path Globbing{} not working with PigStorageSchema or PigStorage('\t', '-schema') (daijy)
PIG-2484: Fix several e2e test failures/aborts for 23 (daijy)
PIG-2400: Document has based aggregation support (chandec via daijy)
PIG-2444: Remove the Zebra *.xml documentation files from the TRUNK and Branch-10 (chandec via daijy)
PIG-2430: An EvalFunc which overrides getArgToFuncMapping with FuncSpec
with constructor arguments is not properly instantiated with said arguments (jcoveney via thejas)
PIG-2457: JsonLoaderStorage tests is broken for e2e (daijy)
PIG-2426: ProgressableReporter.progress(String msg) is an empty function (vivekp via daijy)
PIG-2363: _logs for streaming commands bug in new parser (vivekp via daijy)
PIG-2331: BinStorage in LOAD statement failing when input has curly braces (xutingz via thejas)
PIG-2391: Bzip_2 test is broken (xutingz via daijy)
PIG-2358: JobStats.getHadoopCounters() is never set and always returns null (xutingz via daijy)
PIG-2184: Not able to provide positional reference to macro invocations (xutingz via daijy)
PIG-2209: JsonMetadata fails to find schema for glob paths (daijy)
PIG-2165: Need a way to deal with params and param_file in embedded pig in python (daijy)
PIG-2313: NPE in ILLUSTRATE trying to get StatusReporter in STORE (daijy)
PIG-2335: bin/pig does not work with bash 3.0 (azaroth)
PIG-2275: NullPointerException from ILLUSTRATE (daijy)
PIG-2119: DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan (daijy)
PIG-2290: TOBAG wraps tuple parameters in another tuple (ryan.hoegg via thejas)
PIG-2288: Pig 0.9 error message not useful as compared to 0.8 in case
of group by (vivekp via thejas)
PIG-2309: Keyword 'NOT' is wrongly treated as a UDF in split statement (vivekp via thejas)
PIG-2307: Jetty version should be updated in .eclipse.templates/.classpath,
pig-template.xml and pig.pom as well (zjshen via daijy)
PIG-2273: Pig.compileFromFile in embedded python fails when pig script starts with a comment (ddaniels888 via gates)
PIG-2278: Wrong version numbers for libraries in eclipse template classpath (azaroth)
PIG-2115: Fix Pig HBaseStorage configuration and setup issues (gbowyer@fastmail.co.uk via dvryaboy)
PIG-2232: "declare" document contains a typo (daijy)
PIG-2055: inconsistent behavior in parser generated during build (thejas)
PIG-2185: NullPointerException while Accessing Empty Bag in FOREACH { FILTER } (daijy)
PIG-2227: Wrong jars copied into lib directory in e2e tests when invoked from top level (gates)
PIG-2219: Pig tests fail if ${user.home}/pigtest/conf does not already exist (cwsteinbach via gates)
PIG-2215: Newlines in function arguments still cause exceptions to be thrown (awarring via gates)
PIG-2214: InternalSortedBag two-arg constructor doesn't pass bagCount (sallen via gates)
PIG-2174: HBaseStorage column filters miss some fields (billgraham via dvryaboy)
PIG-2090: re-enable TestGrunt test cases (thejas)
PIG-2181: Improvement : for error message when describe misses alias (vivekp via daijy)
PIG-2124: Script never ending when joining from the same source (daijy)
PIG-2170: NPE thrown during illustrate (thejas)
PIG-2186: PigStorage new warnings about missing schema file
can be confusing (thejas)
PIG-2179: tests in TestLoad are failing (thejas)
PIG-2146: POStore.getSchema() returns null because of which PigOutputCommitter
is not storing schema while cleanup (thejas)
PIG-2027: NPE if Pig don't have permission for log file (daijy)
PIG-2171: TestScriptLanguage is broken on trunk (daijy and thejas)
PIG-2172: Fix test failure for ant 1.8.x (daijy)
PIG-2162: bin/pig should not modify user args (rangadi via thejas)
PIG-2060: Fix errors in pig grammars reported by ANTLRWorks (azaroth via thejas)
PIG-2156: Limit/Sample with variable does not work if the expression starts
with an integer/double (azaroth via thejas)
PIG-2130: Piggybank:MultiStorage is not compressing output files (vivekp via daijy)
PIG-2147: Support nested tags for XMLLoader (vivekp via daijy)
PIG-1890: Fix piggybank unit test TestAvroStorage (kengoodhope via daijy)
PIG-2110: NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor (dale_jin via daijy)
PIG-2144: ClassCastException when using IsEmpty(DIFF()) (thejas)
PIG-2139: LogicalExpressionSimplifier optimizer rule should check if udf is
deterministic while checking if they are equal (thejas)
PIG-2137: SAMPLE should not be pushed above DISTINCT (dvryaboy and thejas)
PIG-2136: Implementation of Sample should use LessThanExpression
instead of LessThanEqualExpression (azaroth via thejas)
PIG-2140: Usage printed from Main.java gives wrong option for disabling
LogicalExpressionSimplifier (thejas)
PIG-2120: UDFContext.getClientSystemProps() does not respect pig.properties (dvryaboy)
PIG-2129: NOTICE file needs updates (gates)
PIG-2131: Add back test for PIG-1769 (qwertymaniac via gates)
PIG-2112: ResourceSchema.toString does not properly handle maps in the schema (gates)
PIG-1702: Streaming debug output outputs null input-split information (awarring via daijy)
PIG-2109: Ant build continues even if the parser classes fail to be generated. (zjshen via daijy)
PIG-2071: casting numeric type to chararray during schema merge for union
is inconsistent with other schema merge cases (thejas)
PIG-2044: Patten match bug in org.apache.pig.newplan.optimizer.Rule (knoguchi via daijy)
PIG-2048: Add zookeeper to pig jar (gbowyer via gates)
PIG-2008: Cache outputFormat in HBaseStorage (thedatachef via gates)
PIG-2025: org.apache.pig.test.udf.evalfunc.TOMAP is missing package
declaration (azaroth via gates)
PIG-2019: smoketest-jar target has to depend on pigunit-jar to guarantee
inclusion of test classes (cos via gates)
Release 0.9.3 - Unreleased
BUG FIXES
PIG-2944: ivysettings.xml does not let you override .m2/repository (raluri via daijy)
PIG-2912: Pig should clone JobConf while creating JobContextImpl and TaskAttemptContextImpl in Hadoop23 (rohini via daijy)
PIG-2775: Register jar does not goes to classpath in some cases (daijy)
PIG-2693: LoadFunc.setLocation should be called before LoadMetadata.getStatistics (billgraham via julien)
PIG-2666: LoadFunc.setLocation() is not called when pig script only has Order By (daijy)
PIG-2671: e2e harness: Reference local test path via :LOCALTESTPATH: (thw via daijy)
PIG-2642: StoreMetadata.storeSchema can't access files in the output directory (Hadoop 0.23) (thw via daijy)
PIG-2621: Documentation inaccurate regarding Pig Properties in trunk (prkommireddi via daijy)
PIG-2550: Custom tuple results in "Unexpected datatype 110 while reading tuplefrom binary file" while spilling (daijy)
PIG-2442: Multiple Stores in pig streaming causes infinite waiting (daijy)
PIG-2609: e2e harness: make hdfs base path configurable (outside default.conf) (thw via daijy)
PIG-2576: Change in behavior for UDFContext.getUDFContext().getJobConf() in front-end (thw via daijy)
PIG-2588: e2e harness: use pig command for cluster deploy (thw via daijy)
PIG-2572: e2e harness deploy fails when using pig that does not bundle hadoop (thw via daijy)
PIG-2568: PigOutputCommitter hide exception in commitJob (daijy)
PIG-2564: Build fails - Hadoop 0.23.1-SNAPSHOT no longer available (thw via daijy)
PIG-2535: Bug in new logical plan results in no output for join (daijy)
PIG-2534: Pig generating infinite map outputs (daijy)
PIG-2493: UNION causes casting issues (vivekp via daijy)
PIG-2497: Order of execution of fs, store and sh commands in Pig is not maintained (daijy)
Release 0.9.2
IMPROVEMENTS
PIG-2766: Pig-HCat Usability (vikram.dixit via daijy)
PIG-2125: Make Pig work with hadoop .NEXT (daijy)
PIG-2471: Pig Requirements Hadoop (chandec via daijy)
PIG-2431: Upgrade bundled hadoop version to 1.0.0 (daijy)
PIG-2447: piggybank: get hive dependency from maven (thw via azaroth)
PIG-2347: Fix Pig Unit tests for hadoop 23 (daijy)
PIG-2128: Generating the jar file takes a lot of time and is unnecessary when running Pig local mode (julien)
BUG FIXES
PIG-2477: TestBuiltin testLFText/testSFPig failing against 23 due to invalid test setup -- InvalidInputException (phunt via daijy)
PIG-2462: getWrappedSplit is incorrectly returning the first split instead of the current split. (arov via daijy)
PIG-2472: piggybank unit tests write directly to /tmp (thw via daijy)
PIG-2413: e2e test should support testing against two cluster (daijy)
PIG-2342: Pig tutorial documentation needs to update about building tutorial (daijy)
PIG-2458: Can't have spaces in parameter substitution (jcoveney via daijy)
PIG-2410: Piggybank does not compile in 23 (daijy)
PIG-2418: rpm release package does not take PIG_CLASSPATH (daijy)
PIG-2291: PigStats.isSuccessful returns false if embedded pig script has dump (xutingz via daijy)
PIG-2415: A fix for 0.23 local mode: put "yarn-default.xml" into the configuration (daijy)
PIG-2402: inIllustrator condition in PigMapReduce is wrong for hadoop 23 (daijy)
PIG-2370: SkewedParitioner results in Kerberos error (daijy)
PIG-2374: streaming regression with dotNext (daijy)
PIG-2387: BinStorageRecordReader causes negative progress (xutingz via daijy)
PIG-2354: Several fixes for bin/pig (daijy)
PIG-2385: Store statements not getting processed (daijy)
PIG-2320: Error: "projection with nothing to reference" (daijy)
PIG-2346: TypeCastInsert should not insert Foreach if there is no as statement (daijy)
PIG-2339: HCatLoader loads all the partitions in a partitioned table even though
a filter clause on the partitions is specified in the Pig script (daijy)
PIG-2316: Incorrect results for FILTER *** BY ( *** OR ***) with
FilterLogicExpressionSimplifier optimizer turned on (knoguchi via thejas)
PIG-2271: PIG regression in BinStorage/PigStorage in 0.9.1 (thejas)
Release 0.9.1
IMPROVEMENTS
PIG-2284: Add pig-setup-conf.sh script (eyang via daijy)
PIG-2272: e2e test harness should be able to set HADOOP_HOME (gates via daijy)
PIG-2239: Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself (daijy)
PIG-2213: Pig 0.9.1 Documentation (chandec via daijy)
PIG-2221: Couldnt find documentation for ColumnMapKeyPrune optimization rule (chandec via daijy)
BUG FIXES
PIG-2310: bin/pig fail when both pig-0.9.1.jar and pig.jar are in PIG_HOME (daijy)
PIG-1857: Create an package integration project (eyang via daijy)
PIG-2013: Penny gets a null pointer when no properties are set (breed via daijy)
PIG-2102: MonitoredUDF does not work (dvryaboy)
PIG-2152: Null pointer exception while reporting progress (thejas)
PIG-2183: Pig not working with Hadoop 0.20.203.0 (daijy)
PIG-2193: Using HBaseStorage to scan 2 tables in the same Map job produces bad data (rangadi via dvryaboy)
PIG-2199: Penny throws Exception when netty classes are missing (ddaniels888 via daijy)
PIG-2223: error accessing column in output schema of udf having project-star input (thejas)
PIG-2208: Restrict number of PIG generated Haddop counters (rding via daijy)
PIG-2299: jetty 6.1.14 startup issue causes unit tests to fail in CI (thw via daijy)
PIG-2301: Some more bin/pig, build.xml cleanup for 0.9.1 (daijy)
PIG-2237: LIMIT generates wrong number of records if pig determines no of reducers as more than 1 (daijy)
PIG-2261: Restore support for parenthesis in Pig 0.9 (rding via daijy)
PIG-2238: Pig 0.9 error message not useful as compared to 0.8 (daijy)
PIG-2286: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization (daijy)
PIG-2270: Put jython.jar in classpath (daijy)
PIG-2274: remove pig deb package dependency on sun-java6-jre (gkesavan via daijy)
PIG-2264: Change conf/log4j.properties to conf/log4j.properties.template (daijy)
PIG-2231: Limit produce wrong number of records after foreach flatten (daijy)
Release 0.9.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1622: DEFINE streaming options are ill defined and not properly documented (xuefu)
PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy)
PIG-1745: Disable converting bytes loading from BinStorage (daijy)
PIG-1188: Padding nulls to the input tuple according to input schema (daijy)
PIG-1876: Typed map for Pig (daijy)
IMPROVEMENTS
PIG-1938: support project-range as udf argument (thejas)
PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu)
PIG-2062: Script silently ended (xuefu)
PIG-2039: IndexOutOfBounException for a case (xuefu)
PIG-2038: Pig fails to parse empty tuple/map/bag constant (xuefu)
PIG-1775: Removal of old logical plan (xuefu)
PIG-1998: Allow macro to return void (rding)
PIG-2003: Using keyward as alias doesn't either emit an error or produce a logical plan (xuefu)
PIG-1981: LoadPushDown.pushProjection should pass alias in addition to position (daijy)
PIG-2006: Regression: NPE when Pig processes an empty script file, fix test case (xuefu)
PIG-2006: Regression: NPE when Pig processes an empty script file (xuefu)
PIG-2007: Parsing error when map key referred directly from udf in nested foreach (xuefu)
PIG-2000: Pig gives incorrect error message dealing with scalar projection (xuefu)
PIG-2002: Regression: Pig gives error "Projection with nothing to reference!" for a valid query (xuefu)
PIG-1921: Improve error messages in new parser (xuefu)
PIG-1996: Pig new parser fails to recognize PARALLEL keywords in a case (xuefu)
PIG-1612: error reporting: PigException needs to have a way to indicate that
its message is appropriate for user (laukik via thejas)
PIG-1782: Add ability to load data by column family in HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1954: Design deployment interface for e2e test harness (gates)
PIG-1881: Need a special interface for Penny (Inspector Gadget) (laukik via
gates)
PIG-1947: Incorrect line number is reported during parsing(xuefu)
PIG1918: Line number should be give for logical plan failures (xuefu)
PIG-1961: Pig prints "null" as file name in case of grammar error (xuefu)
PIG-1956: Pig parser shouldn't log error code 0 (xuefu)
PIG-1957: Pig parser gives misleading error message when the next foreach block has syntactic errors (xuefu)
PIG-1958: Regression: Pig doesn't log type cast warning messages (xuefu)
PIG-1918: Line number should be give for logical plan failures (xuefu)
PIG-1899: Add end to end test harness for Pig (gates)
PIG-1932: GFCross should allow the user to set the DEFAULT_PARALLELISM value (gates)
PIG-1913: Use a file for excluding tests (tomwhite via gates)
PIG-1693: support project-range expression. (was: There
needs to be a way in foreach to indicate "and all the
rest of the fields" ) (thejas)
PIG-1772: Pig 090 Documentation (chandec via daijy)
PIG-1830: Type mismatch error in key from map, when doing GROUP on PigStorageSchema() variable (dvryaboy)
PIG-1566: Support globbing for registering jars in pig script (nrai via daijy)
PIG-1886: Add zookeeper jar to list of jars shipped when HBaseStorage used (dvryaboy)
PIG-1874: Make PigServer work in a multithreading environment (rding)
PIG-1889: bin/pig should pick up HBase configuration from HBASE_CONF_DIR
PIG-1794: Javascript support for Pig embedding and UDFs in scripting languages (julien)
PIG-1853: Using ANTLR jars from maven repository (rding)
PIG-1728: more doc updates (chandec via olgan)
PIG-1793: Add macro expansion to Pig Latin (rding)
PIG-847: Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag (daijy)
PIG-1748: Add load/store function AvroStorage for avro data (guolin2001, jghoman via daijy)
PIG-1769: Consistency for HBaseStorage (dvryaboy)
PIG-1786: Move describe/nested describe to new logical plan (daijy)
PIG-1809: addition of TOMAP function (olgan)
PIG-1749: Update Pig parser so that function arguments can contain newline characters (jghoman via daijy)
PIG-1806: Modify embedded Pig API for usability (rding)
PIG-1799: Provide deployable maven artifacts for pigunit and pig smoke tests
(cos via gates)
PIG-1728: turing complete docs (chandec via olgan)
PIG-1675: allow PigServer to register pig script from InputStream (zjffdu via dvryaboy)
PIG-1479: Embed Pig in scripting languages (rding)
PIG-946: Combiner optimizer does not optimize when limit follow group, foreach (thejas)
PIG-1277: Pig should give error message when cogroup on tuple keys of different inner type (daijy)
PIG-1755: Clean up duplicated code in PhysicalOperators (dvryaboy)
PIG-750: Use combiner when algebraic UDFs are used in expressions (thejas)
PIG-490: Combiner not used when group elements referred to in
tuple notation instead of flatten. (thejas)
PIG-1768: 09 docs: illustrate (changec via olgan)
PIG-1768: docs reorg (changec via olgan)
PIG-1712: ILLUSTRATE rework (yanz)
PIG-1758: Deep cast of complex type (daijy)
PIG-1728: doc updates (chandec via olgan)
PIG-1752: Enable UDFs to indicate files to load into the Distributed Cache
(gates)
PIG-1747: pattern match classes for matching patterns in physical plan (thejas)
PIG-1707: Allow pig build to pull from alternate maven repo to enable building
against newer hadoop versions (pradeepkth)
PIG-1618: Switch to new parser generator technology (xuefuz via thejas)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1508: Make 'docs' target (forrest) work with Java 1.6 (cwsteinbach via gates)
PIG-1608: pig should always include pig-default.properties and pig.properties in the pig.jar (nrai via daijy)
OPTIMIZATIONS
PIG-1696: Performance: Use System.arraycopy() instead of manually copying the bytes while reading the data (hashutosh)
BUG FIXES
PIG-2159: New logical plan uses incorrect class for SUM causing for ClassCastException (daijy)
PIG-2106: Fix Zebra unit test TestBasicUnion.testNeg3, TestBasicUnion.testNeg4 (daijy)
PIG-2083: bincond ERROR 1025: Invalid field projection when null is used (thejas)
PIG-2089: Javadoc for ResourceFieldSchema.getSchema() is wrong (daijy)
PIG-2084: pig is running validation for a statement at a time batch mode,
instead of running it for whole script (thejas)
PIG-2088: Return alias validation failed when there is single line comment in the macro (rding)
PIG-2081: Dryrun gives wrong line numbers in error message for scripts containing macro (rding)
PIG-2078: POProject.getNext(DataBag) does not handle null (daijy)
PIG-2029: Inconsistency in Pig Stats reports (rding)
PIG-2070: "Unknown" appears in error message for an error case (thejas)
PIG-2069: LoadFunc jar does not ship to backend in MultiQuery case (rding)
PIG-2076: update documentation, help command with correct default value
of pig.cachedbag.memusage (thejas)
PIG-2072: NPE when udf has project-star argument and input schema is null (thejas)
PIG-2075: Bring back TestNewPlanPushUpFilter (daijy)
PIG-1827: When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason (rding)
PIG-2056: Jython error messages should show script name (rding)
PIG-2014: SAMPLE shouldn't be pushed up (dvryaboy)
PIG-2058: Macro missing returns clause doesn't give a good error message (rding)
PIG-2035: Macro expansion doesn't handle multiple expansions of same macro inside another macro (rding)
PIG-2030: Merged join/cogroup does not automatically ship loader (daijy)
PIG-2052: Ship guava.jar to backend (daijy)
PIG-2012: Comments at the begining of the file throws off line numbers in errors (rding)
PIG-2043: Ship antlr-runtime.jar to backend (daijy)
PIG-2049: Pig should display TokenMgrError message consistently across all parsers (rding)
PIG-2041: Minicluster should make each run independent (daijy)
PIG-2040: Move classloader from QueryParserDriver to PigContext (daijy)
PIG-1999: Macro alias masker should consider schema context (rding)
PIG-1821: UDFContext.getUDFProperties does not handle collisions
in hashcode of udf classname (+ arg hashcodes) (thejas)
PIG-2028: Speed up multiquery unit tests (rding)
PIG-1990: support casting of complex types with empty inner schema
to complex type with non-empty inner schema (thejas)
PIG-2016: -dot option does not work with explain and new logical plan (daijy)
PIG-2018: NPE for co-group with group-by column having complex schema and
different load functions for each input (thejas)
PIG-2015: Explain writes out logical plan twice (alangates)
PIG-2017: consumeMap() fails with EmptyStackException (thedatachef via daijy)
PIG-1989: complex type casting should return null on casting failure (daijy)
PIG-1826: Unexpected data type -1 found in stream error (daijy)
PIG-2004: Incorrect input types passed on to eval function (thejas)
PIG-1814: mapred.output.compress in SET statement does not work (daijy)
PIG-1976: One more TwoLevelAccess to remove (daijy)
PIG-1865: BinStorage/PigStorageSchema cannot load data from a different namenode (daijy)
PIG-1910: incorrect schema shown when project-star is used with other projections (daijy)
PIG-2005: Discrepancy in the way dry run handles semicolon in macro definition (rding)
PIG-1281: Detect org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.Tuple type of errors at Compile Type during
creation of logical plan (thejas)
PIG-1939: order-by statement should support project-range to-end in
any position among the sort columns if input schema is known (thejas)
PIG-1978: Secondary sort fail when dereferencing two fields inside foreach (daijy)
PIG-1962: Wrong alias assinged to store operator (daijy)
PIG-1975: Need to provide backward compatibility for legacy LoadCaster (without bytesToMap(bytes, fieldSchema)) (daijy)
PIG-1987: -dryrun does not work with set (rding)
PIG-1871: Dont throw exception if partition filters cannot be pushed up. (rding)
PIG-1870: HBaseStorage doesn't project correctly (dvryaboy)
PIG-1788: relation-as-scalar error messages should indicate the field
being used as scalar (laukik via thejas)
PIG-1697: NullPointerException if log4j.properties is Used (laukik via daijy)
PIG-1929:Type checker failed to catch invalid type comparison (thejas)
PIG-1928: Type Checking, incorrect error message (thejas)
PIG-1979: New logical plan failing with ERROR 2229: Couldn't find matching uid -1 (daijy)
PIG-1897: multiple star projection in a statement does not produce
the right plan (thejas)
PIG-1917: NativeMapReduce does not Allow Configuration Parameters
containing Spaces (thejas)
PIG-1974: Lineage need to set for every cast (thejas)
PIG-1988: Importing an empty macro file causing NPE (rding)
PIG-1977: "Stream closed" error while reading Pig temp files (results of intermediate jobs) (rding)
PIG-1963: in nested foreach, accumutive udf taking input from order-by does not get results in order (thejas)
PIG-1911: Infinite loop with accumulator function in nested foreach (thejas)
PIG-1923: Jython UDFs fail to convert Maps of Integer values back to Pig types (julien)
PIG-1944: register javascript UDFs does not work (julien)
PIG-1955: PhysicalOperator has a member variable (non-static) Log object that
is non-transient, this causes serialization errors (woody via rding)
PIG-1964: PigStorageSchema fails if a column value is null (thejas))
PIG-1866: Dereference a bag within a tuple does not work (daijy)
PIG-1984: Worng stats shown when there are multiple stores but same file names (rding)
PIG-1893: Pig report input size -1 for empty input file (rding)
PIG-1868: New logical plan fails when I have complex data types from udf
(daijy)
PIG-1927: Dereference partial name failed (daijy)
PIG-1934: Fix zebra test TestCheckin1, TestCheckin4 (daijy)
PIG-1931: Integrate Macro Expansion with New Parser (rding)
PIG-1933: Hints such as 'collected' and 'skewed' for "group by" or "join by"
should not be treated as tokens. (xuefuz via thejas)
PIG-1925: Parser error message doesn't show location of the error or show it
as Line 0:0 (xuefuz via gates)
PIG-671: typechecker does not throw an error when multiple arguments are
passed to COUNT (deepujain via gates)
PIG-1152: bincond operator throws parser error (xuefuz via thejas)
PIG-1885: SUBSTRING fails when input length less than start (deepujain via
gates)
PIG-719: store <expr> into 'filename'; should be valid syntax, but does not work (xuefuz via thejas)
PIG-1770: matches clause problem with chars that have special meaning in dk.brics - #, @ .. (thejas)
PIG-1862: Pig returns exit code 0 for the failed Pig script due to non-existing input directory (rding)
PIG-1888: Fix TestLogicalPlanGenerator not use hardcoded path (daijy)
PIG-1837: Error while using IsEmpty function (rding)
PIG-1884: Change ReadToEndLoader.setLocation not throw UnsupportedOperationException (thejas)
PIG-1887: Fix pig-withouthadoop.jar to contains proper jars (daijy)
PIG-1779: Wrong stats shown when there are multiple loads but same file names (rding)
PIG-1861: The pig script stored in the Hadoop History logs is stored as a concatenated string without whitespace this causes problems when attempting to extract and execute the script (rding)
PIG-1829: "0" value seen in PigStat's map/reduce runtime, even when the job is successful (rding)
PIG-1856: Custom jar is not packaged with the new job created by LimitAdjuster (rding)
PIG-1872: Fix bug in AvroStorage (guolin2001, jghoman via daijy)
PIG-1536: use same logic for merging inner schemas in "default union" and
"union onschema" (daijy)
PIG-1304: Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input (laukik via rding)
PIG-1852: Packaging antlr jar with pig.jar (rding via daijy)
PIG-1717 pig needs to call setPartitionFilter if schema is null but
getPartitionKeys is not (gerritjvv via gates)
PIG-313: Error handling aggregate of a computation (daijy)
PIG-496: project of bags from complex data causes failures (daijy)
PIG-730: problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema (daijy)
PIG-767: Schema reported from DESCRIBE and actual schema of inner bags are different (daijy)
PIG-1801: Need better error message for Jython errors (rding)
PIG-1742: org.apache.pig.newplan.optimizer.Rule.java does not work
with plan patterns where leaves/sinks are not siblings (thejas)
Release 0.8.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1518: multi file input format for loaders (yanz via rding)
PIG-1249: Safe-guards against misconfigured Pig scripts without PARALLEL keyword (zjffdu vi olgan)
IMPROVEMENTS
PIG-1561: XMLLoader in Piggybank does not support bz2 or gzip compressed XML files (vivekp via daijy)
PIG-1677: modify the repository path of pig artifacts to org/apache/pig in stead or org/apache/hadoop/pig (nrai via olgan)
PIG-1600: Docs update (romainr via olgan)
PIG-1632: The core jar in the tarball contains the kitchen sink (eli via olgan)
PIG-1617: 'group all' should always use one reducer (thejas)
PIG-1589: add test cases for mapreduce operator which use distributed cache (thejas)
PIG-1548: Optimize scalar to consolidate the part file (rding)
PIG-1600: Docs update (chandec via olgan)
PIG-1585: Add new properties to help and documentation(olgan)
PIG-1399: Filter expression optimizations (yanz via gates)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1458: aggregate files for replicated join (rding)
PIG-1205: Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc (zjffdu and dvryaboy)
PIG-1568: Optimization rule FilterAboveForeach is too restrictive and doesn't
handle project * correctly (xuefuz via daijy)
PIG-1574: Optimization rule PushUpFilter causes filter to be pushed up out joins (xuefuz via daijy)
PIG-1515: Migrate logical optimization rule: PushDownForeachFlatten (xuefuz via daijy)
PIG-1321: Logical Optimizer: Merge cascading foreach (xuefuz via daijy)
PIG-1483: [piggybank] Add HadoopJobHistoryLoader to the piggybank (rding)
PIG-1555: [piggybank] add CSV Loader (dvryaboy)
PIG-1501: need to investigate the impact of compression on pig performance (yanz via thejas)
PIG-1497: Mandatory rule PartitionFilterOptimizer (xuefuz via daijy)
PIG-1514: Migrate logical optimization rule: OpLimitOptimizer (xuefuz via daijy)
PIG-1551: Improve dynamic invokers to deal with no-arg methods and array parameters (dvryaboy)
PIG-1311: Document audience and stability for remaining interfaces (gates)
PIG-506: Does pig need a NATIVE keyword? (aniket486 via thejas)
PIG-1510: Add `deepCopy` for LogicalExpressions (swati.j via daijy)
PIG-1447: Tune memory usage of InternalCachedBag (thejas)
PIG-1505: support jars and scripts in dfs (anhi via rding)
PIG-1334: Make pig artifacts available through maven (niraj via rding)
PIG-1466: Improve log messages for memory usage (thejas)
PIG-1404: added PigUnit, a framework fo building unit tests of Pig Latin scripts (romainr via gates)
PIG-1452: to remove hadoop20.jar from lib and use hadoop from the apache maven
repo. (rding)
PIG-1295: Binary comparator for secondary sort (azaroth via daijy)
PIG-1448: Detach tuple from inner plans of physical operator (thejas)
PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi
via olgan)
PIG-103: Shared Job /tmp location should be configurable (niraj via rding)
PIG-1496: Mandatory rule ImplicitSplitInserter (yanz via daijy)
PIG-346: grant help command cleanup (olgan)
PIG-1199: help includes obsolete options (olgan)
PIG-1434: Allow casting relations to scalars (aniket486 via rding)
PIG-1461: support union operation that merges based on column names (thejas)
PIG-1517: Pig needs to support keywords in the package name (aniket486 via olgan)
PIG-928: UDFs in scripting languages (aniket486 via daijy)
PIG-1509: Add .gitignore file (cwsteinbach via gates)
PIG-1478: Add progress notification listener to PigRunner API (rding)
PIG-1472: Optimize serialization/deserialization between Map and Reduce and between MR jobs (thejas)
PIG-1389: Implement Pig counter to track number of rows for each input files
(rding)
PIG-1454: Consider clean up backend code (rding)
PIG-1333: API interface to Pig (rding)
PIG-1405: Need to move many standard functions from piggybank into Pig
(aniket486 via daijy)
PIG-1427: Monitor and kill runaway UDFs (dvryaboy)
PIG-1428: Make a StatusReporter singleton available for incrementing counters (dvryaboy)
PIG-972: Make describe work with nested foreach (aniket486 via daijy)
PIG-1438: [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
(rding)
PIG-1441: new test targets (olgan)
PIG-282: Custom Partitioner (aniket486 via daijy)
PIG-283: Allow to set arbitrary jobconf key-value pairs inside pig program (hashutosh)
PIG-1373: We need to add jdiff output to docs on the website (daijy)
PIG-1422: Duplicate code in LOPrinter.java (zjffdu)
PIG-1420: Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple (rjurney via dvryaboy)
PIG-1408: Annotate explain plans with aliases (rding)
PIG-1410: Make PigServer can handle files with parameters (zjffdu)
PIG-1406: Allow to run shell commands from grunt (zjffdu)
PIG-1398: Marking Pig interfaces for org.apache.pig.data package (gates)
PIG-1396: eclipse-files target in build.xml fails to generate necessary classes in src-gen
PIG-1390: Provide a target to generate eclipse-related classpath and files (chaitk via thejas)
PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy)
PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1363: Unnecessary loadFunc instantiations (hashutosh)
PIG-1370: Marking Pig interface for org.apache.pig package (gates)
PIG-1354: UDFs for dynamic invocation of simple Java methods (dvryaboy)
PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that
bzip files can be efficiently processed by splitting the files (pradeepkth)
PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in
subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
(pradeepkth)
PIG-1413: Remove svn:externals reference for test-patch.sh and
create a local copy of test-patch.sh (gkesavan)
PIG-1302: Include zebra's "pigtest" ant target as a part of pig's
ant test target. (gkesavan)
PIG-1582: To upgrade commons-logging
OPTIMIZATIONS
PIG-1353: Map-side joins (ashutoshc)
PIG-1309: Map-side Cogroup (ashutoshc)
BUG FIXES
PIG-2067: FilterLogicExpressionSimplifier removed some branches in some cases (daijy)
PIG-2033: Pig returns sucess for the failed Pig script (rding)
PIG-1993: PigStorageSchema throw NPE with ColumnPruning (daijy)
PIG-1935: New logical plan: Should not push up filter in front of Bincond (daijy)
PIG-1912: non-deterministic output when a file is loaded multiple times (daijy)
PIG-1892: Bug in new logical plan : No output generated even though there are
valid records (daijy)
PIG-1808: Error message in 0.8 not much helpful as compared to 0.7 (daijy)
PIG-1850: Order by is failing with ClassCastException if schema is undefined
for new logical plan in 0.8 (daijy)
PIG-1831: Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf (daijy)
PIG-1841: TupleSize implemented incorrectly (laukik via daijy)
PIG-1843: NPE in schema generation (daijy)
PIG-1820: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF (daijy)
PIG-1854: Pig returns exit code 0 for the failed Pig script (rding)
PIG-1812: Problem with DID_NOT_FIND_LOAD_ONLY_MAP_PLAN (daijy)
PIG-1813: Pig 0.8 throws ERROR 1075 while trying to refer a map in the result
of eval udf.Works with 0.7 (daijy)
PIG-1776: changing statement corresponding to alias after explain , then
doing dump gives incorrect result (thejas)
PIG-1800: Missing Signature for maven staging release (rding)
PIG-1815: pig task retains used instances of PhysicalPlan (thejas)
PIG-1785: New logical plan: uid conflict in flattened fields (daijy)
PIG-1787: Error in logical plan generated (daijy)
PIG-1791: System property mapred.output.compress, but pig-cluster-hadoop-site.xml doesn't (daijy)
PIG-1771: New logical plan: Merge schema fail if LoadFunc.getSchema return different schema with "Load...AS" (daijy)
PIG-1766: New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite (daijy)
PIG-1762: Logical simplification fails on map key referenced values (yanz)
PIG-1761: New logical plan: Exception when bag dereference in the middle of expression (daijy)
PIG-1757: After split combination, the number of maps may vary slightly (yanz)
PIG-1760: Need to report progress in all databags (rding)
PIG-1709: Skewed join use fewer reducer for extreme large key (daijy)
PIG-1751: New logical plan: PushDownForEachFlatten fail in UDF with unknown
output schema (daijy)
PIG-1741: Lineage fail when flatten a bag (daijy)
PIG-1739: zero status is returned when pig script fails (yanz)
PIG-1738: New logical plan: Optimized UserFuncExpression.getFieldSchema (daijy)
PIG-1732: New logical plan: logical plan get confused if we generate the same
field twice in ForEach (daijy)
PIG-1737: New logical plan: Improve error messages when merge schema fail (daijy)
PIG-1725: New logical plan: uidOnlySchema bug in LOGenerate (daijy)
PIG-1729: New logical plan: Dereference does not add into plan after deepCopy (daijy)
PIG-1721: New logical plan: script fail when reuse foreach inner alias (daijy)
PIG-1716: New logical plan: LogToPhyTranslationVisitor should translate the structure for regex optimization (daijy)
PIG-1740: Fix SVN location in setup doc (chandec via olgan)
PIG-1719: New logical plan: FieldSchema generation for BinCond is wrong (daijy)
PIG-1720: java.lang.NegativeArraySizeException during Quicksort (thejas)
PIG-1727: Hadoop default config override pig.properties (rding)
PIG-1731: Stack Overflows where there are composite logical expressions on UDFs using the new logical plan (yanz)
PIG-1723: Need to limit the length of Pig counter names (rding)
PIG-1714: Option mapred.output.compress doesn't work in Pig 0.8 but worked in
0.7 (xuefuz via rding)
PIG-1715: pig-withouthadoop.jar missing automaton.jar (thejas)
PIG-1706: New logical plan: PushDownFlattenForEach fail if flattened field has user defined schema (daijy)
PIG-1705: New logical plan: self-join fail for some queries (daijy)
PIG-1704: Output Compression is not at work if the output path is absolute and there is a trailing / afte the compression suffix (yanz)
PIG-1695: MergeForEach does not carry user defined schema if any one of the merged ForEach has user defined schema (daijy)
PIG-1684: Inconsistent usage of store func. (thejas)
PIG-1694: union-onschema projects null schema at parsing stage for some queries (thejas)
PIG-1685: Pig is unable to handle counters for glob paths ? (daijy)
PIG-1683: New logical plan: Nested foreach plan fail if one inner alias is refered more than once (daijy)
PIG-1542: log level not propogated to MR task loggers (nrai via daijy)
PIG-1673: query with consecutive union-onschema statement errors out (thejas)
PIG-1653: Scripting UDF fails if the path to script is an absolute path (daijy)
PIG-1669: PushUpFilter fail when filter condition contains scalar (daijy)
PIG-1672: order of relations in replicated join gets switched in a query where
first relation has two mergeable foreach statements (thejas)
PIG-1666: union onschema fails when the input relation has cast from bytearray to another type (thejas)
PIG-1655: code duplicated for udfs that were moved from piggybank to builtin (nrai via daijy)
PIG-1670: pig throws ExecException in stead of FrontEnd exception when the plan validation fails (nrai via daijy)
PIG-1668: Order by failed with RuntimeException (rding)
PIG-1659: sortinfo is not set for store if there is a filter after ORDER BY (daijy)
PIG-1664: leading '_' in directory/file names should be ignored; the "pigtest" build target should include all pig-related zebra tests. (yanz)
PIG-1662: Need better error message for MalFormedProbVecException (rding)
PIG-1656: TOBAG udfs ignores columns with null value; it does not use input type
to determine output schema (thejas)
PIG-1658: ORDER BY does not work properly on integer/short keys that are -1 (yanz)
PIG-1638: sh output gets mixed up with the grunt prompt (nrai via daijy)
PIG-1607: pig should have separate javadoc.jar in the maven
repository (nrai via thejas)
PIG-1651: PIG class loading mishandled (rding)
PIG-1650: pig grunt shell breaks for many commands like perl , awk ,
pipe , 'ls -l' etc (nrai via thejas)
PIG-1649: FRJoin fails to compute number of input files for replicated
input (thejas)
PIG-1637: Combiner not use because optimizor inserts a foreach between group
and algebric function (daijy)
PIG-1648: Split combination may return too many block locations to map/reduce framework (yanz)
PIG-1641: Incorrect counters in local mode (rding)
PIG-1647: Logical simplifier throws a NPE (yanz)
PIG-1642: Order by doesn't use estimation to determine the parallelism (rding)
PIG-1644: New logical plan: Plan.connect with position is misused in some
places (daijy)
PIG-1643: join fails for a query with input having 'load using pigstorage
without schema' + 'foreach' (daijy)
PIG-1645: Using both small split combination and temporary file compression on a query of ORDER BY may cause crash (yanz)
PIG-1635: Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of
AND and OR may get changed (yanz)
PIG-1639: New logical plan: PushUpFilter should not push before group/cogroup
if filter condition contains UDF (xuefuz via daijy)
PIG-1643: join fails for a query with input having 'load using pigstorage
without schema' + 'foreach' (thejas)
PIG-1628: log this message at debug level : 'Pig Internal storage in use' (thejas)
PIG-1636: Scalar fail if the scalar variable is generated by limit (daijy)
PIG-1605: PIG-1605: Adding soft link to plan to solve input file dependency
(daijy)
PIG-1598: Pig gobbles up error messages - Part 2 (nrai via daijy)
PIG-1616: 'union onschema' does not use create output with correct schema
when udfs are involved (thejas)
PIG-1610: 'union onschema' does handle some cases involving 'namespaced'
column names in schema (thejas)
PIG-1609: 'union onschema' should give a more useful error message when
schema of one of the relations has null column name (thejas)
PIG-1562: Fix the version for the dependent packages for the maven (nrai via
rding)
PIG-1604: 'relation as scalar' does not work with complex types (thejas)
PIG-1601: Make scalar work for secure hadoop (daijy)
PIG-1602: The .classpath of eclipse template still use hbase-0.20.0 (zjffdu)
PIG-1596: NPE's thrown when attempting to load hbase columns containing null values (zjffdu)
PIG-1597: Development snapshot jar no longer picked up by bin/pig (dvryaboy)
PIG-1599: pig gives generic message for few cases (nrai via rding)
PIG-1595: casting relation to scalar- problem with handling of data from non PigStorage loaders (thejas)
PIG-1591: pig does not create a log file, if tje MR job succeeds but front end fails (nrai via daijy)
PIG-1543: IsEmpty returns the wrong value after using LIMIT (daijy)
PIG-1550: better error handling in casting relations to scalars (thejas)
PIG-1572: change default datatype when relations are used as scalar to bytearray (thejas)
PIG-1583: piggybank unit test TestLookupInFiles is broken (daijy)
PIG-1563: some of string functions don't work on bytearrays (olgan)
PIG-1569: java properties not honored in case of properties such as
stop.on.failure (rding)
PIG-1570: native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs (thejas)
PIG-1343: pig_log file missing even though Main tells it is creating one and
an M/R job fails (nrai via rding)
PIG-1482: Pig gets confused when more than one loader is involved (xuefuz via thejas)
PIG-1579: Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput (daijy)
PIG-1557: couple of issue mapping aliases to jobs (rding)
PIG-1552: Nested describe failed when the alias is not referred in the first foreach inner plan (aniket486 via daijy)
PIG-1486: update ant eclipse-files target to include new jar and remove contrib dirs from build path (thejas)
PIG-1524: 'Proactive spill count' is misleading (thejas)
PIG-1546: Incorrect assert statements in operator evaluation (ajaykidave via
pradeepkth)
PIG-1392: Parser fails to recognize valid field (niraj via rding)
PIG-1541: FR Join shouldn't match null values (rding)
PIG-1525: Incorrect data generated by diff of SUM (rding)
PIG-1288: EvalFunc returnType is wrong for generic subclasses (daijy)
PIG-1534: Code discovering UDFs in the script has a bug in a order by case
(pradeepkth)
PIG-1533: Compression codec should be a per-store property (rding)
PIG-1527: No need to deserialize UDFContext on the client side (rding)
PIG-1516: finalize in bag implementations causes pig to run out of memory in reduce (thejas)
PIG-1521: explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used (thejas)
PIG-1513: Pig doesn't handle empty input directory (rding)
PIG-1500: guava.jar should be removed from the lib folder (niraj via rding)
PIG-1034: Pig does not support ORDER ... BY group alias (zjffdu)
PIG-1445: Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented (daijy)
PIG-348: -j command line option doesn't work (rding)
PIG-1487: Replace "bz" with ".bz" in all the LoadFunc
PIG-1489: Pig MapReduceLauncher does not use jars in register statement
(rding)
PIG-1435: make sure dependent jobs fail when a jon in multiquery fails (niraj
via rding)
PIG-1492: DefaultTuple and DefaultMemory understimate their memory footprint (thejas)
PIG-1409: Fix up javadocs for org.apache.pig.builtin (gates)
PIG-1490: Make Pig storers work with remote HDFS in secure mode (rding)
PIG-1469: DefaultDataBag assumes ArrayList as default List type (azaroth via dvryaboy)
PIG-1467: order by fail when set "fs.file.impl.disable.cache" to true (daijy)
PIG-1463: Replace "bz" with ".bz" in setStoreLocation in PigStorage (zjffdu)
PIG-1221: Filter equality does not work for tuples (zjffdu)
PIG-1456: TestMultiQuery takes a long time to run (rding)
PIG-1457: Pig will run complete zebra test even we give -Dtestcase=xxx (daijy)
PIG-1450: TestAlgebraicEvalLocal failures due to OOM (daijy)
PIG-1433: pig should create success file if
mapreduce.fileoutputcommitter.marksuccessfuljobs is true (pradeepkth)
PIG-1347: Clear up output directory for a failed job (daijy)
PIG-1419: Remove "user.name" from JobConf (daijy)
PIG-1359: bin/pig script does not pick up correct jar libraries (zjffdu)
PIG-566: Dump and store outputs do not match for PigStorage (azaroth via daijy)
PIG-1414: Problem with parameter substitution (rding)
PIG-1407: Logging starts before being configured (azaroth via daijy)
PIG-1391: pig unit tests leave behind files in temp directory because
MiniCluster files don't get deleted (thejas)
PIG-1211: Pig script runs half way after which it reports syntax error
(pradeepkth)
PIG-1401: "explain -script <script file>" executes grunt commands like
run/dump/copy etc - explain -script should not execute any grunt command and
only explain the query plans (pradeepkth)
PIG-1303: Inconsistent instantiation of parametrized UDFs (jrussek and dvryaboy)
740 : Incorrect line number is generated when a string with double quotes is
used instead of single quotes and is passed to UDF (pradeepkth)
1378: har url not usable in Pig scripts (pradeepkth)
PIG-1395: Mapside cogroup runs out of memory (ashutoshc)
PIG-1383: Remove empty svn directorirs from source tree (rding)
PIG-1348: PigStorage making unnecessary byte array copy when storing data
(rding)
PIG-1372: Restore PigInputFormat.sJob for backward compatibility (pradeepkth)
PIG-1369: POProject does not handle null tuples and non existent fields in
some cases (pradeepkth)
PIG-1364: Public javadoc on apache site still on 0.2, needs to be updated for each version release (gates)
PIG-1338: Pig should exclude hadoop conf in local mode (daijy)
PIG-1299: Implement Pig counter to track number of output rows for each output
files (rding)
PIG-1366: PigStorage's pushProjection implementation results in NPE under
certain data conditions (pradeepkth)
PIG-1365: WrappedIOException is missing from Pig.jar (pradeepkth)
PIG-1313: PigServer leaks memory over time (billgraham via daijy)
PIG-1346: In unit tests Util.executeShellCommand relies on java commands being
in the path and does not consider JAVA_HOME (pradeepkth)
PIG-1352: piggybank UPPER udf throws exception if argument is null
PIG-1560: Fix ant target checkstyle (gkesavan)
Release 0.7.0
INCOMPATIBLE CHANGES
PIG-1292: Interface Refinements (ashutoshc)
PIG-1259: ResourceFieldSchema.setSchema should not allow a bag field without a
Tuple as its only sub field (the tuple itself can have a schema with > 1
subfields) (pradeepkth)
PIG-1265: Change LoadMetadata and StoreMetadata to use Job instead of
Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
(pradeepkth)
PIG-1250: Make StoreFunc an abstract class and create a mirror interface
called StoreFuncInterface (pradeepkth)
PIG-1234: Unable to create input slice for har:// files (pradeepkth)
PIG-1200: Using TableInputFormat in HBaseStorage (zjffdu via pradeepkth)
PIG-1148: Move splitable logic from pig latin to InputFormat (zjffdu via
pradeepkth)
PIG-1141: Make streaming work with the new load-store interfaces (rding via
pradeepkth)
PIG-1110: Handle compressed file formats -- Gz, BZip with the new proposal
(rding via pradeepkth)
PIG-1088: change merge join and merge join indexer to work with new LoadFunc
interface (thejas via pradeepkth)
PIG-879: Pig should provide a way for input location string in load statement
to be passed as-is to the Loader (rding via pradeepkth)
PIG-966: load-store-redesign branch: change SampleLoader and subclasses to
work with new LoadFunc interface (thejas via pradeepkth)
PIG-1094: Fix unit tests corresponding to source changes so far (pradeepkth)
PIG-1090: Update sources to reflect recent changes in load-store interfaces
(pradeepkth)
PIG-1072: ReversibleLoadStoreFunc interface should be removed to enable
different load and store implementation classes to be used in a reversible
manner (rding via pradeepkth)
IMPROVEMENTS
PIG-1381: Need a way for Pig to take an alternative property file (daijy)
PIG-1330: Move pruned schema tracking logic from LoadFunc to core code (daijy)
PIG-1320: more documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1320: documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1325: Provide a way to exclude a testcase when running "ant test"
(pradeepkth)
PIG-1312: Make Pig work with hadoop security (daijy)
PIG-1308: Inifinite loop in JobClient when reading from BinStorage Message:
[org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 2] (pradeepkth)
PIG-1285: Allow SingleTupleBag to be serialized (dvryaboy)
PIG-1117: Pig reading hive columnar rc tables (gerritjvv via dvryaboy)
PIG-1287: Use hadoop-0.20.2 with pig 0.7.0 release (pradeepkth)
PIG-1257: PigStorage per the new load-store redesign should support splitting
of bzip files (pradeepkth)
PIG-1290: WeightedRangePartitioner should not check if input is empty if
quantile file is empty (pradeepkth)
PIG-1262: Additional findbugs and javac warnings (daijy)
PIG-1248: [piggybank] some useful String functions (dvryaboy)
PIG-1251: Move SortInfo calculation earlier in compilation (ashutoshc)
PIG-1233: NullPointerException in AVG (ankur via olgan)
PIG-1218: Use distributed cache to store samples (rding via pradeepkth)
PIG-1226: suuport for additional jar files (thejas via olgan)
PIG-1230: Streaming input in POJoinPackage should use nonspillable bag to
collect tuples (ashutoshc)
PIG-1224: Collected group should change to use new (internal) bag (ashutoshc)
PIG-1046: join algorithm specification is within double quotes (ashutoshc)
PIG-1209: Port POJoinPackage to proactively spill (ashutoshc)
PIG-1190: Handling of quoted strings in pig-latin/grunt commands (ashutoshc)
PIG-1214: Pig 0.6 Docs fixes (chandec via olgan)
PIG-977: exit status does not account for JOB_STATUS.TERMINATED (ashutoshc)
PIG-1192: Pig 0.6 Docs fixes (chandec via olgan)
PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan)
PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan)
PIG-1102: Collect number of spills per job (sriranjan via olgan)
PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs
(dvryaboy via pradeepkth)
PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan)
PIG-1163: Pig/Zebra 0.6.0 release (chandec via olgan)
PIG-1156: Add aliases to ExecJobs and PhysicalOperators (dvryaboy via gates)
PIG-1161: add missing license headers (dvryaboy via olgan)
PIG-760: Add a new PigStorageSchema load/store function that
store schemas for text files (dvryaboy via gates)
PIG-1106: FR join should not spill (ankit.modi via olgan)
PIG-1147: Zebra Docs for Pig 0.6.0 (chandec via olgan)
PIG-1129: Pig UDF doc: fieldsToRead function (chandec via olgan)
PIG-978: MQ docs update (chandec via olgan)
PIG-990: Provide a way to pin LogicalOperator Options (dvryaboy via gates)
PIG-1103: refactoring of commit tests (olgan)
PIG-1101: Allow arugment to limit to be long in addition to int (ashutoshc via
gates)
PIG-872: use distributed cache for the replicated data set in FR join
(sriranjan via olgan)
PIG-1053: Consider moving to Hadoop for local mode (ankit.modi via olgan)
PIG-1085: Pass JobConf and UDF specific configuration information to UDFs
(gates)
PIG-1173: pig cannot be built without an internet connection (jmhodges via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-1507: Full outer join fails while doing a filter on joined data (daijy)
PIG-1493: Column Pruner throw exception "inconsistent pruning" (daijy)
PIG-1484: BinStorage should support comma seperated path (daijy)
PIG-1443: DefaultTuple underestimate the memory footprint for string (daijy)
PIG-1446: OOME in a query having a bincond in the inner plan of a Foreach.(hashutosh)
PIG-1415: LoadFunc signature is not correct in LoadFunc.getSchema sometimes (daijy)
PIG-1403: Make Pig work with remote HDFS in secure mode (daijy)
PIG-1394: POCombinerPackage hold too much memory for InternalCachedBag (daijy)
PIG-1374: PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in the next statement (daijy)
PIG-1336: Optimize POStore serialized into JobConf (daijy)
PIG-1335: UDFFinder should find LoadFunc used by POCast (daijy)
PIG-1307: when we spill the DefaultDataBag we are not setting the sized changed flag to be true. (breed via daijy)
PIG-1298: Restore file traversal behavior to Pig loaders (rding)
PIG-1289: PIG Join fails while doing a filter on joined data (daijy)
PIG-1266: Show spill count on the pig console at the end of the job (sriranjan
via rding)
PIG-1296: Skewed join fail due to negative partition index (daijy)
PIG-1293: pig wrapper script tends to fail if pig is in the path and PIG_HOME
isn't set (aw via gates)
PIG-1272: Column pruner causes wrong results (daijy)
PIG-1275: empty bag in PigStorage read as null (daijy)
PIG-1252: Diamond splitter does not generate correct results when using
Multi-query optimization (rding)
PIG-1260: Param Subsitution results in parser error if there is no EOL after
last line in script (rding)
PIG-1238: Dump does not respect the schema (rding)
PIG-1261: PigStorageSchema broke after changes to ResourceSchema (dvryaboy via
daijy)
PIG-1053: Put pig.properties back into release distribution (gates).
PIG-1273: Skewed join throws error (rding)
PIG-1267: Problems with partition filter optimizer (rding)
PIG-1079: Modify merge join to use distributed cache to maintain the index
(rding)
PIG-1241: Accumulator is turned on when a map is used with a non-accumulative
UDF (yinghe vi olgan)
PIG-1215: Make Hadoop jobId more prominent in the client log (ashutoshc)
PIG-1216: New load store design does not allow Pig to validate inputs and
outputs up front (ashutoshc via pradeepkth)
PIG-1239: PigContext.connect() should not create a jobClient and jobClient
should be created on demand when needed (pradeepkth)
PIG-1169: Top-N queries produce incorrect results when a store statement is added between order by and limit statement (rding)
PIG-1131: Pig simple join does not work when it contains empty lines (ashutoshc)
PIG-834: incorrect plan when algebraic functions are nested (ashutoshc)
PIG-1217: Fix argToFuncMapping in Piggybank Top function (dvryaboy via gates)
PIG-1154: Local Mode fails when hadoop config directory is specified in
classpath (ankit.modi via gates)
PIG-1124: Unable to set Custom Job Name using the -Dmapred.job.name parameter (ashutoshc)
PIG-1213: Schema serialization is broken (pradeepkth)
PIG-1194: ERROR 2055: Received Error while processing the map plan (rding via ashutoshc)
PIG-1204: Pig hangs when joining two streaming relations in local mode
(rding)
PIG-1191: POCast throws exception for certain sequences of LOAD, FILTER,
FORACH (pradeepkth via gates)
PIG-1171: Top-N queries produce incorrect results when followed by a cross statement (rding via olgan)
PIG-1159: merge join right side table does not support comma seperated paths
(rding via olgan)
PIG-1158: pig command line -M option doesn't support table union correctly
(comma seperated paths) (rding via olgan)
PIG-1143: Poisson Sample Loader should compute the number of samples required
only once (sriranjan via olgan)
PIG-1157: Sucessive replicated joins do not generate Map Reduce plan and fails
due to OOM (rding via olgan)
PIG-1075: Error in Cogroup when key fields types don't match (rding via olgan)
PIG-973: type resolution inconsistency (rding via olgan)
PIG-1135: skewed join partitioner returns negative partition index (yinghe
via olgan)
PIG-1134: Skewed Join sampling job overwhelms the name node (sriranjan via
olgan)
PIG-1105: COUNT_STAR accumulate interface implementation cases failure
(sriranjan via olgan)
PIG-1118: expression with aggregate functions returning null, with accumulate
interface (yinghe via olgan)
PIG-1068: COGROUP fails with 'Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableText, recieved
org.apache.pig.impl.io.NullableTuple' (rding via gates)
PIG-1113: Diamond query optimization throws error in JOIN (rding via olgan)
PIG-1116: Remove redundant map-reduce job for merge join (pradeepkth)
PIG-1114: MultiQuery optimization throws error when merging 2 level spl (rding
via olgan)
PIG-1108: Incorrect map output key type in MultiQuery optimiza (rding via
olgan)
PIG-1022: optimizer pushes filter before the foreach that generates column
used by filter (daijy via gates)
PIG-1107: PigLineRecordReader bails out on an empty line for compressed data
(ankit.modi via olgan)
PIG-598: Parameter substitution ($PARAMETER) should not be performed in
comments (thejas via olgan)
PIG-1064: Behaviour of COGROUP with and without schema when using "*" operator
(pradeepkth)
PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi
via )
PIG-1086: Nested sort by * throw exception (rding via daijy)
PIG-1146: Inconsistent column pruning in LOUnion (daijy)
PIG-1176: Column Pruner issues in union of loader with and without schema
(daijy)
PIG-1184: PruneColumns optimization does not handle the case of foreach
flatten correctly if flattened bag is not used later (daijy)
PIG-1189: StoreFunc UDF should ship to the backend automatically without
"register" (daijy)
PIG-1212: LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null (daijy)
PIG-1255: Tiny code cleanup for serialization code for PigSplit (daijy)
PIG-613: Casting elements inside a tuple does not take effect (daijy)
Release 0.6.0
INCOMPATIBLE CHANGES
PIG-922: Logical optimizer: push up project (daijy)
IMPROVEMENTS
PIG-1084: Pig 0.6.0 Documentation improvements (chandec via olgan)
PIG-1089: Pig 0.6.0 Documentation (chandec via olgan)
PIG-958: Splitting output data on key field (ankur via pradeepkth)
PIG-1058: FINDBUGS: remaining "Correctness Warnings" (olgan)
PIG-1036: Fragment-replicate left outer join (ankit.modi via pradeepkth)
PIG-920: optimizing diamond queries (rding via pradeepkth)
PIG-1040: FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be (olgan)
PIG-1059: FINDBUGS: remaining Bad practice + Multithreaded correctness Warning (olgan)
PIG-953: Enable merge join in pig to work with loaders and store functions
which can internally index sorted data (pradeepkth)
PIG-1055: FINDBUGS: remaining "Dodgy Warnings" (olgan)
PIG-1052: FINDBUGS: remaining performance warningse(olgan)
PIG-1037: Converted sorted and distinct bags to use the new active spilling
paradigm (yinghe via gates)
PIG-1051: FINDBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (olgan)
PIG-1050: FINDBUGS: DLS_DEAD_LOCAL_STORE: Dead store to local variable (olgan)
PIG-1045: Integration with Hadoop 20 New API (rding via pradeepkth)
PIG-1043: FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class
(olgan)
PIG-1047: FINDBUGS: URF_UNREAD_FIELD: Unread field (olgan)
PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new
String(String) constructor (olgan)
PIG-984: Add map side grouping for data that is already collected when
it is read into the map (rding via gates)
PIG-1025: Add ability to set job priority from Pig Latin script (kevinweil via
gates)
PIG-1028: FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number
constructor; use static valueOf instead (olgan)
PIG-1012: FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in
serializable class (olgan)
PIG-1013: FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on
an array (olgan)
PIG-1011: FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't
define serialVersionUID (olgan)
PIG-1009: FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream (olgan)
PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan)
PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with
a lower case letter (olgan)
PIG-1023: FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL (olgan)
PIG-1019: added findbugs exclusion file (olgan)
PIG-983: PERFORMANCE: multi-query optimization on multiple group bys
following a join or cogroup (rding via pradeepkth)
PIG-975: Need a databag that does not register with SpillableMemoryManager and
spill data pro-actively (yinghe via olgan)
PIG-891: Fixing dfs statement for Pig (zjffdu via daijy
PIG-956: 10 minute commit tests (olgan)
PIG-948: [Usability] Relating pig script with MR jobs (ashutoshc via daijy)
PIG-960: Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage ( ankit.modi via daijy)
PIG-1020: Include an ant target to build pig.jar without hadoop libraries (daijy)
PIG-1033: javac warnings: deprecated hadoop APIs (daijy)
PIG-1041: javac warnings: cast, fallthrough, serial (daijy)
PIG-1042: javac warnings: unchecked (daijy)
PIG-1038: Optimize nested distinct/sort to use secondary key (daijy)
PIG-979: Acummulator Interface for UDFs (yinghe via daijy)
OPTIMIZATIONS
PIG-922: Logical optimizer: push up project (daijy)
BUG FIXES
PIG-1080: PigStorage may miss records when loading a file (rding via olgan)
PIG-1071: Support comma separated file/directory names in load statements
(rding via pradeepkth)
PIG-970: Changes to make HBase loader work with HBase 0.20 (vbarat and zjffdu
via gates)
PIG-1035: support for skewed outer join (sriranjan via pradeepkth)
PIG-1030: explain and dump not working with two UDFs inside inner plan of
foreach (rding via pradeepkth)
PIG-1048: inner join using 'skewed' produces multiple rows for keys with
single row in both input relations (sriranjan via gates)
PIG-1063: Pig does not call checkOutSpecs() on OutputFormat provided by
StoreFunc in the multistore case (pradeepkth)
PIG-746: Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag
should never be serialized (rding via pradeepkth)
PIG-1027: Number of bytes written are always zero in local mode (zjffdu via gates)
PIG-976: Multi-query optimization throws ClassCastException (rding via
pradeepkth)
PIG-858: Order By followed by "replicated" join fails while compiling MR-plan
from physical plan (ashutoshc via gates)
PIG-968: Fix findContainingJar to work properly when there is a + in the jar
path (tlipcon via gates)
PIG-738: Regexp passed from pigscript fails in UDF (pradeepkth)
PIG-942: Maps are not implicitly casted (pradeepkth)
PIG-513: Removed unecessary bounds check in DefaultTuple (ashutoshc via
gates)
PIG-951: Set parallelism explicitly to 1 for indexing job in merge join
(ashutoshc via gates)
PIG-592: schema inferred incorrectly (daijy)
PIG-989: Allow type merge between numerical type and non-numerical type (daijy)
PIG-894: order-by fails when input is empty (daijy)
PIG-995: Limit Optimizer throw exception "ERROR 2156: Error while fixing projections" (daijy)
PIG-1000: InternalCachedBag.java generates javac warning and findbug warning (yinghe via daijy)
PIG-921: Strange use case for Join which produces different results in local and map reduce mode (daijy)
PIG-1024: Script contains nested limit fail due to "LOLimit does not support multiple outputs" (daijy)
PIG-644: Duplicate column names in foreach do not throw parser error (daijy)
PIG-927: null should be handled consistently in Join (daijy)
PIG-790: Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) (daijy)
PIG-1001: Generate more meaningful error message when one input file does not exist (daijy)
PIG-1060: MultiQuery optimization throws error for multi-level splits (rding via daijy)
PIG-1128: column pruning causing failure when foreach has user-specified
schema (daijy)
PIG-1127: Logical operator should contains individual copy of schema object
(daijy)
PIG-1133: UDFContext should be made available to LoadFunc.bindTo (daijy)
PIG-1132: Column Pruner issues in dealing with unprunable loader (daijy)
PIG-1142: Got NullPointerException merge join with pruning (daijy)
PIG-1155: Need to make sure existing loaders work "as is" (daijy)
PIG-1144: set default_parallelism construct does not set the number of
reducers correctly (daijy)
PIG-1165: Signature of loader does not set correctly for order by (daijy)
PIG-761: ERROR 2086 on simple JOIN (daijy)
PIG-1172: PushDownForeachFlatten shall not push ForEach below Join if the
flattened fields is used in Join (daijy)
PIG-1180: Piggybank should compile even if we only have
"pig-withouthadoop.jar" but no "pig.jar" in the pig home directory (daijy)
PIG-1185: Data bags do not close spill files after using iterator to read
tuples (yinghe via daijy)
PIG-1186: Pig do not take values in "pig-cluster-hadoop-site.xml" (daijy)
PIG-1193: Secondary sort issue on nested desc sort (daijy)
PIG-1195: POSort should take care of sort order (daijy)
PIG-1210: fieldsToRead send the same fields more than once in some cases (daijy)
PIG-1231: DefaultDataBagIterator.hasNext() should be idempotent in all cases
(daijy)
Release 0.5.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-1039: documentation update (chandec via olgan)
OPTIMIZATIONS
BUG FIXES
PIG-963: Join in local mode matches null keys (pradeepkth)
PIG-660: Integration with Hadoop 20 (sms via olgan)
Release 0.4.0 - 2009-09-26
INCOMPATIBLE CHANGES
PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart
(olgan)
PIG-734: Changed maps to only take strings as keys (gates)
IMPROVEMENTS
PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan)
PIG-578: join ... outer, ... outer semantics are a no-ops, should produce
corresponding null values (pradeepkth)
PIG-936: making dump and PigDump independent from Tuple.toString (daijy)
PIG-890: Create a sampler interface and improve the skewed join sampler (sriranjan via daijy)
PIG-922: Logical optimizer: push up project part 1 (daijy)
PIG-812: COUNT(*) does not work (breed)
PIG-923: Allow specifying log file location through pig.properties (dvryaboy via daijy)
PIG-926: Merge-Join phase 2 (ashutoshc via pradeepkth)
PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth)
PIG-893: Added string -> integer, long, float, and double casts (zjffdu via gates)
PIG-833: Added Zebra, new columnar storage mechanism for HDFS (rangadi plus many others via gates)
PIG-697: Proposed improvements to pig's optimizer, Phase5 (daijy)
PIG-895: Default parallel for Pig (daijy)
PIG-820: Change RandomSampleLoader to take a LoadFunc instead of extending
BinStorage. Added new Samplable interface for loaders to implement
allowing them to be used by RandomSampleLoader (ashutoshc via gates)
PIG-832: Make import list configurable (daijy)
PIG-697: Proposed improvements to pig's optimizer (sms)
PIG-753: Allow UDFs with no parameters (zjffdu via gates)
PIG-765: jdiff for pig ( gkesavan
OPTIMIZATIONS
PIG-792: skew join implementation (sriranjan via olgan)
BUG FIXES
PIG-964: Handling null in skewed join (sriranjan via olgan)
PIG-962: Skewed join creates 3 map reduce jobs (sriranjan via olgan)
PIG-957: Tutorial is broken with 0.4 branch and trunk (pradeepkth)
PIG-955: Skewed join produces invalid results (yinghe via olgan)
PIG-954: Skewed join fails when pig.skewedjoin.reduce.memusage is not
configured(yinghe via olgan)
PIG-882: log level not propogated to loggers - duplicate message (daijy)
PIG-943: Pig crash when it cannot get counter from hadoop (daijy)
PIG-935: Skewed join throws an exception when used with map keys(sriranjan
via pradeepkth)
PIG-934: Merge join implementation currently does not seek to right point
on the right side input based on the offset provided by the index
(ashutoshc via pradeepkth)
PIG-925: Fix join in local mode (daijy)
PIG-913: Error in Pig script when grouping on chararray column (daijy)
PIG-907: Provide multiple version of HashFNV (Piggybank) (daijy)
PIG-905: TOKENIZE throws exception on null data (daijy)
PIG-901: InputSplit (SliceWrapper) created by Pig is big in size due to
serialized PigContext (pradeepkth)
PIG-882: log level not propogated to loggers (daijy)
PIG-880: Order by is borken with complex fields (sms)
PIG-773: Empty complex constants (empty bag, empty tuple and empty map)
should be supported (ashutoshc via sms)
PIG-695: Pig should not fail when error logs cannot be created (sms)
PIG-878: Pig is returning too many blocks in the input split. (arunc via gates)
PIG-888: Pig do not pass udf to the backend in some situation (daijy)
PIG-728: All backend error messages must be logged to preserve the
original error messages (sms)
PIG-877: Push up filter does not account for added columns in foreach
(sms)
PIG-883: udf import list does not send to the backend (daijy)
PIG-881: Pig should ship load udfs to the backend (daijy)
PIG-876: limit changes order of order-by to ascending (daijy)
PIG-851: Map type used as return type in UDFs not recognized at all times
(zjffdu via sms)
PIG-861: POJoinPackage lose tuple in large dataset (daijy)
PIG-797: Limit with ORDER BY producing wrong results (daijy)
PIG-850: Dump produce wrong result while "store into" is ok (daijy)
PIG-852: pig -version or pig -help returns exit code of 1 (milindb via
olgan)
PIG-849: Local engine loses records in splits (hagleitn via olgan)
PIG-939: Fix checkstyle ivy configuration ( gkesavan )
Release 0.3.0 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-817: documentation update (chandec via olgan)
PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates)
PIG-831: Turned off reporting of records and bytes written for mutli-store
queries as the returned results are confusing and wrong. (gates)
PIG-813: documentation updates (chandec via olgan)
PIG-825: PIG_HADOOP_VERSION should be set to 18 (dvryaboy via gates)
PIG-795: support for SAMPLE command (ericg via olgan)
PIG-619: Create one InputSplit even when the input file is zero length
so that hadoop runs maps and creates output for the next
job (gates)
PIG-697: Proposed improvements to pig's optimizer (sms)
PIG-700: To automate the pig patch test process (gkesavan via sms)
PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu
via gates)
PIG-652: Adapt changes in store interface to multi-query changes (hagleitn
via gates)
PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create
empty bags (pradeepkth)
PIG-741: Allow limit to be nested in a foreach.
PIG-627: multiquery support phase 3 (hagleitn and Richard Ding via olgan)
PIG-743: To implement clover (gkesavan)
PIG-701: Implement IVY for resolving pig dependencies (gkesavan)
PIG-626: Add access to hadoop counters (shubhamc via gates)
PIG-627: multiquery support phase 1 and phase 2 (hagleitn and Richard Ding via pradeepkth)
BUG FIXES
PIG-846: MultiQuery optimization in some cases has an issue when there is a
split in the map plan (pradeepkth)
PIG-835: Multiquery optimization does not handle the case where the map keys
in the split plans have different key types (tuple and non tuple key type)
(pradeepkth)
PIG-839: incorrect return codes on failure when using -f or -e flags (hagleitn
via sms)
PIG-796: support conversion from numeric types to chararray (ashutoshc
via pradeepkth)
PIG-564: problem with parameter substitution and special charachters (olgan)
PIG-802: PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan)
PIG-816: PigStorage() does not accept Unicode characters in its contructor (pradeepkth)
PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan)
PIG-819: run -param -param; is a valid grunt command (milindb via olgan)
PIG-656: Use of eval or any other keyword in the package hierarchy of a UDF causes
parse exception (milindb via sms)
PIG-814: Make Binstorage more robust when data contains record markers (pradeepkth)
PIG-811: Globs with "?" in the pattern are broken in local mode (hagleitn via
olgan)
PIG-810: Fixed NPE in PigStats (gates)
PIG-804: problem with lineage with double map redirection (pradeepkth)
PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs
"FileSystem closed" error on large input (pradeepkth)
PIG-693: Parameter to UDF which is an alias returned in another UDF in nested
foreach causes incorrect results (thejas via sms)
PIG-725: javadoc: warning - Multiple sources of package comments found for
package "org.apache.commons.logging" (gkesavan via sms)
PIG-745: Add DataType.toString() to force basic types to chararray, useful
for UDFs that want to handle all simple types as strings (ciemo via gates)
PIG-514: COUNT returns no results as a result of two filter statements in
FOREACH (pradeepkth)
PIG-789: Fix dump and illustrate to work with new multi-query feature
(hagleitn via gates)
PIG-774: Pig does not handle Chinese characters (in both the parameter subsitution
using -param_file or embedded in the Pig script) correctly (daijy)
PIG-800: Fix distinct and order in local mode to not go into an infinite loop
(gates)
PIG-806: to remove author tags in the pig source code (sms
PIG-799: Unit tests on windows are failing after multiquery commit (daijy)
PIG-781: Error reporting for failed MR jobs (hagleitn via olgan)
Release 0.2.0
INCOMPATIBLE CHANGES
PIG-157: Add types and rework execution pipeline (gates)
PIG-458: integration with Hadoop 18 (olgan)
NEW FEATURES
PIG-139: command line editing (daijy via olgan)
PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates)
PIG-535: added rmf command
PIG-704 Added ALIASES command that shows all currently defined ALIASES.
Changed semantics of DEFINE to define last used alias if no argument is
given (ericg via gates)
PIG-713 Added alias completion as part of tab completion in grunt (ericg
via gates)
IMPROVEMENTS
PIG-270: proper line number for parse errors (daijy via olgan)
PIG-367: convinience function for UDFs to name schema
PIG-443: Illustrate for the Types branch (shubhamc via olgan)
PIG-599: Added buffering to BufferedPositionedInputStream (gates)
PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth
via olgan)
PIG-628: misc performance improvements (pradeepkth via olgan)
PIG-589: error handling, phase 1-2 (sms via olgan)
PIG-590: error handling, phase 3 (sms)
PIG-591: error handling, phase 4 (sms)
PIG-545: PERFORMANCE: Sampler for order bys does not produce a good
distribution (pradeepkth)
PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan)
PIG-636: Use lightweight bag implementations which do not register with
SpillableMemoryManager with Combiner (pradeepkth)
PIG-563: support for multiple combiner invocations (pradeepkth via olgan)
PIG-465: performance improvement - removing keys from the value (pradeepkth
via olgan)
PIG-450: PERFORMANCE: Distinct should make use of combiner to remove
duplicate values from keys. (gates)
PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth
via gates)
BUG FIXES
PIG-294: string comparator unit tests (sms via pi_song)
PIG-258: cleaning up directories on failure (daijy via olgan)
PIG-363: fix for describe to produce schema name
PIG-368: making JobConf available to Load/Store UDFs
PIG-311: cross is broken
PIG-369: support for filter UDFs
PIG-375: support for implicit split
PIG-301: fix for order by descending
PIG-378: fix for GENERATE + LIMIT
PIG-362: don't push limit above generate with flatten
PIG-381: bincond does not handle null data
PIG-382: bincond throws typecast exception
PIG-352: java.lang.ClassCastException when invalid field is accessed
PIG-329: TestStoreOld, 2 unit tests were broken
PIG-353: parsing of complex types
PIG-392: error handling with multiple MRjobs
PIG-397: code defaults to single reducer
PIG-373: unconnected load causes problem,
PIG-413: problem with float sum
PIG-398: Expressions not allowed inside foreach (sms via olgan)
PIG-418: divide by 0 problem
PIG-402: order by with user comparator (shravanmn via olgan)
PIG-415: problem with comparators (shravanmn via olgan)
PIG-422: cross is broken (shravanmn via olgan)
PIG-407: need to clone operators (pradeepkth via olgan)
PIG-428: TypeCastInserter does not replace projects in inner plans
correctly (pradeepkth vi olgan)
PIG-421: error with complex nested plan (sms via olgan)
PIG-429: Self join wth implicit split has the join output in wrong order
(pradeepkth via olgan)
PIG-434: short-circuit AND and OR (pradeepkth viia olgan)
PIG-333: allowing no parethesis with single column alias with flatten (sms
via olgan)
PIG-426: Adding result of two UDFs gives a syntax error
PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan)
PIG-436: alias is lost when single column is flattened (pradeepkth via
olgan)
PIG-364: Limit return incorrect records when we use multiple reducer
(daijy via olgan)
PIG-439: disallow alias renaming (pradeepkth via olgan)
PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth
via olgan)
PIG-442: Disambiguated alias after a foreach flatten is not accessible a
couple of statements after the foreach (sms via olgan)
PIG-424: nested foreach with flatten and agg gives an error (sms via
olgan)
PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD
connection is fully established (olgan)
PIG-430: Projections in nested filter and inside foreach do not work (sms
via olgan)
PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries
(shravanmn via olgan)
PIG-444: job.jar is left behined (pradeepkth via olgan)
PIG-447: improved error messages (pradeepkth via olgan)
PIG-448: explain broken after load with types (pradeepkth via olgan)
PIG-380: invalid schema for databag constant (sms via olgan)
PIG-451: If an field is part of group followed by flatten, then referring
to it causes a parse error (pradeepkth via olgan)
PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan)
PIG-459: increased sleep time before checking for job progress
PIG-462: LIMIT N should create one output file with N rows (shravanmn via
olgan)
PIG-376: set job name (olgan)
PIG-463: POCast changes (pradeepkth via olgan)
PIG-427: casting input to UDFs
PIG-437: as in alias names causing problems (sms via olgan)
PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan)
PIG-470: TextLoader should produce bytearrays (sms via olgan)
PIG-335: lineage (sms vi olgan)
PIG-464: bag schema definition (pradeepkth via olgan)
PIG-457: report 100% on successful jobs only (shravanmn via olgan)
PIG-471: ignoring status errors from hadoop (pradeepkth via olgan)
PIG-489: (*) processing (sms via olgan)
PIG-475: missing heartbeats (shravanmn via olgan)
PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan)
PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan)
PIG-501: Make branches/types work under cygwin (daijy via olgan)
PIG-504: cleanup illustrate not to produce cn= (shubhamc via olgan)
PIG-469: make sure that describe says "int" not "integer" (sms via olgan)
PIG-495: projecting of bags only give 1 field (olgan)
PIG-500: Load Func for POCast is not being set in some cases (sms via
olgan)
PIG-499: parser issue with as (sms via olgan)
PIG-507: permission error not reported (pradeepkth via olgan)
PIG-508: problem with double joins (pradeepkth via olgan)
PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan)
PIG-505: working with map elements (sms via olgan)
PIG-517: load functiin with parameters does not work with cast (pradeepkth
via olgan)
PIG-525: make sure cast for udf parameters works (olgan)
PIG-512: Expressions in foreach lead to errors (sms via olgan)
PIG-528: use UDF return in schema computation (sms via olgan)
PIG-527: allow PigStorage to write out complex output (sms via olgan)
PIG-537: Failure in Hadoop map collect stage due to type mismatch in the
keys used in cogroup (pradeepkth vi olgan)
PIG-538: support for null constants (pradeepkth via olgan)
PIG-385: more null handling (pradeepkth via olgan)
PIG-546: FilterFunc calls empty constructor when it should be calling
parameterized constructor (sms via olgan)
PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via
olgan)
PIG-501: make unit tests run under windows (daijy via olgan)
PIG-543: Restore local mode to truly run locally instead of use map
reduce. (shubhamc via gates)
PIG-556: Changed FindQuantiles to report progress. Fixed issue with null
reporter being passed to EvalFuncs. (gates)
PIG-6: Add load support from hbase (hustlmsp via gates)
PIG-522: make negation work (pradeepkth via olgan)
PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple
error (pradeepkth via olgan)
PIG-572 A PigServer.registerScript() method, which lets a client
programmatically register a Pig Script. (shubhamc via gates)
PIG-570: problems with handling bzip data (breed via olgan)
PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan)
PIG-623: Fix spelling errors in output messages (tomwhite via sms)
PIG-622: Include pig executable in distribution (tomwhite via sms)
PIG-615: Wrong number of jobs with limit (shravanmn via sms)
PIG-635: POCast.java has incorrect formatting (sms)
PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext()
gives a null pointer exception (pradeepkth)
PIG-632: Improved error message for binary operators (sms)
PIG-636: Performance improvement: Use lightweight bag implementations which do not
register with SpillableMemoryManager with Combiner (pradeepkth)
PIG-631: 4 Unit test failures on Windows (daijy)
PIG-645: Streaming is broken with the latest trunk (pradeepkth)
PIG-646: Distinct UDF should report progress (sms)
PIG-647: memory sized passed on pig command line does not get propagated
to JobConf (sms)
PIG-648: BinStorage fails when it finds markers unexpectedly in the data
(pradeepkth)
PIG-649: RandomSampleLoader does not handle skipping correctly in
getNext() (pradeepkth)
PIG-560: UTFDataFormatException (encoded string too long) is thrown when
storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms)
PIG-642: Limit after FRJ causes problems (daijy)
PIG-637: Limit broken after order by in the local mode (shubhamc via
olgan)
PIG-553: EvalFunc.finish() not getting called (shravanmn via sms)
PIG-654: Optimize build.xml (daijy)
PIG-574: allowing to run scripts from within grunt shell (hagleitn via
olgan)
PIG-665: Map key type not correctly set (for use when key is null) when
map plan does not have localrearrange (pradeepkth)
PIG-590: error handling on the backend (sms via olgan)
PIG-590: error handling on the backend (sms)
PIG-658: Data type long : When 'L' or 'l' is included with data
(123L or 123l) load produces null value. Also the case with Float (thejas
via sms)
PIG-591: Error handling phase four (sms via pradeepkth)
PIG-664: Semantics of * is not consistent (sms)
PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms)
PIG-655: Comparison of schemas of bincond operands is flawed (sms via
pradeepkth)
PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth
via sms)
PIG-577: outer join query looses name information (sms via pradeepkth)
PIG-690: UNION doesn't work in the latest code (pradeepkth via sms)
PIG-544: Utf8StorageConverter.java does not always produce NULLs when data
is malformed(thejas via sms)
PIG-532: Casting a field removes its alias.(thejas via sms)
PIG-705: Pig should display a better error message when backend error
messages cannot be parsed (sms)
PIG-650: pig should look for and use the pig specific
'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the
HOD case (sms)
PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan)
PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan)
PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via
olgan)
PIG-703: user documentation (chandec vi olgan)
PIG-711: Implement checkstyle for pig (gkesavan via olgan)
PIG-715: doc updates (chandec vi olgan)
PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates)
PIG-692: When running a job from a script, use the name of that script as
the default name for the job (vzaliva via gates)
PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan)
PIG-720: further doc cleanup (gkesavan via olgan)
Release 0.1.1 - 2008-12-04
INCOMPATIBLE CHANGES
NEW FEATURES
IMPROVEMENTS
PIG-253: integration with hadoop-18
BUG FIXES
PIG-342: Fix DistinctDataBag to recalculate size after it has spilled.
(bdimcheff via gates)
Release 0.1.0 - 2008-09-11
INCOMPATIBLE CHANGES
PIG-123: requires escape of '\' in chars and string
NEW FEATURES
PIG-20 Added custom comparator functions for order by (phunt via gates)
PIG-94: Streaming implementation (arunc via olgan)
PIG-58: parameter substitution
PIG-55: added custom splitter (groves via olgan)
PIG-59: Add a new ILLUSTRATE command (shubhamc via gates)
PIG-256: Added variable argument support for UDFs (pi_song)
IMPROVEMENTS:
PIG-8 added binary comparator (olgan)
PIG-11 Add capability to search for jar file to register. (antmagna via olgan)
PIG-7: Added use of combiner in some restricted cases. (gates)
PIG-47: Added methods to DataMap to provide access to its content
PIG-30: Rewrote DataBags to better handle decisions of when to spill to
disk and to spill more intelligently. (gates)
PIG-12: Added time stamps to log4j messages (phunt via gates)
PIG-44: Added adaptive decision of the number of records to hold in memory
before spilling (utkarsh)
PIG-56: Made DataBag implement Iterable. (groves via gates)
PIG-39: created more efficient version of read (spullara via olgan)
PIG-32: ABstraction layer (olgan)
PIG-83: Change everything except grunt and Main (PigServer on down) to use
common logging abstraction instead of log4j. By default in grunt, log4j
still used as logging layer. Also converted all System.out/err.println
statements to use logging instead. (francisoud via gates)
PIG-13: adding version to the system (joa23 via olgan)
PIG-113: Make explain output more understandable (pi_song via gates)
PIG-120: Support map reduce in local mode. To do this user needs to
specify execution type as mapreduce and cluster name as local (joa23 via gates)
PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates)
PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates)
BUG FIXES
PIG-24 Files that were incorrectly placed under test/reports have been
removed. ant clean now cleans test/reports. (milindb via gates)
PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@)
PIG-23 Made pig work with java 1.5. (milindb via gates)
PIG-17 integrated with Hadoop 0.15 (olgan@)
PIG-33 Help was commented out - uncommented (olgan)
PIG-31: second half of concurrent mode problem addressed (olgan)
PIG-14: added heartbeat functionality (olgan)
PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release
PIG-29: fixed bag factory to be properly initialized (utkarsh)
PIG-43: fixed problem where using the combiner prevented a pig alias
from being evaluated more than once. (gates)
PIG-45: Fixed pig.pl to not assume hodrc file is named the same as
cluster name (gates)
PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples
instead of Tuples, causing Reducer to crash in some cases.
PIG-41: Added patterns to svn:ignore
PIG-51: Fixed combiner in the presence of flattening
PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the
comparator function instead of Class.forName. (gates)
PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@)
PIG-77: Added eclipse specific files to svn:ignore
PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates)
PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates)
PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arunc
via olgan)
PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default
path. Also fix it to not die if pigclient.conf is missing. (craigm via
gates)
PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill
files when they are done spilling (contributions by craigm, breed, and
gates, committed by gates)
PIG-95: Remove System.exit() statements from inside pig (joa23 via gates)
PIG-65: convert tabs to spaces (groves via olgan)
PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when
more than one bag is involved (gates)
PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf
reference. (francisoud via gates)
PIG-80: In a number of places stack trace information was being lost by an
exception being caught, and a different exception then thrown. All those
locations have been changed so that the new exception now wraps the old.
(francisoud via gates)
PIG-84: Converted printStackTrace calls to calls to the logger.
(francisoud via gates)
PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates)
PIG-99: Fix to make unit tests not run out of memory. (francisoud via
gates)
PIG-107: enabled several tests. (francisoud via olgan)
PIG-46: abort processing on error for non-interactive mode (olston via
olgan)
PIG-109: improved exception handling (oae via olgan)
PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can
be run w/o access to a hadoop cluster. (xuzh via gates)
PIG-68: improvements to build.xml (joa23 via olgan)
PIG-110: Replaced code accidently merged out in PIG-32 fix that handled
flattening the combiner case. (gates and oae)
PIG-213: Remove non-static references to logger from data bags and tuples,
as it causes significant overhead (vgeschel via gates)
PIG-284: target for building source jar (oae via olgan)