blob: 5ba52fd58e7a3b6c2aa71f4d4fbae4a4c2d35fc6 [file] [log] [blame]
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Pig Change Log
Release 0.16.0 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4719: Documentation for PIG-4704: Customizable Error Handling for Storers in Pig (daijy)
PIG-4714: Improve logging across multiple components with callerId (daijy)
PIG-4885: Turn off union optimizer if there is PARALLEL clause in union in Tez (rohini)
PIG-4894: Add API for StoreFunc to specify if they are write safe from two different vertices (rohini)
PIG-4884: Tez needs to use DistinctCombiner.Combine (rohini)
PIG-4874: Remove schema tuple reference overhead for replicate join hashmap (rohini)
PIG-4879: Pull latest version of joda-time (rohini)
PIG-4526: Make setting up the build environment easier (nielsbasjes via rohini)
PIG-4641: Print the instance of Object without using toString() (sandyridgeracer via rohini)
PIG-4455: Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter (zjffdu via rohini)
PIG-4866: Do not serialize PigContext in configuration to the backend (rohini)
PIG-4547: Update Jython version to 2.7.0 (erwaman via daijy)
PIG-4862: POProject slow by creating StackTrace repeatedly (knoguchi)
PIG-4853: Fetch inputs before starting outputs (rohini)
PIG-4847: POPartialAgg processing and spill improvements (rohini)
PIG-4840: Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups (rohini)
PIG-4843: Turn off combiner in reducer vertex for Tez if bags are in combine plan (rohini)
PIG-4796: Authenticate with Kerberos using a keytab file (nielsbasjes via daijy)
PIG-4817: Bump HTTP Logparser to version 2.4 (nielsbasjes via daijy)
PIG-4811: Upgrade groovy library to address MethodClosure vulnerability (daijy)
PIG-4803: Improve performance of regex-based builtin functions (eyal via daijy)
PIG-4802: Autoparallelism should estimate less when there is combiner (rohini)
PIG-4761: Add more information to front end error messages (eyal via daijy)
PIG-4792: Do not add java and sun system properties to jobconf (rohini)
PIG-4787: Log JSONLoader exception while parsing records (rohini)
PIG-4763: Insufficient check for the number of arguments in runpigmix.pl (sekikn via rohini)
PIG-4411: Support for vertex level configuration like speculative execution (rohini)
PIG-4775: Better default values for shuffle bytes per reducer (rohini)
PIG-4753: Pigmix should have option to delete outputs after completing the tests (mitdesai via rohini)
PIG-4744: Honor tez.staging-dir setting in tez-site.xml (rohini via daijy)
PIG-4742: Document Pig's Register Artifact Command added in PIG-4417 (akshayrai09 via daijy)
PIG-4417: Pig's register command should support automatic fetching of jars from repo (akshayrai09 via daijy)
PIG-4713: Document Bloom UDF (gliptak via daijy)
PIG-3251: Bzip2TextInputFormat requires double the memory of maximum record size (knoguchi)
PIG-4704: Customizable Error Handling for Storers in Pig (siddhimehta via daijy)
PIG-4717: Update Apache HTTPD LogParser to latest version (nielsbasjes via daijy)
PIG-4468: Pig's jackson version conflicts with that of hadoop 2.6.0 or newer (zjffdu via daijy)
PIG-4708: Upgrade joda-time to 2.8 (rohini)
PIG-4697: Pig needs to serialize only part of the udfcontext for each vertex (rohini)
PIG-4702: Load once for sampling and partitioning in order by for certain LoadFuncs (rohini)
PIG-4699: Print Job stats information in Tez like mapreduce (rohini)
PIG-4554: Compress pig.script before encoding (sandyridgeracer via rohini)
PIG-4670: Embedded Python scripts still parse line by line (rohini)
PIG-4663: HBaseStorage should allow the MaxResultsPerColumnFamily limit to avoid memory or scan timeout issues (pmazak via rohini)
PIG-4673: Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences
of search keys with replacement values (murali.k.h.rao@gmail.com via daijy)
PIG-4674: TOMAP should infer schema (daijy)
PIG-4676: Upgrade Hive to 1.2.1 (daijy)
PIG-4574: Eliminate identity vertex for order by and skewed join right after LOAD (rohini)
PIG-4365: TOP udf should implement Accumulator interface (eyal via rohini)
PIG-4570: Allow AvroStorage to use a class for the schema (pmazak via daijy)
PIG-4405: Adding 'map[]' support to mock/Storage (nielsbasjes via daijy)
PIG-4638: Allow TOMAP to accept dynamically sized input (nielsbasjes via daijy)
PIG-4639: Add better parser for Apache HTTPD access log (nielsbasjes via daijy)
BUG FIXES
PIG-4821: Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez (rohini)
PIG-4734: TOMAP schema inferring breaks some scripts in type checking for bincond (daijy)
PIG-4786: CROSS will not work correctly with Grace Parallelism (daijy)
PIG-3227: SearchEngineExtractor does not work for bing (dannyant via daijy)
PIG-4902: Fix UT failures on 0.16 branch: TestTezGraceParallelism, TestPigScriptParser (daijy)
PIG-4909: PigStorage incompatible with commons-cli-1.3 (knoguchi)
PIG-4908: JythonFunction refers to Oozie launcher script absolute path (rohini)
PIG-4905: Input of empty dir does not produce empty output file in Tez (rohini)
PIG-4576: Nightly test HCat_DDL_2 fails with TDE ON (nmaheshwari via daijy)
PIG-4873: InputSplit.getLocations return null and result a NPE in Pig (daijy)
PIG-4895: User UDFs relying on mapreduce.job.maps broken in Tez (rohini)
PIG-4883: MapKeyType of splitter was set wrongly in specific multiquery case (kellyzly via rohini)
PIG-4887: Parameter substitution skipped with glob on register (knoguchi)
PIG-4889: Replacing backslash fails as lexical error (knoguchi)
PIG-4880: Overlapping of parameter substitution names inside&outside a macro fails with NPE (knoguchi)
PIG-4881: TestBuiltin.testUniqueID failing on hadoop-1.x (knoguchi)
PIG-4888: Line number off when reporting syntax error inside a macro (knoguchi)
PIG-3772: Syntax error when casting an inner schema of a bag and line break involved (ssvinarchukhorton via knoguchi)
PIG-4892: removing /tmp/output before UT (daijy)
PIG-4882: Remove hardcoded groovy.grape.report.downloads=true from DownloadResolver (erwaman via daijy)
PIG-4581: thread safe issue in NodeIdGenerator (rcatherinot via rohini)
PIG-4878: Fix issues from PIG-4847 (rohini)
PIG-4877: LogFormat parser fails test (nielsbasjes via daijy)
PIG-4860: Loading data using OrcStorage() accepts only default FileSystem path (beriaanirudh via rohini)
PIG-4868: Low values for bytes.per.reducer configured by user not honored in Tez for inputs (rohini)
PIG-4869: Removing unwanted configuration in Tez broke ConfiguredFailoverProxyProvider (rohini)
PIG-4867: -stop_on_failure does not work with Tez (rohini)
PIG-4844: Tez AM runs out of memory when vertex has high number of outputs (rohini)
PIG-3906: ant site errors out (nielsbasjes via daijy)
PIG-4851: Null not padded when input has less fields than declared schema for some loader (rohini)
PIG-4850: Registered jars do not use submit replication (rdblue via cheolsoo)
PIG-4845: Parallel instantiation of classes in Tez cause tasks to fail (rohini)
PIG-4841: Inline-op with schema declaration fails with syntax error (knoguchi)
PIG-4832: Fix TestPrumeColumn NPE failure (kellyzly via daijy)
PIG-4833 TestBuiltin.testURIWithCurlyBrace in TEZ failing after PIG-4819 (knoguchi)
PIG-4819: RANDOM() udf can lead to missing or redundant records (knoguchi)
PIG-4816: Read a null scalar causing a Tez failure (daijy)
PIG-4818: Single quote inside comment in GENERATE is not being ignored (knoguchi)
PIG-4814: AvroStorage does not take namenode HA as part of schema file url (daijy)
PIG-4812: Register Groovy UDF with relative path does not work (daijy)
PIG-4806: UDFContext can be reset in the middle during Tez input and output initialization (rohini)
PIG-4808: PluckTuple overwrites regex if used more than once in the same script (eyal via daijy)
PIG-4801: Provide backward compatibility with mapreduce mapred.task settings (rohini)
PIG-4759: Fix Classresolution_1 e2e failure (rohini)
PIG-4800: EvalFunc.getCacheFiles() fails for different namenode (rohini)
PIG-4790: Join after union fail due to UnionOptimizer (rohini)
PIG-4686: Backend code should not call AvroStorageUtils.getPaths (mitdesai via rohini)
PIG-4795: Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream (emopers via daijy)
PIG-4690: Union with self replicate join will fail in Tez (rohini)
PIG-4791: PORelationToExprProject filters records instead of returning emptybag in nested foreach after union (rohini)
PIG-4779: testBZ2Concatenation[pig.bzip.use.hadoop.inputformat = true] failing due to successful read (knoguchi)
PIG-4587: Applying isFirstReduceOfKey for Skewed left outer join skips records (rohini)
PIG-4782: OutOfMemoryError: GC overhead limit exceeded with POPartialAgg (rohini)
PIG-4737: Check and fix clone implementation for all classes extending PhysicalOperator (rohini)
PIG-4770: OOM with POPartialAgg in some cases (rohini)
PIG-4773: [Pig on Tez] Secondary key descending sort in nested foreach after union does ascending instead (rohini)
PIG-4774: Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input (rohini)
PIG-4757: Job stats on successfully read/output records wrong with multiple inputs/outputs (rohini)
PIG-4769: UnionOptimizer hits errors when merging vertex group into split (rohini)
PIG-4768: EvalFunc reporter is null in Tez (rohini)
PIG-4760: TezDAGStats.convertToHadoopCounters is not used, but impose MR counter limit (daijy)
PIG-4755: Typo in runpigmix script (mitdesai via daijy)
PIG-4736: Removing empty keys in UDFContext broke one LoadFunc (rohini)
PIG-4733: Avoid NullPointerException in JVMReuseImpl for builtin classes (rohini)
PIG-4722: [Pig on Tez] NPE while running Combiner (rohini)
PIG-4730: [Pig on Tez] Total parallelism estimation does not account load parallelism (rohini)
PIG-4689: CSV Writes incorrect header if two CSV files are created in one script (nielsbasjes via daijy)
PIG-4727: Incorrect types table for AVG in docs (nsmith via daijy)
PIG-4725: Typo in FrontendException messages "Incompatable" (nsmith via daijy)
PIG-4721: IsEmpty documentation error (nsmith via daijy)
PIG-4712: [Pig on Tez] NPE in Bloom UDF after Union (rohini)
PIG-4707: [Pig on Tez] Streaming job hangs with pig.exec.mapPartAgg=true (rohini)
PIG-4703: TezOperator.stores shall not ship to backend (daijy)
PIG-4696: Empty map returned by a streaming_python udf wrongly contains a null key (cheolsoo)
PIG-4691: [Pig on Tez] Support for whitelisting storefuncs for union optimization (rohini)
PIG-3957: Refactor out resetting input key in TezDagBuilder (rohini)
PIG-4688: Limit followed by POPartialAgg can give empty or partial results in Tez (rohini)
PIG-4635: NPE while running pig script in tez mode (daijy)
PIG-4683: Nested order is broken after PIG-3591 in some cases (daijy)
PIG-4679: Performance degradation due to InputSizeReducerEstimator since PIG-3754 (daijy)
PIG-4315: MergeJoin or Split followed by order by gives NPE in Tez (rohini)
PIG-4654: Reduce tez memory.reserve-fraction and clear spillables for better memory utilization (rohini)
PIG-4628: Pig 0.14 job with order by fails in mapreduce mode with Oozie (knoguchi)
PIG-4651: Optimize NullablePartitionWritable serialization for skewed join (rohini)
PIG-4627: [Pig on Tez] Self join does not handle null values correctly (rohini)
PIG-4644: PORelationToExprProject.clone() is broken (erwaman via rohini)
PIG-4650: ant mvn-deploy target is broken (daijy)
PIG-4649: [Pig on Tez] Union followed by HCatStorer misses some data (rohini)
PIG-4636: Occurred spelled incorrectly in error message for Launcher and POMergeCogroup (stevenmz via daijy)
PIG-4624: Error on ORC empty file without schema (daijy)
PIG-3622: Allow casting bytearray fields to bytearray type (redisliu via daijy)
PIG-4618: When use tez as the engine , set pig.user.cache.enabled=true do not take effect (wisgood via rohini)
PIG-4533: Document error: Pig does support concatenated gz file (xhudik via daijy)
PIG-4578: ToDateISO should support optional ' ' space variant used by JDBC (michaelthoward via daijy)
Release 0.15.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4560: Pig 0.15.0 Documentation (daijy)
PIG-4429: Add Pig alias information and Pig script to the DAG view in Tez UI (daijy)
PIG-3994: Implement getting backend exception for Tez (rohini)
PIG-4563: Upgrade to released Tez 0.7.0 (daijy)
PIG-4525: Clarify "Scalar has more than one row in the output." (Niels Basjes via gates)
PIG-4511: Add columns to prune from PluckTuple (jbabcock via cheolsoo)
PIG-4434: Improve auto-parallelism for tez (daijy)
PIG-4495: Better multi-query planning in case of multiple edges (rohini)
PIG-3294: Allow Pig use Hive UDFs (daijy)
PIG-4476: Fix logging in AvroStorage* classes and SchemaTuple class (rdsr via rohini)
PIG-4458: Support UDFs in a FOREACH Before a Merge Join (wattsinabox via daijy)
PIG-4454: Upgrade tez to 0.6.0 (daijy)
PIG-4451: Log partition and predicate filter pushdown information and fix optimizer looping (rohini)
PIG-4430: Pig should support reading log4j.properties file from classpath as well (rdsr via daijy)
PIG-4407: Allow specifying a replication factor for jarcache (jira.shegalov via rohini)
PIG-4401: Add pattern matching to PluckTuple (cheolsoo)
PIG-2692: Make the Pig unit faciliities more generalizable and update javadocs (razsapps via daijy)
PIG-4379: Make RoundRobinPartitioner public (daijy)
PIG-4378: Better way to fix tez local mode test hanging (daijy)
PIG-4358: Add test cases for utf8 chinese in Pig (nmaheshwari via daijy)
PIG-4370: HBaseStorage should support delete markers (bridiver via daijy)
PIG-4360: HBaseStorage should support setting the timestamp field (bridiver via daijy)
PIG-4337: Split Types and MultiQuery e2e tests into multiple groups (rohini)
PIG-4333: Split BigData tests into multiple groups (rohini)
BUG FIXES
PIG-4592: Pig 0.15 stopped working with Hadoop 1.x (daijy)
PIG-4580: Fix TestTezAutoParallelism.testSkewedJoinIncreaseParallelism test failure (daijy)
PIG-4571: TestPigRunner.testGetHadoopCounters fail on Windows (daijy)
PIG-4541: Skewed full outer join does not return records if any relation is empty. Outer join does not
return any record if left relation is empty (daijy)
PIG-4564: Pig can deadlock in POPartialAgg if there is a bag (rohini via daijy)
PIG-4569: Fix e2e test Rank_1 failure (rohini)
PIG-4490: MIN/MAX builtin UDFs return wrong results when accumulating for strings (xplenty via rohini)
PIG-4418: NullPointerException in JVMReuseImpl (rohini)
PIG-4562: Typo in DataType.toDateTime (daijy)
PIG-4559: Fix several new tez e2e test failures (daijy)
PIG-4506: binstorage fails to write biginteger (ssavvides via daijy)
PIG-4556: Local mode is broken in some case by PIG-4247 (daijy)
PIG-4523: Tez engine should use tez config rather than mr config whenever possible (daijy)
PIG-4452: Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error (daijy)
PIG-4543: TestEvalPipelineLocal.testRankWithEmptyReduce fail on Hadoop 1 (daijy)
PIG-4544: Upgrade Hbase to 0.98.12 (daijy)
PIG-4481: e2e tests ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce different result on Windows (daijy)
PIG-4496: Fix CBZip2InputStream to close underlying stream (petersla via daijy)
PIG-4528: Fix a typo in src/docs/src/documentation/content/xdocs/basic.xml (namusyaka via daijy)
PIG-4532: Pig Documentation contains typo for AvroStorage (fredericschmaljohann via daijy)
PIG-4377: Skewed outer join produce wrong result in some cases (daijy)
PIG-4538: Pig script fail with CNF in follow up MR job (daijy)
PIG-4537: Fix unit test failure introduced by TEZ-2392: TestCollectedGroup, TestLimitVariable, TestMapSideCogroup, etc (daijy)
PIG-4530: StackOverflow in TestMultiQueryLocal running under hadoop20 (nielsbasjes via rohini)
PIG-4529: Pig on tez hit counter limit imposed by MR (daijy)
PIG-4524: Pig Minicluster unit tests broken by TEZ-2333 (daijy)
PIG-4527: NON-ASCII Characters in Javadoc break 'ant docs' (nielsbasjes via daijy)
PIG-4494: Pig's htrace version conflicts with that of hadoop 2.6.0 (daijy)
PIG-4519: Correct link to Contribute page (gliptak via daijy)
PIG-4514: pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change (thejas)
PIG-4503: [Pig on Tez] NPE in UnionOptimizer with multiple levels of union (rohini)
PIG-4509: [Pig on Tez] Unassigned applications not killed on shutdown (rohini)
PIG-4508: [Pig on Tez] PigProcessor check for commit only on MROutput (rohini)
PIG-4505: [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx (rohini)
PIG-4502: E2E tests build fail with udfs compile (nmaheshwari via daijy)
PIG-4498: AvroStorage in Piggbank does not handle bad records and fails (viraj via rohini)
PIG-4499: mvn-build miss tez classes in pig-h2.jar (daijy)
PIG-4488: Pig on tez mask tez.queue.name (daijy)
PIG-4497: [Pig on Tez] NPE for null scalar (rohini)
PIG-4493: Pig on Tez gives wrong results if Union is followed by Split (rohini)
PIG-4491: Streaming Python Bytearray Bugs (jeremykarn via daijy)
PIG-4487: Pig on Tez gives wrong success message on failure in case of multiple outputs (rohini)
PIG-4483: Pig on Tez output statistics shows storing to same directory twice for union (rohini)
PIG-4480: Pig script failure on Tez with split and order by due to missing sample collection (rohini)
PIG-4484: Ant pull jetty-6.1.26.zip on some platform (daijy)
PIG-4479: Pig script with union within nested splits followed by join failed on Tez (rohini)
PIG-4457: Error is thrown by JobStats.getOutputSize() when storing to a MySql table (rohini)
PIG-4475: Keys in AvroMapWrapper are not proper Pig types (rdsr via daijy)
PIG-4478: TestCSVExcelStorage fails with jdk8 (rohini)
PIG-4474: Increasing intermediate parallelism has issue with default parallelism (rohini)
PIG-4465: Pig streaming ship fails for relative paths on Tez (rohini)
PIG-4461: Use benchmarks for Windows Pig e2e tests (nmaheshwari via daijy)
PIG-4463: AvroMapWrapper still leaks Avro data types and AvroStorageDataConversionUtilities do not handle
Pig maps (rdsr via daijy)
PIG-4460: TestBuiltIn testValueListOutputSchemaComplexType and testValueSetOutputSchemaComplexType tests
create bags whose inner schema is not a tuple (erwaman via daijy)
PIG-4448: AvroMapWrapper leaks Avro data types when the map values are complex avro records (rdsr via daijy)
PIG-4453: Remove test-tez-local target (daijy)
PIG-4443: Write inputsplits in Tez to disk if the size is huge and option to compress pig input splits (rohini)
PIG-4447: Pig Cannot handle nullable values (arrays and records) in avro records (rdsr via daijy)
PIG-4444: Fix unit test failure TestTezAutoParallelism (daijy)
PIG-4445: VALUELIST and VALUESET outputSchema does not match actual schema of data returned when map value schema
is complex (erwaman via daijy)
PIG-4442: Eliminate redundant RPC call to get file information in HPath (cnauroth via daijy)
PIG-4440: Some code samples in documentation use Unicode left/right single quotes, which cause a
parse failure (cnauroth via daijy)
PIG-4264: Port TestAvroStorage to tez local mode (daijy)
PIG-4437: Fix tez unit test failure TestJoinSmoke, TestSkewedJoin (daijy)
PIG-4432: Built-in VALUELIST and VALUESET UDFs do not preserve the schema when the map value type is
a complex type (erwaman via daijy)
PIG-4408: Merge join should support replicated join as a predecessor (bridiver via daijy)
PIG-4389: Flag to run selected test suites in e2e tests (daijy)
PIG-4385: testDefaultBootup fails because it cannot find "pig.properties" (mkudlej via daijy)
PIG-4397: CSVExcelStorage incorrect output if last field value is null (daijy)
PIG-4431: ReadToEndLoader does not close the record reader for the last input split (rdsr via daijy)
PIG-4426: RowNumber(simple) Rank not producing correct results (knoguchi)
PIG-4433: Loading bigdecimal in nested tuple does not work (kpriceyahoo via daijy)
PIG-4410: Fix testRankWithEmptyReduce in tez mode (daijy)
PIG-4392: RANK BY fails when default_parallel is greater than cardinality of field being ranked by (daijy)
PIG-4403: Combining -Dpig.additional.jars.uris with -useHCatalog breaks due to combination
with colon instead of comma (ovlaere via daijy)
PIG-4402: JavaScript UDF example in the doc is broken (cheolsoo)
PIG-4394: Fix Split_9 and Union_5 e2e failures (rohini)
PIG-4391: Fix TestPigStats test failure (rohini)
PIG-4387: Honor yarn settings in tez-site.xml and optimize dag status fetch (rohini)
PIG-4352: Port local mode tests to Tez - TestUnionOnSchema (daijy)
PIG-4359: Port local mode tests to Tez - part4 (daijy)
PIG-4340: PigStorage fails parsing empty map (daijy)
PIG-4366: Port local mode tests to Tez - part5 (daijy)
PIG-4381: PIG grunt shell DEFINE commands fails when it spans multiple lines (daijy)
PIG-4384: TezLauncher thread should be deamon thread (zjffdu via daijy)
PIG-4376: NullPointerException accessing a field of an invalid bag from a nested foreach
(kspringborn via daijy)
PIG-4355: Piggybank: XPath cant handle namespace in xpath, nor can it return more than one match
(cavanaug via daijy)
PIG-4371: Duplicate snappy.version in libraries.properties (daijy)
PIG-4368: Port local mode tests to Tez - TestLoadStoreFuncLifeCycle (daijy)
PIG-4367: Port local mode tests to Tez - TestMultiQueryBasic (daijy)
PIG-4339: e2e test framework assumes default exectype as mapred (rohini)
PIG-2949: JsonLoader only reads arrays of objects (eyal via daijy)
PIG-4213: CSVExcelStorage not quoting texts containing \r (CR) when storing (alfonso.nishikawa via daijy)
PIG-2647: Split Combining drops splits with empty getLocations() (tmwoodruff via daijy)
PIG-4294: Enable unit test "TestNestedForeach" for spark (kellyzly via rohini)
PIG-4282: Enable unit test "TestForEachNestedPlan" for spark (kellyzly via rohini)
PIG-4361: Fix perl script problem in TestStreaming.java (kellyzly via xuefu)
PIG-4354: Port local mode tests to Tez - part3 (daijy)
PIG-4338: Fix test failures with JDK8 (rohini)
PIG-4351: TestPigRunner.simpleTest2 fail on trunk (daijy)
PIG-4350: Port local mode tests to Tez - part2 (daijy)
PIG-4326: AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records (mprim via daijy)
PIG-4345: e2e test "RubyUDFs_13" fails because of the different result of "group a all" in different engines like "spark", "mapreduce" (kellyzly via rohini)
PIG-4332: Remove redundant jars packaged into pig-withouthadoop.jar for hadoop 2 (rohini)
PIG-4331: update README, '-x' option in usage to include tez (thejas via daijy)
PIG-4327: Schema of map with value that has an alias can't be parsed again (mprim via daijy)
PIG-4330: Regression test for PIG-3584 - AvroStorage does not correctly translate arrays of strings (brocknoland via daijy)
PIG-3615: Update the way that JsonLoader/JsonStorage deal with BigDecimal (tyro89 via daijy)
PIG-4329: Fetch optimization should be disabled when limit is not pushed up (lbendig via cheolsoo)
PIG-3413: JsonLoader fails the pig job in case of malformed json input (eyal via daijy)
PIG-4247: S3 properties are not picked up from core-site.xml in local mode (cheolsoo)
PIG-4242: For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining of every line
(holdfenytolvaj via daijy)
Release 0.14.1 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
BUG FIXES
PIG-4409: fs.defaultFS is overwritten in JobConf by replicated join at runtime (cheolsoo)
PIG-4404: LOAD with HBaseStorage on secure cluster is broken in Tez (rohini)
PIG-4375: ObjectCache should use ProcessorContext.getObjectRegistry() (rohini)
PIG-4334: PigProcessor does not set pig.datetime.default.tz (rohini)
PIG-4342: Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT (daijy)
Release 0.14.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-4321: Documentation for 0.14 (daijy)
PIG-4328: Upgrade Hive to 0.14 (daijy)
PIG-4318: Make PigConfiguration naming consistent (rohini)
PIG-4316: Port TestHBaseStorage to tez local mode (rohini)
PIG-4224: Upload Tez payload history string to timeline server (daijy)
PIG-3977: Get TezStats working for Oozie (rohini)
PIG-3979: group all performance, garbage collection, and incremental aggregation (rohini)
PIG-4253: Add a UniqueID UDF (daijy)
PIG-4160: Provide a way to pass local jars in pig.additional.jars when using a remote
url for a script (acoliver via daijy)
PIG-4246: HBaseStorage should implement getShipFiles (rohini)
PIG-3456: Reduce threadlocal conf access in backend for each record (rohini)
PIG-3861: duplicate jars get added to distributed cache (chitnis via rohini)
PIG-4039: New interface for resetting static variables for jvm reuse (rohini)
PIG-3870: STRSPLITTOBAG UDF (cryptoe via daijy)
PIG-4080: Add Preprocessor commands and more to the black/whitelisting feature (prkommireddi via daijy)
PIG-4162: Intermediate reducer parallelism in Tez should be higher (rohini)
PIG-4186: Fix e2e run against new build of pig and some enhancements (rohini)
PIG-3838: Organize tez code into subpackages (rohini)
PIG-4069: Limit reduce task should start as soon as one map task finishes (rohini)
PIG-4141: Ship UDF/LoadFunc/StoreFunc dependent jar automatically (daijy)
PIG-4146: Create a target to run mr and tez unit test in one shot (daijy)
PIG-4144: Make pigunit.PigTest work in tez mode (daijy)
PIG-4128: New logical optimizer rule: ConstantCalculator (daijy)
PIG-4124: Command for Python streaming udf should be configurable (cheolsoo)
PIG-4114: Add Native operator to tez (daijy)
PIG-4117: Implement merge cogroup in Tez (daijy)
PIG-4119: Add message at end of each testcase with timestamp in Pig system tests (nmaheshwari via daijy)
PIG-4008: Pig code change to enable Tez Local mode (airbots via daijy)
PIG-4091: Predicate pushdown for ORC (rohini via daijy)
PIG-4077: Some fixes and e2e test for OrcStorage (rohini)
PIG-4054: Do not create job.jar when submitting job (daijy)
PIG-4047: Break up pig withouthadoop and fat jar (daijy)
PIG-4062: Add ascending order option to builtin TOP function (raj171 via cheolsoo)
PIG-3558: ORC support for Pig (daijy)
PIG-2122: Parameter Substitution doesn't work in the Grunt shell (daijy)
PIG-4031: Provide Counter aggregation for Tez (daijy)
PIG-4028: add a flag to control the ivy resolve/retrieve output (gkesavan via daijy)
PIG-4015: Provide a way to disable auto-parallism in tez (daijy)
PIG-3846: Implement automatic reducer parallelism (daijy)
PIG-3939: SPRINTF function to format strings using a printf-style template (mrflip via cheolsoo)
PIG-3970: Merge Tez branch into trunk (daijy)
OPTIMIZATIONS
PIG-4657: [Pig on Tez] Optimize GroupBy and Distinct key comparison (rohini)
BUG FIXES
PIG-4335: Pig release tarball miss tez classes (daijy)
PIG-4325: StackOverflow when spilling InternalCachedBag (daijy)
PIG-4324: Remove jsch-LICENSE.txt (daijy)
PIG-4267: ToDate has incorrect timezone offsets (bridiver via daijy)
PIG-4319: Make LoadPredicatePushdown InterfaceAudience.Private till PIG-4093 (rohini)
PIG-4312: TestStreamingUDF tez mode leave orphan process on Windows (daijy)
PIG-4314: BigData_5 hang on some machine (daijy)
PIG-4299: SpillableMemoryManager assumes tenured heap incorrectly (prkommireddi via daijy)
PIG-4298: Descending order-by is broken in some cases when key is bytearrays (cheolsoo)
PIG-4263: Move tez local mode unit tests to a separate target (daijy)
PIG-4257: Fix several e2e tests on secure cluster (daijy)
PIG-4261: Skip shipping local resources in tez local mode (daijy)
PIG-4182: e2e tests Scripting_[1-12] fail on Windows (daijy)
PIG-4259: Fix few issues related to Union, CROSS and auto parallelism in Tez (rohini)
PIG-4250: Fix Security Risks found by Coverity (daijy)
PIG-4258: Fix several e2e tests on Windows (daijy)
PIG-4256: Fix StreamingPythonUDFs e2e test failure on Windows (daijy)
PIG-4166: Collected group drops last record when combined with merge join (bridiver via daijy)
PIG-2495: Using merge JOIN from a HBaseStorage produces an error (bridiver via daijy)
PIG-4235: Fix unit test failures on Windows (daijy)
PIG-4245: 1-1 edge vertices should use same jvm opts (rohini)
PIG-4252: Tez container reuse fail when using script udf (daijy)
PIG-4241: Auto local mode mistakenly converts large jobs to local mode when using with Hive tables (cheolsoo)
PIG-4184: UDF backward compatibility issue after POStatus.STATUS_NULL refactory (daijy)
PIG-4238: Property 'pig.job.converted.fetch' should be unset when fetch finishes (lbendig)
PIG-4151: Pig Cannot Write Empty Maps to HBase (daijy)
PIG-4181: Cannot launch tez e2e test on Windows (daijy)
PIG-2834: MultiStorage requires unused constructor argument (daijy)
PIG-4230: Documentation fix: first nested foreach example is incomplete (lbendig via daijy)
PIG-4199: Mapreduce ACLs should be translated to Tez ACLs (rohini)
PIG-4227: Streaming Python UDF handles bag outputs incorrectly (cheolsoo)
PIG-4219: When parsing a schema, pig drops tuple inside of Bag if it contains only one field (lbendig via daijy)
PIG-4226: Upgrade Tez to 0.5.1 (daijy)
PIG-4220: MapReduce-based Rank failing with NPE due to missing Counters (knoguchi)
PIG-3985: Multiquery execution of RANK with RANK BY causes NPE (rohini)
PIG-4218: Pig OrcStorage fail to load a map with null key (daijy)
PIG-4164: After Pig job finish, Pig client spend too much time retry to connect to AM (daijy)
PIG-4212: Allow LIMIT of 0 for variableLimit (constant 0 is already allowed) (knoguchi)
PIG-4196: Auto ship udf jar is broken (daijy)
PIG-4214: Fix unit test fail TestMRJobStats (daijy)
PIG-4217: Fix documentation in BuildBloom (praveenr019 via daijy)
PIG-4215: Fix unit test failure TestParamSubPreproc and TestMacroExpansion (daijy)
PIG-4175: PIG CROSS operation follow by STORE produces non-deterministic results each run (daijy)
PIG-4202: Reset UDFContext state before OutputCommitter invocations in Tez (rohini)
PIG-4205: e2e test property-check does not check all prerequisites (kellyzly via daijy)
PIG-4180: e2e test Native_3 fail on Hadoop 2 (daijy)
PIG-4178: HCatDDL_[1-3] fail on Windows (daijy)
PIG-4046: PiggyBank DBStorage DATETIME should use setTimestamp with java.sql.Timestamp (sinchii via daijy)
PIG-4050: HadoopShims.getTaskReports() can cause OOM with Hadoop 2 (rohini)
PIG-4176: Fix tez e2e test Bloom_[1-3] (daijy)
PIG-4195: Support loading char/varchar data in OrcStorage (daijy)
PIG-4201: Native e2e tests fail when run against old version of pig (rohini)
PIG-4197: Fix typo in Job Stats header: MinMapTIme => MinMapTime (jmartell7 via daijy)
PIG-4194: ReadToEndLoader does not call setConf on pigSplit in initializeReader (shadanan via rohini)
PIG-4187: Fix Orc e2e tests (daijy)
PIG-4177: BigData_1 fail after PIG-4149 (daijy)
PIG-3507: Pig fails to run in local mode on a Kerberos enabled Hadoop cluster (kellyzly via rohini)
PIG-4171: Streaming UDF fails when direct fetch optimization is enabled (cheolsoo)
PIG-4170: Multiquery with different type of key gives wrong result (daijy)
PIG-4104: Accumulator UDF throws OOM in Tez (rohini)
PIG-4169: NPE in ConstantCalculator (cheolsoo)
PIG-4161: check for latest Hive snapshot dependencies (daijy)
PIG-4102: Adding e2e tests and several improvements for Orc predicate pushdown (daijy)
PIG-4156: [PATCH] fix NPE when running scripts stored on hdfs:// (acoliver via daijy)
PIG-4159: TestGroupConstParallelTez and TestJobSubmissionTez should be excluded in Hadoop 20 unit tests (cheolsoo)
PIG-4154: ScriptState#setScript(File) does not close resources (lars_francke via daijy)
PIG-4155: Quitting grunt shell using CTRL-D character throws exception (abhishek.agarwal via daijy)
PIG-4157: Pig compilation failure due to HIVE-7208 (daijy)
PIG-4158: TestAssert is broken in trunk (cheolsoo)
PIG-4143: Port more mini cluster tests to Tez - part 7 (daijy)
PIG-4149: Rounding issue in FindQuantiles (daijy)
PIG-4145: Port local mode tests to Tez - part1 (daijy)
PIG-4076: Fix pom file (daijy)
PIG-4140: VertexManagerEvent.getUserPayload returns ReadOnlyBuffer after TEZ-1449 (daijy)
PIG-4136: No special handling jythonjar/jrubyjar in e2e tests after PIG-4047 (daijy)
PIG-4137: Fix hadoopversion 23 compilation due to TEZ-1469 (daijy)
PIG-4135: Fetch optimization should be disabled if plan contains no limit (cheolsoo)
PIG-4061: Make Streaming UDF work in Tez (hotfix PIG-4061-3.patch)
PIG-4134: TEZ-1449 broke the build (knoguchi)
PIG-4132: TEZ-1246 and TEZ-1390 broke a build (knoguchi)
PIG-4129: Pig -Dhadoopversion=23 compile fail after TEZ-1426 (daijy)
PIG-4127: Build failure due to TEZ-1132 and TEZ-1416 (lbendig)
PIG-4125: TEZ-1347 broke the build
PIG-4123: Increase memory for TezMiniCluster (daijy)
PIG-4122: Fix hadoopversion 23 compilation due to TEZ-1194 (daijy)
PIG-4061: Make Streaming UDF work in Tez (daijy)
PIG-4118: Fix hadoopversion 23 compilation due to TEZ-1237/TEZ-1407 (daijy)
PIG-4109: register local jar fail on Windows when Pig script is remote (daijy)
PIG-4116: Update Pig doc about Hadoop 2 Streaming Python UDF support (cheolsoo)
PIG-4112: NPE in packager when union + group-by followed by replicated join in Tez (rohini via cheolsoo)
PIG-4113: TEZ-1386 breaks hadoop 2 compilation in trunk (cheolsoo)
PIG-4110: TEZ-1382 breaks Hadoop 2 compilation (cheolsoo)
PIG-4105: Fix TestAvroStorage with ibm jdk (fang fang chen via daijy)
PIG-4108: Pig -Dhadoopversion=23 compile fail after TEZ-1317 (daijy)
PIG-4086: Fix Orc e2e tests for tez (daijy)
PIG-4101: Lower tez.am.task.max.failed.attempts to 2 from 4 in Tez mini cluster (cheolsoo)
PIG-4099: "ant copypom" failed with "could not find file $PIG_HOME/ivy/pig.pom to copy" (fang fang chen via cheolsoo)
PIG-4098: Vertex Location Hint api update after TEZ-1041 (jeagles via cheolsoo)
PIG-4088: TEZ-1346 breaks hadoop 2 compilation in trunk (cheolsoo)
PIG-4089: TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after
PIG-4079 in Hadoop 1 (cheolsoo)
PIG-4085: TEZ-1303 broke hadoop 2 compilation in trunk (cheolsoo)
PIG-4082: TEZ-1278 broke hadoop 2 compilation in trunk (cheolsoo)
PIG-4079: Parallel clause is not honored in local mode (cheolsoo)
PIG-4078: Port more mini cluster tests to Tez - part 6 (rohini)
PIG-4071: Fix TestStore.testSetStoreSchema, TestParamSubPreproc.testGruntWithParamSub,
TestJobSubmission.testReducerNumEstimation (daijy)
PIG-4074: mapreduce.client.submit.file.replication is not honored in cached files (cheolsoo)
PIG-4052: TestJobControlSleep, TestInvokerSpeed are unreliable (daijy)
PIG-4053: TestMRCompiler succeeded with sun jdk 1.6 while failed with sun jdk 1.7 (daijy)
PIG-3982: ant target test-tez should depend on jackson-pig-3039-test-download (daijy)
PIG-4064: Fix tez auto parallelism test failures (daijy)
PIG-4075: TEZ-1311 broke Hadoop2 compilation (cheolsoo)
PIG-4070: Change from TezJobConfig to TezRuntimeConfiguration (rohini)
PIG-4068: ObjectCache causes ClassCastException (cheolsoo)
PIG-4067: TestAllLoader in piggybank fails with new hive version (rohini)
PIG-4065: Fix failing unit tests in Tez (rohini)
PIG-4060: Refactor TezJob and TezLauncher (cheolsoo)
PIG-2689: JsonStorage fails to find schema when LimitAdjuster runs (rohini)
PIG-4056: Remove PhysicalOperator.setAlias (rohini)
PIG-4058: Use single config in Tez for input and output (rohini)
PIG-3886: UdfDistributedCache_1 fails in tez branch (cheolsoo)
PIG-4055 Build broke after TEZ-1130 API rename (knoguchi)
PIG-3935: Port more mini cluster tests to Tez - part 5 (rohini)
PIG-3984: PigServer.shutdown removes the tez resource folder (daijy via rohini)
PIG-4048: TEZ-692 has a incompatible API change removing TezSession (rohini)
PIG-4044: Pig should use avro-mapred-hadoop2.jar instead of avro-mapred.jar when compile with hadoop 2 (daijy)
PIG-4043: JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks (cheolsoo)
PIG-4036: Fix e2e failures - JobManagement_3, CmdErrors_3 and BigData_4 (daijy)
PIG-4041: org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper compiling error (jeagles via cheolsoo)
PIG-4038: SPRINTF should return NULL on any NULL input (mrflip via daijy)
PIG-4025: TestLoadFuncWrapper, TestLoadFuncMetaDataWrapper,TestStoreFuncWrapper
and TestStoreFuncMetadataWrapper fail on IBM JDK (ahireanup via daijy)
PIG-4024: TestPigStreamingUDF and TestPigStreaming fail on IBM JDK (ahireanup via daijy)
PIG-4023: BigDec/Int sort is broken (ahireanup via daijy)
PIG-4003: Error is thrown by JobStats.getOutputSize() when storing to a Hive table (cheolsoo)
PIG-4035: Fix CollectedGroup e2e tests for tez (daijy)
PIG-4034: Exclude TestTezAutoParallelism when -Dhadoopversion=20 (cheolsoo)
PIG-4033: Fix MergeSparseJoin e2e tests on tez (daijy)
PIG-3478: Make StreamingUDF work for Hadoop 2 (lbendig via daijy)
PIG-4032: BloomFilter fails with s3 path in Hadoop 2.4 (cheolsoo)
PIG-4018: Schema validation fails with UNION ONSCHEMA (daijy)
PIG-4022: Fix tez e2e test SkewedJoin_6 (daijy)
PIG-4001: POPartialAgg aggregates too aggressively when multiple values aggregated (tmwoodruff via cheolsoo)
PIG-4027: Always check for latest Tez snapshot dependencies (lbendig via cheolsoo)
PIG-4020: Fix tez e2e tests MapPartialAgg_[2-4], StreamingPerformance_[6-7] (daijy)
PIG-4019: Compilation broken after TEZ-1169 (daijy)
PIG-4014: Fix Rank e2e test failures on tez (daijy)
PIG-4013: Order by multiple column fail on Tez (daijy)
PIG-3983: TestGrunt.testKeepGoigFailed fail on tez mode (daijy)
PIG-3959: Skewed join followed by replicated join fails in Tez (cheolsoo)
PIG-3995: Tez unit tests shouldn't run when -Dhadoopversion=20 (cheolsoo)
PIG-3986: PigSplit to support multiple split class (tongjie via cheolsoo)
PIG-3988: PigStorage: CommandLineParser is not thread safe (tmwoodruff via cheolsoo)
PIG-2409: Pig show wrong tracking URL for hadoop 2 (lbendig via rohini)
PIG-3978: Container reuse does not across PigServer (daijy)
PIG-3974: E2E test data generation fails in cluster mode (lbendig via cheolsoo)
PIG-3969: Javascript UDF fails if no output schema is defined (lbendig via cheolsoo)
PIG-3971: Pig on tez fails to run in Oozie in secure cluster (rohini)
PIG-3968: OperatorPlan.serialVersionUID is not defined (daijy)
Release 0.13.1 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
OPTIMIZATIONS
BUG FIXES
PIG-4139: pig query throws error java.lang.NoSuchFieldException: jobsInProgress on MRv1 (satish via cheolsoo)
PIG-4133: Need to update the default $HCAT_HOME dir in the PIG script (mnarayan via cheolsoo)
PIG-4106: Describe shouldn't trigger execution in batch mode (cheolsoo)
Release 0.13.0
INCOMPATIBLE CHANGES
PIG-3996: Delete zebra from svn (cheolsoo)
PIG-3898: Refactor PPNL for non-MR execution engine (cheolsoo)
PIG-3485: Remove CastUtils.bytesToMap(byte[] b) method from LoadCaster interface (cheolsoo)
PIG-3419: Pluggable Execution Engine (achalsoni81 via cheolsoo)
PIG-2207: Support custom counters for aggregating warnings from different udfs (aniket486)
IMPROVEMENTS
PIG-3892: Pig distribution for hadoop 2 (daijy)
PIG-4006: Make the interval of DAGStatus report configurable (cheolsoo)
PIG-3999: Document PIG-3388 (lbendig via cheolsoo)
PIG-3954: Document use of user level jar cache (aniket486)
PIG-3752: Fix e2e Parallel test for Windows (daijy)
PIG-3966: Document variable input arguments of UDFs (lbendig via aniket486)
PIG-3963: Documentation for BagToString UDF (mrflip via daijy)
PIG-3929: pig.temp.dir should allow to substitute vars as hadoop configuration does (aniket486)
PIG-3913: Pig should use job's jobClient wherever possible (fixes local mode counters) (aniket486)
PIG-3941: Piggybank's Over UDF returns an output schema with named fields (mrflip via cheolsoo)
PIG-3545: Seperate validation rules from optimizer (daijy)
PIG-3745: Document auto local mode for pig (aniket486)
PIG-3932: Document ROUND_TO builtin UDF (mrflip via cheolsoo)
PIG-3926: ROUND_TO function: rounds double/float to fixed number of decimal places (mrflip via cheolsoo)
PIG-3901: Organize the Pig properties file and document all properties (mrflip via cheolsoo)
PIG-3867: Added hadoop home to build classpath for build pig with unit test on windows (Sergey Svinarchuk via gates)
PIG-3914: Change TaskContext to abstract class (cheolsoo)
PIG-3672: Pig should not check for hardcoded file system implementations (rohini)
PIG-3860: Refactor PigStatusReporter and PigLogger for non-MR execution engine (cheolsoo)
PIG-3865: Remodel the XMLLoader to work to be faster and more maintainable (aseldawy via daijy)
PIG-3737: Bundle dependent jars in distribution in %PIG_HOME%/lib folder (daijy)
PIG-3771: Piggybank Avrostorage makes a lot of namenode calls in the backend (rohini)
PIG-3851: Upgrade jline to 2.11 (daijy)
PIG-3884: Move multi store counters to PigStatsUtil from MRPigStatsUtil (rohini)
PIG-3591: Refactor POPackage to separate MR specific code from packaging (mwagner via cheolsoo)
PIG-3449: Move JobCreationException to org.apache.pig.backend.hadoop.executionengine (cheolsoo)
PIG-3765: Ability to disable Pig commands and operators (prkommireddi)
PIG-3731: Ability to specify local-mode specific configuration (useful for local/auto-local mode) (aniket486)
PIG-3793: Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData (prkommireddi)
PIG-3778: Log list of running jobs along with progress (rohini)
PIG-3675: Documentation for AccumuloStorage (elserj via daijy)
PIG-3648: Make the sample size for RandomSampleLoader configurable (cheolsoo)
PIG-259: allow store to overwrite existing directroy (nezihyigitbasi via daijy)
PIG-2672: Optimize the use of DistributedCache (aniket486)
PIG-3238: Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters
and inserts another set of characters at a specified starting point (nezihyigitbasi via daijy)
PIG-3299: Provide support for LazyOutputFormat to avoid creating empty files (lbendig via daijy)
PIG-3642: Direct HDFS access for small jobs (fetch) (lbendig via cheolsoo)
PIG-3730: Performance issue in SelfSpillBag (rajesh.balamohan via rohini)
PIG-3654: Add class cache to PigContext (tmwoodruff via daijy)
PIG-3463: Pig should use hadoop local mode for small jobs (aniket486)
PIG-3573: Provide StoreFunc and LoadFunc for Accumulo (elserj via daijy)
PIG-3653: Add support for pre-deployed jars (tmwoodruff via daijy)
PIG-3645: Move FileLocalizer.setR() calls to unit tests (cheolsoo)
PIG-3637: PigCombiner creating log spam (rohini)
PIG-3632: Add option to configure cacheBlocks in HBaseStorage (rohini)
PIG-3619: Provide XPath function (Saad Patel via gates)
PIG-3590: remove PartitionFilterOptimizer from trunk (aniket486)
PIG-3580: MIN, MAX and AVG functions for BigDecimal and BigInteger (harichinnan via cheolsoo)
PIG-3569: SUM function for BigDecimal and BigInteger (harichinnan via rohini)
PIG-3505: Make AvroStorage sync interval take default from io.file.buffer.size (rohini)
PIG-3563: support adding archives to the distributed cache (jdonofrio via cheolsoo)
PIG-3388: No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage (lbendig via cheolsoo)
PIG-3522: Remove shock from pig (daijy)
PIG-3295: Casting from bytearray failing after Union even when each field is from a single Loader (knoguchi)
PIG-3444: CONCAT with 2+ input parameters fail (lbendig via daijy)
PIG-3117: A debug mode in which pig does not delete temporary files (ihadanny via cheolsoo)
PIG-3484: Make the size of pig.script property configurable (cheolsoo)
OPTIMIZATIONS
PIG-3882: Multiquery off mode execution is not done in batch and very inefficient (rohini)
BUG FIXES
PIG-4037: TestHBaseStorage, TestAccumuloPigCluster has failures with hadoopversion=23 (daijy)
PIG-4005: depend on hbase-hadoop2-compat rather than hbase-hadoop1-compat when hbaseversion is 95 (daijy)
PIG-4021: Fix TestHBaseStorage failure after auto local mode change (PIG-3463) (daijy)
PIG-4029: TestMRCompiler is broken after PIG-3874 (daijy)
PIG-4030: TestGrunt, TestPigRunner fail after PIG-3892 (daijy)
PIG-3975: Multiple Scalar reference calls leading to missing records (knoguchi via rohini)
PIG-4017: NPE thrown from JobControlCompiler.shipToHdfs (cheolsoo)
PIG-3997: Issue on Pig docs: Testing and Diagnostics (zjffdu via cheolsoo)
PIG-3998: Documentation fix: invalid page links, wrong Groovy udf example (lbendig via cheolsoo)
PIG-4000: Minor documentation fix for PIG-3642 (lbendig via cheolsoo)
PIG-3991: TestErrorHandling.tesNegative7 is broken in trunk/branch-0.13 (cheolsoo)
PIG-3990: ant docs is broken in trunk/branch-0.13 (cheolsoo)
PIG-3989: PIG_OPTS does not work with some version of HADOOP (daijy)
PIG-3739: The Warning_4 e2e test is broken in trunk (aniket486)
PIG-3976: Typo correction in JobStats breaks Oozie (rohini)
PIG-3874: FileLocalizer temp path can sometimes be non-unique (chitnis via cheolsoo)
PIG-3967: Grunt fail if we running more statement after first store (daijy)
PIG-3915: MapReduce queries in Pigmix outputs different results than Pig's (keren3000 via daijy)
PIG-3955: Remove url.openStream() file descriptor leak from JCC (aniket486)
PIG-3958: TestMRJobStats is broken in 0.13 and trunk (aniket486)
PIG-3949: HiveColumnarStorage compile failure with Hive 0.14.0 (daijy)
PIG-3960: Compile fail against Hadoop 2.4.0 after PIG-3913 (daijy)
PIG-3956: UDF profile is often misleading (cheolsoo)
PIG-3950: Removing empty file PColFilterExtractor.java speeds up rebuilds (mrflip via cheolsoo)
PIG-3940: NullPointerException writing .pig_header for field with null name in JsonMetadata.java (mrflip via cheolsoo)
PIG-3944: PigNullableWritable toString method throws NPE on null value (mauzhang via cheolsoo)
PIG-3936: DBStorage fails on storing nulls for non varchar columns (jeremykarn via cheolsoo)
PIG-3945: Ant not sending hadoopversion to piggybank sub-ant (mrflip via cheolsoo)
PIG-3942: Util.buildPp() is incompatible with Non-MR execution engine (cheolsoo)
PIG-3902: PigServer creates cycle (thedatachef via cheolsoo)
PIG-3930: "java.io.IOException: Cannot initialize Cluster" in local mode with hadoopversion=23 dependencies (jira.shegalov via cheolsoo)
PIG-3921: Obsolete entries in piggybank javadoc build script (mrflip via cheolsoo)
PIG-3923: Gitignore file should ignore all generated artifacts (mrflip via cheolsoo)
PIG-3922: Increase Forrest heap size to avoid OutOfMemoryError building docs (mrflip via cheolsoo)
PIG-3916: isEmpty should not be early terminating (rohini)
PIG-3859: auto local mode should not modify reducer configuration (aniket486)
PIG-3909: Type Casting issue (daijy)
PIG-3905: 0.12.1 release can't be build for Hadoop2 (daijy)
PIG-3894: Datetime function AddDuration, SubtractDuration and all Between functions don't check for null values in the input tuple (jennythompson via cheolsoo)
PIG-3889: Direct fetch doesn't set job submission timestamps (cheolsoo)
PIG-3895: Pigmix run script has compilation error (rohini)
PIG-3885: AccumuloStorage incompatible with Accumulo 1.6.0 (elserj via daijy)
PIG-3888: Direct fetch doesn't differentiate between frontend and backend sides (lbendig via daijy)
PIG-3887: TestMRJobStats is broken in trunk (cheolsoo)
PIG-3868: Fix Iterator_1 e2e test on windows (ssvinarchukhorton via rohini)
PIG-3871: Replace org.python.google.* with com.google.* in imports (cheolsoo)
PIG-3858: PigLogger/PigStatusReporter is not set for fetch tasks (lbendig via cheolsoo)
PIG-3798: Registered jar in pig script are appended to the classpath multiple times (cheolsoo)
PIG-3844: Make ScriptState InheritableThreadLocal for threads that need it (amatsukawa via cheolsoo)
PIG-3837: ant pigperf target is broken in trunk (cheolsoo)
PIG-3836: Pig signature has has guava version dependency (amatsukawa via cheolsoo)
PIG-3832: Fix piggybank test compilation failure after PIG-3449 (rohini)
PIG-3807: Pig creates wrong schema after dereferencing nested tuple fields with sorts (daijy)
PIG-3802: Fix TestBlackAndWhitelistValidator failures (prkommireddi)
PIG-3815: Hadoop bug causes to pig to fail silently with jar cache (aniket486)
PIG-3816: Incorrect Javadoc for launchPlan() method (kyungho via prkommireddi)
PIG-3673: Divide by zero error in runpigmix.pl script (suhassatish via daijy)
PIG-3805: ToString(datetime [, format string]) doesn't work without the second argument (jennythompson via daijy)
PIG-3809: AddForEach optimization doesn't set the alias of the added foreach (cheolsoo)
PIG-3811: PigServer.registerScript() wraps exception incorrectly on parsing errors (prkommireddi)
PIG-3806: PigServer constructor throws NPE after PIG-3765 (aniket486)
PIG-3801: Auto local mode does not call storeSchema (aniket486)
PIG-3754: InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size (aniket486)
PIG-3679: e2e StreamingPythonUDFs_10 fails in trunk (cheolsoo)
PIG-3776: Conflicting versions of jline is present in trunk (cheolsoo)
PIG-3674: Fix TestAccumuloPigCluster on Hadoop 2 (elserj via daijy)
PIG-3740: Document direct fetch optimization (lbendig via cheolsoo)
PIG-3746: NPE is thrown if Pig fails before PigStats is intialized (cheolsoo)
PIG-3747: Update skewed join documentation (cheolsoo)
PIG-3755: auto local mode selection does not check lower bound for size (aniket486)
PIG-3447: Compiler warning message dropped for CastLineageSetter and others with no enum kind (knoguchi via cheolsoo)
PIG-3627: Json storage : Doesn't work in cases , where other Store Functions (like PigStorage / AvroStorage)
do work (ssvinarchukhorton via daijy)
PIG-3606: Pig script throws error when searching for hcatalog jars in latest hive (deepesh via daijy)
PIG-3623: HBaseStorage: setting loadKey and noWAL to false doesn't have any affect (nezihyigitbasi via rohini)
PIG-3744: SequenceFileLoader does not support BytesWritable (rohini)
PIG-3726: Ranking empty records leads to NullPointerException (jarcec via daijy)
PIG-3652: Pigmix parser (PigPerformanceLoader) deletes chars during parsing (keren3000 via daijy)
PIG-3722: Udf deserialization for registered classes fails in local_mode (aniket486)
PIG-3641: Split "otherwise" producing incorrect output when combined with ColumnPruning (knoguchi)
PIG-3682: mvn-inst target does not install pig-h2.jar into local .m2 (raluri via aniket486)
PIG-3511: Security: Pig temporary directories might have world readable permissions (rohini)
PIG-3664: Piggy Bank XPath UDF can't be called (nezihyigitbasi via daijy)
PIG-3662: Static loadcaster in BinStorage can cause exception (lbendig via rohini)
PIG-3617: problem with temp file deletion in MAPREDUCE operator (nezihyigitbasi via cheolsoo)
PIG-3649: POPartialAgg incorrectly calculates size reduction when multiple values aggregated (tmwoodruff via daijy)
PIG-3650: Fix for PIG-3100 breaks column pruning (tmwoodruff via daijy)
PIG-3643: Nested Foreach with UDF and bincond is broken (cheolsoo)
PIG-3616: TestBuiltIn.testURIwithCurlyBrace() silently fails (lbendig via cheolsoo)
PIG-3608: ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key (rding)
PIG-3639: TestRegisteredJarVisibility is broken in trunk (cheolsoo)
PIG-3640: Retain intermediate files for debugging purpose in batch mode (cheolsoo)
PIG-3609: ClassCastException when calling compareTo method on AvroBagWrapper (rding via cheolsoo)
PIG-3584: AvroStorage does not correctly translate arrays of strings (jadler via cheolsoo)
PIG-3633: AvroStorage tests are failing when running against Avro 1.7.5 (jarcec via cheolsoo)
PIG-3612: Storing schema does not work cross cluster with PigStorage and JsonStorage (rohini)
PIG-3607: PigRecordReader should report progress for each inputsplit processed (rohini)
PIG-3566: Cannot set useMatches of REGEX_EXTRACT_ALL and REGEX_EXTRACT (nezihyigitbasi via cheolsoo)
PIG-2132: [Piggybank] MIN and MAX functions should ignore nulls (rekhajoshm via cheolsoo)
PIG-3581: Incorrect scope resolution with nested foreach (aniket486)
PIG-3285: Jobs using HBaseStorage fail to ship dependency jars (ndimiduk via cheolsoo)
PIG-3582: Document SUM, MIN, MAX, and AVG functions for BigInteger and BigDecimal (harichinnan via cheolsoo)
PIG-3525: PigStats.get() and ScriptState.get() shouldn't return MR-specific objects (cheolsoo)
PIG-3568: Define the semantics of POStatus.STATUS_NULL (mwagner via cheolsoo)
PIG-3561: Clean up PigStats and JobStats after PIG-3419 (cheolsoo)
PIG-3553: HadoopJobHistoryLoader fails to load job history on hadoop v 1.2 (lgiri via cheolsoo)
PIG-3559: Trunk is broken by PIG-3522 (cheolsoo)
PIG-3551: Minor typo on pig latin basics page (elserj via aniket486)
PIG-3526: Unions with Enums do not work with AvroStorage (jadler via cheolsoo)
PIG-3377: New AvroStorage throws NPE when storing untyped map/array/bag (jadler via cheolsoo)
PIG-3542: Javadoc of REGEX_EXTRACT_ALL (nyigitba via daijy)
PIG-3518: Need to ship jruby.jar in the release (daijy)
PIG-3524: Clean up Launcher and MapReduceLauncher after PIG-3419 (cheolsoo)
PIG-3515: Shell commands are limited from OS buffer (andronat via cheolsoo)
PIG-3520: Provide backward compatibility for PigRunner and PPNL after PIG-3419 (daijy via cheolsoo)
PIG-3519: Remove dependency on uber avro-tools jar (jarcec via cheolsoo)
PIG-3451: EvalFunc<T> ctor reflection to determine value of type param T is brittle (hazen via aniket486)
PIG-3509: Exception swallowing in TOP (vrajaram via aniket486)
PIG-3506: FLOOR documentation references CEIL function instead of FLOOR (seshness via daijy)
PIG-3497: JobControlCompiler should only do reducer estimation when the job has a reduce phase (amatsukawa via aniket486)
PIG-3469: Skewed join can cause unrecoverable NullPointerException when one of its inputs is missing (Jarek Jarcec Cecho via xuefuz)
PIG-3496: Propagate HBase 0.95 jars to the backend (Jarek Jarcec Cecho via xuefuz)
Release 0.12.1 (unreleased changes)
IMPROVEMENTS
PIG-3529: Upgrade HBase dependency from 0.95-SNAPSHOT to 0.96 (jarcec via daijy)
PIG-3552: UriUtil used by reducer estimator should support viewfs (amatsukawa via aniket486)
PIG-3549: Print hadoop jobids for failed, killed job (aniket486)
PIG-3047: Check the size of a relation before adding it to distributed cache in Replicated join (aniket486)
PIG-3480: TFile-based tmpfile compression crashes in some cases (dvryaboy via aniket486)
BUG FIXES
PIG-3661: Piggybank AvroStorage fails if used in more than one load or store statement (rohini)
PIG-3819: e2e tests containing "perl -e "print $_;" fails on Hadoop 2 (daijy)
PIG-3813: Rank column is assigned different uids everytime when schema is reset (cheolsoo)
PIG-3833: Relation loaded by AvroStorage with schema is projected incorrectly in foreach statement (jeongjinku via cheolsoo)
PIG-3794: pig -useHCatalog fails using pig command line interface on HDInsight (ehans via daijy)
PIG-3827: Custom partitioner is not picked up with secondary sort optimization (daijy)
PIG-3826: Outer join with PushDownForEachFlatten generates wrong result (daijy)
PIG-3820: TestAvroStorage fail on some OS (daijy)
PIG-3818: PIG-2499 is accidently reverted (daijy)
PIG-3516: pig does not bring in joda-time as dependency in its pig-template.xml (daijy)
PIG-3753: LOGenerate generates null schema (daijy)
PIG-3782: PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment (knoguchi via daijy)
PIG-3779: Assert constructs ConstantExpression with null when no comment is given (thedatachef via cheolsoo)
PIG-3777: Pig 12.0 Documentation (karinahauser via daijy)
PIG-3774: Piggybank Over UDF get wrong result (daijy)
PIG-3657: New partition filter extractor fails with NPE (cheolsoo)
PIG-3347: Store invocation brings side effect (daijy)
PIG-3670: Fix assert in Pig script (daijy)
PIG-3741: Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage (aniket486)
PIG-3677: ConfigurationUtil.getLocalFSProperties can return an inconsistent property set (rohini)
PIG-3621: Python Avro library can't read Avros made with builtin AvroStorage (rusell.jurney via cheolsoo)
PIG-3592: Should not try to create success file for non-fs schemes like hbase (rohini)
PIG-3572: Fix all unit test for during build pig with Hadoop 2.X on Windows (ssvinarchukhorton via daijy)
PIG-2629: Wrong Usage of Scalar which is null causes high namenode operation (rohini)
PIG-3593: Import jython standard module fail on cluster (daijy)
PIG-3576: NPE due to PIG-3549 when job never gets submitted (lbendig via cheolsoo)
PIG-3567: LogicalPlanPrinter throws OOM for large scripts (aniket486)
PIG-3579: pig.script's deserialized version does not maintain line numbers (jgzhang via aniket486)
PIG-3570: Rollback PIG-3060 (daijy)
PIG-3530: Some e2e tests is broken due to PIG-3480 (daijy)
PIG-3492: ColumnPrune dropping used column due to LogicalRelationalOperator.fixDuplicateUids changes not propagating (knoguchi via daijy)
PIG-3325: Adding a tuple to a bag is slow (dvryaboy via aniket486)
PIG-3512: Reducer estimater is broken by PIG-3497
PIG-3510: New filter extractor fails with more than one filter statement (aniket486 via cheolsoo)
Release 0.12.0
INCOMPATIBLE CHANGES
PIG-3082: outputSchema of a UDF allows two usages when describing a Tuple schema (jcoveney)
PIG-3191: [piggybank] MultiStorage output filenames are not sortable (Danny Antonelli via jcoveney)
PIG-3174: Remove rpm and deb artifacts from build.xml (gates)
IMPROVEMENTS
PIG-3503: More document for Pig 0.12 new features (daijy)
PIG-3445: Make Parquet format available out of the box in Pig (lbendig via aniket486)
PIG-3483: Document ASSERT keyword (aniket486 via daijy)
PIG-3470: Print configuration variables in grunt (lbendig via daijy)
PIG-3493: Add max/min for datetime (tyro89 via daijy)
PIG-3479: Fix BigInt, BigDec, Date serialization. Improve perf of PigNullableWritable deserilization (dvryaboy)
PIG-3461: Rewrite PartitionFilterOptimizer to make it work for all the cases (aniket486)
PIG-2417: Streaming UDFs - allow users to easily write UDFs in scripting languages with no
JVM implementation. (jeremykarn via daijy)
PIG-3199: Provide a method to retriever name of loader/storer in PigServer (prkommireddi via daijy)
PIG-3367: Add assert keyword (operator) in pig (aniket486)
PIG-3235: Avoid extra byte array copies in streaming (rohini)
PIG-3065: pig output format/committer should support recovery for hadoop 0.23 (daijy)
PIG-3390: Make pig working with HBase 0.95 (jarcec via daijy)
PIG-3431: Return more information for parsing related exceptions. (jeremykarn via daijy)
PIG-3430: Add xml format for explaining MapReduce Plan. (jeremykarn via daijy)
PIG-3048: Add mapreduce workflow information to job configuration (billie.rinaldi via daijy)
PIG-3436: Make pigmix run with Hadoop2 (rohini)
PIG-3424: Package import list should consider class name as is first even if -Dudf.import.list is passed (rohini)
PIG-3204: Change script parsing to parse entire script instead of line by line (rohini)
PIG-3359: Register Statements and Param Substitution in Macros (jpacker via cheolsoo)
PIG-3182: Pig currently lacks functions to trim the whitespace only on one hand side (sarutak via cheolsoo)
PIG-3163: Pig current releases lack a UDF endsWith. This UDF tests if a given string ends with the specified suffix (sriramkrishnan via cheolsoo)
PIG-3015: Rewrite of AvroStorage (jadler via cheolsoo)
PIG-3361: Improve Hadoop version detection logic for Pig unit test (daijy)
PIG-3280: Document IN operator and CASE expression (cheolsoo)
PIG-3342: Allow conditions in case statement (cheolsoo)
PIG-3327: Pig hits OOM when fetching task reports (rohini)
PIG-3336: Change IN operator to use or-expressions instead of EvalFunc (cheolsoo)
PIG-3339: Move pattern compilation in ToDate as a static variable (rohini)
PIG-3332: Upgrade Avro dependency to 1.7.4 (nielsbasjes via cheolsoo)
PIG-3307: Refactor physical operators to remove methods parameters that are always null (julien)
PIG-3317: disable optimizations via pig properties (traviscrawford via billgraham)
PIG-3321: AVRO: Support user specified schema on load (harveyc via rohini)
PIG-2959: Add a pig.cmd for Pig to run under Windows (daijy)
PIG-3311: add pig-withouthadoop-h2 to mvn-jar (julien)
PIG-2873: Converting bin/pig shell script to python (vikram.dixit via daijy)
PIG-3308: Storing data in hive columnar rc format (maczech via daijy)
PIG-3303: add hadoop h2 artifact to publications in ivy.xml (julien)
PIG-3169: Remove intermediate data after a job finishes (mwagner via cheolsoo)
PIG-3173: Partition filter push down does not happen when partition keys condition include a AND and OR construct (rohini)
PIG-2786: enhance Pig launcher script wrt. HBase/HCat integration (ndimiduk via daijy)
PIG-3198: Let users use any function from PigType -> PigType as if it were builtlin (jcoveney)
PIG-3268: Case statement support (cheolsoo)
PIG-3269: In operator support (cheolsoo)
PIG-200: Pig Performance Benchmarks (daijy)
PIG-3261: User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not
appended (qwertymaniac via daijy)
PIG-3141: Giving CSVExcelStorage an option to handle header rows (jpacker via cheolsoo)
PIG-3217: Add support for DateTime type in Groovy UDFs (herberts via daijy)
PIG-3218: Add support for biginteger/bigdecimal type in Groovy UDFs (herberts via daijy)
PIG-3248: Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha (daijy)
PIG-3235: Add log4j.properties for unit tests (cheolsoo)
PIG-3236: parametrize snapshot and staging repo id (gkesavan via daijy)
PIG-3244: Make PIG_HOME configurable (robert.schooley@gmail.com via daijy)
PIG-3233: Deploy a Piggybank Jar (njw45 via cheolsoo)
PIG-3245: Documentation about HBaseStorage (Daisuke Kobayashi via cheolsoo)
PIG-3211: Allow default Load/Store funcs to be configurable (prkommireddi via cheolsoo)
PIG-3136: Introduce a syntax making declared aliases optional (jcoveney via cheolsoo)
PIG-3142: [piggybank] Fixed-width load and store functions for the Piggybank (jpacker via cheolsoo)
PIG-3162: PigTest.assertOutput doesn't allow non-default delimiter (dreambird via cheolsoo)
PIG-3002: Pig client should handle CountersExceededException (jarcec via billgraham)
PIG-3189: Remove ivy/pig.pom and improve build mvn targets (billgraham)
PIG-3192: Better call to action to download Pig in docs (rjurney via jcoveney)
PIG-3167: Job stats are printed incorrectly for map-only jobs (Mark Wagner via jcoveney)
PIG-3131: Document PluckTuple UDF (rjurney via jcoveney)
PIG-3098: Add another test for the self join case (jcoveney)
PIG-3129: Document syntax to refer to previous relation (rjurney via jcoveney)
PIG-2553: Pig shouldn't allow attempts to write multiple relations into same directory (prkommireddi via cheolsoo)
PIG-3179: Task Information Header only prints out the first split for each task (knoguchi via rohini)
PIG-3108: HBaseStorage returns empty maps when mixing wildcard with other columns (christoph.bauer via billgraham)
PIG-3178: Print a stacktrace when ExecutableManager hits an OOM (knoguchi via rohini)
PIG-3160: GFCross uses unnecessary loop (sandyr via cheolsoo)
PIG-3138: Decouple PigServer.executeBatch() from compilation of batch (pkommireddi via cheolsoo)
PIG-2878: Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This
check is case insensitive. (shami via gates)
PIG-2994: Grunt shortcuts (prasanth_j via cheolsoo)
PIG-3140: Document PigProgressNotificationListener configs (billgraham)
PIG-3139: Document reducer estimation (billgraham)
PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney)
PIG-3073: POUserFunc creating log spam for large scripts (jcoveney)
PIG-3124: Push FLATTENs After FILTERs If Possible (nwhite via daijy)
PIG-3086: Allow A Prefix To Be Added To URIs In PigUnit Tests (nwhite via gates)
PIG-3091: Make schema, header and stats file configurable in JsonMetadata (pkommireddi via jcoveney)
PIG-3078: Make a UDF that, given a string, returns just the columns prefixed by that string (jcoveney)
PIG-3090: Introduce a syntax to be able to easily refer to the previously defined relation (jcoveney)
PIG-3057: Make PigStorage.readField() protected (pablomar and billgraham via billgraham)
PIG-2788: improved string interpolation of variables (jcoveney)
PIG-2362: Rework Ant build.xml to use macrodef instead of antcall (azaroth via cheolsoo)
PIG-2857: Add a -tagPath option to PigStorage (prkommireddi via cheolsoo)
PIG-2341: Need better documentation on Pig/HBase integration (jthakrar and billgraham via billgraham)
PIG-3075: Allow AvroStorage STORE Operations To Use Schema Specified By URI (nwhite via cheolsoo)
PIG-3062: Change HBaseStorage to permit overriding pushProjection (billgraham)
PIG-3016: Modernize more tests (jcoveney via cheolsoo)
PIG-2582: Store size in bytes (not mbytes) in ResourceStatistics (prkommireddi via billgraham)
PIG-3006: Modernize a chunk of the tests (jcoveney via cheolsoo)
PIG-2997: Provide a convenience constructor on PigServer that accepts Configuration (prkommireddi via rohini)
PIG-2933: HBaseStorage is using setScannerCaching which is deprecated (prkommireddi via rohini)
PIG-2881: Add SUBTRACT eval function (jocosti via cheolsoo)
PIG-3004: Improve exceptions messages when a RuntimeException is raised in Physical Operators (julien)
PIG-2990: the -secretDebugCmd shouldn't be a secret and should just be...a command (jcoveney)
PIG-2941: Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices (jgordon via azaroth)
PIG-2778: Add 'matches' operator to predicate pushdown (cheolsoo via jcoveney)
PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms)
PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)
PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas)
OPTIMIZATIONS
PIG-3395: Large filter expression makes Pig hang (cheolsoo)
PIG-3123: Simplify Logical Plans By Removing Unneccessary Identity Projections (njw45 via cheolsoo)
PIG-3013: BinInterSedes improve chararray sort performance (rohini)
BUG FIXES
PIG-3504: Fix e2e Describe_cmdline_12 (cheolsoo via daijy)
PIG-3128: Document the BigInteger and BigDecimal data type (daijy via cheolsoo)
PIG-3495: Streaming udf e2e tests failures on Windows (daijy)
PIG-3292: Logical plan invalid state: duplicate uid in schema during self-join to get cross product (cheolsoo via daijy)
PIG-3491: Fix e2e failure Jython_Diagnostics_4 (daijy)
PIG-3114: Duplicated macro name error when using pigunit (daijy)
PIG-3370: Add New Reserved Keywords To The Pig Docs (cheolsoo)
PIG-3487: Fix syntax errors in nightly.conf (arpitgupta via daijy)
PIG-3458: ScalarExpression lost with multiquery optimization (knoguchi)
PIG-3360: Some intermittent negative e2e tests fail on hadoop 2 (daijy)
PIG-3468: PIG-3123 breaks e2e test Jython_Diagnostics_2 (daijy)
PIG-3471: Add a base abstract class for ExecutionEngine (cheolsoo)
PIG-3466: Race Conditions in InternalDistinctBag during proactive spill (cheolsoo)
PIG-3464: Mark ExecType and ExecutionEngine interfaces as evolving (cheolsoo)
PIG-3454: Update JsonLoader/JsonStorage (tyro89 via daijy)
PIG-3333: Fix remaining Windows core unit test failures (daijy)
PIG-3426: Add support for removing s3 files (jeremykarn via daijy)
PIG-3349: Document ToString(Datetime, String) UDF (cheolsoo)
PIG-3374: CASE and IN fail when expression includes dereferencing operator (cheolsoo)
PIG-2606: union/ join operations are not accepting same alias as multiple inputs (hsubramaniyan via daijy)
PIG-3379: Alias reuse in nested foreach causes PIG script to fail (xuefuz via daijy)
PIG-3432: typo in log message in SchemaTupleFrontend (epishkin via cheolsoo)
PIG-3410: LimitOptimizer is applied before PartitionFilterOptimizer (aniket486)
PIG-3405: Top UDF documentation indicates improper use (aniket486 via cheolsoo)
PIG-3425: Hive jdo api jar referenced in pig script throws error (deepesh via cheolsoo)
PIG-3422: AvroStorage failed to read paths separated by commas (yuanlid via rohini)
PIG-3420: Failed to retrieve map values from data loaded by AvroStorage (yuanlid via rohini)
PIG-3414: QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition (cheolsoo)
PIG-3412: jsonstorage breaks when tuple does not have as many columns as schema (aesilberstein via cheolsoo)
PIG-3243: Documentation error (sarutak via cheolsoo)
PIG-3210: Pig fails to start when it cannot write log to log files (mengsungwu via cheolsoo)
PIG-3392: Document STARTSWITH and ENDSWITH UDFs (sriramkrishnan via cheolsoo)
PIG-3393: STARTSWITH udf doesn't override outputSchema method (sriramkrishnan via cheolsoo)
PIG-3389: "Set job.name" does not work with dump command (cheolsoo)
PIG-3387: Miss spelling in test code "TestBuiltin.java" (sarutak via cheolsoo)
PIG-3384: Missing negation in UDF doc sample code (ddamours via cheolsoo)
PIG-3369: unit test TestImplicitSplitOnTuple.testImplicitSplitterOnTuple failed when using hadoopversion=23 (dreambird via cheolsoo)
PIG-3375: CASE does not preserve the order of when branches (cheolsoo)
PIG-3364: Case expression fails with an even number of when branches (cheolsoo)
PIG-3354: UDF example does not handle nulls (patc888 via daijy)
PIG-3355: ColumnMapKeyPrune bug with distinct operator (jeremykarn via aniket486)
PIG-3318: AVRO: 'default value' not honored when merging schemas on load with AvroStorage (viraj via rohini)
PIG-3250: Pig dryrun generates wrong output in .expanded file for 'SPLIT....OTHERWISE...' command (dreambird via cheolsoo)
PIG-3331: Default values not stored in avro file when using specific schemas during store in AvroStorage (viraj via rohini)
PIG-3322: AvroStorage give NPE on reading file with union as top level schema (viraj via rohini)
PIG-2828: Handle nulls in DataType.compare (aniket486)
PIG-3335: TestErrorHandling.tesNegative7 fails on MR2 (xuefuz)
PIG-3316: Pig failed to interpret DateTime values in some special cases (xuefuz)
PIG-2956: Invalid cache specification for some streaming statement (daijy)
PIG-3310: ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations (cstenac via daijy)
PIG-3334: Fix Windows piggybank unit test failures (daijy)
PIG-3337: Fix remaining Window e2e tests (daijy)
PIG-3328: DataBags created with an initial list of tuples don't get registered as spillable (mwagner via daijy)
PIG-3313: pig job hang if the job tracker is bounced during execution (yu.chenjie via daijy)
PIG-3297: Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc (nielsbasjes via cheolsoo)
PIG-3069: Native Windows Compatibility for Pig E2E Tests and Harness (anthony.murphy via daijy)
PIG-3291: TestExampleGenerator fails on Windows because of lack of file name escaping (dwann via daijy)
PIG-3026: Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences (dwann via daijy)
PIG-3025: TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification (dwann via daijy)
PIG-2955: Fix bunch of Pig e2e tests on Windows (daijy)
PIG:3302: JSONStorage throws NPE if map has null values (rohini)
PIG-3309: TestJsonLoaderStorage fails with IBM JDK 6/7 (lrangel via daijy)
PIG-3097: HiveColumnarLoader doesn't correctly load partitioned Hive table (maczech via daijy)
PIG-3305: Infinite loop when input path contains empty partition directory (maczech via daijy)
PIG-3286: TestPigContext.testImportList fails in trunk (cheolsoo)
PIG-2970: Nested foreach getting incorrect schema when having unrelated inner query (daijy)
PIG-3304: XMLLoader in piggybank does not work with inline closed tags (aseldawy via daijy)
PIG-3028: testGrunt dev test needs some command filters to run correctly without cygwin (jgordon via gates)
PIG-3290: TestLogicalPlanBuilder.testQuery85 fail in trunk (daijy)
PIG-3027: pigTest unit test needs a newline filter for comparisons of golden multi-line (jgordon via gates)
PIG-2767: Pig creates wrong schema after dereferencing nested tuple fields (daijy)
PIG-3276: change the default value for hcat.bin to hcat instead of /usr/local/hcat/bin/hcat (arpitgupta via daijy)
PIG-3277: fix the path to the benchmarks file in the print statement (arpitgupta via daijy)
PIG-3122: Operators should not implicitly become reserved keywords (jcoveney via cheolsoo)
PIG-3193: Fix "ant docs" warnings (cheolsoo)
PIG-3186: tar/deb/pkg ant targets should depend on piggybank (lbendig via gates)
PIG-3270: Union onschema failing at runtime when merging incompatible types (knoguchi via daijy)
PIG-3271: POSplit ignoring error from input processing giving empty results (knoguchi via daijy)
PIG-2265: Test case TestSecondarySort failure (daijy)
PIG-3060: FLATTEN in nested foreach fails when the input contains an empty bag (daijy)
PIG-3249: Pig startup script prints out a wrong version of hadoop when using fat jar (prkommireddi via daijy)
PIG-3110: pig corrupts chararrays with trailing whitespace when converting them to long (prkommireddi via daijy)
PIG-3253: Misleading comment w.r.t getSplitIndex() method in PigSplit.java (cheolsoo)
PIG-3208: [zebra] TFile should not set io.compression.codec.lzo.buffersize (ekoontz via daijy)
PIG-3172: Partition filter push down does not happen when there is a non partition key map column filter (rohini)
PIG-3205: Passing arguments to python script does not work with -f option (rohini)
PIG-3239: Unable to return multiple values from a macro using SPLIT (dreambird via cheolsoo)
PIG-3077: TestMultiQueryLocal should not write in /tmp (dreambird via cheolsoo)
PIG-3081: Pig progress stays at 0% for the first job in hadoop 23 (rohini)
PIG-3150: e2e Scripting_5 fails in trunk (dreambird via cheolsoo)
PIG-3153: TestScriptUDF.testJavascriptExampleScript fails in trunk (cheolsoo)
PIG-3145: Parameters in core-site.xml and mapred-site.xml are not correctly substituted (cheolsoo)
PIG-3135: HExecutionEngine should look for resources in user passed Properties (prkommireddi via cheolsoo)
PIG-3200: MiniCluster should delete hadoop-site.xml on shutDown (prkommireddi via cheolsoo)
PIG-3158: Errors in the document "Control Structures" (miyakawataku via cheolsoo)
PIG-3161: Update reserved keywords in Pig docs (russell.jurney via cheolsoo)
PIG-3156: TestSchemaTuple fails in trunk (cheolsoo)
PIG-3155: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk (cheolsoo)
PIG-3154: TestPackage.testOperator fails in trunk (dreambird via cheolsoo)
PIG-3168: TestMultiQueryBasic.testMultiQueryWithSplitInMapAndMultiMerge fails in trunk (cheolsoo)
PIG-3137: Fix Piggybank test to not using /tmp dir (dreambird via cheolsoo)
PIG-3149: e2e build.xml still refers to jython 2.5.0 jar even though it's replaced by jython standalone 2.5.2 jar (cheolsoo)
PIG-2266: Bug with input file joining optimization in Pig (jadler via cheolsoo)
PIG-2645: PigSplit does not handle the case where SerializationFactory returns null (shami via gates)
PIG-3031: Update Pig to use a newer version of joda-time (zjshen via cheolsoo)
PIG-3071: Update hcatalog jar and path to hbase storage handler jar in pig script (arpitgupta via cheolsoo)
PIG-3029 TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution (jgordon via gates)
PIG-3120: setStoreFuncUDFContextSignature called with null signature (jdler via cheolsoo)
PIG-3115: Distinct Build-in Function Doesn't Handle Null Bags (njw45 via daijy)
PIG-2433: Jython import module not working if module path is in classpath (rohini)
PIG-2769 a simple logic causes very long compiling time on pig 0.10.0 (nwhite via gates)
PIG-2251: PIG leaks Zookeeper connections when using HBaseStorage (jamarkha via cheolsoo)
PIG-3112: Errors and lacks in document "User Defined Functions" (miyakawataku via cheolsoo)
PIG-3050: Fix FindBugs multithreading warnings (cheolsoo)
PIG-3066: Fix TestPigRunner in trunk (cheolsoo)
PIG-3101: Increase io.sort.mb in YARN MiniCluste (cheolsoo)
PIG-3100: If a .pig_schema file is present, can get an index out of bounds error (jcoveney)
PIG-3096: Make PigUnit thread safe (cheolsoo)
PIG-3095: "which" is called many, many times for each Pig STREAM statement (nwhite via cheolsoo)
PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo)
PIG-3084: Improve exceptions messages in POPackage (julien)
PIG-3072: Pig job reporting negative progress (knoguchi via rohini)
PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo)
PIG-2924: PigStats should not be assuming all Storage classes to be file-based storage (cheolsoo)
PIG-3046: An empty file name in -Dpig.additional.jars throws an error (prkommireddi via cheolsoo)
PIG-2989: Illustrate for Rank Operator (xalan via gates)
PIG-2885: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3 (cheolsoo via sms)
PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy)
Release 0.11.2 (Unreleased)
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3380: Fix e2e test float precision related test failures when run with -Dpig.exec.mapPartAgg=true (anlilin via rohini)
OPTIMIZATIONS
PIG-2769: a simple logic causes very long compiling time on pig 0.10.0 (njw45 via dvryaboy) (prev. applied to 0.12)
BUG FIXES
PIG-3455: Pig 0.11.1 OutOfMemory error (rohini)
PIG-3435: Custom Partitioner not working with MultiQueryOptimizer (knoguchi via daijy)
PIG-3385: DISTINCT no longer uses custom partitioner (knoguchi via daijy)
PIG-2507: Semicolon in paramenters for UDF results in parsing error (tnachen via daijy)
PIG-3341: Strict datetime parsing and improve performance of loading datetime values (rohini)
PIG-3329: RANK operator failed when working with SPLIT (xalan via cheolsoo)
PIG-3345: Handle null in DateTime functions (rohini)
PIG-3223: AvroStorage does not handle comma separated input paths (dreambird via rohini)
PIG-3262: Pig contrib 0.11 doesn't compile on certain rpm systems (mgrover via cheolsoo)
PIG-3264: mvn signanddeploy target broken for pigunit, pigsmoke and piggybank (billgraham)
Release 0.11.1
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-3256: Upgrade jython to 2.5.3 (legal concern) (daijy)
PIG-2988: start deploying pigunit maven artifact part of Pig release process (njw45 via rohini)
PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag. (knoguchi via rohini)
PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini)
PIG-3202: CUBE operator not documented in user docs (prasanth_j via billgraham)
OPTIMIZATIONS
BUG FIXES
PIG-3267: HCatStorer fail in limit query (daijy)
PIG-3252: AvroStorage gives wrong schema for schemas with named records (mwagner via cheolsoo)
PIG-3132: NPE when illustrating a relation with HCatLoader (daijy)
PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 (prkommireddi via dvryaboy)
PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy)
PIG-3144: Erroneous map entry alias resolution leading to "Duplicate schema alias" errors (jcoveney via cheolsoo)
PIGG-3212: Race Conditions in POSort and (Internal)SortedBag during Proactive Spill (kadeng via dvryaboy)
PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase (rohini)
Release 0.11.0
INCOMPATIBLE CHANGES
PIG-3034: Remove Penny code from Pig repository (gates via cheolsoo)
PIG-2931: $ signs in the replacement string make parameter substitution fail (cheolsoo via jcoveney)
PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates)
IMPROVEMENTS
PIG-3044: Trigger POPartialAgg compaction under GC pressure (dvryaboy)
PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini)
PIG-2934: HBaseStorage filter optimizations (billgraham)
PIG-2980: documentation for DateTime datatype (zjshen via thejas)
PIG-2982: add unit tests for DateTime type that test setting timezone (zjshen via thejas)
PIG-2937: generated field in nested foreach does not inherit the variable name as the field name (jcoveney)
PIG-3019: Need a target in build.xml for source releases (gates)
PIG-2832: org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext (prkommireddi via rohini)
PIG-2898: Parallel execution of e2e tests (iveselovsky via rohini)
PIG-2913: org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file (cheolsoo via julien)
PIG-2976: Reduce HBaseStorage logging (billgraham)
PIG-2947: Documentation for Rank operator (xalan via azaroth)
PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy)
PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)
PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham)
PIG-2579: Support for multiple input schemas in AvroStorage (cheolsoo via sms)
PIG-2946: Documentation of "history" and "clear" commands (xalan via azaroth)
PIG-2877: Make SchemaTuple work in foreach (and thus, in loads) (jcoveney)
PIG-2923: Lazily register bags with SpillableMemoryManager (dvryaboy)
PIG-2929: Improve documentation around AVG, CONCAT, MIN, MAX (cheolsoo via billgraham)
PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney)
PIG-2879: Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix. (initialcontext via azaroth)
PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates)
PIG-2918: Avoid Spillable bag overhead where possible (dvryaboy)
PIG-2900: Streaming should provide conf settings in the environment (dvryaboy)
PIG-2353: RANK function like in SQL (xalan via azaroth)
PIG-2915: Builtin TOP udf is sensitive to null input bags (hazen via dvryaboy)
PIG-2901: Errors and lacks in document "Pig Latin Basics" (miyakawataku via billgraham)
PIG-2905: Improve documentation around REPLACE (cheolsoo via billgraham)
PIG-2882: Use Deque instead of Stack (mkhadikov via dvryaboy)
PIG-2781: LOSort isEqual method (xalan via dvryaboy)
PIG-2835: Optimizing the convertion from bytes to Integer/Long (jay23jack via dvryaboy)
PIG-2886: Add Scan TimeRange to HBaseStorage (ted.m via dvryaboy)
PIG-2895: jodatime jar missing in pig-withouthadoop.jar (thejas)
PIG-2888: Improve performance of POPartialAgg (dvryaboy)
PIG-2708: split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator (analog.sony via daijy)
PIG-2890: Revert PIG-2578 (dvryaboy)
PIG-2850: Pig should support loading macro files as resources stored in JAR files (matterhayes via dvryaboy)
PIG-1314: Add DateTime Support to Pig (zjshen via thejas)
PIG-2785: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0 (matterhayes via sms)
PIG-2556: CSVExcelStorage load: quoted field with newline as first character sees newline as record end (tivv via dvryaboy)
PIG-2875: Add recursive record support to AvroStorage (cheolsoo via sms)
PIG-2662: skew join does not honor its config parameters (rajesh.balamohan via thejas)
PIG-2871: Refactor signature for PigReducerEstimator (billgraham)
PIG-2851: Add flag to ant to run tests with a debugger port (billgraham)
PIG-2862: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier (jcoveney)
PIG-2855: Provide a method to measure time spent in UDFs (dvryaboy)
PIG-2837: AvroStorage throws StackOverFlowError (cheolsoo via sms)
PIG-2856: AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories. (cheolsoo via sms)
PIG-2569: Fix org.apache.pig.test.TestInvoker.testSpeed (aklochkov via dvryaboy)
PIG-2858: Improve PlanHelper to allow finding any PhysicalOperator in a plan (dvryaboy)
PIG-2854: AvroStorage doesn't work with Avro 1.7.1 (cheolsoo via sms)
PIG-2779: Refactoring the code for setting number of reducers (jay23jack via billgraham)
PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator (prasanth_j via dvryaboy)
PIG-2814: Fix issues with Sample operator documentation (prasanth_j via dvryaboy)
PIG-2817: Documentation for Groovy UDFs (herberts via julien)
PIG-2492: AvroStorage should recognize globs and commas (cheolsoo via sms)
PIG-2706: Add clear to list of grunt commands (xalan via azaroth)
PIG-2823: TestPigContext.testImportList() does not pass if another javac in on the PATH (julien)
PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" (jgordon via azaroth)
PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead (jgordon via azaroth)
PIG-2820: relToAbsolutePath is not replayed properly when Grunt reparses the script after PIG-2699 (julien)
PIG-2763: Groovy UDFs (herberts via julien)
PIG-2780: MapReduceLauncher should break early when one of the jobs throws an exception (jay23jack via daijy)
PIG-2804: Remove "PIG" exec type (dvryaboy)
PIG-2726: Handling legitimate NULL values in Cube operator (prasanth_j via dvryaboy)
PIG-2808: Add *.project to .gitignore (azaroth)
PIG-2787: change the module name in ivy to lowercase to match the maven repo (julien)
PIG-2632: Create a SchemaTuple which generates efficient Tuples via code gen (jcoveney)
PIG-2750: add artifacts to the ivy.xml for other jars Pig generates (julien)
PIG-2748: Change the names of the jar produced in the build folder to match maven conventions (julien)
PIG-2770: Allow easy inclusion of custom build targets (julien)
PIG-2697: pretty print schema via pig.pretty.print.schema (rangadi via jcoveney)
PIG-2673: Allow Merge join to follow an ORDER statement (dvryaboy)
PIG-2699: Reduce the number of instances of Load and Store Funcs down to 2+1 (julien)
PIG-2166: UDFs to join a bag (hluu via daijy)
PIG-2651: Provide a much easier to use accumulator interface (jcoveney via daijy)
PIG-2658: Add pig.script.submitted.timestamp and pig.job.submitted.timestamp in generated Map-Reduce job conf (billgraham)
PIG-2735: Add a pig.version.suffix property in build.xml to easily override with a build number (julien)
PIG-2705: outputSchema modification from scripting UDFs (levyjoshua via julien)
PIG-2724: Make Tuple Iterable (jcoveney)
PIG-2733: Add *.patch, *.log, *.orig, *.rej, *.class to gitignore (jcoveney)
PIG-2732: Let's get rid of the deprecated Tuple methods (jcoveney)
PIG-2638: Optimize BinInterSedes treatment of longs (jcoveney)
PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy)
PIG-2710: Implement Naive CUBE operator (prasanth_j via dvryaboy)
PIG-2714: Pig documentation on TOP funcation has issues (daijy)
PIG-2066: Accumulators should be able to early-terminate (jcoveney)
PIG-2600: Better Map support (prkommireddi via jcoveney)
PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy)
PIG-2702: Make Pig local mode (and tests) faster by working around the hard coded sleep(5000) in hadoop's JobControl (julien)
PIG-2659: add source location of the aliases in the physical plan (julien)
PIG-2547: Easier UDFs: Convenient EvalFunc super-classes (billgraham, dvryaboy)
PIG-2639: Utils.getSchemaFromString should automatically give name to all types, but fails on boolean (jcoveney)
PIG-2696: Enhance Job Stat to print out median map and reduce time (hluu via daijy)
PIG-2583: Add Grunt command to list the statements in cache (xalan via daijy)
PIG-2688: Log the aliases being processed for the current job (ddaniels888 via azaroth)
PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney)
PIG-2685: Fix error in EvalFunc ctor when implementing Algebraic UDF whose return type is parameterized (andy schlaikjer via jcoveney)
PIG-2664: Allow PPNL impls to get more job info during the run (billgraham)
PIG-2663: Expose helpful ScriptState methods (billgraham)
PIG-2660: PPNL notified of plan before it gets executed (billgraham)
PIG-2574: Make reducer estimator plugable (billgraham)
PIG-2677: Add target to build.xml to generate clover summary reports (gates)
PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien)
PIG-2257: AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor (billgraham)
PIG-2587: Compute LogicalPlan signature and store in job conf (billgraham)
PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false
PIG-2604: Pig should print its build info at runtime (traviscrawford via dvryaboy)
PIG-2573: Automagically setting parallelism based on input file size does not work with HCatalog (traviscrawford via julien)
PIG-2538: Add helper wrapper classes for StoreFunc (billgraham via dvryaboy)
PIG-2010: registered jars on distributed cache (traviscrawford and julienledem via dvryaboy)
PIG-2533: Pig MR job exceptions masked on frontend (traviscrawford via dvryaboy)
PIG-2525: Support pluggable PigProcessNotifcationListeners on the command line (dvryaboy)
PIG-2515: [piggybank] Make CustomFormatToISO return null on Exception in parsing dates (rjurney via dvryaboy)
PIG-2503: Make @MonitoredUDF inherited (dvryaboy)
PIG-2488: Move Python unit tests to e2e tests (alangates via daijy)
PIG-2456: Pig should have a pigrc to specify default script cache (prkommireddi via daijy)
PIG-2496: Cache resolved classes in PigContext (dvryaboy)
PIG-2482: Integrate HCat DDL command into Pig (daijy)
PIG-2479: changingPattern should be used with checkmodified in ivysettings.xml (abayer via azaroth)
PIG-2349: Ant build repeats ivy-buildJar several times (azaroth)
PIG-2359: Support more efficient Tuples when schemas are known (dvryaboy)
PIG-2282: Automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy (azaroth via daijy)
PIG-2468: Speed up TestBuiltin (dvryaboy)
PIG-2467: Speed up TestCommit (dvryaboy)
PIG-2460: Use guava 11 instead of r06 (dvryaboy)
PIG-2267: Make the name of the columns in schema optional (jcoveney via daijy)
PIG-2453: Fetching schema can be very slow for multi-thousand file LOADs (dvryaboy)
PIG-2443: [Piggybank] Add UDFs to check if a String is an Integer And if a String is Numeric (prkommireddi via daijy)
PIG-2437: Use Ivy to get automaton.jar (azaroth)
PIG-2448: Convert more tests to use LOCAL mode (dvryaboy)
PIG-2438: Do not hardcode commons-lang version in build.xml (azaroth)
PIG-2422: Add log messages for Jython schema definitions (vivekp via gates)
PIG-2403: Reduce code duplication in SUM, MAX, MIN udfs (dvryaboy)
PIG-2245: Add end to end test for tokenize (markroddy via gates)
PIG-2327: bin/pig doesn't have any hooks for picking up ZK installation deployed from tarballs (rvs via hashutosh)
PIG-2382: Modify .gitignore to ignore pig-withouthadoop.jar (azaroth via hashutosh)
PIG-2380: Expose version information more cleanly (jcoveney via azaroth)
PIG-2311: STRSPLIT needs to allow bytearray arguments (xuting via olgan)
PIG-2365: Current TOP implementation needlessly results in a null bag name (jcoveney via dvryaboy)
PIG-2151: Add annotation to specify output schema in Java UDFs (dvryaboy)
PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan)
PIG-2328: Add builtin UDFs for building and using bloom filters (gates)
PIG-2338: Need signature for EvalFunc (daijy)
PIG-2337: Provide UDF with input schema (xutingz via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-3147: Spill failing with "java.lang.RuntimeException: InternalCachedBag.spill() should not be called" (knoguchi via dvryaboy)
PIG-3109: Missing license headers (jarcec via cheolsoo)
PIG-3022: TestRegisteredJarVisibility.testRegisteredJarVisibility fails with hadoop-2.0.x (rohini via cheolsoo)
PIG-3125: Fix zebra compilation error (cheolsoo)
PIG-3051: java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning (knoguchi via rohini)
PIG-3076: make TestScalarAliases more reliable (julien)
PIG-3020: "Duplicate uid in schema" error when joining two relations derived from the same load statement (jcoveney)
PIG-3044: hotfix to remove divide by 0 error (jcoveney)
PIG-3033: test-patch failed with javadoc warnings (fang fang chen via cheolsoo)
PIG-3058: Upgrade junit to at least 4.8 (fang fang chen via cheolsoo)
PIG-2978: TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x (cheolsoo)
PIG-3039: Not possible to use custom version of jackson jars (rohini)
PIG-3045: Specifying sorting field(s) at nightly.conf - fix sortArgs (rohini via cheolsoo)
PIG-2979: Pig.jar doesn't work with hadoop-2.0.x (cheolsoo)
PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini)
PIG-2405: some unit test case failed with open JDK (fang fang chen via cheolsoo)
PIG-3018: Refactor TestScriptLanguage to remove duplication and write script in different files (julien)
PIG-2973: TestStreaming test times out (cheolsoo)
PIG-3001: TestExecutableManager.testAddJobConfToEnv fails randomly (cheolsoo)
PIG-3017: Pig's object serialization should use compression (jcoveney)
PIG-2968: ColumnMapKeyPrune fails to prune a subtree inside foreach (knoguchi via cheolsoo)
PIG-2999: Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing (knoguchi via azaroth)
PIG-2998: Fix TestScriptLanguage and TestMacroExpansion (cheolsoo via jcoveney)
PIG-2975: TestTypedMap.testOrderBy failing with incorrect result (knoguchi via jcoveney)
PIG-2950: Fix tiny documentation error in BagToString builtin (initialcontext via daijy)
PIG-2967: Fix Glob_local test failure for Pig E2E Test Framework (sushantj via daijy)
PIG-1283: COUNT on null bag causes failure (analog.sony via jcoveney)
PIG-2958: Pig tests do not appear to have a logger attached (daijyc via jcoveney)
PIG-2926: TestPoissonSampleLoader failing on rhel environment (jcoveney)
PIG-2985: TestRank1,2,3 fail with hadoop-2.0.x (rohini via azaroth)
PIG-2971: Add new parameter to specify the streaming environment (jcoveney)
PIG-2963: Illustrate command and POPackageLite (cheolsoo via jcoveney)
PIG-2961: BinInterSedesRawComparator broken by TUPLE_number patch (jcoveney)
PIG-2932: Setting high default_parallel causes IOException in local mode (cheolsoo via gates)
PIG-2737: [piggybank] TestIndexedStorage is failing, should be refactored (jcoveney)
PIG-2935: Catch NoSuchMethodError when StoreFuncInterface's new cleanupOnSuccess method isn't implemented. (gates via dvryaboy)
PIG-2920: e2e tests override PERL5LIB environment variable (azaroth)
PIG-2917: SpillableMemoryManager memory leak for WeakReference (haitao.yao via dvryaboy)
PIG-2938: All unit tests that use MR2 MiniCluster are broken in trunk (cheolsoo via dvryaboy)
PIG-2936: Tuple serialization bug (jcoveney)
PIG-2930: ant test doesn't compile in trunk (cheolsoo via daijy)
PIG-2791: Pig does not work with ViewFileSystem (rohini via daijy)
PIG-2833: org.apache.pig.pigunit.pig.PigServer does not initialize set default log level of pigContext (cheolsoo via jcoveney)
PIG-2744: Handle Pig command line with XML special characters (lulynn_2008 via daijy)
PIG-2637: Command-line option -e throws TokenMgrError exception (lulynn_2008 via daijy)
PIG-2887: Macro cannot handle negative number (knoguchi via gates)
PIG-2844: ant makepom is misconfigured (julien)
PIG-2896: Pig does not fail anymore if two macros are declared with the same name (julien)
PIG-2848: TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output (julien)
PIG-2884: JobControlCompiler mis-logs after reducer estimation (billgraham)
PIG-2876: Bump up Xerces version (jcoveney)
PIG-2866: PigServer fails with macros without a script file (billgraham)
PIG-2860: [piggybank] TestAvroStorageUtils.testGetConcretePathFromGlob fails on some version of hadoop (cheolsoo via jcoveney)
PIG-2861: PlanHelper imports org.python.google.common.collect.Lists instead of org.google.common.collect.Lists (jcoveney)
PIG-2849: Errors in document Getting Started (miyakawataku via billgraham)
PIG-2843: Typo in Documentation (eric59 via billgraham)
PIG-2841: Inconsistent URL in Docs (eric59 via billgraham)
PIG-2740: get rid of "java[77427:1a03] Unable to load realm info from SCDynamicStore" log lines when running pig tests (julien)
PIG-2839: mock.Storage overwrites output with the last relation written when storing UNION (julien)
PIG-2840: Fix SchemaTuple bugs (jcoveney)
PIG-2842: TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file (julien)
PIG-2827: Unwrap exception swallowing in TOP (haitao.yao via jcoveney)
PIG-2825: StoreFunc signature setting in LogicalPlan broken (jcoveney)
PIG-2815: class loader management in PigContext (rangadi via jcoveney)
PIG-2813: Fix test regressions from PIG-2632 (jcoveney)
PIG-2806: Fix merge join test regression from PIG-2632 (jcoveney)
PIG-2809: TestUDFContext broken by PIG-2699 (julienledem via daijy)
PIG-2807: TestParser TestPigStorage TestNewPlanOperatorPlan broken by PIG-2699 (julienledem via daijy)
PIG-2782: Specifying sorting field(s) at nightly.conf (cheolsoo via daijy)
PIG-2790: After Pig-2699 the script schema (LOAD ... USING ... AS {script schema}) is passed after getSchema is called (daijy)
PIG-2777: Docs are broken due to malformed xml after PIG-2673 (dvryaboy)
PIG-2593: Filter by a boolean value does not work (jay23jack via daijy)
PIG-2665: Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts with embedded Pig Latin (daijy)
PIG-2736: Support implicit cast from bytearray to boolean (jay23jack via daijy)
PIG-2508: PIG can unpredictably ignore deprecated Hadoop config options (thw via dvryaboy)
PIG-2691: Duplicate TOKENIZE schema (jay23jack via azaroth)
PIG-2173: piggybank datetime conversion javadocs not properly formatted (hluu via daijy)
PIG-2709: PigAvroRecordReader should specify which file has a problem when throwing IOException (mpercy via daijy)
PIG-2640: Usage message gives wrong information for Pig additional jars (prkommireddi via daijy)
PIG-2652: Skew join and order by don't trigger reducer estimation (dvryaboy)
PIG-2616: JobControlCompiler.getInputSizeFromLoader must handle exceptions from LoadFunc.getStatistics (billgraham)
PIG-2644: Piggybank's HadoopJobHistoryLoader throws NPE when reading broken history file (herberts via daijy)
PIG-2627: Custom partitioner not set when POSplit is involved in Plan (aniket486 via daijy)
PIG-2596: Jython UDF does not handle boolean output (aniket486 via daijy)
PIG-2649: org.apache.pig.parser.ParserValidationException does not expose the cause exception
PIG-2540: [piggybank] AvroStorage can't read schema on amazon s3 in elastic mapreduce (rjurney via jcoveney)
PIG-2618: e2e local fails to build
PIG-2608: Typo in PigStorage documentation for source tagging (prkommireddi via daijy)
PIG-2590: running ant tar and rpm targets on same copy of pig source results in problems (thejas)
PIG-2581: HashFNV inconsistent/non-deterministic due to default platform encoding (prkommireddi via daijy)
PIG-2514: REGEX_EXTRACT not returning correct group with non greedy regex (romainr via daijy)
PIG-2532: Registered classes fail deserialization in frontend (traviscrawford via julien)
PIG-2549: org.apache.pig.piggybank.storage.avro - Broken documentation link for AvroStorage (chrisas via daijy)
PIG-2322: varargs functions do not get passed the arguments in Python embedding (julien)
PIG-2491: Pig docs still mention hadoop-site.xml (daijy)
PIG-2504: Incorrect sample provided for REGEX_EXTRACT (prkommireddi via daijy)
PIG-2502: Make "hcat.bin" configurable in e2e test (daijy)
PIG-2501: Changes needed to contrib/piggybank/java/build.xml in order to build piggybank.jar with Hadoop 0.23
(ekoontz via daijy)
PIG-2499: Pig TestGrunt.testShellCommand occasionally fails (tomwhite via daijy)
PIG-2326: Pig minicluster tests can not be run from eclipse (julienledem via daijy)
PIG-2432: Eclipse .classpath file is out of date (gates)
PIG-2427: getSchemaFromString throws away the name of the tuple that is in a bag (jcoveney via dvryaboy)
PIG-2425: Aggregate Warning does not work as expected on Embedding Pig in Java 0.9.1 (prkommireddi via thejas)
PIG-2384: Generic Invokers should use PigContext to resolve classes (dvryaboy)
PIG-2379: Bug in Schema.getPigSchema(ResourceSchema rSchema) improperly adds two level access (jcoveney via dvryaboy)
PIG-2355: ant clean does not clean e2e test build artifacts (daijy)
PIG-2352: e2e test harness' use of environment variables causes unintended effects between tests (gates)
Release 0.10.1
BUG FIXES
PIG-3107: bin and autocomplete are missing in src release (daijy)
PIG-3106: Missing license header in several java file (daijy)
PIG-3099: Pig unit test fixes for TestGrunt(1), TestStore(2), TestEmptyInputDir(3) (vikram.dixit via daijy)
PIG-2953: "which" utility does not exist on Windows (daijy)
PIG-2960: Increase the timeout for unit test (daijy)
PIG-2942: DevTests, TestLoad has a false failure on Windows (jgordon via daijy)
PIG-2801: grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens
(jgordon via daijy)
PIG-2798: pig streaming tests assume interpreters are auto-resolved (jgordon via daijy)
PIG-2795: Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path (jgordon via daijy)
PIG-2796: Local temporary paths are not always valid HDFS path names (jgordon via daijy)
Release 0.10.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-2664: Allow PPNL impls to get more job info during the run (billgraham)
PIG-2663: Expose helpful ScriptState methods (billgraham)
PIG-2660: PPNL notified of plan before it gets executed (billgraham)
PIG-2574: Make reducer estimator plugable (billgraham)
PIG-2601: Additional document for 0.10 (daijy)
PIG-2317: Ruby/Jruby UDFs (jcoveney via daijy)
PIG-1270: Push limit into loader (daijy)
PIG-2589: Additional e2e test for 0.10 new features (daijy)
PIG-2182: Add more append support to DataByteArray (gsingers via daijy)
PIG-438: Handle realiasing of existing Alias (A=B;) (daijy)
PIG-2548: Support for providing parameters to python script (daijy)
PIG-2518: Add ability to clean ivy cache in build.xml (daijy)
PIG-2300: Pig Docs - release 0.10.0 (and 0.9.1) (chandec via daijy)
PIG-2332: JsonLoader/JsonStorage (daijy)
PIG-2334: Set default number of reducers for S3N filesystem (ddaniels888 via daijy)
PIG-1387: Syntactical Sugar for PIG-1385 (azaroth)
PIG-2305: Pig should log the split locations in task logs (vivekp via thejas)
PIG-2293: Pig should support a more efficient merge join against data sources that natively support point
lookups or where the join is against large, sparse tables (aklish via daijy)
PIG-2287: add test cases for limit and sample that use expressions with
constants only (no scalar variables) (thejas via gates)
PIG-2092: Missing sh command from Grant shell (olgan)
PIG-2163: Improve nested cross to stream one relation (zjshen via daijy)
PIG-2249: Enable pig e2e testing on EC2 (gates)
PIG-2256: Upgrade Avro dependency to 1.5.3 (tucu00 via dvryaboy)
PIG-604: Kill the Pig job should kill all associated Hadoop Jobs (daijy)
PIG-2096: End to end tests for new Macro feature (gates)
PIG-2242: Allow the delimiter to be specified when calling TOKENIZE (markroddy via hashutosh)
PIG-2240: Allow any compression codec to be specified in AvroStorage (tomwhite via dvryaboy)
PIG-2229: Pig end-to-end tests should test local mode as well as mr mode (gates)
PIG-2235: Several files in e2e tests aren't being run (gates)
PIG-2196: Test harness should be independent of Pig (hashutosh) -- Missed few
changes in last commit.
PIG-2196: Test harness should be independent of Pig (hashutosh)
PIG-1429: Add Boolean Data Type to Pig (zjshen via daijy)
PIG-2218: Pig end-to-end tests should be accessible from top level build.xml (gates)
PIG-2176: add logical plan assumption checker (thejas)
PIG-1631: Support to 2 level nested foreach (aniket486 via daijy)
PIG-2191: Reduce amount of log spam generated by UDFs (dvryaboy)
PIG-2200: Piggybank cannot be built from the Git mirror (dvryaboy)
PIG-2168: CubeDimensions UDF (dvryaboy)
PIG-2189: e2e test harness needs to use Pig as a source of truth (gates via daijy)
PIG-1904: Default split destination (azaroth via thejas)
PIG-2143: Make PigStorage optionally store schema; improve docs. (dvryaboy)
PIG-1973: UDFContext.getUDFContext usage of ThreadLocal pattern
is not typical (woody via thejas)
PIG-2053: PigInputFormat uses class.isAssignableFrom() where
instanceof is more appropriate (woody via thejas)
PIG-2161: TOTUPLE should use no-copy tuple creation (dvryaboy)
PIG-1946: HBaseStorage constructor syntax is error prone (billgraham via dvryaboy)
PIG-2001: DefaultTuple(List) constructor is inefficient, causes List.size()
System.arraycopy() calls (though they are 0 byte copies),
DefaultTuple(int) constructor is a bit misleading wrt time
complexity (woody via thejas)
PIG-1916: Nested cross (zjshen via daijy)
PIG-2121: e2e test harness should use ant instead of make (gates)
PIG-2142: Allow registering multiple jars from DFS via single statement (rangadi via dvryaboy)
PIG-1926: Sample/Limit should take scalar (azaroth via thejas)
PIG-1950: e2e test harness needs to be able to compare to previous version of
Pig (gates)
PIG-536: the shell script 'pig' does not work if PIG_HOME has the word 'hadoop' in it's directory (miguno via olgan)
PIG-2108 e2e test harness needs to be able to mark certain tests as ignored
(gates)
PIG-1825: ability to turn off the write ahead log for pig's HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1824: Support import modules in Jython UDF (woody via rding)
PIG-1994: e2e test harness deployment implementation for existing cluster
(gates)
PIG-2036: [piggybank] Set header delimiter in PigStorageSchema (mmoeller via dvryaboy)
PIG-1949: e2e test harness should use bin/pig rather than calling java
directly (gates)
PIG-2026: e2e tests in eclipse classpath (azaroth via hashutosh)
PIG-2024: Incorrect jar paths in .classpath template for eclipse (azaroth via hashutosh)
OPTIMIZATIONS
PIG-2011: Speed up TestTypedMap.java (dvryaboy)
PIG-2228: support partial aggregation in map task (thejas)
BUG FIXES
PIG-2940: HBaseStorage store fails in secure cluster (cheolsoo via daijy)
PIG-2821: HBaseStorage should work with secure hbase (rohini via daijy)
PIG-2859: Fix few e2e test failures (rohini via daijy)
PIG-2729: Macro expansion does not use pig.import.search.path - UnitTest borked (johannesch via daijy)
PIG-2783: Fix Iterator_1 e2e test for Hadoop 23 (rohini via daijy)
PIG-2761: With hadoop23 importing modules inside python script does not work (rohini via daijy)
PIG-2759: Typo in document "Built In Functions" (daijy)
PIG-2745: Pig e2e test RubyUDFs fails in MR mode when running from tarball (cheolsoo via daijy)
PIG-2741: Python script throws an NameError: name 'Configuration' is not defined in case cache dir is not created
(knoguchi via daijy)
PIG-2669: Pig release should include pig-default.properties after rebuild (daijy)
PIG-2739: PyList should map to Bag automatically in Jython (daijy)
PIG-2730: TFileStorage getStatistics incorrectly throws an exception instead of returning null (traviscrawford via daijy)
PIG-2717: Tuple field mangled during flattening (daijy)
PIG-2721: Wrong output generated while loading bags as input (knoguchi via daijy)
PIG-2578: Multiple Store-commands mess up mapred.output.dir. (daijy)
PIG-2623: Support S3 paths for registering UDFs (nshkrob via daijy)
PIG-2505: AvroStorage won't read any file not ending in .avro (russell.jurney via daijy)
PIG-2585: Enable ignored e2e test cases (daijy)
PIG-2563: IndexOutOfBoundsException: while projecting fields from a bag (daijy)
PIG-2411: AvroStorage UDF in PiggyBank fails to STORE a bag of single-field tuples as Avro arrays (russell.jurney via daijy)
PIG-2565: Support IMPORT for macros stored in S3 Buckets (daijy)
PIG-2570: LimitOptimizer fails with dynamic LIMIT argument (daijy)
PIG-2543: PigStats.isSuccessful returns false if embedded pig script has sh commands (daijy)
PIG-2509: Util.getSchemaFromString fails with java.lang.NullPointerException when a tuple in a bag has no name (as when used in MongoStorage UDF) (jcoveney via daijy)
PIG-2559: Embedded pig in python; invoking sys.exit(0) causes script failure (vivekp via daijy)
PIG-2530: Reusing alias name in nested foreach causes incorrect results (daijy)
PIG-2489: Input Path Globbing{} not working with PigStorageSchema or PigStorage('\t', '-schema') (daijy)
PIG-2484: Fix several e2e test failures/aborts for 23 (daijy)
PIG-2400: Document has based aggregation support (chandec via daijy)
PIG-2444: Remove the Zebra *.xml documentation files from the TRUNK and Branch-10 (chandec via daijy)
PIG-2430: An EvalFunc which overrides getArgToFuncMapping with FuncSpec
with constructor arguments is not properly instantiated with said arguments (jcoveney via thejas)
PIG-2457: JsonLoaderStorage tests is broken for e2e (daijy)
PIG-2426: ProgressableReporter.progress(String msg) is an empty function (vivekp via daijy)
PIG-2363: _logs for streaming commands bug in new parser (vivekp via daijy)
PIG-2331: BinStorage in LOAD statement failing when input has curly braces (xutingz via thejas)
PIG-2391: Bzip_2 test is broken (xutingz via daijy)
PIG-2358: JobStats.getHadoopCounters() is never set and always returns null (xutingz via daijy)
PIG-2184: Not able to provide positional reference to macro invocations (xutingz via daijy)
PIG-2209: JsonMetadata fails to find schema for glob paths (daijy)
PIG-2165: Need a way to deal with params and param_file in embedded pig in python (daijy)
PIG-2313: NPE in ILLUSTRATE trying to get StatusReporter in STORE (daijy)
PIG-2335: bin/pig does not work with bash 3.0 (azaroth)
PIG-2275: NullPointerException from ILLUSTRATE (daijy)
PIG-2119: DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan (daijy)
PIG-2290: TOBAG wraps tuple parameters in another tuple (ryan.hoegg via thejas)
PIG-2288: Pig 0.9 error message not useful as compared to 0.8 in case
of group by (vivekp via thejas)
PIG-2309: Keyword 'NOT' is wrongly treated as a UDF in split statement (vivekp via thejas)
PIG-2307: Jetty version should be updated in .eclipse.templates/.classpath,
pig-template.xml and pig.pom as well (zjshen via daijy)
PIG-2273: Pig.compileFromFile in embedded python fails when pig script starts with a comment (ddaniels888 via gates)
PIG-2278: Wrong version numbers for libraries in eclipse template classpath (azaroth)
PIG-2115: Fix Pig HBaseStorage configuration and setup issues (gbowyer@fastmail.co.uk via dvryaboy)
PIG-2232: "declare" document contains a typo (daijy)
PIG-2055: inconsistent behavior in parser generated during build (thejas)
PIG-2185: NullPointerException while Accessing Empty Bag in FOREACH { FILTER } (daijy)
PIG-2227: Wrong jars copied into lib directory in e2e tests when invoked from top level (gates)
PIG-2219: Pig tests fail if ${user.home}/pigtest/conf does not already exist (cwsteinbach via gates)
PIG-2215: Newlines in function arguments still cause exceptions to be thrown (awarring via gates)
PIG-2214: InternalSortedBag two-arg constructor doesn't pass bagCount (sallen via gates)
PIG-2174: HBaseStorage column filters miss some fields (billgraham via dvryaboy)
PIG-2090: re-enable TestGrunt test cases (thejas)
PIG-2181: Improvement : for error message when describe misses alias (vivekp via daijy)
PIG-2124: Script never ending when joining from the same source (daijy)
PIG-2170: NPE thrown during illustrate (thejas)
PIG-2186: PigStorage new warnings about missing schema file
can be confusing (thejas)
PIG-2179: tests in TestLoad are failing (thejas)
PIG-2146: POStore.getSchema() returns null because of which PigOutputCommitter
is not storing schema while cleanup (thejas)
PIG-2027: NPE if Pig don't have permission for log file (daijy)
PIG-2171: TestScriptLanguage is broken on trunk (daijy and thejas)
PIG-2172: Fix test failure for ant 1.8.x (daijy)
PIG-2162: bin/pig should not modify user args (rangadi via thejas)
PIG-2060: Fix errors in pig grammars reported by ANTLRWorks (azaroth via thejas)
PIG-2156: Limit/Sample with variable does not work if the expression starts
with an integer/double (azaroth via thejas)
PIG-2130: Piggybank:MultiStorage is not compressing output files (vivekp via daijy)
PIG-2147: Support nested tags for XMLLoader (vivekp via daijy)
PIG-1890: Fix piggybank unit test TestAvroStorage (kengoodhope via daijy)
PIG-2110: NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor (dale_jin via daijy)
PIG-2144: ClassCastException when using IsEmpty(DIFF()) (thejas)
PIG-2139: LogicalExpressionSimplifier optimizer rule should check if udf is
deterministic while checking if they are equal (thejas)
PIG-2137: SAMPLE should not be pushed above DISTINCT (dvryaboy and thejas)
PIG-2136: Implementation of Sample should use LessThanExpression
instead of LessThanEqualExpression (azaroth via thejas)
PIG-2140: Usage printed from Main.java gives wrong option for disabling
LogicalExpressionSimplifier (thejas)
PIG-2120: UDFContext.getClientSystemProps() does not respect pig.properties (dvryaboy)
PIG-2129: NOTICE file needs updates (gates)
PIG-2131: Add back test for PIG-1769 (qwertymaniac via gates)
PIG-2112: ResourceSchema.toString does not properly handle maps in the schema (gates)
PIG-1702: Streaming debug output outputs null input-split information (awarring via daijy)
PIG-2109: Ant build continues even if the parser classes fail to be generated. (zjshen via daijy)
PIG-2071: casting numeric type to chararray during schema merge for union
is inconsistent with other schema merge cases (thejas)
PIG-2044: Patten match bug in org.apache.pig.newplan.optimizer.Rule (knoguchi via daijy)
PIG-2048: Add zookeeper to pig jar (gbowyer via gates)
PIG-2008: Cache outputFormat in HBaseStorage (thedatachef via gates)
PIG-2025: org.apache.pig.test.udf.evalfunc.TOMAP is missing package
declaration (azaroth via gates)
PIG-2019: smoketest-jar target has to depend on pigunit-jar to guarantee
inclusion of test classes (cos via gates)
Release 0.9.3 - Unreleased
BUG FIXES
PIG-2944: ivysettings.xml does not let you override .m2/repository (raluri via daijy)
PIG-2912: Pig should clone JobConf while creating JobContextImpl and TaskAttemptContextImpl in Hadoop23 (rohini via daijy)
PIG-2775: Register jar does not goes to classpath in some cases (daijy)
PIG-2693: LoadFunc.setLocation should be called before LoadMetadata.getStatistics (billgraham via julien)
PIG-2666: LoadFunc.setLocation() is not called when pig script only has Order By (daijy)
PIG-2671: e2e harness: Reference local test path via :LOCALTESTPATH: (thw via daijy)
PIG-2642: StoreMetadata.storeSchema can't access files in the output directory (Hadoop 0.23) (thw via daijy)
PIG-2621: Documentation inaccurate regarding Pig Properties in trunk (prkommireddi via daijy)
PIG-2550: Custom tuple results in "Unexpected datatype 110 while reading tuplefrom binary file" while spilling (daijy)
PIG-2442: Multiple Stores in pig streaming causes infinite waiting (daijy)
PIG-2609: e2e harness: make hdfs base path configurable (outside default.conf) (thw via daijy)
PIG-2576: Change in behavior for UDFContext.getUDFContext().getJobConf() in front-end (thw via daijy)
PIG-2588: e2e harness: use pig command for cluster deploy (thw via daijy)
PIG-2572: e2e harness deploy fails when using pig that does not bundle hadoop (thw via daijy)
PIG-2568: PigOutputCommitter hide exception in commitJob (daijy)
PIG-2564: Build fails - Hadoop 0.23.1-SNAPSHOT no longer available (thw via daijy)
PIG-2535: Bug in new logical plan results in no output for join (daijy)
PIG-2534: Pig generating infinite map outputs (daijy)
PIG-2493: UNION causes casting issues (vivekp via daijy)
PIG-2497: Order of execution of fs, store and sh commands in Pig is not maintained (daijy)
Release 0.9.2
IMPROVEMENTS
PIG-2766: Pig-HCat Usability (vikram.dixit via daijy)
PIG-2125: Make Pig work with hadoop .NEXT (daijy)
PIG-2471: Pig Requirements Hadoop (chandec via daijy)
PIG-2431: Upgrade bundled hadoop version to 1.0.0 (daijy)
PIG-2447: piggybank: get hive dependency from maven (thw via azaroth)
PIG-2347: Fix Pig Unit tests for hadoop 23 (daijy)
PIG-2128: Generating the jar file takes a lot of time and is unnecessary when running Pig local mode (julien)
BUG FIXES
PIG-2477: TestBuiltin testLFText/testSFPig failing against 23 due to invalid test setup -- InvalidInputException (phunt via daijy)
PIG-2462: getWrappedSplit is incorrectly returning the first split instead of the current split. (arov via daijy)
PIG-2472: piggybank unit tests write directly to /tmp (thw via daijy)
PIG-2413: e2e test should support testing against two cluster (daijy)
PIG-2342: Pig tutorial documentation needs to update about building tutorial (daijy)
PIG-2458: Can't have spaces in parameter substitution (jcoveney via daijy)
PIG-2410: Piggybank does not compile in 23 (daijy)
PIG-2418: rpm release package does not take PIG_CLASSPATH (daijy)
PIG-2291: PigStats.isSuccessful returns false if embedded pig script has dump (xutingz via daijy)
PIG-2415: A fix for 0.23 local mode: put "yarn-default.xml" into the configuration (daijy)
PIG-2402: inIllustrator condition in PigMapReduce is wrong for hadoop 23 (daijy)
PIG-2370: SkewedParitioner results in Kerberos error (daijy)
PIG-2374: streaming regression with dotNext (daijy)
PIG-2387: BinStorageRecordReader causes negative progress (xutingz via daijy)
PIG-2354: Several fixes for bin/pig (daijy)
PIG-2385: Store statements not getting processed (daijy)
PIG-2320: Error: "projection with nothing to reference" (daijy)
PIG-2346: TypeCastInsert should not insert Foreach if there is no as statement (daijy)
PIG-2339: HCatLoader loads all the partitions in a partitioned table even though
a filter clause on the partitions is specified in the Pig script (daijy)
PIG-2316: Incorrect results for FILTER *** BY ( *** OR ***) with
FilterLogicExpressionSimplifier optimizer turned on (knoguchi via thejas)
PIG-2271: PIG regression in BinStorage/PigStorage in 0.9.1 (thejas)
Release 0.9.1
IMPROVEMENTS
PIG-2284: Add pig-setup-conf.sh script (eyang via daijy)
PIG-2272: e2e test harness should be able to set HADOOP_HOME (gates via daijy)
PIG-2239: Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself (daijy)
PIG-2213: Pig 0.9.1 Documentation (chandec via daijy)
PIG-2221: Couldnt find documentation for ColumnMapKeyPrune optimization rule (chandec via daijy)
BUG FIXES
PIG-2310: bin/pig fail when both pig-0.9.1.jar and pig.jar are in PIG_HOME (daijy)
PIG-1857: Create an package integration project (eyang via daijy)
PIG-2013: Penny gets a null pointer when no properties are set (breed via daijy)
PIG-2102: MonitoredUDF does not work (dvryaboy)
PIG-2152: Null pointer exception while reporting progress (thejas)
PIG-2183: Pig not working with Hadoop 0.20.203.0 (daijy)
PIG-2193: Using HBaseStorage to scan 2 tables in the same Map job produces bad data (rangadi via dvryaboy)
PIG-2199: Penny throws Exception when netty classes are missing (ddaniels888 via daijy)
PIG-2223: error accessing column in output schema of udf having project-star input (thejas)
PIG-2208: Restrict number of PIG generated Haddop counters (rding via daijy)
PIG-2299: jetty 6.1.14 startup issue causes unit tests to fail in CI (thw via daijy)
PIG-2301: Some more bin/pig, build.xml cleanup for 0.9.1 (daijy)
PIG-2237: LIMIT generates wrong number of records if pig determines no of reducers as more than 1 (daijy)
PIG-2261: Restore support for parenthesis in Pig 0.9 (rding via daijy)
PIG-2238: Pig 0.9 error message not useful as compared to 0.8 (daijy)
PIG-2286: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization (daijy)
PIG-2270: Put jython.jar in classpath (daijy)
PIG-2274: remove pig deb package dependency on sun-java6-jre (gkesavan via daijy)
PIG-2264: Change conf/log4j.properties to conf/log4j.properties.template (daijy)
PIG-2231: Limit produce wrong number of records after foreach flatten (daijy)
Release 0.9.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1622: DEFINE streaming options are ill defined and not properly documented (xuefu)
PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy)
PIG-1745: Disable converting bytes loading from BinStorage (daijy)
PIG-1188: Padding nulls to the input tuple according to input schema (daijy)
PIG-1876: Typed map for Pig (daijy)
IMPROVEMENTS
PIG-1938: support project-range as udf argument (thejas)
PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu)
PIG-2062: Script silently ended (xuefu)
PIG-2039: IndexOutOfBounException for a case (xuefu)
PIG-2038: Pig fails to parse empty tuple/map/bag constant (xuefu)
PIG-1775: Removal of old logical plan (xuefu)
PIG-1998: Allow macro to return void (rding)
PIG-2003: Using keyward as alias doesn't either emit an error or produce a logical plan (xuefu)
PIG-1981: LoadPushDown.pushProjection should pass alias in addition to position (daijy)
PIG-2006: Regression: NPE when Pig processes an empty script file, fix test case (xuefu)
PIG-2006: Regression: NPE when Pig processes an empty script file (xuefu)
PIG-2007: Parsing error when map key referred directly from udf in nested foreach (xuefu)
PIG-2000: Pig gives incorrect error message dealing with scalar projection (xuefu)
PIG-2002: Regression: Pig gives error "Projection with nothing to reference!" for a valid query (xuefu)
PIG-1921: Improve error messages in new parser (xuefu)
PIG-1996: Pig new parser fails to recognize PARALLEL keywords in a case (xuefu)
PIG-1612: error reporting: PigException needs to have a way to indicate that
its message is appropriate for user (laukik via thejas)
PIG-1782: Add ability to load data by column family in HBaseStorage (billgraham via dvryaboy)
PIG-1772: Pig 090 Documentation (chandec via olgan)
PIG-1954: Design deployment interface for e2e test harness (gates)
PIG-1881: Need a special interface for Penny (Inspector Gadget) (laukik via
gates)
PIG-1947: Incorrect line number is reported during parsing(xuefu)
PIG1918: Line number should be give for logical plan failures (xuefu)
PIG-1961: Pig prints "null" as file name in case of grammar error (xuefu)
PIG-1956: Pig parser shouldn't log error code 0 (xuefu)
PIG-1957: Pig parser gives misleading error message when the next foreach block has syntactic errors (xuefu)
PIG-1958: Regression: Pig doesn't log type cast warning messages (xuefu)
PIG-1918: Line number should be give for logical plan failures (xuefu)
PIG-1899: Add end to end test harness for Pig (gates)
PIG-1932: GFCross should allow the user to set the DEFAULT_PARALLELISM value (gates)
PIG-1913: Use a file for excluding tests (tomwhite via gates)
PIG-1693: support project-range expression. (was: There
needs to be a way in foreach to indicate "and all the
rest of the fields" ) (thejas)
PIG-1772: Pig 090 Documentation (chandec via daijy)
PIG-1830: Type mismatch error in key from map, when doing GROUP on PigStorageSchema() variable (dvryaboy)
PIG-1566: Support globbing for registering jars in pig script (nrai via daijy)
PIG-1886: Add zookeeper jar to list of jars shipped when HBaseStorage used (dvryaboy)
PIG-1874: Make PigServer work in a multithreading environment (rding)
PIG-1889: bin/pig should pick up HBase configuration from HBASE_CONF_DIR
PIG-1794: Javascript support for Pig embedding and UDFs in scripting languages (julien)
PIG-1853: Using ANTLR jars from maven repository (rding)
PIG-1728: more doc updates (chandec via olgan)
PIG-1793: Add macro expansion to Pig Latin (rding)
PIG-847: Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag (daijy)
PIG-1748: Add load/store function AvroStorage for avro data (guolin2001, jghoman via daijy)
PIG-1769: Consistency for HBaseStorage (dvryaboy)
PIG-1786: Move describe/nested describe to new logical plan (daijy)
PIG-1809: addition of TOMAP function (olgan)
PIG-1749: Update Pig parser so that function arguments can contain newline characters (jghoman via daijy)
PIG-1806: Modify embedded Pig API for usability (rding)
PIG-1799: Provide deployable maven artifacts for pigunit and pig smoke tests
(cos via gates)
PIG-1728: turing complete docs (chandec via olgan)
PIG-1675: allow PigServer to register pig script from InputStream (zjffdu via dvryaboy)
PIG-1479: Embed Pig in scripting languages (rding)
PIG-946: Combiner optimizer does not optimize when limit follow group, foreach (thejas)
PIG-1277: Pig should give error message when cogroup on tuple keys of different inner type (daijy)
PIG-1755: Clean up duplicated code in PhysicalOperators (dvryaboy)
PIG-750: Use combiner when algebraic UDFs are used in expressions (thejas)
PIG-490: Combiner not used when group elements referred to in
tuple notation instead of flatten. (thejas)
PIG-1768: 09 docs: illustrate (changec via olgan)
PIG-1768: docs reorg (changec via olgan)
PIG-1712: ILLUSTRATE rework (yanz)
PIG-1758: Deep cast of complex type (daijy)
PIG-1728: doc updates (chandec via olgan)
PIG-1752: Enable UDFs to indicate files to load into the Distributed Cache
(gates)
PIG-1747: pattern match classes for matching patterns in physical plan (thejas)
PIG-1707: Allow pig build to pull from alternate maven repo to enable building
against newer hadoop versions (pradeepkth)
PIG-1618: Switch to new parser generator technology (xuefuz via thejas)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1508: Make 'docs' target (forrest) work with Java 1.6 (cwsteinbach via gates)
PIG-1608: pig should always include pig-default.properties and pig.properties in the pig.jar (nrai via daijy)
OPTIMIZATIONS
PIG-1696: Performance: Use System.arraycopy() instead of manually copying the bytes while reading the data (hashutosh)
BUG FIXES
PIG-2159: New logical plan uses incorrect class for SUM causing for ClassCastException (daijy)
PIG-2106: Fix Zebra unit test TestBasicUnion.testNeg3, TestBasicUnion.testNeg4 (daijy)
PIG-2083: bincond ERROR 1025: Invalid field projection when null is used (thejas)
PIG-2089: Javadoc for ResourceFieldSchema.getSchema() is wrong (daijy)
PIG-2084: pig is running validation for a statement at a time batch mode,
instead of running it for whole script (thejas)
PIG-2088: Return alias validation failed when there is single line comment in the macro (rding)
PIG-2081: Dryrun gives wrong line numbers in error message for scripts containing macro (rding)
PIG-2078: POProject.getNext(DataBag) does not handle null (daijy)
PIG-2029: Inconsistency in Pig Stats reports (rding)
PIG-2070: "Unknown" appears in error message for an error case (thejas)
PIG-2069: LoadFunc jar does not ship to backend in MultiQuery case (rding)
PIG-2076: update documentation, help command with correct default value
of pig.cachedbag.memusage (thejas)
PIG-2072: NPE when udf has project-star argument and input schema is null (thejas)
PIG-2075: Bring back TestNewPlanPushUpFilter (daijy)
PIG-1827: When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason (rding)
PIG-2056: Jython error messages should show script name (rding)
PIG-2014: SAMPLE shouldn't be pushed up (dvryaboy)
PIG-2058: Macro missing returns clause doesn't give a good error message (rding)
PIG-2035: Macro expansion doesn't handle multiple expansions of same macro inside another macro (rding)
PIG-2030: Merged join/cogroup does not automatically ship loader (daijy)
PIG-2052: Ship guava.jar to backend (daijy)
PIG-2012: Comments at the begining of the file throws off line numbers in errors (rding)
PIG-2043: Ship antlr-runtime.jar to backend (daijy)
PIG-2049: Pig should display TokenMgrError message consistently across all parsers (rding)
PIG-2041: Minicluster should make each run independent (daijy)
PIG-2040: Move classloader from QueryParserDriver to PigContext (daijy)
PIG-1999: Macro alias masker should consider schema context (rding)
PIG-1821: UDFContext.getUDFProperties does not handle collisions
in hashcode of udf classname (+ arg hashcodes) (thejas)
PIG-2028: Speed up multiquery unit tests (rding)
PIG-1990: support casting of complex types with empty inner schema
to complex type with non-empty inner schema (thejas)
PIG-2016: -dot option does not work with explain and new logical plan (daijy)
PIG-2018: NPE for co-group with group-by column having complex schema and
different load functions for each input (thejas)
PIG-2015: Explain writes out logical plan twice (alangates)
PIG-2017: consumeMap() fails with EmptyStackException (thedatachef via daijy)
PIG-1989: complex type casting should return null on casting failure (daijy)
PIG-1826: Unexpected data type -1 found in stream error (daijy)
PIG-2004: Incorrect input types passed on to eval function (thejas)
PIG-1814: mapred.output.compress in SET statement does not work (daijy)
PIG-1976: One more TwoLevelAccess to remove (daijy)
PIG-1865: BinStorage/PigStorageSchema cannot load data from a different namenode (daijy)
PIG-1910: incorrect schema shown when project-star is used with other projections (daijy)
PIG-2005: Discrepancy in the way dry run handles semicolon in macro definition (rding)
PIG-1281: Detect org.apache.pig.data.DataByteArray cannot be cast to
org.apache.pig.data.Tuple type of errors at Compile Type during
creation of logical plan (thejas)
PIG-1939: order-by statement should support project-range to-end in
any position among the sort columns if input schema is known (thejas)
PIG-1978: Secondary sort fail when dereferencing two fields inside foreach (daijy)
PIG-1962: Wrong alias assinged to store operator (daijy)
PIG-1975: Need to provide backward compatibility for legacy LoadCaster (without bytesToMap(bytes, fieldSchema)) (daijy)
PIG-1987: -dryrun does not work with set (rding)
PIG-1871: Dont throw exception if partition filters cannot be pushed up. (rding)
PIG-1870: HBaseStorage doesn't project correctly (dvryaboy)
PIG-1788: relation-as-scalar error messages should indicate the field
being used as scalar (laukik via thejas)
PIG-1697: NullPointerException if log4j.properties is Used (laukik via daijy)
PIG-1929:Type checker failed to catch invalid type comparison (thejas)
PIG-1928: Type Checking, incorrect error message (thejas)
PIG-1979: New logical plan failing with ERROR 2229: Couldn't find matching uid -1 (daijy)
PIG-1897: multiple star projection in a statement does not produce
the right plan (thejas)
PIG-1917: NativeMapReduce does not Allow Configuration Parameters
containing Spaces (thejas)
PIG-1974: Lineage need to set for every cast (thejas)
PIG-1988: Importing an empty macro file causing NPE (rding)
PIG-1977: "Stream closed" error while reading Pig temp files (results of intermediate jobs) (rding)
PIG-1963: in nested foreach, accumutive udf taking input from order-by does not get results in order (thejas)
PIG-1911: Infinite loop with accumulator function in nested foreach (thejas)
PIG-1923: Jython UDFs fail to convert Maps of Integer values back to Pig types (julien)
PIG-1944: register javascript UDFs does not work (julien)
PIG-1955: PhysicalOperator has a member variable (non-static) Log object that
is non-transient, this causes serialization errors (woody via rding)
PIG-1964: PigStorageSchema fails if a column value is null (thejas))
PIG-1866: Dereference a bag within a tuple does not work (daijy)
PIG-1984: Worng stats shown when there are multiple stores but same file names (rding)
PIG-1893: Pig report input size -1 for empty input file (rding)
PIG-1868: New logical plan fails when I have complex data types from udf
(daijy)
PIG-1927: Dereference partial name failed (daijy)
PIG-1934: Fix zebra test TestCheckin1, TestCheckin4 (daijy)
PIG-1931: Integrate Macro Expansion with New Parser (rding)
PIG-1933: Hints such as 'collected' and 'skewed' for "group by" or "join by"
should not be treated as tokens. (xuefuz via thejas)
PIG-1925: Parser error message doesn't show location of the error or show it
as Line 0:0 (xuefuz via gates)
PIG-671: typechecker does not throw an error when multiple arguments are
passed to COUNT (deepujain via gates)
PIG-1152: bincond operator throws parser error (xuefuz via thejas)
PIG-1885: SUBSTRING fails when input length less than start (deepujain via
gates)
PIG-719: store <expr> into 'filename'; should be valid syntax, but does not work (xuefuz via thejas)
PIG-1770: matches clause problem with chars that have special meaning in dk.brics - #, @ .. (thejas)
PIG-1862: Pig returns exit code 0 for the failed Pig script due to non-existing input directory (rding)
PIG-1888: Fix TestLogicalPlanGenerator not use hardcoded path (daijy)
PIG-1837: Error while using IsEmpty function (rding)
PIG-1884: Change ReadToEndLoader.setLocation not throw UnsupportedOperationException (thejas)
PIG-1887: Fix pig-withouthadoop.jar to contains proper jars (daijy)
PIG-1779: Wrong stats shown when there are multiple loads but same file names (rding)
PIG-1861: The pig script stored in the Hadoop History logs is stored as a concatenated string without whitespace this causes problems when attempting to extract and execute the script (rding)
PIG-1829: "0" value seen in PigStat's map/reduce runtime, even when the job is successful (rding)
PIG-1856: Custom jar is not packaged with the new job created by LimitAdjuster (rding)
PIG-1872: Fix bug in AvroStorage (guolin2001, jghoman via daijy)
PIG-1536: use same logic for merging inner schemas in "default union" and
"union onschema" (daijy)
PIG-1304: Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input (laukik via rding)
PIG-1852: Packaging antlr jar with pig.jar (rding via daijy)
PIG-1717 pig needs to call setPartitionFilter if schema is null but
getPartitionKeys is not (gerritjvv via gates)
PIG-313: Error handling aggregate of a computation (daijy)
PIG-496: project of bags from complex data causes failures (daijy)
PIG-730: problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema (daijy)
PIG-767: Schema reported from DESCRIBE and actual schema of inner bags are different (daijy)
PIG-1801: Need better error message for Jython errors (rding)
PIG-1742: org.apache.pig.newplan.optimizer.Rule.java does not work
with plan patterns where leaves/sinks are not siblings (thejas)
Release 0.8.0 - Unreleased
INCOMPATIBLE CHANGES
PIG-1518: multi file input format for loaders (yanz via rding)
PIG-1249: Safe-guards against misconfigured Pig scripts without PARALLEL keyword (zjffdu vi olgan)
IMPROVEMENTS
PIG-1561: XMLLoader in Piggybank does not support bz2 or gzip compressed XML files (vivekp via daijy)
PIG-1677: modify the repository path of pig artifacts to org/apache/pig in stead or org/apache/hadoop/pig (nrai via olgan)
PIG-1600: Docs update (romainr via olgan)
PIG-1632: The core jar in the tarball contains the kitchen sink (eli via olgan)
PIG-1617: 'group all' should always use one reducer (thejas)
PIG-1589: add test cases for mapreduce operator which use distributed cache (thejas)
PIG-1548: Optimize scalar to consolidate the part file (rding)
PIG-1600: Docs update (chandec via olgan)
PIG-1585: Add new properties to help and documentation(olgan)
PIG-1399: Filter expression optimizations (yanz via gates)
PIG-1531: Pig gobbles up error messages (nrai via hashutosh)
PIG-1458: aggregate files for replicated join (rding)
PIG-1205: Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc (zjffdu and dvryaboy)
PIG-1568: Optimization rule FilterAboveForeach is too restrictive and doesn't
handle project * correctly (xuefuz via daijy)
PIG-1574: Optimization rule PushUpFilter causes filter to be pushed up out joins (xuefuz via daijy)
PIG-1515: Migrate logical optimization rule: PushDownForeachFlatten (xuefuz via daijy)
PIG-1321: Logical Optimizer: Merge cascading foreach (xuefuz via daijy)
PIG-1483: [piggybank] Add HadoopJobHistoryLoader to the piggybank (rding)
PIG-1555: [piggybank] add CSV Loader (dvryaboy)
PIG-1501: need to investigate the impact of compression on pig performance (yanz via thejas)
PIG-1497: Mandatory rule PartitionFilterOptimizer (xuefuz via daijy)
PIG-1514: Migrate logical optimization rule: OpLimitOptimizer (xuefuz via daijy)
PIG-1551: Improve dynamic invokers to deal with no-arg methods and array parameters (dvryaboy)
PIG-1311: Document audience and stability for remaining interfaces (gates)
PIG-506: Does pig need a NATIVE keyword? (aniket486 via thejas)
PIG-1510: Add `deepCopy` for LogicalExpressions (swati.j via daijy)
PIG-1447: Tune memory usage of InternalCachedBag (thejas)
PIG-1505: support jars and scripts in dfs (anhi via rding)
PIG-1334: Make pig artifacts available through maven (niraj via rding)
PIG-1466: Improve log messages for memory usage (thejas)
PIG-1404: added PigUnit, a framework fo building unit tests of Pig Latin scripts (romainr via gates)
PIG-1452: to remove hadoop20.jar from lib and use hadoop from the apache maven
repo. (rding)
PIG-1295: Binary comparator for secondary sort (azaroth via daijy)
PIG-1448: Detach tuple from inner plans of physical operator (thejas)
PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi
via olgan)
PIG-103: Shared Job /tmp location should be configurable (niraj via rding)
PIG-1496: Mandatory rule ImplicitSplitInserter (yanz via daijy)
PIG-346: grant help command cleanup (olgan)
PIG-1199: help includes obsolete options (olgan)
PIG-1434: Allow casting relations to scalars (aniket486 via rding)
PIG-1461: support union operation that merges based on column names (thejas)
PIG-1517: Pig needs to support keywords in the package name (aniket486 via olgan)
PIG-928: UDFs in scripting languages (aniket486 via daijy)
PIG-1509: Add .gitignore file (cwsteinbach via gates)
PIG-1478: Add progress notification listener to PigRunner API (rding)
PIG-1472: Optimize serialization/deserialization between Map and Reduce and between MR jobs (thejas)
PIG-1389: Implement Pig counter to track number of rows for each input files
(rding)
PIG-1454: Consider clean up backend code (rding)
PIG-1333: API interface to Pig (rding)
PIG-1405: Need to move many standard functions from piggybank into Pig
(aniket486 via daijy)
PIG-1427: Monitor and kill runaway UDFs (dvryaboy)
PIG-1428: Make a StatusReporter singleton available for incrementing counters (dvryaboy)
PIG-972: Make describe work with nested foreach (aniket486 via daijy)
PIG-1438: [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
(rding)
PIG-1441: new test targets (olgan)
PIG-282: Custom Partitioner (aniket486 via daijy)
PIG-283: Allow to set arbitrary jobconf key-value pairs inside pig program (hashutosh)
PIG-1373: We need to add jdiff output to docs on the website (daijy)
PIG-1422: Duplicate code in LOPrinter.java (zjffdu)
PIG-1420: Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple (rjurney via dvryaboy)
PIG-1408: Annotate explain plans with aliases (rding)
PIG-1410: Make PigServer can handle files with parameters (zjffdu)
PIG-1406: Allow to run shell commands from grunt (zjffdu)
PIG-1398: Marking Pig interfaces for org.apache.pig.data package (gates)
PIG-1396: eclipse-files target in build.xml fails to generate necessary classes in src-gen
PIG-1390: Provide a target to generate eclipse-related classpath and files (chaitk via thejas)
PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy)
PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1363: Unnecessary loadFunc instantiations (hashutosh)
PIG-1370: Marking Pig interface for org.apache.pig package (gates)
PIG-1354: UDFs for dynamic invocation of simple Java methods (dvryaboy)
PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that
bzip files can be efficiently processed by splitting the files (pradeepkth)
PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in
subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema()
(pradeepkth)
PIG-1413: Remove svn:externals reference for test-patch.sh and
create a local copy of test-patch.sh (gkesavan)
PIG-1302: Include zebra's "pigtest" ant target as a part of pig's
ant test target. (gkesavan)
PIG-1582: To upgrade commons-logging
OPTIMIZATIONS
PIG-1353: Map-side joins (ashutoshc)
PIG-1309: Map-side Cogroup (ashutoshc)
BUG FIXES
PIG-2067: FilterLogicExpressionSimplifier removed some branches in some cases (daijy)
PIG-2033: Pig returns sucess for the failed Pig script (rding)
PIG-1993: PigStorageSchema throw NPE with ColumnPruning (daijy)
PIG-1935: New logical plan: Should not push up filter in front of Bincond (daijy)
PIG-1912: non-deterministic output when a file is loaded multiple times (daijy)
PIG-1892: Bug in new logical plan : No output generated even though there are
valid records (daijy)
PIG-1808: Error message in 0.8 not much helpful as compared to 0.7 (daijy)
PIG-1850: Order by is failing with ClassCastException if schema is undefined
for new logical plan in 0.8 (daijy)
PIG-1831: Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf (daijy)
PIG-1841: TupleSize implemented incorrectly (laukik via daijy)
PIG-1843: NPE in schema generation (daijy)
PIG-1820: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF (daijy)
PIG-1854: Pig returns exit code 0 for the failed Pig script (rding)
PIG-1812: Problem with DID_NOT_FIND_LOAD_ONLY_MAP_PLAN (daijy)
PIG-1813: Pig 0.8 throws ERROR 1075 while trying to refer a map in the result
of eval udf.Works with 0.7 (daijy)
PIG-1776: changing statement corresponding to alias after explain , then
doing dump gives incorrect result (thejas)
PIG-1800: Missing Signature for maven staging release (rding)
PIG-1815: pig task retains used instances of PhysicalPlan (thejas)
PIG-1785: New logical plan: uid conflict in flattened fields (daijy)
PIG-1787: Error in logical plan generated (daijy)
PIG-1791: System property mapred.output.compress, but pig-cluster-hadoop-site.xml doesn't (daijy)
PIG-1771: New logical plan: Merge schema fail if LoadFunc.getSchema return different schema with "Load...AS" (daijy)
PIG-1766: New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite (daijy)
PIG-1762: Logical simplification fails on map key referenced values (yanz)
PIG-1761: New logical plan: Exception when bag dereference in the middle of expression (daijy)
PIG-1757: After split combination, the number of maps may vary slightly (yanz)
PIG-1760: Need to report progress in all databags (rding)
PIG-1709: Skewed join use fewer reducer for extreme large key (daijy)
PIG-1751: New logical plan: PushDownForEachFlatten fail in UDF with unknown
output schema (daijy)
PIG-1741: Lineage fail when flatten a bag (daijy)
PIG-1739: zero status is returned when pig script fails (yanz)
PIG-1738: New logical plan: Optimized UserFuncExpression.getFieldSchema (daijy)
PIG-1732: New logical plan: logical plan get confused if we generate the same
field twice in ForEach (daijy)
PIG-1737: New logical plan: Improve error messages when merge schema fail (daijy)
PIG-1725: New logical plan: uidOnlySchema bug in LOGenerate (daijy)
PIG-1729: New logical plan: Dereference does not add into plan after deepCopy (daijy)
PIG-1721: New logical plan: script fail when reuse foreach inner alias (daijy)
PIG-1716: New logical plan: LogToPhyTranslationVisitor should translate the structure for regex optimization (daijy)
PIG-1740: Fix SVN location in setup doc (chandec via olgan)
PIG-1719: New logical plan: FieldSchema generation for BinCond is wrong (daijy)
PIG-1720: java.lang.NegativeArraySizeException during Quicksort (thejas)
PIG-1727: Hadoop default config override pig.properties (rding)
PIG-1731: Stack Overflows where there are composite logical expressions on UDFs using the new logical plan (yanz)
PIG-1723: Need to limit the length of Pig counter names (rding)
PIG-1714: Option mapred.output.compress doesn't work in Pig 0.8 but worked in
0.7 (xuefuz via rding)
PIG-1715: pig-withouthadoop.jar missing automaton.jar (thejas)
PIG-1706: New logical plan: PushDownFlattenForEach fail if flattened field has user defined schema (daijy)
PIG-1705: New logical plan: self-join fail for some queries (daijy)
PIG-1704: Output Compression is not at work if the output path is absolute and there is a trailing / afte the compression suffix (yanz)
PIG-1695: MergeForEach does not carry user defined schema if any one of the merged ForEach has user defined schema (daijy)
PIG-1684: Inconsistent usage of store func. (thejas)
PIG-1694: union-onschema projects null schema at parsing stage for some queries (thejas)
PIG-1685: Pig is unable to handle counters for glob paths ? (daijy)
PIG-1683: New logical plan: Nested foreach plan fail if one inner alias is refered more than once (daijy)
PIG-1542: log level not propogated to MR task loggers (nrai via daijy)
PIG-1673: query with consecutive union-onschema statement errors out (thejas)
PIG-1653: Scripting UDF fails if the path to script is an absolute path (daijy)
PIG-1669: PushUpFilter fail when filter condition contains scalar (daijy)
PIG-1672: order of relations in replicated join gets switched in a query where
first relation has two mergeable foreach statements (thejas)
PIG-1666: union onschema fails when the input relation has cast from bytearray to another type (thejas)
PIG-1655: code duplicated for udfs that were moved from piggybank to builtin (nrai via daijy)
PIG-1670: pig throws ExecException in stead of FrontEnd exception when the plan validation fails (nrai via daijy)
PIG-1668: Order by failed with RuntimeException (rding)
PIG-1659: sortinfo is not set for store if there is a filter after ORDER BY (daijy)
PIG-1664: leading '_' in directory/file names should be ignored; the "pigtest" build target should include all pig-related zebra tests. (yanz)
PIG-1662: Need better error message for MalFormedProbVecException (rding)
PIG-1656: TOBAG udfs ignores columns with null value; it does not use input type
to determine output schema (thejas)
PIG-1658: ORDER BY does not work properly on integer/short keys that are -1 (yanz)
PIG-1638: sh output gets mixed up with the grunt prompt (nrai via daijy)
PIG-1607: pig should have separate javadoc.jar in the maven
repository (nrai via thejas)
PIG-1651: PIG class loading mishandled (rding)
PIG-1650: pig grunt shell breaks for many commands like perl , awk ,
pipe , 'ls -l' etc (nrai via thejas)
PIG-1649: FRJoin fails to compute number of input files for replicated
input (thejas)
PIG-1637: Combiner not use because optimizor inserts a foreach between group
and algebric function (daijy)
PIG-1648: Split combination may return too many block locations to map/reduce framework (yanz)
PIG-1641: Incorrect counters in local mode (rding)
PIG-1647: Logical simplifier throws a NPE (yanz)
PIG-1642: Order by doesn't use estimation to determine the parallelism (rding)
PIG-1644: New logical plan: Plan.connect with position is misused in some
places (daijy)
PIG-1643: join fails for a query with input having 'load using pigstorage
without schema' + 'foreach' (daijy)
PIG-1645: Using both small split combination and temporary file compression on a query of ORDER BY may cause crash (yanz)
PIG-1635: Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of
AND and OR may get changed (yanz)
PIG-1639: New logical plan: PushUpFilter should not push before group/cogroup
if filter condition contains UDF (xuefuz via daijy)
PIG-1643: join fails for a query with input having 'load using pigstorage
without schema' + 'foreach' (thejas)
PIG-1628: log this message at debug level : 'Pig Internal storage in use' (thejas)
PIG-1636: Scalar fail if the scalar variable is generated by limit (daijy)
PIG-1605: PIG-1605: Adding soft link to plan to solve input file dependency
(daijy)
PIG-1598: Pig gobbles up error messages - Part 2 (nrai via daijy)
PIG-1616: 'union onschema' does not use create output with correct schema
when udfs are involved (thejas)
PIG-1610: 'union onschema' does handle some cases involving 'namespaced'
column names in schema (thejas)
PIG-1609: 'union onschema' should give a more useful error message when
schema of one of the relations has null column name (thejas)
PIG-1562: Fix the version for the dependent packages for the maven (nrai via
rding)
PIG-1604: 'relation as scalar' does not work with complex types (thejas)
PIG-1601: Make scalar work for secure hadoop (daijy)
PIG-1602: The .classpath of eclipse template still use hbase-0.20.0 (zjffdu)
PIG-1596: NPE's thrown when attempting to load hbase columns containing null values (zjffdu)
PIG-1597: Development snapshot jar no longer picked up by bin/pig (dvryaboy)
PIG-1599: pig gives generic message for few cases (nrai via rding)
PIG-1595: casting relation to scalar- problem with handling of data from non PigStorage loaders (thejas)
PIG-1591: pig does not create a log file, if tje MR job succeeds but front end fails (nrai via daijy)
PIG-1543: IsEmpty returns the wrong value after using LIMIT (daijy)
PIG-1550: better error handling in casting relations to scalars (thejas)
PIG-1572: change default datatype when relations are used as scalar to bytearray (thejas)
PIG-1583: piggybank unit test TestLookupInFiles is broken (daijy)
PIG-1563: some of string functions don't work on bytearrays (olgan)
PIG-1569: java properties not honored in case of properties such as
stop.on.failure (rding)
PIG-1570: native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs (thejas)
PIG-1343: pig_log file missing even though Main tells it is creating one and
an M/R job fails (nrai via rding)
PIG-1482: Pig gets confused when more than one loader is involved (xuefuz via thejas)
PIG-1579: Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput (daijy)
PIG-1557: couple of issue mapping aliases to jobs (rding)
PIG-1552: Nested describe failed when the alias is not referred in the first foreach inner plan (aniket486 via daijy)
PIG-1486: update ant eclipse-files target to include new jar and remove contrib dirs from build path (thejas)
PIG-1524: 'Proactive spill count' is misleading (thejas)
PIG-1546: Incorrect assert statements in operator evaluation (ajaykidave via
pradeepkth)
PIG-1392: Parser fails to recognize valid field (niraj via rding)
PIG-1541: FR Join shouldn't match null values (rding)
PIG-1525: Incorrect data generated by diff of SUM (rding)
PIG-1288: EvalFunc returnType is wrong for generic subclasses (daijy)
PIG-1534: Code discovering UDFs in the script has a bug in a order by case
(pradeepkth)
PIG-1533: Compression codec should be a per-store property (rding)
PIG-1527: No need to deserialize UDFContext on the client side (rding)
PIG-1516: finalize in bag implementations causes pig to run out of memory in reduce (thejas)
PIG-1521: explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used (thejas)
PIG-1513: Pig doesn't handle empty input directory (rding)
PIG-1500: guava.jar should be removed from the lib folder (niraj via rding)
PIG-1034: Pig does not support ORDER ... BY group alias (zjffdu)
PIG-1445: Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented (daijy)
PIG-348: -j command line option doesn't work (rding)
PIG-1487: Replace "bz" with ".bz" in all the LoadFunc
PIG-1489: Pig MapReduceLauncher does not use jars in register statement
(rding)
PIG-1435: make sure dependent jobs fail when a jon in multiquery fails (niraj
via rding)
PIG-1492: DefaultTuple and DefaultMemory understimate their memory footprint (thejas)
PIG-1409: Fix up javadocs for org.apache.pig.builtin (gates)
PIG-1490: Make Pig storers work with remote HDFS in secure mode (rding)
PIG-1469: DefaultDataBag assumes ArrayList as default List type (azaroth via dvryaboy)
PIG-1467: order by fail when set "fs.file.impl.disable.cache" to true (daijy)
PIG-1463: Replace "bz" with ".bz" in setStoreLocation in PigStorage (zjffdu)
PIG-1221: Filter equality does not work for tuples (zjffdu)
PIG-1456: TestMultiQuery takes a long time to run (rding)
PIG-1457: Pig will run complete zebra test even we give -Dtestcase=xxx (daijy)
PIG-1450: TestAlgebraicEvalLocal failures due to OOM (daijy)
PIG-1433: pig should create success file if
mapreduce.fileoutputcommitter.marksuccessfuljobs is true (pradeepkth)
PIG-1347: Clear up output directory for a failed job (daijy)
PIG-1419: Remove "user.name" from JobConf (daijy)
PIG-1359: bin/pig script does not pick up correct jar libraries (zjffdu)
PIG-566: Dump and store outputs do not match for PigStorage (azaroth via daijy)
PIG-1414: Problem with parameter substitution (rding)
PIG-1407: Logging starts before being configured (azaroth via daijy)
PIG-1391: pig unit tests leave behind files in temp directory because
MiniCluster files don't get deleted (thejas)
PIG-1211: Pig script runs half way after which it reports syntax error
(pradeepkth)
PIG-1401: "explain -script <script file>" executes grunt commands like
run/dump/copy etc - explain -script should not execute any grunt command and
only explain the query plans (pradeepkth)
PIG-1303: Inconsistent instantiation of parametrized UDFs (jrussek and dvryaboy)
740 : Incorrect line number is generated when a string with double quotes is
used instead of single quotes and is passed to UDF (pradeepkth)
1378: har url not usable in Pig scripts (pradeepkth)
PIG-1395: Mapside cogroup runs out of memory (ashutoshc)
PIG-1383: Remove empty svn directorirs from source tree (rding)
PIG-1348: PigStorage making unnecessary byte array copy when storing data
(rding)
PIG-1372: Restore PigInputFormat.sJob for backward compatibility (pradeepkth)
PIG-1369: POProject does not handle null tuples and non existent fields in
some cases (pradeepkth)
PIG-1364: Public javadoc on apache site still on 0.2, needs to be updated for each version release (gates)
PIG-1338: Pig should exclude hadoop conf in local mode (daijy)
PIG-1299: Implement Pig counter to track number of output rows for each output
files (rding)
PIG-1366: PigStorage's pushProjection implementation results in NPE under
certain data conditions (pradeepkth)
PIG-1365: WrappedIOException is missing from Pig.jar (pradeepkth)
PIG-1313: PigServer leaks memory over time (billgraham via daijy)
PIG-1346: In unit tests Util.executeShellCommand relies on java commands being
in the path and does not consider JAVA_HOME (pradeepkth)
PIG-1352: piggybank UPPER udf throws exception if argument is null
PIG-1560: Fix ant target checkstyle (gkesavan)
Release 0.7.0
INCOMPATIBLE CHANGES
PIG-1292: Interface Refinements (ashutoshc)
PIG-1259: ResourceFieldSchema.setSchema should not allow a bag field without a
Tuple as its only sub field (the tuple itself can have a schema with > 1
subfields) (pradeepkth)
PIG-1265: Change LoadMetadata and StoreMetadata to use Job instead of
Configuraiton and add a cleanupOnFailure method to StoreFuncInterface
(pradeepkth)
PIG-1250: Make StoreFunc an abstract class and create a mirror interface
called StoreFuncInterface (pradeepkth)
PIG-1234: Unable to create input slice for har:// files (pradeepkth)
PIG-1200: Using TableInputFormat in HBaseStorage (zjffdu via pradeepkth)
PIG-1148: Move splitable logic from pig latin to InputFormat (zjffdu via
pradeepkth)
PIG-1141: Make streaming work with the new load-store interfaces (rding via
pradeepkth)
PIG-1110: Handle compressed file formats -- Gz, BZip with the new proposal
(rding via pradeepkth)
PIG-1088: change merge join and merge join indexer to work with new LoadFunc
interface (thejas via pradeepkth)
PIG-879: Pig should provide a way for input location string in load statement
to be passed as-is to the Loader (rding via pradeepkth)
PIG-966: load-store-redesign branch: change SampleLoader and subclasses to
work with new LoadFunc interface (thejas via pradeepkth)
PIG-1094: Fix unit tests corresponding to source changes so far (pradeepkth)
PIG-1090: Update sources to reflect recent changes in load-store interfaces
(pradeepkth)
PIG-1072: ReversibleLoadStoreFunc interface should be removed to enable
different load and store implementation classes to be used in a reversible
manner (rding via pradeepkth)
IMPROVEMENTS
PIG-1381: Need a way for Pig to take an alternative property file (daijy)
PIG-1330: Move pruned schema tracking logic from LoadFunc to core code (daijy)
PIG-1320: more documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1320: documentation updates for Pig 0.7.0 (chandec via olgan)
PIG-1325: Provide a way to exclude a testcase when running "ant test"
(pradeepkth)
PIG-1312: Make Pig work with hadoop security (daijy)
PIG-1308: Inifinite loop in JobClient when reading from BinStorage Message:
[org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to
process : 2] (pradeepkth)
PIG-1285: Allow SingleTupleBag to be serialized (dvryaboy)
PIG-1117: Pig reading hive columnar rc tables (gerritjvv via dvryaboy)
PIG-1287: Use hadoop-0.20.2 with pig 0.7.0 release (pradeepkth)
PIG-1257: PigStorage per the new load-store redesign should support splitting
of bzip files (pradeepkth)
PIG-1290: WeightedRangePartitioner should not check if input is empty if
quantile file is empty (pradeepkth)
PIG-1262: Additional findbugs and javac warnings (daijy)
PIG-1248: [piggybank] some useful String functions (dvryaboy)
PIG-1251: Move SortInfo calculation earlier in compilation (ashutoshc)
PIG-1233: NullPointerException in AVG (ankur via olgan)
PIG-1218: Use distributed cache to store samples (rding via pradeepkth)
PIG-1226: suuport for additional jar files (thejas via olgan)
PIG-1230: Streaming input in POJoinPackage should use nonspillable bag to
collect tuples (ashutoshc)
PIG-1224: Collected group should change to use new (internal) bag (ashutoshc)
PIG-1046: join algorithm specification is within double quotes (ashutoshc)
PIG-1209: Port POJoinPackage to proactively spill (ashutoshc)
PIG-1190: Handling of quoted strings in pig-latin/grunt commands (ashutoshc)
PIG-1214: Pig 0.6 Docs fixes (chandec via olgan)
PIG-977: exit status does not account for JOB_STATUS.TERMINATED (ashutoshc)
PIG-1192: Pig 0.6 Docs fixes (chandec via olgan)
PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan)
PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan)
PIG-1102: Collect number of spills per job (sriranjan via olgan)
PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs
(dvryaboy via pradeepkth)
PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan)
PIG-1163: Pig/Zebra 0.6.0 release (chandec via olgan)
PIG-1156: Add aliases to ExecJobs and PhysicalOperators (dvryaboy via gates)
PIG-1161: add missing license headers (dvryaboy via olgan)
PIG-760: Add a new PigStorageSchema load/store function that
store schemas for text files (dvryaboy via gates)
PIG-1106: FR join should not spill (ankit.modi via olgan)
PIG-1147: Zebra Docs for Pig 0.6.0 (chandec via olgan)
PIG-1129: Pig UDF doc: fieldsToRead function (chandec via olgan)
PIG-978: MQ docs update (chandec via olgan)
PIG-990: Provide a way to pin LogicalOperator Options (dvryaboy via gates)
PIG-1103: refactoring of commit tests (olgan)
PIG-1101: Allow arugment to limit to be long in addition to int (ashutoshc via
gates)
PIG-872: use distributed cache for the replicated data set in FR join
(sriranjan via olgan)
PIG-1053: Consider moving to Hadoop for local mode (ankit.modi via olgan)
PIG-1085: Pass JobConf and UDF specific configuration information to UDFs
(gates)
PIG-1173: pig cannot be built without an internet connection (jmhodges via daijy)
OPTIMIZATIONS
BUG FIXES
PIG-1507: Full outer join fails while doing a filter on joined data (daijy)
PIG-1493: Column Pruner throw exception "inconsistent pruning" (daijy)
PIG-1484: BinStorage should support comma seperated path (daijy)
PIG-1443: DefaultTuple underestimate the memory footprint for string (daijy)
PIG-1446: OOME in a query having a bincond in the inner plan of a Foreach.(hashutosh)
PIG-1415: LoadFunc signature is not correct in LoadFunc.getSchema sometimes (daijy)
PIG-1403: Make Pig work with remote HDFS in secure mode (daijy)
PIG-1394: POCombinerPackage hold too much memory for InternalCachedBag (daijy)
PIG-1374: PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in the next statement (daijy)
PIG-1336: Optimize POStore serialized into JobConf (daijy)
PIG-1335: UDFFinder should find LoadFunc used by POCast (daijy)
PIG-1307: when we spill the DefaultDataBag we are not setting the sized changed flag to be true. (breed via daijy)
PIG-1298: Restore file traversal behavior to Pig loaders (rding)
PIG-1289: PIG Join fails while doing a filter on joined data (daijy)
PIG-1266: Show spill count on the pig console at the end of the job (sriranjan
via rding)
PIG-1296: Skewed join fail due to negative partition index (daijy)
PIG-1293: pig wrapper script tends to fail if pig is in the path and PIG_HOME
isn't set (aw via gates)
PIG-1272: Column pruner causes wrong results (daijy)
PIG-1275: empty bag in PigStorage read as null (daijy)
PIG-1252: Diamond splitter does not generate correct results when using
Multi-query optimization (rding)
PIG-1260: Param Subsitution results in parser error if there is no EOL after
last line in script (rding)
PIG-1238: Dump does not respect the schema (rding)
PIG-1261: PigStorageSchema broke after changes to ResourceSchema (dvryaboy via
daijy)
PIG-1053: Put pig.properties back into release distribution (gates).
PIG-1273: Skewed join throws error (rding)
PIG-1267: Problems with partition filter optimizer (rding)
PIG-1079: Modify merge join to use distributed cache to maintain the index
(rding)
PIG-1241: Accumulator is turned on when a map is used with a non-accumulative
UDF (yinghe vi olgan)
PIG-1215: Make Hadoop jobId more prominent in the client log (ashutoshc)
PIG-1216: New load store design does not allow Pig to validate inputs and
outputs up front (ashutoshc via pradeepkth)
PIG-1239: PigContext.connect() should not create a jobClient and jobClient
should be created on demand when needed (pradeepkth)
PIG-1169: Top-N queries produce incorrect results when a store statement is added between order by and limit statement (rding)
PIG-1131: Pig simple join does not work when it contains empty lines (ashutoshc)
PIG-834: incorrect plan when algebraic functions are nested (ashutoshc)
PIG-1217: Fix argToFuncMapping in Piggybank Top function (dvryaboy via gates)
PIG-1154: Local Mode fails when hadoop config directory is specified in
classpath (ankit.modi via gates)
PIG-1124: Unable to set Custom Job Name using the -Dmapred.job.name parameter (ashutoshc)
PIG-1213: Schema serialization is broken (pradeepkth)
PIG-1194: ERROR 2055: Received Error while processing the map plan (rding via ashutoshc)
PIG-1204: Pig hangs when joining two streaming relations in local mode
(rding)
PIG-1191: POCast throws exception for certain sequences of LOAD, FILTER,
FORACH (pradeepkth via gates)
PIG-1171: Top-N queries produce incorrect results when followed by a cross statement (rding via olgan)
PIG-1159: merge join right side table does not support comma seperated paths
(rding via olgan)
PIG-1158: pig command line -M option doesn't support table union correctly
(comma seperated paths) (rding via olgan)
PIG-1143: Poisson Sample Loader should compute the number of samples required
only once (sriranjan via olgan)
PIG-1157: Sucessive replicated joins do not generate Map Reduce plan and fails
due to OOM (rding via olgan)
PIG-1075: Error in Cogroup when key fields types don't match (rding via olgan)
PIG-973: type resolution inconsistency (rding via olgan)
PIG-1135: skewed join partitioner returns negative partition index (yinghe
via olgan)
PIG-1134: Skewed Join sampling job overwhelms the name node (sriranjan via
olgan)
PIG-1105: COUNT_STAR accumulate interface implementation cases failure
(sriranjan via olgan)
PIG-1118: expression with aggregate functions returning null, with accumulate
interface (yinghe via olgan)
PIG-1068: COGROUP fails with 'Type mismatch in key from map: expected
org.apache.pig.impl.io.NullableText, recieved
org.apache.pig.impl.io.NullableTuple' (rding via gates)
PIG-1113: Diamond query optimization throws error in JOIN (rding via olgan)
PIG-1116: Remove redundant map-reduce job for merge join (pradeepkth)
PIG-1114: MultiQuery optimization throws error when merging 2 level spl (rding
via olgan)
PIG-1108: Incorrect map output key type in MultiQuery optimiza (rding via
olgan)
PIG-1022: optimizer pushes filter before the foreach that generates column
used by filter (daijy via gates)
PIG-1107: PigLineRecordReader bails out on an empty line for compressed data
(ankit.modi via olgan)
PIG-598: Parameter substitution ($PARAMETER) should not be performed in
comments (thejas via olgan)
PIG-1064: Behaviour of COGROUP with and without schema when using "*" operator
(pradeepkth)
PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi
via )
PIG-1086: Nested sort by * throw exception (rding via daijy)
PIG-1146: Inconsistent column pruning in LOUnion (daijy)
PIG-1176: Column Pruner issues in union of loader with and without schema
(daijy)
PIG-1184: PruneColumns optimization does not handle the case of foreach
flatten correctly if flattened bag is not used later (daijy)
PIG-1189: StoreFunc UDF should ship to the backend automatically without
"register" (daijy)
PIG-1212: LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null (daijy)
PIG-1255: Tiny code cleanup for serialization code for PigSplit (daijy)
PIG-613: Casting elements inside a tuple does not take effect (daijy)
Release 0.6.0
INCOMPATIBLE CHANGES
PIG-922: Logical optimizer: push up project (daijy)
IMPROVEMENTS
PIG-1084: Pig 0.6.0 Documentation improvements (chandec via olgan)
PIG-1089: Pig 0.6.0 Documentation (chandec via olgan)
PIG-958: Splitting output data on key field (ankur via pradeepkth)
PIG-1058: FINDBUGS: remaining "Correctness Warnings" (olgan)
PIG-1036: Fragment-replicate left outer join (ankit.modi via pradeepkth)
PIG-920: optimizing diamond queries (rding via pradeepkth)
PIG-1040: FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be (olgan)
PIG-1059: FINDBUGS: remaining Bad practice + Multithreaded correctness Warning (olgan)
PIG-953: Enable merge join in pig to work with loaders and store functions
which can internally index sorted data (pradeepkth)
PIG-1055: FINDBUGS: remaining "Dodgy Warnings" (olgan)
PIG-1052: FINDBUGS: remaining performance warningse(olgan)
PIG-1037: Converted sorted and distinct bags to use the new active spilling
paradigm (yinghe via gates)
PIG-1051: FINDBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (olgan)
PIG-1050: FINDBUGS: DLS_DEAD_LOCAL_STORE: Dead store to local variable (olgan)
PIG-1045: Integration with Hadoop 20 New API (rding via pradeepkth)
PIG-1043: FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class
(olgan)
PIG-1047: FINDBUGS: URF_UNREAD_FIELD: Unread field (olgan)
PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new
String(String) constructor (olgan)
PIG-984: Add map side grouping for data that is already collected when
it is read into the map (rding via gates)
PIG-1025: Add ability to set job priority from Pig Latin script (kevinweil via
gates)
PIG-1028: FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number
constructor; use static valueOf instead (olgan)
PIG-1012: FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in
serializable class (olgan)
PIG-1013: FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on
an array (olgan)
PIG-1011: FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't
define serialVersionUID (olgan)
PIG-1009: FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream (olgan)
PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan)
PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with
a lower case letter (olgan)
PIG-1023: FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL (olgan)
PIG-1019: added findbugs exclusion file (olgan)
PIG-983: PERFORMANCE: multi-query optimization on multiple group bys
following a join or cogroup (rding via pradeepkth)
PIG-975: Need a databag that does not register with SpillableMemoryManager and
spill data pro-actively (yinghe via olgan)
PIG-891: Fixing dfs statement for Pig (zjffdu via daijy
PIG-956: 10 minute commit tests (olgan)
PIG-948: [Usability] Relating pig script with MR jobs (ashutoshc via daijy)
PIG-960: Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage ( ankit.modi via daijy)
PIG-1020: Include an ant target to build pig.jar without hadoop libraries (daijy)
PIG-1033: javac warnings: deprecated hadoop APIs (daijy)
PIG-1041: javac warnings: cast, fallthrough, serial (daijy)
PIG-1042: javac warnings: unchecked (daijy)
PIG-1038: Optimize nested distinct/sort to use secondary key (daijy)
PIG-979: Acummulator Interface for UDFs (yinghe via daijy)
OPTIMIZATIONS
PIG-922: Logical optimizer: push up project (daijy)
BUG FIXES
PIG-1080: PigStorage may miss records when loading a file (rding via olgan)
PIG-1071: Support comma separated file/directory names in load statements
(rding via pradeepkth)
PIG-970: Changes to make HBase loader work with HBase 0.20 (vbarat and zjffdu
via gates)
PIG-1035: support for skewed outer join (sriranjan via pradeepkth)
PIG-1030: explain and dump not working with two UDFs inside inner plan of
foreach (rding via pradeepkth)
PIG-1048: inner join using 'skewed' produces multiple rows for keys with
single row in both input relations (sriranjan via gates)
PIG-1063: Pig does not call checkOutSpecs() on OutputFormat provided by
StoreFunc in the multistore case (pradeepkth)
PIG-746: Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag
should never be serialized (rding via pradeepkth)
PIG-1027: Number of bytes written are always zero in local mode (zjffdu via gates)
PIG-976: Multi-query optimization throws ClassCastException (rding via
pradeepkth)
PIG-858: Order By followed by "replicated" join fails while compiling MR-plan
from physical plan (ashutoshc via gates)
PIG-968: Fix findContainingJar to work properly when there is a + in the jar
path (tlipcon via gates)
PIG-738: Regexp passed from pigscript fails in UDF (pradeepkth)
PIG-942: Maps are not implicitly casted (pradeepkth)
PIG-513: Removed unecessary bounds check in DefaultTuple (ashutoshc via
gates)
PIG-951: Set parallelism explicitly to 1 for indexing job in merge join
(ashutoshc via gates)
PIG-592: schema inferred incorrectly (daijy)
PIG-989: Allow type merge between numerical type and non-numerical type (daijy)
PIG-894: order-by fails when input is empty (daijy)
PIG-995: Limit Optimizer throw exception "ERROR 2156: Error while fixing projections" (daijy)
PIG-1000: InternalCachedBag.java generates javac warning and findbug warning (yinghe via daijy)
PIG-921: Strange use case for Join which produces different results in local and map reduce mode (daijy)
PIG-1024: Script contains nested limit fail due to "LOLimit does not support multiple outputs" (daijy)
PIG-644: Duplicate column names in foreach do not throw parser error (daijy)
PIG-927: null should be handled consistently in Join (daijy)
PIG-790: Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) (daijy)
PIG-1001: Generate more meaningful error message when one input file does not exist (daijy)
PIG-1060: MultiQuery optimization throws error for multi-level splits (rding via daijy)
PIG-1128: column pruning causing failure when foreach has user-specified
schema (daijy)
PIG-1127: Logical operator should contains individual copy of schema object
(daijy)
PIG-1133: UDFContext should be made available to LoadFunc.bindTo (daijy)
PIG-1132: Column Pruner issues in dealing with unprunable loader (daijy)
PIG-1142: Got NullPointerException merge join with pruning (daijy)
PIG-1155: Need to make sure existing loaders work "as is" (daijy)
PIG-1144: set default_parallelism construct does not set the number of
reducers correctly (daijy)
PIG-1165: Signature of loader does not set correctly for order by (daijy)
PIG-761: ERROR 2086 on simple JOIN (daijy)
PIG-1172: PushDownForeachFlatten shall not push ForEach below Join if the
flattened fields is used in Join (daijy)
PIG-1180: Piggybank should compile even if we only have
"pig-withouthadoop.jar" but no "pig.jar" in the pig home directory (daijy)
PIG-1185: Data bags do not close spill files after using iterator to read
tuples (yinghe via daijy)
PIG-1186: Pig do not take values in "pig-cluster-hadoop-site.xml" (daijy)
PIG-1193: Secondary sort issue on nested desc sort (daijy)
PIG-1195: POSort should take care of sort order (daijy)
PIG-1210: fieldsToRead send the same fields more than once in some cases (daijy)
PIG-1231: DefaultDataBagIterator.hasNext() should be idempotent in all cases
(daijy)
Release 0.5.0
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-1039: documentation update (chandec via olgan)
OPTIMIZATIONS
BUG FIXES
PIG-963: Join in local mode matches null keys (pradeepkth)
PIG-660: Integration with Hadoop 20 (sms via olgan)
Release 0.4.0 - 2009-09-26
INCOMPATIBLE CHANGES
PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart
(olgan)
PIG-734: Changed maps to only take strings as keys (gates)
IMPROVEMENTS
PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan)
PIG-578: join ... outer, ... outer semantics are a no-ops, should produce
corresponding null values (pradeepkth)
PIG-936: making dump and PigDump independent from Tuple.toString (daijy)
PIG-890: Create a sampler interface and improve the skewed join sampler (sriranjan via daijy)
PIG-922: Logical optimizer: push up project part 1 (daijy)
PIG-812: COUNT(*) does not work (breed)
PIG-923: Allow specifying log file location through pig.properties (dvryaboy via daijy)
PIG-926: Merge-Join phase 2 (ashutoshc via pradeepkth)
PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth)
PIG-893: Added string -> integer, long, float, and double casts (zjffdu via gates)
PIG-833: Added Zebra, new columnar storage mechanism for HDFS (rangadi plus many others via gates)
PIG-697: Proposed improvements to pig's optimizer, Phase5 (daijy)
PIG-895: Default parallel for Pig (daijy)
PIG-820: Change RandomSampleLoader to take a LoadFunc instead of extending
BinStorage. Added new Samplable interface for loaders to implement
allowing them to be used by RandomSampleLoader (ashutoshc via gates)
PIG-832: Make import list configurable (daijy)
PIG-697: Proposed improvements to pig's optimizer (sms)
PIG-753: Allow UDFs with no parameters (zjffdu via gates)
PIG-765: jdiff for pig ( gkesavan
OPTIMIZATIONS
PIG-792: skew join implementation (sriranjan via olgan)
BUG FIXES
PIG-964: Handling null in skewed join (sriranjan via olgan)
PIG-962: Skewed join creates 3 map reduce jobs (sriranjan via olgan)
PIG-957: Tutorial is broken with 0.4 branch and trunk (pradeepkth)
PIG-955: Skewed join produces invalid results (yinghe via olgan)
PIG-954: Skewed join fails when pig.skewedjoin.reduce.memusage is not
configured(yinghe via olgan)
PIG-882: log level not propogated to loggers - duplicate message (daijy)
PIG-943: Pig crash when it cannot get counter from hadoop (daijy)
PIG-935: Skewed join throws an exception when used with map keys(sriranjan
via pradeepkth)
PIG-934: Merge join implementation currently does not seek to right point
on the right side input based on the offset provided by the index
(ashutoshc via pradeepkth)
PIG-925: Fix join in local mode (daijy)
PIG-913: Error in Pig script when grouping on chararray column (daijy)
PIG-907: Provide multiple version of HashFNV (Piggybank) (daijy)
PIG-905: TOKENIZE throws exception on null data (daijy)
PIG-901: InputSplit (SliceWrapper) created by Pig is big in size due to
serialized PigContext (pradeepkth)
PIG-882: log level not propogated to loggers (daijy)
PIG-880: Order by is borken with complex fields (sms)
PIG-773: Empty complex constants (empty bag, empty tuple and empty map)
should be supported (ashutoshc via sms)
PIG-695: Pig should not fail when error logs cannot be created (sms)
PIG-878: Pig is returning too many blocks in the input split. (arunc via gates)
PIG-888: Pig do not pass udf to the backend in some situation (daijy)
PIG-728: All backend error messages must be logged to preserve the
original error messages (sms)
PIG-877: Push up filter does not account for added columns in foreach
(sms)
PIG-883: udf import list does not send to the backend (daijy)
PIG-881: Pig should ship load udfs to the backend (daijy)
PIG-876: limit changes order of order-by to ascending (daijy)
PIG-851: Map type used as return type in UDFs not recognized at all times
(zjffdu via sms)
PIG-861: POJoinPackage lose tuple in large dataset (daijy)
PIG-797: Limit with ORDER BY producing wrong results (daijy)
PIG-850: Dump produce wrong result while "store into" is ok (daijy)
PIG-852: pig -version or pig -help returns exit code of 1 (milindb via
olgan)
PIG-849: Local engine loses records in splits (hagleitn via olgan)
PIG-939: Fix checkstyle ivy configuration ( gkesavan )
Release 0.3.0 - Unreleased
INCOMPATIBLE CHANGES
IMPROVEMENTS
PIG-817: documentation update (chandec via olgan)
PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates)
PIG-831: Turned off reporting of records and bytes written for mutli-store
queries as the returned results are confusing and wrong. (gates)
PIG-813: documentation updates (chandec via olgan)
PIG-825: PIG_HADOOP_VERSION should be set to 18 (dvryaboy via gates)
PIG-795: support for SAMPLE command (ericg via olgan)
PIG-619: Create one InputSplit even when the input file is zero length
so that hadoop runs maps and creates output for the next
job (gates)
PIG-697: Proposed improvements to pig's optimizer (sms)
PIG-700: To automate the pig patch test process (gkesavan via sms)
PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu
via gates)
PIG-652: Adapt changes in store interface to multi-query changes (hagleitn
via gates)
PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create
empty bags (pradeepkth)
PIG-741: Allow limit to be nested in a foreach.
PIG-627: multiquery support phase 3 (hagleitn and Richard Ding via olgan)
PIG-743: To implement clover (gkesavan)
PIG-701: Implement IVY for resolving pig dependencies (gkesavan)
PIG-626: Add access to hadoop counters (shubhamc via gates)
PIG-627: multiquery support phase 1 and phase 2 (hagleitn and Richard Ding via pradeepkth)
BUG FIXES
PIG-846: MultiQuery optimization in some cases has an issue when there is a
split in the map plan (pradeepkth)
PIG-835: Multiquery optimization does not handle the case where the map keys
in the split plans have different key types (tuple and non tuple key type)
(pradeepkth)
PIG-839: incorrect return codes on failure when using -f or -e flags (hagleitn
via sms)
PIG-796: support conversion from numeric types to chararray (ashutoshc
via pradeepkth)
PIG-564: problem with parameter substitution and special charachters (olgan)
PIG-802: PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan)
PIG-816: PigStorage() does not accept Unicode characters in its contructor (pradeepkth)
PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan)
PIG-819: run -param -param; is a valid grunt command (milindb via olgan)
PIG-656: Use of eval or any other keyword in the package hierarchy of a UDF causes
parse exception (milindb via sms)
PIG-814: Make Binstorage more robust when data contains record markers (pradeepkth)
PIG-811: Globs with "?" in the pattern are broken in local mode (hagleitn via
olgan)
PIG-810: Fixed NPE in PigStats (gates)
PIG-804: problem with lineage with double map redirection (pradeepkth)
PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs
"FileSystem closed" error on large input (pradeepkth)
PIG-693: Parameter to UDF which is an alias returned in another UDF in nested
foreach causes incorrect results (thejas via sms)
PIG-725: javadoc: warning - Multiple sources of package comments found for
package "org.apache.commons.logging" (gkesavan via sms)
PIG-745: Add DataType.toString() to force basic types to chararray, useful
for UDFs that want to handle all simple types as strings (ciemo via gates)
PIG-514: COUNT returns no results as a result of two filter statements in
FOREACH (pradeepkth)
PIG-789: Fix dump and illustrate to work with new multi-query feature
(hagleitn via gates)
PIG-774: Pig does not handle Chinese characters (in both the parameter subsitution
using -param_file or embedded in the Pig script) correctly (daijy)
PIG-800: Fix distinct and order in local mode to not go into an infinite loop
(gates)
PIG-806: to remove author tags in the pig source code (sms
PIG-799: Unit tests on windows are failing after multiquery commit (daijy)
PIG-781: Error reporting for failed MR jobs (hagleitn via olgan)
Release 0.2.0
INCOMPATIBLE CHANGES
PIG-157: Add types and rework execution pipeline (gates)
PIG-458: integration with Hadoop 18 (olgan)
NEW FEATURES
PIG-139: command line editing (daijy via olgan)
PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates)
PIG-535: added rmf command
PIG-704 Added ALIASES command that shows all currently defined ALIASES.
Changed semantics of DEFINE to define last used alias if no argument is
given (ericg via gates)
PIG-713 Added alias completion as part of tab completion in grunt (ericg
via gates)
IMPROVEMENTS
PIG-270: proper line number for parse errors (daijy via olgan)
PIG-367: convinience function for UDFs to name schema
PIG-443: Illustrate for the Types branch (shubhamc via olgan)
PIG-599: Added buffering to BufferedPositionedInputStream (gates)
PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth
via olgan)
PIG-628: misc performance improvements (pradeepkth via olgan)
PIG-589: error handling, phase 1-2 (sms via olgan)
PIG-590: error handling, phase 3 (sms)
PIG-591: error handling, phase 4 (sms)
PIG-545: PERFORMANCE: Sampler for order bys does not produce a good
distribution (pradeepkth)
PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan)
PIG-636: Use lightweight bag implementations which do not register with
SpillableMemoryManager with Combiner (pradeepkth)
PIG-563: support for multiple combiner invocations (pradeepkth via olgan)
PIG-465: performance improvement - removing keys from the value (pradeepkth
via olgan)
PIG-450: PERFORMANCE: Distinct should make use of combiner to remove
duplicate values from keys. (gates)
PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth
via gates)
BUG FIXES
PIG-294: string comparator unit tests (sms via pi_song)
PIG-258: cleaning up directories on failure (daijy via olgan)
PIG-363: fix for describe to produce schema name
PIG-368: making JobConf available to Load/Store UDFs
PIG-311: cross is broken
PIG-369: support for filter UDFs
PIG-375: support for implicit split
PIG-301: fix for order by descending
PIG-378: fix for GENERATE + LIMIT
PIG-362: don't push limit above generate with flatten
PIG-381: bincond does not handle null data
PIG-382: bincond throws typecast exception
PIG-352: java.lang.ClassCastException when invalid field is accessed
PIG-329: TestStoreOld, 2 unit tests were broken
PIG-353: parsing of complex types
PIG-392: error handling with multiple MRjobs
PIG-397: code defaults to single reducer
PIG-373: unconnected load causes problem,
PIG-413: problem with float sum
PIG-398: Expressions not allowed inside foreach (sms via olgan)
PIG-418: divide by 0 problem
PIG-402: order by with user comparator (shravanmn via olgan)
PIG-415: problem with comparators (shravanmn via olgan)
PIG-422: cross is broken (shravanmn via olgan)
PIG-407: need to clone operators (pradeepkth via olgan)
PIG-428: TypeCastInserter does not replace projects in inner plans
correctly (pradeepkth vi olgan)
PIG-421: error with complex nested plan (sms via olgan)
PIG-429: Self join wth implicit split has the join output in wrong order
(pradeepkth via olgan)
PIG-434: short-circuit AND and OR (pradeepkth viia olgan)
PIG-333: allowing no parethesis with single column alias with flatten (sms
via olgan)
PIG-426: Adding result of two UDFs gives a syntax error
PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan)
PIG-436: alias is lost when single column is flattened (pradeepkth via
olgan)
PIG-364: Limit return incorrect records when we use multiple reducer
(daijy via olgan)
PIG-439: disallow alias renaming (pradeepkth via olgan)
PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth
via olgan)
PIG-442: Disambiguated alias after a foreach flatten is not accessible a
couple of statements after the foreach (sms via olgan)
PIG-424: nested foreach with flatten and agg gives an error (sms via
olgan)
PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD
connection is fully established (olgan)
PIG-430: Projections in nested filter and inside foreach do not work (sms
via olgan)
PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries
(shravanmn via olgan)
PIG-444: job.jar is left behined (pradeepkth via olgan)
PIG-447: improved error messages (pradeepkth via olgan)
PIG-448: explain broken after load with types (pradeepkth via olgan)
PIG-380: invalid schema for databag constant (sms via olgan)
PIG-451: If an field is part of group followed by flatten, then referring
to it causes a parse error (pradeepkth via olgan)
PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan)
PIG-459: increased sleep time before checking for job progress
PIG-462: LIMIT N should create one output file with N rows (shravanmn via
olgan)
PIG-376: set job name (olgan)
PIG-463: POCast changes (pradeepkth via olgan)
PIG-427: casting input to UDFs
PIG-437: as in alias names causing problems (sms via olgan)
PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan)
PIG-470: TextLoader should produce bytearrays (sms via olgan)
PIG-335: lineage (sms vi olgan)
PIG-464: bag schema definition (pradeepkth via olgan)
PIG-457: report 100% on successful jobs only (shravanmn via olgan)
PIG-471: ignoring status errors from hadoop (pradeepkth via olgan)
PIG-489: (*) processing (sms via olgan)
PIG-475: missing heartbeats (shravanmn via olgan)
PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan)
PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan)
PIG-501: Make branches/types work under cygwin (daijy via olgan)
PIG-504: cleanup illustrate not to produce cn= (shubhamc via olgan)
PIG-469: make sure that describe says "int" not "integer" (sms via olgan)
PIG-495: projecting of bags only give 1 field (olgan)
PIG-500: Load Func for POCast is not being set in some cases (sms via
olgan)
PIG-499: parser issue with as (sms via olgan)
PIG-507: permission error not reported (pradeepkth via olgan)
PIG-508: problem with double joins (pradeepkth via olgan)
PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan)
PIG-505: working with map elements (sms via olgan)
PIG-517: load functiin with parameters does not work with cast (pradeepkth
via olgan)
PIG-525: make sure cast for udf parameters works (olgan)
PIG-512: Expressions in foreach lead to errors (sms via olgan)
PIG-528: use UDF return in schema computation (sms via olgan)
PIG-527: allow PigStorage to write out complex output (sms via olgan)
PIG-537: Failure in Hadoop map collect stage due to type mismatch in the
keys used in cogroup (pradeepkth vi olgan)
PIG-538: support for null constants (pradeepkth via olgan)
PIG-385: more null handling (pradeepkth via olgan)
PIG-546: FilterFunc calls empty constructor when it should be calling
parameterized constructor (sms via olgan)
PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via
olgan)
PIG-501: make unit tests run under windows (daijy via olgan)
PIG-543: Restore local mode to truly run locally instead of use map
reduce. (shubhamc via gates)
PIG-556: Changed FindQuantiles to report progress. Fixed issue with null
reporter being passed to EvalFuncs. (gates)
PIG-6: Add load support from hbase (hustlmsp via gates)
PIG-522: make negation work (pradeepkth via olgan)
PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple
error (pradeepkth via olgan)
PIG-572 A PigServer.registerScript() method, which lets a client
programmatically register a Pig Script. (shubhamc via gates)
PIG-570: problems with handling bzip data (breed via olgan)
PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan)
PIG-623: Fix spelling errors in output messages (tomwhite via sms)
PIG-622: Include pig executable in distribution (tomwhite via sms)
PIG-615: Wrong number of jobs with limit (shravanmn via sms)
PIG-635: POCast.java has incorrect formatting (sms)
PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext()
gives a null pointer exception (pradeepkth)
PIG-632: Improved error message for binary operators (sms)
PIG-636: Performance improvement: Use lightweight bag implementations which do not
register with SpillableMemoryManager with Combiner (pradeepkth)
PIG-631: 4 Unit test failures on Windows (daijy)
PIG-645: Streaming is broken with the latest trunk (pradeepkth)
PIG-646: Distinct UDF should report progress (sms)
PIG-647: memory sized passed on pig command line does not get propagated
to JobConf (sms)
PIG-648: BinStorage fails when it finds markers unexpectedly in the data
(pradeepkth)
PIG-649: RandomSampleLoader does not handle skipping correctly in
getNext() (pradeepkth)
PIG-560: UTFDataFormatException (encoded string too long) is thrown when
storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms)
PIG-642: Limit after FRJ causes problems (daijy)
PIG-637: Limit broken after order by in the local mode (shubhamc via
olgan)
PIG-553: EvalFunc.finish() not getting called (shravanmn via sms)
PIG-654: Optimize build.xml (daijy)
PIG-574: allowing to run scripts from within grunt shell (hagleitn via
olgan)
PIG-665: Map key type not correctly set (for use when key is null) when
map plan does not have localrearrange (pradeepkth)
PIG-590: error handling on the backend (sms via olgan)
PIG-590: error handling on the backend (sms)
PIG-658: Data type long : When 'L' or 'l' is included with data
(123L or 123l) load produces null value. Also the case with Float (thejas
via sms)
PIG-591: Error handling phase four (sms via pradeepkth)
PIG-664: Semantics of * is not consistent (sms)
PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms)
PIG-655: Comparison of schemas of bincond operands is flawed (sms via
pradeepkth)
PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth
via sms)
PIG-577: outer join query looses name information (sms via pradeepkth)
PIG-690: UNION doesn't work in the latest code (pradeepkth via sms)
PIG-544: Utf8StorageConverter.java does not always produce NULLs when data
is malformed(thejas via sms)
PIG-532: Casting a field removes its alias.(thejas via sms)
PIG-705: Pig should display a better error message when backend error
messages cannot be parsed (sms)
PIG-650: pig should look for and use the pig specific
'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the
HOD case (sms)
PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan)
PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan)
PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via
olgan)
PIG-703: user documentation (chandec vi olgan)
PIG-711: Implement checkstyle for pig (gkesavan via olgan)
PIG-715: doc updates (chandec vi olgan)
PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates)
PIG-692: When running a job from a script, use the name of that script as
the default name for the job (vzaliva via gates)
PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan)
PIG-720: further doc cleanup (gkesavan via olgan)
Release 0.1.1 - 2008-12-04
INCOMPATIBLE CHANGES
NEW FEATURES
IMPROVEMENTS
PIG-253: integration with hadoop-18
BUG FIXES
PIG-342: Fix DistinctDataBag to recalculate size after it has spilled.
(bdimcheff via gates)
Release 0.1.0 - 2008-09-11
INCOMPATIBLE CHANGES
PIG-123: requires escape of '\' in chars and string
NEW FEATURES
PIG-20 Added custom comparator functions for order by (phunt via gates)
PIG-94: Streaming implementation (arunc via olgan)
PIG-58: parameter substitution
PIG-55: added custom splitter (groves via olgan)
PIG-59: Add a new ILLUSTRATE command (shubhamc via gates)
PIG-256: Added variable argument support for UDFs (pi_song)
IMPROVEMENTS:
PIG-8 added binary comparator (olgan)
PIG-11 Add capability to search for jar file to register. (antmagna via olgan)
PIG-7: Added use of combiner in some restricted cases. (gates)
PIG-47: Added methods to DataMap to provide access to its content
PIG-30: Rewrote DataBags to better handle decisions of when to spill to
disk and to spill more intelligently. (gates)
PIG-12: Added time stamps to log4j messages (phunt via gates)
PIG-44: Added adaptive decision of the number of records to hold in memory
before spilling (utkarsh)
PIG-56: Made DataBag implement Iterable. (groves via gates)
PIG-39: created more efficient version of read (spullara via olgan)
PIG-32: ABstraction layer (olgan)
PIG-83: Change everything except grunt and Main (PigServer on down) to use
common logging abstraction instead of log4j. By default in grunt, log4j
still used as logging layer. Also converted all System.out/err.println
statements to use logging instead. (francisoud via gates)
PIG-13: adding version to the system (joa23 via olgan)
PIG-113: Make explain output more understandable (pi_song via gates)
PIG-120: Support map reduce in local mode. To do this user needs to
specify execution type as mapreduce and cluster name as local (joa23 via gates)
PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates)
PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates)
BUG FIXES
PIG-24 Files that were incorrectly placed under test/reports have been
removed. ant clean now cleans test/reports. (milindb via gates)
PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@)
PIG-23 Made pig work with java 1.5. (milindb via gates)
PIG-17 integrated with Hadoop 0.15 (olgan@)
PIG-33 Help was commented out - uncommented (olgan)
PIG-31: second half of concurrent mode problem addressed (olgan)
PIG-14: added heartbeat functionality (olgan)
PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release
PIG-29: fixed bag factory to be properly initialized (utkarsh)
PIG-43: fixed problem where using the combiner prevented a pig alias
from being evaluated more than once. (gates)
PIG-45: Fixed pig.pl to not assume hodrc file is named the same as
cluster name (gates)
PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples
instead of Tuples, causing Reducer to crash in some cases.
PIG-41: Added patterns to svn:ignore
PIG-51: Fixed combiner in the presence of flattening
PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the
comparator function instead of Class.forName. (gates)
PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@)
PIG-77: Added eclipse specific files to svn:ignore
PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates)
PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates)
PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arunc
via olgan)
PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default
path. Also fix it to not die if pigclient.conf is missing. (craigm via
gates)
PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill
files when they are done spilling (contributions by craigm, breed, and
gates, committed by gates)
PIG-95: Remove System.exit() statements from inside pig (joa23 via gates)
PIG-65: convert tabs to spaces (groves via olgan)
PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when
more than one bag is involved (gates)
PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf
reference. (francisoud via gates)
PIG-80: In a number of places stack trace information was being lost by an
exception being caught, and a different exception then thrown. All those
locations have been changed so that the new exception now wraps the old.
(francisoud via gates)
PIG-84: Converted printStackTrace calls to calls to the logger.
(francisoud via gates)
PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates)
PIG-99: Fix to make unit tests not run out of memory. (francisoud via
gates)
PIG-107: enabled several tests. (francisoud via olgan)
PIG-46: abort processing on error for non-interactive mode (olston via
olgan)
PIG-109: improved exception handling (oae via olgan)
PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can
be run w/o access to a hadoop cluster. (xuzh via gates)
PIG-68: improvements to build.xml (joa23 via olgan)
PIG-110: Replaced code accidently merged out in PIG-32 fix that handled
flattening the combiner case. (gates and oae)
PIG-213: Remove non-static references to logger from data bags and tuples,
as it causes significant overhead (vgeschel via gates)
PIG-284: target for building source jar (oae via olgan)