| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| Pig Change Log |
| |
| Trunk (unreleased changes) |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-4728: Compilation against hbase 1.x fails with hbase-hadoop1-compat not found (szita via rohini) |
| |
| PIG-4897: Scope of param substitution for run/exec commands (knoguchi) |
| |
| PIG-4923: Drop Hadoop 1.x support in Pig 0.17 (szita via rohini) |
| |
| PIG-5109: Remove HadoopJobHistoryLoader (szita via daijy) |
| |
| PIG-5067: Revisit union on numeric type and chararray to bytearray (knoguchi) |
| |
| IMPROVEMENTS |
| |
| PIG-5120: Let tez_local mode run without a jar file (knoguchi) |
| |
| PIG-3851: Upgrade jline to 2.11 (daijy) |
| |
| PIG-4963: Add a Bloom join (rohini) |
| |
| PIG-3938: Add LoadCaster to EvalFunc (knoguchi) |
| |
| PIG-5105: Tez unit tests failing with "Argument list too long" (rohini) |
| |
| PIG-4901: To use Multistorage for each Group (szita via daijy) |
| |
| PIG-5025: Fix flaky test failures in TestLoad.java (szita via rohini) |
| |
| PIG-4939: QueryParserUtils.setHdfsServers(QueryParserUtils.java:104) should not be called for non-dfs |
| methods (szita via daijy) |
| |
| PIG-5034: Remove org.apache.hadoop.hive.serde2.objectinspector.primitive package (nkollar via daijy) |
| |
| PIG-5036: Remove biggish from e2e input dataset (daijy) |
| |
| PIG-5053: Can't change HDFS user home in e2e tests using Ant (nkollar via daijy) |
| |
| PIG-5037: Add api getDisplayString to PigStats (zjffdu) |
| |
| PIG-5020: Give file location for loadcaster related warning and errors (knoguchi) |
| |
| PIG-5027: Improve SAMPLE Scalar Expression Example (icook via knoguchi) |
| |
| PIG-5023: Documentation for BagToTuple (icook via knoguchi) |
| |
| PIG-5022: Error in TOKENIZE Example (icook vi knoguchi) |
| |
| PIG-4931: Document IN operator (dbist13 vi daijy) |
| |
| PIG-4852: Add accumulator implementation for MaxTupleBy1stField (szita via daijy) |
| |
| PIG-4925: Support for passing the bloom filter to the Bloom UDF (rohini) |
| |
| PIG-4911: Provide option to disable DAG recovery (rohini) |
| |
| PIG-4906: Add Bigdecimal functions in Over function (cgalan via daijy) |
| |
| PIG-2768: Fix org.apache.hadoop.conf.Configuration deprecation warnings for Hadoop 23 (rohini) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-5127: Test fail when running test-core-mrtez (daijy) |
| |
| PIG-5083: CombinerPackager and LitePackager should not materialize bags (rohini) |
| |
| PIG-5087: e2e Native3 failing after PIG-4923 (knoguchi) |
| |
| PIG-5073: Skip e2e Limit_5 test for Tez (knoguchi) |
| |
| PIG-5072: e2e Union_12 fails on typecast when oldpig=0.11 (knoguchi) |
| |
| PIG-3891: FileBasedOutputSizeReader does not calculate size of files in sub-directories (nkollar via rohini) |
| |
| PIG-5070: Allow Grunt e2e tests to run in parallel (knoguchi) |
| |
| PIG-5061: ant test -Dtestcase=TestBoolean failing (knoguchi) |
| |
| PIG-5066: e2e Jython_Checkin_2 failing due to floating precision difference (knoguchi) |
| |
| PIG-5063: e2e IOErrors_1 on mapreduce is unstable (knoguchi) |
| |
| PIG-5062: Allow Native e2e tests to run in parallel (knoguchi) |
| |
| PIG-5060: TestPigRunner.testDisablePigCounters2 failing with tez (knoguchi) |
| |
| PIG-5056: Fix AvroStorage writing enums (szita via daijy) |
| |
| PIG-5055: Infinite loop with join by fixed index (knoguchi) |
| |
| PIG-5049: Cleanup e2e tests turing_jython.conf (Daniel Dai) |
| |
| PIG-5033: MultiQueryOptimizerTez creates bad plan with union, split and FRJoin (rohini,tmwoordruff via rohini) |
| |
| PIG-4934: SET command does not work well with deprecated settings (szita via daijy) |
| |
| PIG-4798: big integer literals fail to parse (szita via daijy) |
| |
| PIG-5045: CSVExcelStorage Load: A Quoted Field with a Single Escaped Quote """" Becomes "" This should become " instead |
| (szita via daijy) |
| |
| PIG-5026: Remove src/META-INF/services/org.apache.hadoop.mapreduce.protocol.ClientProtocolProvider (nkollar via daijy) |
| |
| PIG-5041: RoundRobinPartitioner is not deterministic when order of input records change (rohini) |
| |
| PIG-5040: Order by and CROSS partitioning is not deterministic due to usage of Random (rohini |
| |
| PIG-5038: Pig Limit_2 e2e test failed with sort check (Konstantin_Harasov via rohini) |
| |
| PIG-5039: TestTypeCheckingValidatorNewLP.TestTypeCheckingValidatorNewLP is failing (nkollar via knoguchi) |
| |
| PIG-3087: Refactor TestLogicalPlanBuilder to be meaningful (szita via daijy) |
| |
| PIG-4976: streaming job with store clause stuck if the script fail (daijy via knoguchi) |
| |
| PIG-5035: killJob API does not work in Tez (zjffdu via rohini) |
| |
| PIG-5032: Output record stats in Tez is wrong when there is split followed by union (rohini) |
| |
| PIG-5031: Tez failing to compile when replicate join is done with a limit vertex on left (knoguchi) |
| |
| PIG-5019: Pig generates tons of warnings for udf with enabled warnings aggregation (murshyd via rohini) |
| |
| PIG-4974: A simple map reference fail to cast (knoguchi) |
| |
| PIG-4975 Map schema shows "Type: null Uid: null" in explain (knoguchi) |
| |
| PIG-4973: Bigdecimal divison fails (szita via daijy) |
| |
| PIG-4967: NPE in PigJobControl.run() when job status is null (water via daijy) |
| |
| PIG-4972: StreamingIO_1 fail on perl 5.22 (daijy) |
| |
| PIG-4933: TestDataBagAccess.testBagConstantFlatten1/TestLogicalPlanBuilder.testQuery90 broken after PIG-2315 (knoguchi) |
| |
| PIG-4965: Refactor test/perf/pigmix/bin/runpigmix.pl to delete the output of single test case |
| if we enable cleanup_after_test (kellyzly via daijy) |
| |
| PIG-4966: Fix Pig compatibility with Hive 2.1.0 (zyork via daijy) |
| |
| PIG-4935: TEZ_USE_CLUSTER_HADOOP_LIBS is always set to true (rohini) |
| |
| PIG-4961: CROSS followed by LIMIT inside nested foreach drop data from result (rohini) |
| |
| PIG-4960: Split followed by order by/skewed join is skewed in Tez (rohini) |
| |
| PIG-4957: See "Received kill signal" message for a normal run after PIG-4921 (rohini) |
| |
| PIG-4953: Predicate push-down will not run filters for single unary expressions (rdblue via daijy) |
| |
| PIG-4940: Predicate push-down filtering unary expressions can be pushed (rdblue via daijy) |
| |
| PIG-4938: [PiggyBank] XPath returns empty values when using aggregation method (nkollar via daijy) |
| |
| PIG-4896: Param substitution ignored when redefined (knoguchi) |
| |
| PIG-2315: Make as clause work in generate (daijy via knoguchi) |
| |
| PIG-4921: Kill running jobs on InterruptedException (rohini) |
| |
| PIG-4916: Pig on Tez fail to remove temporary HDFS files in some cases (daijy) |
| |
| Release 0.16.1 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-4945: Update document for conflicting macro params (knoguchi via daijy) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-5119: SkewedJoin_15 is unstable (daijy) |
| |
| PIG-5118: Script fails with Invalid dag containing 0 vertices (rohini) |
| |
| PIG-5111: e2e Utf8Test fails in local mode (rohini) |
| |
| PIG-5112: Cleanup pig-template.xml (daijy) |
| |
| PIG-5046: Skewed join with auto parallelism hangs when right input also has autoparallelism (rohini) |
| |
| PIG-5108: AvroStorage on Tez with exception on nested records (daijy) |
| |
| PIG-4260: SpillableMemoryManager.spill should revert spill on all exception (rohini) |
| |
| PIG-4918: Pig on Tez cannot switch pig.temp.dir to another fs (daijy) |
| |
| PIG-5078: Script fails with error - POStoreTez only accepts MROutput (rohini) |
| |
| PIG-5088: HashValuePartitioner has skew when there is only map fields (rohini) |
| |
| PIG-5043: Slowstart not applied in Tez with PARALLEL clause (rohini) |
| |
| PIG-4930: Skewed Join Breaks On Empty Sampled Input When Key is From Map (nkollar via rohini) |
| |
| PIG-3417: Job fails when skewed join is done on tuple key (nkollar via rohini) |
| |
| PIG-5074: Build broken when hadoopversion=20 in branch 0.16 (szita via daijy) |
| |
| PIG-5064: NPE in TestScriptUDF#testPythonBuiltinModuleImport1 when JAVA_HOME is not set (water via daijy) |
| |
| PIG-5048: HiveUDTF fail if it is the first expression in projection (nkollar via daijy) |
| |
| PIG-4951: Rename PIG_ATS_ENABLED constant (szita via daijy) |
| |
| PIG-4947: LOAD with HBaseStorage using a mix of pure wildcards and prefixed wildcards results |
| in empty maps for the pure wildcarded column families (daijy) |
| |
| PIG-4948: Pig on Tez AM use too much memory on a small cluster (daijy) |
| |
| PIG-4949: Fix registering jar in S3 which was broken by PIG-4417 in Pig 0.16 (yangyishan0901m via daijy) |
| |
| PIG-4950: Fix minor issues with running scripts in non-local FileSystems (petersla via daijy) |
| |
| Release 0.16.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-4719: Documentation for PIG-4704: Customizable Error Handling for Storers in Pig (daijy) |
| |
| PIG-4714: Improve logging across multiple components with callerId (daijy) |
| |
| PIG-4885: Turn off union optimizer if there is PARALLEL clause in union in Tez (rohini) |
| |
| PIG-4894: Add API for StoreFunc to specify if they are write safe from two different vertices (rohini) |
| |
| PIG-4884: Tez needs to use DistinctCombiner.Combine (rohini) |
| |
| PIG-4874: Remove schema tuple reference overhead for replicate join hashmap (rohini) |
| |
| PIG-4879: Pull latest version of joda-time (rohini) |
| |
| PIG-4526: Make setting up the build environment easier (nielsbasjes via rohini) |
| |
| PIG-4641: Print the instance of Object without using toString() (sandyridgeracer via rohini) |
| |
| PIG-4455: Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter (zjffdu via rohini) |
| |
| PIG-4866: Do not serialize PigContext in configuration to the backend (rohini) |
| |
| PIG-4547: Update Jython version to 2.7.0 (erwaman via daijy) |
| |
| PIG-4862: POProject slow by creating StackTrace repeatedly (knoguchi) |
| |
| PIG-4853: Fetch inputs before starting outputs (rohini) |
| |
| PIG-4847: POPartialAgg processing and spill improvements (rohini) |
| |
| PIG-4840: Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups (rohini) |
| |
| PIG-4843: Turn off combiner in reducer vertex for Tez if bags are in combine plan (rohini) |
| |
| PIG-4796: Authenticate with Kerberos using a keytab file (nielsbasjes via daijy) |
| |
| PIG-4817: Bump HTTP Logparser to version 2.4 (nielsbasjes via daijy) |
| |
| PIG-4811: Upgrade groovy library to address MethodClosure vulnerability (daijy) |
| |
| PIG-4803: Improve performance of regex-based builtin functions (eyal via daijy) |
| |
| PIG-4802: Autoparallelism should estimate less when there is combiner (rohini) |
| |
| PIG-4761: Add more information to front end error messages (eyal via daijy) |
| |
| PIG-4792: Do not add java and sun system properties to jobconf (rohini) |
| |
| PIG-4787: Log JSONLoader exception while parsing records (rohini) |
| |
| PIG-4763: Insufficient check for the number of arguments in runpigmix.pl (sekikn via rohini) |
| |
| PIG-4411: Support for vertex level configuration like speculative execution (rohini) |
| |
| PIG-4775: Better default values for shuffle bytes per reducer (rohini) |
| |
| PIG-4753: Pigmix should have option to delete outputs after completing the tests (mitdesai via rohini) |
| |
| PIG-4744: Honor tez.staging-dir setting in tez-site.xml (rohini via daijy) |
| |
| PIG-4742: Document Pig's Register Artifact Command added in PIG-4417 (akshayrai09 via daijy) |
| |
| PIG-4417: Pig's register command should support automatic fetching of jars from repo (akshayrai09 via daijy) |
| |
| PIG-4713: Document Bloom UDF (gliptak via daijy) |
| |
| PIG-3251: Bzip2TextInputFormat requires double the memory of maximum record size (knoguchi) |
| |
| PIG-4704: Customizable Error Handling for Storers in Pig (siddhimehta via daijy) |
| |
| PIG-4717: Update Apache HTTPD LogParser to latest version (nielsbasjes via daijy) |
| |
| PIG-4468: Pig's jackson version conflicts with that of hadoop 2.6.0 or newer (zjffdu via daijy) |
| |
| PIG-4708: Upgrade joda-time to 2.8 (rohini) |
| |
| PIG-4697: Pig needs to serialize only part of the udfcontext for each vertex (rohini) |
| |
| PIG-4702: Load once for sampling and partitioning in order by for certain LoadFuncs (rohini) |
| |
| PIG-4699: Print Job stats information in Tez like mapreduce (rohini) |
| |
| PIG-4554: Compress pig.script before encoding (sandyridgeracer via rohini) |
| |
| PIG-4670: Embedded Python scripts still parse line by line (rohini) |
| |
| PIG-4663: HBaseStorage should allow the MaxResultsPerColumnFamily limit to avoid memory or scan timeout issues (pmazak via rohini) |
| |
| PIG-4673: Built In UDF - REPLACE_MULTI : For a given string, search and replace all occurrences |
| of search keys with replacement values (murali.k.h.rao@gmail.com via daijy) |
| |
| PIG-4674: TOMAP should infer schema (daijy) |
| |
| PIG-4676: Upgrade Hive to 1.2.1 (daijy) |
| |
| PIG-4574: Eliminate identity vertex for order by and skewed join right after LOAD (rohini) |
| |
| PIG-4365: TOP udf should implement Accumulator interface (eyal via rohini) |
| |
| PIG-4570: Allow AvroStorage to use a class for the schema (pmazak via daijy) |
| |
| PIG-4405: Adding 'map[]' support to mock/Storage (nielsbasjes via daijy) |
| |
| PIG-4638: Allow TOMAP to accept dynamically sized input (nielsbasjes via daijy) |
| |
| PIG-4639: Add better parser for Apache HTTPD access log (nielsbasjes via daijy) |
| |
| BUG FIXES |
| |
| PIG-4821: Pig chararray field with special UTF-8 chars as part of tuple join key produces wrong results in Tez (rohini) |
| |
| PIG-4734: TOMAP schema inferring breaks some scripts in type checking for bincond (daijy) |
| |
| PIG-4786: CROSS will not work correctly with Grace Parallelism (daijy) |
| |
| PIG-3227: SearchEngineExtractor does not work for bing (dannyant via daijy) |
| |
| PIG-4902: Fix UT failures on 0.16 branch: TestTezGraceParallelism, TestPigScriptParser (daijy) |
| |
| PIG-4909: PigStorage incompatible with commons-cli-1.3 (knoguchi) |
| |
| PIG-4908: JythonFunction refers to Oozie launcher script absolute path (rohini) |
| |
| PIG-4905: Input of empty dir does not produce empty output file in Tez (rohini) |
| |
| PIG-4576: Nightly test HCat_DDL_2 fails with TDE ON (nmaheshwari via daijy) |
| |
| PIG-4873: InputSplit.getLocations return null and result a NPE in Pig (daijy) |
| |
| PIG-4895: User UDFs relying on mapreduce.job.maps broken in Tez (rohini) |
| |
| PIG-4883: MapKeyType of splitter was set wrongly in specific multiquery case (kellyzly via rohini) |
| |
| PIG-4887: Parameter substitution skipped with glob on register (knoguchi) |
| |
| PIG-4889: Replacing backslash fails as lexical error (knoguchi) |
| |
| PIG-4880: Overlapping of parameter substitution names inside&outside a macro fails with NPE (knoguchi) |
| |
| PIG-4881: TestBuiltin.testUniqueID failing on hadoop-1.x (knoguchi) |
| |
| PIG-4888: Line number off when reporting syntax error inside a macro (knoguchi) |
| |
| PIG-3772: Syntax error when casting an inner schema of a bag and line break involved (ssvinarchukhorton via knoguchi) |
| |
| PIG-4892: removing /tmp/output before UT (daijy) |
| |
| PIG-4882: Remove hardcoded groovy.grape.report.downloads=true from DownloadResolver (erwaman via daijy) |
| |
| PIG-4581: thread safe issue in NodeIdGenerator (rcatherinot via rohini) |
| |
| PIG-4878: Fix issues from PIG-4847 (rohini) |
| |
| PIG-4877: LogFormat parser fails test (nielsbasjes via daijy) |
| |
| PIG-4860: Loading data using OrcStorage() accepts only default FileSystem path (beriaanirudh via rohini) |
| |
| PIG-4868: Low values for bytes.per.reducer configured by user not honored in Tez for inputs (rohini) |
| |
| PIG-4869: Removing unwanted configuration in Tez broke ConfiguredFailoverProxyProvider (rohini) |
| |
| PIG-4867: -stop_on_failure does not work with Tez (rohini) |
| |
| PIG-4844: Tez AM runs out of memory when vertex has high number of outputs (rohini) |
| |
| PIG-4851: Null not padded when input has less fields than declared schema for some loader (rohini) |
| |
| PIG-4850: Registered jars do not use submit replication (rdblue via cheolsoo) |
| |
| PIG-4845: Parallel instantiation of classes in Tez cause tasks to fail (rohini) |
| |
| PIG-4841: Inline-op with schema declaration fails with syntax error (knoguchi) |
| |
| PIG-4832: Fix TestPrumeColumn NPE failure (kellyzly via daijy) |
| |
| PIG-4833 TestBuiltin.testURIWithCurlyBrace in TEZ failing after PIG-4819 (knoguchi) |
| |
| PIG-4819: RANDOM() udf can lead to missing or redundant records (knoguchi) |
| |
| PIG-4816: Read a null scalar causing a Tez failure (daijy) |
| |
| PIG-4818: Single quote inside comment in GENERATE is not being ignored (knoguchi) |
| |
| PIG-4814: AvroStorage does not take namenode HA as part of schema file url (daijy) |
| |
| PIG-4812: Register Groovy UDF with relative path does not work (daijy) |
| |
| PIG-4806: UDFContext can be reset in the middle during Tez input and output initialization (rohini) |
| |
| PIG-4808: PluckTuple overwrites regex if used more than once in the same script (eyal via daijy) |
| |
| PIG-4801: Provide backward compatibility with mapreduce mapred.task settings (rohini) |
| |
| PIG-4759: Fix Classresolution_1 e2e failure (rohini) |
| |
| PIG-4800: EvalFunc.getCacheFiles() fails for different namenode (rohini) |
| |
| PIG-4790: Join after union fail due to UnionOptimizer (rohini) |
| |
| PIG-4686: Backend code should not call AvroStorageUtils.getPaths (mitdesai via rohini) |
| |
| PIG-4795: Flushing ObjectOutputStream before calling toByteArray on the underlying ByteArrayOutputStream (emopers via daijy) |
| |
| PIG-4690: Union with self replicate join will fail in Tez (rohini) |
| |
| PIG-4791: PORelationToExprProject filters records instead of returning emptybag in nested foreach after union (rohini) |
| |
| PIG-4779: testBZ2Concatenation[pig.bzip.use.hadoop.inputformat = true] failing due to successful read (knoguchi) |
| |
| PIG-4587: Applying isFirstReduceOfKey for Skewed left outer join skips records (rohini) |
| |
| PIG-4782: OutOfMemoryError: GC overhead limit exceeded with POPartialAgg (rohini) |
| |
| PIG-4737: Check and fix clone implementation for all classes extending PhysicalOperator (rohini) |
| |
| PIG-4770: OOM with POPartialAgg in some cases (rohini) |
| |
| PIG-4773: [Pig on Tez] Secondary key descending sort in nested foreach after union does ascending instead (rohini) |
| |
| PIG-4774: Fix NPE in SUM,AVG,MIN,MAX UDFs for null bag input (rohini) |
| |
| PIG-4757: Job stats on successfully read/output records wrong with multiple inputs/outputs (rohini) |
| |
| PIG-4769: UnionOptimizer hits errors when merging vertex group into split (rohini) |
| |
| PIG-4768: EvalFunc reporter is null in Tez (rohini) |
| |
| PIG-4760: TezDAGStats.convertToHadoopCounters is not used, but impose MR counter limit (daijy) |
| |
| PIG-4755: Typo in runpigmix script (mitdesai via daijy) |
| |
| PIG-4736: Removing empty keys in UDFContext broke one LoadFunc (rohini) |
| |
| PIG-4733: Avoid NullPointerException in JVMReuseImpl for builtin classes (rohini) |
| |
| PIG-4722: [Pig on Tez] NPE while running Combiner (rohini) |
| |
| PIG-4730: [Pig on Tez] Total parallelism estimation does not account load parallelism (rohini) |
| |
| PIG-4689: CSV Writes incorrect header if two CSV files are created in one script (nielsbasjes via daijy) |
| |
| PIG-4727: Incorrect types table for AVG in docs (nsmith via daijy) |
| |
| PIG-4725: Typo in FrontendException messages "Incompatable" (nsmith via daijy) |
| |
| PIG-4721: IsEmpty documentation error (nsmith via daijy) |
| |
| PIG-4712: [Pig on Tez] NPE in Bloom UDF after Union (rohini) |
| |
| PIG-4707: [Pig on Tez] Streaming job hangs with pig.exec.mapPartAgg=true (rohini) |
| |
| PIG-4703: TezOperator.stores shall not ship to backend (daijy) |
| |
| PIG-4696: Empty map returned by a streaming_python udf wrongly contains a null key (cheolsoo) |
| |
| PIG-4691: [Pig on Tez] Support for whitelisting storefuncs for union optimization (rohini) |
| |
| PIG-3957: Refactor out resetting input key in TezDagBuilder (rohini) |
| |
| PIG-4688: Limit followed by POPartialAgg can give empty or partial results in Tez (rohini) |
| |
| PIG-4635: NPE while running pig script in tez mode (daijy) |
| |
| PIG-4683: Nested order is broken after PIG-3591 in some cases (daijy) |
| |
| PIG-4679: Performance degradation due to InputSizeReducerEstimator since PIG-3754 (daijy) |
| |
| PIG-4315: MergeJoin or Split followed by order by gives NPE in Tez (rohini) |
| |
| PIG-4654: Reduce tez memory.reserve-fraction and clear spillables for better memory utilization (rohini) |
| |
| PIG-4628: Pig 0.14 job with order by fails in mapreduce mode with Oozie (knoguchi) |
| |
| PIG-4651: Optimize NullablePartitionWritable serialization for skewed join (rohini) |
| |
| PIG-4627: [Pig on Tez] Self join does not handle null values correctly (rohini) |
| |
| PIG-4644: PORelationToExprProject.clone() is broken (erwaman via rohini) |
| |
| PIG-4650: ant mvn-deploy target is broken (daijy) |
| |
| PIG-4649: [Pig on Tez] Union followed by HCatStorer misses some data (rohini) |
| |
| PIG-4636: Occurred spelled incorrectly in error message for Launcher and POMergeCogroup (stevenmz via daijy) |
| |
| PIG-4624: Error on ORC empty file without schema (daijy) |
| |
| PIG-3622: Allow casting bytearray fields to bytearray type (redisliu via daijy) |
| |
| PIG-4618: When use tez as the engine , set pig.user.cache.enabled=true do not take effect (wisgood via rohini) |
| |
| PIG-4533: Document error: Pig does support concatenated gz file (xhudik via daijy) |
| |
| PIG-4578: ToDateISO should support optional ' ' space variant used by JDBC (michaelthoward via daijy) |
| |
| Release 0.15.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-4560: Pig 0.15.0 Documentation (daijy) |
| |
| PIG-4429: Add Pig alias information and Pig script to the DAG view in Tez UI (daijy) |
| |
| PIG-3994: Implement getting backend exception for Tez (rohini) |
| |
| PIG-4563: Upgrade to released Tez 0.7.0 (daijy) |
| |
| PIG-4525: Clarify "Scalar has more than one row in the output." (Niels Basjes via gates) |
| |
| PIG-4511: Add columns to prune from PluckTuple (jbabcock via cheolsoo) |
| |
| PIG-4434: Improve auto-parallelism for tez (daijy) |
| |
| PIG-4495: Better multi-query planning in case of multiple edges (rohini) |
| |
| PIG-3294: Allow Pig use Hive UDFs (daijy) |
| |
| PIG-4476: Fix logging in AvroStorage* classes and SchemaTuple class (rdsr via rohini) |
| |
| PIG-4458: Support UDFs in a FOREACH Before a Merge Join (wattsinabox via daijy) |
| |
| PIG-4454: Upgrade tez to 0.6.0 (daijy) |
| |
| PIG-4451: Log partition and predicate filter pushdown information and fix optimizer looping (rohini) |
| |
| PIG-4430: Pig should support reading log4j.properties file from classpath as well (rdsr via daijy) |
| |
| PIG-4407: Allow specifying a replication factor for jarcache (jira.shegalov via rohini) |
| |
| PIG-4401: Add pattern matching to PluckTuple (cheolsoo) |
| |
| PIG-2692: Make the Pig unit faciliities more generalizable and update javadocs (razsapps via daijy) |
| |
| PIG-4379: Make RoundRobinPartitioner public (daijy) |
| |
| PIG-4378: Better way to fix tez local mode test hanging (daijy) |
| |
| PIG-4358: Add test cases for utf8 chinese in Pig (nmaheshwari via daijy) |
| |
| PIG-4370: HBaseStorage should support delete markers (bridiver via daijy) |
| |
| PIG-4360: HBaseStorage should support setting the timestamp field (bridiver via daijy) |
| |
| PIG-4337: Split Types and MultiQuery e2e tests into multiple groups (rohini) |
| |
| PIG-4333: Split BigData tests into multiple groups (rohini) |
| |
| BUG FIXES |
| |
| PIG-4592: Pig 0.15 stopped working with Hadoop 1.x (daijy) |
| |
| PIG-4580: Fix TestTezAutoParallelism.testSkewedJoinIncreaseParallelism test failure (daijy) |
| |
| PIG-4571: TestPigRunner.testGetHadoopCounters fail on Windows (daijy) |
| |
| PIG-4541: Skewed full outer join does not return records if any relation is empty. Outer join does not |
| return any record if left relation is empty (daijy) |
| |
| PIG-4564: Pig can deadlock in POPartialAgg if there is a bag (rohini via daijy) |
| |
| PIG-4569: Fix e2e test Rank_1 failure (rohini) |
| |
| PIG-4490: MIN/MAX builtin UDFs return wrong results when accumulating for strings (xplenty via rohini) |
| |
| PIG-4418: NullPointerException in JVMReuseImpl (rohini) |
| |
| PIG-4562: Typo in DataType.toDateTime (daijy) |
| |
| PIG-4559: Fix several new tez e2e test failures (daijy) |
| |
| PIG-4506: binstorage fails to write biginteger (ssavvides via daijy) |
| |
| PIG-4556: Local mode is broken in some case by PIG-4247 (daijy) |
| |
| PIG-4523: Tez engine should use tez config rather than mr config whenever possible (daijy) |
| |
| PIG-4452: Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error (daijy) |
| |
| PIG-4543: TestEvalPipelineLocal.testRankWithEmptyReduce fail on Hadoop 1 (daijy) |
| |
| PIG-4544: Upgrade Hbase to 0.98.12 (daijy) |
| |
| PIG-4481: e2e tests ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce different result on Windows (daijy) |
| |
| PIG-4496: Fix CBZip2InputStream to close underlying stream (petersla via daijy) |
| |
| PIG-4528: Fix a typo in src/docs/src/documentation/content/xdocs/basic.xml (namusyaka via daijy) |
| |
| PIG-4532: Pig Documentation contains typo for AvroStorage (fredericschmaljohann via daijy) |
| |
| PIG-4377: Skewed outer join produce wrong result in some cases (daijy) |
| |
| PIG-4538: Pig script fail with CNF in follow up MR job (daijy) |
| |
| PIG-4537: Fix unit test failure introduced by TEZ-2392: TestCollectedGroup, TestLimitVariable, TestMapSideCogroup, etc (daijy) |
| |
| PIG-4530: StackOverflow in TestMultiQueryLocal running under hadoop20 (nielsbasjes via rohini) |
| |
| PIG-4529: Pig on tez hit counter limit imposed by MR (daijy) |
| |
| PIG-4524: Pig Minicluster unit tests broken by TEZ-2333 (daijy) |
| |
| PIG-4527: NON-ASCII Characters in Javadoc break 'ant docs' (nielsbasjes via daijy) |
| |
| PIG-4494: Pig's htrace version conflicts with that of hadoop 2.6.0 (daijy) |
| |
| PIG-4519: Correct link to Contribute page (gliptak via daijy) |
| |
| PIG-4514: pig trunk compilation is broken - VertexManagerPluginContext.reconfigureVertex change (thejas) |
| |
| PIG-4503: [Pig on Tez] NPE in UnionOptimizer with multiple levels of union (rohini) |
| |
| PIG-4509: [Pig on Tez] Unassigned applications not killed on shutdown (rohini) |
| |
| PIG-4508: [Pig on Tez] PigProcessor check for commit only on MROutput (rohini) |
| |
| PIG-4505: [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx (rohini) |
| |
| PIG-4502: E2E tests build fail with udfs compile (nmaheshwari via daijy) |
| |
| PIG-4498: AvroStorage in Piggbank does not handle bad records and fails (viraj via rohini) |
| |
| PIG-4499: mvn-build miss tez classes in pig-h2.jar (daijy) |
| |
| PIG-4488: Pig on tez mask tez.queue.name (daijy) |
| |
| PIG-4497: [Pig on Tez] NPE for null scalar (rohini) |
| |
| PIG-4493: Pig on Tez gives wrong results if Union is followed by Split (rohini) |
| |
| PIG-4491: Streaming Python Bytearray Bugs (jeremykarn via daijy) |
| |
| PIG-4487: Pig on Tez gives wrong success message on failure in case of multiple outputs (rohini) |
| |
| PIG-4483: Pig on Tez output statistics shows storing to same directory twice for union (rohini) |
| |
| PIG-4480: Pig script failure on Tez with split and order by due to missing sample collection (rohini) |
| |
| PIG-4484: Ant pull jetty-6.1.26.zip on some platform (daijy) |
| |
| PIG-4479: Pig script with union within nested splits followed by join failed on Tez (rohini) |
| |
| PIG-4457: Error is thrown by JobStats.getOutputSize() when storing to a MySql table (rohini) |
| |
| PIG-4475: Keys in AvroMapWrapper are not proper Pig types (rdsr via daijy) |
| |
| PIG-4478: TestCSVExcelStorage fails with jdk8 (rohini) |
| |
| PIG-4474: Increasing intermediate parallelism has issue with default parallelism (rohini) |
| |
| PIG-4465: Pig streaming ship fails for relative paths on Tez (rohini) |
| |
| PIG-4461: Use benchmarks for Windows Pig e2e tests (nmaheshwari via daijy) |
| |
| PIG-4463: AvroMapWrapper still leaks Avro data types and AvroStorageDataConversionUtilities do not handle |
| Pig maps (rdsr via daijy) |
| |
| PIG-4460: TestBuiltIn testValueListOutputSchemaComplexType and testValueSetOutputSchemaComplexType tests |
| create bags whose inner schema is not a tuple (erwaman via daijy) |
| |
| PIG-4448: AvroMapWrapper leaks Avro data types when the map values are complex avro records (rdsr via daijy) |
| |
| PIG-4453: Remove test-tez-local target (daijy) |
| |
| PIG-4443: Write inputsplits in Tez to disk if the size is huge and option to compress pig input splits (rohini) |
| |
| PIG-4447: Pig Cannot handle nullable values (arrays and records) in avro records (rdsr via daijy) |
| |
| PIG-4444: Fix unit test failure TestTezAutoParallelism (daijy) |
| |
| PIG-4445: VALUELIST and VALUESET outputSchema does not match actual schema of data returned when map value schema |
| is complex (erwaman via daijy) |
| |
| PIG-4442: Eliminate redundant RPC call to get file information in HPath (cnauroth via daijy) |
| |
| PIG-4440: Some code samples in documentation use Unicode left/right single quotes, which cause a |
| parse failure (cnauroth via daijy) |
| |
| PIG-4264: Port TestAvroStorage to tez local mode (daijy) |
| |
| PIG-4437: Fix tez unit test failure TestJoinSmoke, TestSkewedJoin (daijy) |
| |
| PIG-4432: Built-in VALUELIST and VALUESET UDFs do not preserve the schema when the map value type is |
| a complex type (erwaman via daijy) |
| |
| PIG-4408: Merge join should support replicated join as a predecessor (bridiver via daijy) |
| |
| PIG-4389: Flag to run selected test suites in e2e tests (daijy) |
| |
| PIG-4385: testDefaultBootup fails because it cannot find "pig.properties" (mkudlej via daijy) |
| |
| PIG-4397: CSVExcelStorage incorrect output if last field value is null (daijy) |
| |
| PIG-4431: ReadToEndLoader does not close the record reader for the last input split (rdsr via daijy) |
| |
| PIG-4426: RowNumber(simple) Rank not producing correct results (knoguchi) |
| |
| PIG-4433: Loading bigdecimal in nested tuple does not work (kpriceyahoo via daijy) |
| |
| PIG-4410: Fix testRankWithEmptyReduce in tez mode (daijy) |
| |
| PIG-4392: RANK BY fails when default_parallel is greater than cardinality of field being ranked by (daijy) |
| |
| PIG-4403: Combining -Dpig.additional.jars.uris with -useHCatalog breaks due to combination |
| with colon instead of comma (ovlaere via daijy) |
| |
| PIG-4402: JavaScript UDF example in the doc is broken (cheolsoo) |
| |
| PIG-4394: Fix Split_9 and Union_5 e2e failures (rohini) |
| |
| PIG-4391: Fix TestPigStats test failure (rohini) |
| |
| PIG-4387: Honor yarn settings in tez-site.xml and optimize dag status fetch (rohini) |
| |
| PIG-4352: Port local mode tests to Tez - TestUnionOnSchema (daijy) |
| |
| PIG-4359: Port local mode tests to Tez - part4 (daijy) |
| |
| PIG-4340: PigStorage fails parsing empty map (daijy) |
| |
| PIG-4366: Port local mode tests to Tez - part5 (daijy) |
| |
| PIG-4381: PIG grunt shell DEFINE commands fails when it spans multiple lines (daijy) |
| |
| PIG-4384: TezLauncher thread should be deamon thread (zjffdu via daijy) |
| |
| PIG-4376: NullPointerException accessing a field of an invalid bag from a nested foreach |
| (kspringborn via daijy) |
| |
| PIG-4355: Piggybank: XPath cant handle namespace in xpath, nor can it return more than one match |
| (cavanaug via daijy) |
| |
| PIG-4371: Duplicate snappy.version in libraries.properties (daijy) |
| |
| PIG-4368: Port local mode tests to Tez - TestLoadStoreFuncLifeCycle (daijy) |
| |
| PIG-4367: Port local mode tests to Tez - TestMultiQueryBasic (daijy) |
| |
| PIG-4339: e2e test framework assumes default exectype as mapred (rohini) |
| |
| PIG-2949: JsonLoader only reads arrays of objects (eyal via daijy) |
| |
| PIG-4213: CSVExcelStorage not quoting texts containing \r (CR) when storing (alfonso.nishikawa via daijy) |
| |
| PIG-2647: Split Combining drops splits with empty getLocations() (tmwoodruff via daijy) |
| |
| PIG-4294: Enable unit test "TestNestedForeach" for spark (kellyzly via rohini) |
| |
| PIG-4282: Enable unit test "TestForEachNestedPlan" for spark (kellyzly via rohini) |
| |
| PIG-4361: Fix perl script problem in TestStreaming.java (kellyzly via xuefu) |
| |
| PIG-4354: Port local mode tests to Tez - part3 (daijy) |
| |
| PIG-4338: Fix test failures with JDK8 (rohini) |
| |
| PIG-4351: TestPigRunner.simpleTest2 fail on trunk (daijy) |
| |
| PIG-4350: Port local mode tests to Tez - part2 (daijy) |
| |
| PIG-4326: AvroStorageSchemaConversionUtilities does not properly convert schema for maps of arrays of records (mprim via daijy) |
| |
| PIG-4345: e2e test "RubyUDFs_13" fails because of the different result of "group a all" in different engines like "spark", "mapreduce" (kellyzly via rohini) |
| |
| PIG-4332: Remove redundant jars packaged into pig-withouthadoop.jar for hadoop 2 (rohini) |
| |
| PIG-4331: update README, '-x' option in usage to include tez (thejas via daijy) |
| |
| PIG-4327: Schema of map with value that has an alias can't be parsed again (mprim via daijy) |
| |
| PIG-4330: Regression test for PIG-3584 - AvroStorage does not correctly translate arrays of strings (brocknoland via daijy) |
| |
| PIG-3615: Update the way that JsonLoader/JsonStorage deal with BigDecimal (tyro89 via daijy) |
| |
| PIG-4329: Fetch optimization should be disabled when limit is not pushed up (lbendig via cheolsoo) |
| |
| PIG-3413: JsonLoader fails the pig job in case of malformed json input (eyal via daijy) |
| |
| PIG-4247: S3 properties are not picked up from core-site.xml in local mode (cheolsoo) |
| |
| PIG-4242: For indented xmls with multiline content (e.g. wikipedia) XMLLoader cuts out the begining of every line |
| (holdfenytolvaj via daijy) |
| |
| Release 0.14.1 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| BUG FIXES |
| |
| PIG-4409: fs.defaultFS is overwritten in JobConf by replicated join at runtime (cheolsoo) |
| |
| PIG-4404: LOAD with HBaseStorage on secure cluster is broken in Tez (rohini) |
| |
| PIG-4375: ObjectCache should use ProcessorContext.getObjectRegistry() (rohini) |
| |
| PIG-4334: PigProcessor does not set pig.datetime.default.tz (rohini) |
| |
| PIG-4342: Pig 0.14 cannot identify the uppercase of DECLARE and DEFAULT (daijy) |
| |
| Release 0.14.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-4321: Documentation for 0.14 (daijy) |
| |
| PIG-4328: Upgrade Hive to 0.14 (daijy) |
| |
| PIG-4318: Make PigConfiguration naming consistent (rohini) |
| |
| PIG-4316: Port TestHBaseStorage to tez local mode (rohini) |
| |
| PIG-4224: Upload Tez payload history string to timeline server (daijy) |
| |
| PIG-3977: Get TezStats working for Oozie (rohini) |
| |
| PIG-3979: group all performance, garbage collection, and incremental aggregation (rohini) |
| |
| PIG-4253: Add a UniqueID UDF (daijy) |
| |
| PIG-4160: Provide a way to pass local jars in pig.additional.jars when using a remote |
| url for a script (acoliver via daijy) |
| |
| PIG-4246: HBaseStorage should implement getShipFiles (rohini) |
| |
| PIG-3456: Reduce threadlocal conf access in backend for each record (rohini) |
| |
| PIG-3861: duplicate jars get added to distributed cache (chitnis via rohini) |
| |
| PIG-4039: New interface for resetting static variables for jvm reuse (rohini) |
| |
| PIG-3870: STRSPLITTOBAG UDF (cryptoe via daijy) |
| |
| PIG-4080: Add Preprocessor commands and more to the black/whitelisting feature (prkommireddi via daijy) |
| |
| PIG-4162: Intermediate reducer parallelism in Tez should be higher (rohini) |
| |
| PIG-4186: Fix e2e run against new build of pig and some enhancements (rohini) |
| |
| PIG-3838: Organize tez code into subpackages (rohini) |
| |
| PIG-4069: Limit reduce task should start as soon as one map task finishes (rohini) |
| |
| PIG-4141: Ship UDF/LoadFunc/StoreFunc dependent jar automatically (daijy) |
| |
| PIG-4146: Create a target to run mr and tez unit test in one shot (daijy) |
| |
| PIG-4144: Make pigunit.PigTest work in tez mode (daijy) |
| |
| PIG-4128: New logical optimizer rule: ConstantCalculator (daijy) |
| |
| PIG-4124: Command for Python streaming udf should be configurable (cheolsoo) |
| |
| PIG-4114: Add Native operator to tez (daijy) |
| |
| PIG-4117: Implement merge cogroup in Tez (daijy) |
| |
| PIG-4119: Add message at end of each testcase with timestamp in Pig system tests (nmaheshwari via daijy) |
| |
| PIG-4008: Pig code change to enable Tez Local mode (airbots via daijy) |
| |
| PIG-4091: Predicate pushdown for ORC (rohini via daijy) |
| |
| PIG-4077: Some fixes and e2e test for OrcStorage (rohini) |
| |
| PIG-4054: Do not create job.jar when submitting job (daijy) |
| |
| PIG-4047: Break up pig withouthadoop and fat jar (daijy) |
| |
| PIG-4062: Add ascending order option to builtin TOP function (raj171 via cheolsoo) |
| |
| PIG-3558: ORC support for Pig (daijy) |
| |
| PIG-2122: Parameter Substitution doesn't work in the Grunt shell (daijy) |
| |
| PIG-4031: Provide Counter aggregation for Tez (daijy) |
| |
| PIG-4028: add a flag to control the ivy resolve/retrieve output (gkesavan via daijy) |
| |
| PIG-4015: Provide a way to disable auto-parallism in tez (daijy) |
| |
| PIG-3846: Implement automatic reducer parallelism (daijy) |
| |
| PIG-3939: SPRINTF function to format strings using a printf-style template (mrflip via cheolsoo) |
| |
| PIG-3970: Merge Tez branch into trunk (daijy) |
| |
| OPTIMIZATIONS |
| |
| PIG-4657: [Pig on Tez] Optimize GroupBy and Distinct key comparison (rohini) |
| |
| |
| BUG FIXES |
| |
| PIG-4335: Pig release tarball miss tez classes (daijy) |
| |
| PIG-4325: StackOverflow when spilling InternalCachedBag (daijy) |
| |
| PIG-4324: Remove jsch-LICENSE.txt (daijy) |
| |
| PIG-4267: ToDate has incorrect timezone offsets (bridiver via daijy) |
| |
| PIG-4319: Make LoadPredicatePushdown InterfaceAudience.Private till PIG-4093 (rohini) |
| |
| PIG-4312: TestStreamingUDF tez mode leave orphan process on Windows (daijy) |
| |
| PIG-4314: BigData_5 hang on some machine (daijy) |
| |
| PIG-4299: SpillableMemoryManager assumes tenured heap incorrectly (prkommireddi via daijy) |
| |
| PIG-4298: Descending order-by is broken in some cases when key is bytearrays (cheolsoo) |
| |
| PIG-4263: Move tez local mode unit tests to a separate target (daijy) |
| |
| PIG-4257: Fix several e2e tests on secure cluster (daijy) |
| |
| PIG-4261: Skip shipping local resources in tez local mode (daijy) |
| |
| PIG-4182: e2e tests Scripting_[1-12] fail on Windows (daijy) |
| |
| PIG-4259: Fix few issues related to Union, CROSS and auto parallelism in Tez (rohini) |
| |
| PIG-4250: Fix Security Risks found by Coverity (daijy) |
| |
| PIG-4258: Fix several e2e tests on Windows (daijy) |
| |
| PIG-4256: Fix StreamingPythonUDFs e2e test failure on Windows (daijy) |
| |
| PIG-4166: Collected group drops last record when combined with merge join (bridiver via daijy) |
| |
| PIG-2495: Using merge JOIN from a HBaseStorage produces an error (bridiver via daijy) |
| |
| PIG-4235: Fix unit test failures on Windows (daijy) |
| |
| PIG-4245: 1-1 edge vertices should use same jvm opts (rohini) |
| |
| PIG-4252: Tez container reuse fail when using script udf (daijy) |
| |
| PIG-4241: Auto local mode mistakenly converts large jobs to local mode when using with Hive tables (cheolsoo) |
| |
| PIG-4184: UDF backward compatibility issue after POStatus.STATUS_NULL refactory (daijy) |
| |
| PIG-4238: Property 'pig.job.converted.fetch' should be unset when fetch finishes (lbendig) |
| |
| PIG-4151: Pig Cannot Write Empty Maps to HBase (daijy) |
| |
| PIG-4181: Cannot launch tez e2e test on Windows (daijy) |
| |
| PIG-2834: MultiStorage requires unused constructor argument (daijy) |
| |
| PIG-4230: Documentation fix: first nested foreach example is incomplete (lbendig via daijy) |
| |
| PIG-4199: Mapreduce ACLs should be translated to Tez ACLs (rohini) |
| |
| PIG-4227: Streaming Python UDF handles bag outputs incorrectly (cheolsoo) |
| |
| PIG-4219: When parsing a schema, pig drops tuple inside of Bag if it contains only one field (lbendig via daijy) |
| |
| PIG-4226: Upgrade Tez to 0.5.1 (daijy) |
| |
| PIG-4220: MapReduce-based Rank failing with NPE due to missing Counters (knoguchi) |
| |
| PIG-3985: Multiquery execution of RANK with RANK BY causes NPE (rohini) |
| |
| PIG-4218: Pig OrcStorage fail to load a map with null key (daijy) |
| |
| PIG-4164: After Pig job finish, Pig client spend too much time retry to connect to AM (daijy) |
| |
| PIG-4212: Allow LIMIT of 0 for variableLimit (constant 0 is already allowed) (knoguchi) |
| |
| PIG-4196: Auto ship udf jar is broken (daijy) |
| |
| PIG-4214: Fix unit test fail TestMRJobStats (daijy) |
| |
| PIG-4217: Fix documentation in BuildBloom (praveenr019 via daijy) |
| |
| PIG-4215: Fix unit test failure TestParamSubPreproc and TestMacroExpansion (daijy) |
| |
| PIG-4175: PIG CROSS operation follow by STORE produces non-deterministic results each run (daijy) |
| |
| PIG-4202: Reset UDFContext state before OutputCommitter invocations in Tez (rohini) |
| |
| PIG-4205: e2e test property-check does not check all prerequisites (kellyzly via daijy) |
| |
| PIG-4180: e2e test Native_3 fail on Hadoop 2 (daijy) |
| |
| PIG-4178: HCatDDL_[1-3] fail on Windows (daijy) |
| |
| PIG-4046: PiggyBank DBStorage DATETIME should use setTimestamp with java.sql.Timestamp (sinchii via daijy) |
| |
| PIG-4050: HadoopShims.getTaskReports() can cause OOM with Hadoop 2 (rohini) |
| |
| PIG-4176: Fix tez e2e test Bloom_[1-3] (daijy) |
| |
| PIG-4195: Support loading char/varchar data in OrcStorage (daijy) |
| |
| PIG-4201: Native e2e tests fail when run against old version of pig (rohini) |
| |
| PIG-4197: Fix typo in Job Stats header: MinMapTIme => MinMapTime (jmartell7 via daijy) |
| |
| PIG-4194: ReadToEndLoader does not call setConf on pigSplit in initializeReader (shadanan via rohini) |
| |
| PIG-4187: Fix Orc e2e tests (daijy) |
| |
| PIG-4177: BigData_1 fail after PIG-4149 (daijy) |
| |
| PIG-3507: Pig fails to run in local mode on a Kerberos enabled Hadoop cluster (kellyzly via rohini) |
| |
| PIG-4171: Streaming UDF fails when direct fetch optimization is enabled (cheolsoo) |
| |
| PIG-4170: Multiquery with different type of key gives wrong result (daijy) |
| |
| PIG-4104: Accumulator UDF throws OOM in Tez (rohini) |
| |
| PIG-4169: NPE in ConstantCalculator (cheolsoo) |
| |
| PIG-4161: check for latest Hive snapshot dependencies (daijy) |
| |
| PIG-4102: Adding e2e tests and several improvements for Orc predicate pushdown (daijy) |
| |
| PIG-4156: [PATCH] fix NPE when running scripts stored on hdfs:// (acoliver via daijy) |
| |
| PIG-4159: TestGroupConstParallelTez and TestJobSubmissionTez should be excluded in Hadoop 20 unit tests (cheolsoo) |
| |
| PIG-4154: ScriptState#setScript(File) does not close resources (lars_francke via daijy) |
| |
| PIG-4155: Quitting grunt shell using CTRL-D character throws exception (abhishek.agarwal via daijy) |
| |
| PIG-4157: Pig compilation failure due to HIVE-7208 (daijy) |
| |
| PIG-4158: TestAssert is broken in trunk (cheolsoo) |
| |
| PIG-4143: Port more mini cluster tests to Tez - part 7 (daijy) |
| |
| PIG-4149: Rounding issue in FindQuantiles (daijy) |
| |
| PIG-4145: Port local mode tests to Tez - part1 (daijy) |
| |
| PIG-4076: Fix pom file (daijy) |
| |
| PIG-4140: VertexManagerEvent.getUserPayload returns ReadOnlyBuffer after TEZ-1449 (daijy) |
| |
| PIG-4136: No special handling jythonjar/jrubyjar in e2e tests after PIG-4047 (daijy) |
| |
| PIG-4137: Fix hadoopversion 23 compilation due to TEZ-1469 (daijy) |
| |
| PIG-4135: Fetch optimization should be disabled if plan contains no limit (cheolsoo) |
| |
| PIG-4061: Make Streaming UDF work in Tez (hotfix PIG-4061-3.patch) |
| |
| PIG-4134: TEZ-1449 broke the build (knoguchi) |
| |
| PIG-4132: TEZ-1246 and TEZ-1390 broke a build (knoguchi) |
| |
| PIG-4129: Pig -Dhadoopversion=23 compile fail after TEZ-1426 (daijy) |
| |
| PIG-4127: Build failure due to TEZ-1132 and TEZ-1416 (lbendig) |
| |
| PIG-4125: TEZ-1347 broke the build |
| |
| PIG-4123: Increase memory for TezMiniCluster (daijy) |
| |
| PIG-4122: Fix hadoopversion 23 compilation due to TEZ-1194 (daijy) |
| |
| PIG-4061: Make Streaming UDF work in Tez (daijy) |
| |
| PIG-4118: Fix hadoopversion 23 compilation due to TEZ-1237/TEZ-1407 (daijy) |
| |
| PIG-4109: register local jar fail on Windows when Pig script is remote (daijy) |
| |
| PIG-4116: Update Pig doc about Hadoop 2 Streaming Python UDF support (cheolsoo) |
| |
| PIG-4112: NPE in packager when union + group-by followed by replicated join in Tez (rohini via cheolsoo) |
| |
| PIG-4113: TEZ-1386 breaks hadoop 2 compilation in trunk (cheolsoo) |
| |
| PIG-4110: TEZ-1382 breaks Hadoop 2 compilation (cheolsoo) |
| |
| PIG-4105: Fix TestAvroStorage with ibm jdk (fang fang chen via daijy) |
| |
| PIG-4108: Pig -Dhadoopversion=23 compile fail after TEZ-1317 (daijy) |
| |
| PIG-4086: Fix Orc e2e tests for tez (daijy) |
| |
| PIG-4101: Lower tez.am.task.max.failed.attempts to 2 from 4 in Tez mini cluster (cheolsoo) |
| |
| PIG-4099: "ant copypom" failed with "could not find file $PIG_HOME/ivy/pig.pom to copy" (fang fang chen via cheolsoo) |
| |
| PIG-4098: Vertex Location Hint api update after TEZ-1041 (jeagles via cheolsoo) |
| |
| PIG-4088: TEZ-1346 breaks hadoop 2 compilation in trunk (cheolsoo) |
| |
| PIG-4089: TestMultiQuery.testMultiQueryJiraPig1169 fails in trunk after |
| PIG-4079 in Hadoop 1 (cheolsoo) |
| |
| PIG-4085: TEZ-1303 broke hadoop 2 compilation in trunk (cheolsoo) |
| |
| PIG-4082: TEZ-1278 broke hadoop 2 compilation in trunk (cheolsoo) |
| |
| PIG-4079: Parallel clause is not honored in local mode (cheolsoo) |
| |
| PIG-4078: Port more mini cluster tests to Tez - part 6 (rohini) |
| |
| PIG-4071: Fix TestStore.testSetStoreSchema, TestParamSubPreproc.testGruntWithParamSub, |
| TestJobSubmission.testReducerNumEstimation (daijy) |
| |
| PIG-4074: mapreduce.client.submit.file.replication is not honored in cached files (cheolsoo) |
| |
| PIG-4052: TestJobControlSleep, TestInvokerSpeed are unreliable (daijy) |
| |
| PIG-4053: TestMRCompiler succeeded with sun jdk 1.6 while failed with sun jdk 1.7 (daijy) |
| |
| PIG-3982: ant target test-tez should depend on jackson-pig-3039-test-download (daijy) |
| |
| PIG-4064: Fix tez auto parallelism test failures (daijy) |
| |
| PIG-4075: TEZ-1311 broke Hadoop2 compilation (cheolsoo) |
| |
| PIG-4070: Change from TezJobConfig to TezRuntimeConfiguration (rohini) |
| |
| PIG-4068: ObjectCache causes ClassCastException (cheolsoo) |
| |
| PIG-4067: TestAllLoader in piggybank fails with new hive version (rohini) |
| |
| PIG-4065: Fix failing unit tests in Tez (rohini) |
| |
| PIG-4060: Refactor TezJob and TezLauncher (cheolsoo) |
| |
| PIG-2689: JsonStorage fails to find schema when LimitAdjuster runs (rohini) |
| |
| PIG-4056: Remove PhysicalOperator.setAlias (rohini) |
| |
| PIG-4058: Use single config in Tez for input and output (rohini) |
| |
| PIG-3886: UdfDistributedCache_1 fails in tez branch (cheolsoo) |
| |
| PIG-4055 Build broke after TEZ-1130 API rename (knoguchi) |
| |
| PIG-3935: Port more mini cluster tests to Tez - part 5 (rohini) |
| |
| PIG-3984: PigServer.shutdown removes the tez resource folder (daijy via rohini) |
| |
| PIG-4048: TEZ-692 has a incompatible API change removing TezSession (rohini) |
| |
| PIG-4044: Pig should use avro-mapred-hadoop2.jar instead of avro-mapred.jar when compile with hadoop 2 (daijy) |
| |
| PIG-4043: JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number of tasks (cheolsoo) |
| |
| PIG-4036: Fix e2e failures - JobManagement_3, CmdErrors_3 and BigData_4 (daijy) |
| |
| PIG-4041: org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper compiling error (jeagles via cheolsoo) |
| |
| PIG-4038: SPRINTF should return NULL on any NULL input (mrflip via daijy) |
| |
| PIG-4025: TestLoadFuncWrapper, TestLoadFuncMetaDataWrapper,TestStoreFuncWrapper |
| and TestStoreFuncMetadataWrapper fail on IBM JDK (ahireanup via daijy) |
| |
| PIG-4024: TestPigStreamingUDF and TestPigStreaming fail on IBM JDK (ahireanup via daijy) |
| |
| PIG-4023: BigDec/Int sort is broken (ahireanup via daijy) |
| |
| PIG-4003: Error is thrown by JobStats.getOutputSize() when storing to a Hive table (cheolsoo) |
| |
| PIG-4035: Fix CollectedGroup e2e tests for tez (daijy) |
| |
| PIG-4034: Exclude TestTezAutoParallelism when -Dhadoopversion=20 (cheolsoo) |
| |
| PIG-4033: Fix MergeSparseJoin e2e tests on tez (daijy) |
| |
| PIG-3478: Make StreamingUDF work for Hadoop 2 (lbendig via daijy) |
| |
| PIG-4032: BloomFilter fails with s3 path in Hadoop 2.4 (cheolsoo) |
| |
| PIG-4018: Schema validation fails with UNION ONSCHEMA (daijy) |
| |
| PIG-4022: Fix tez e2e test SkewedJoin_6 (daijy) |
| |
| PIG-4001: POPartialAgg aggregates too aggressively when multiple values aggregated (tmwoodruff via cheolsoo) |
| |
| PIG-4027: Always check for latest Tez snapshot dependencies (lbendig via cheolsoo) |
| |
| PIG-4020: Fix tez e2e tests MapPartialAgg_[2-4], StreamingPerformance_[6-7] (daijy) |
| |
| PIG-4019: Compilation broken after TEZ-1169 (daijy) |
| |
| PIG-4014: Fix Rank e2e test failures on tez (daijy) |
| |
| PIG-4013: Order by multiple column fail on Tez (daijy) |
| |
| PIG-3983: TestGrunt.testKeepGoigFailed fail on tez mode (daijy) |
| |
| PIG-3959: Skewed join followed by replicated join fails in Tez (cheolsoo) |
| |
| PIG-3995: Tez unit tests shouldn't run when -Dhadoopversion=20 (cheolsoo) |
| |
| PIG-3986: PigSplit to support multiple split class (tongjie via cheolsoo) |
| |
| PIG-3988: PigStorage: CommandLineParser is not thread safe (tmwoodruff via cheolsoo) |
| |
| PIG-2409: Pig show wrong tracking URL for hadoop 2 (lbendig via rohini) |
| |
| PIG-3978: Container reuse does not across PigServer (daijy) |
| |
| PIG-3974: E2E test data generation fails in cluster mode (lbendig via cheolsoo) |
| |
| PIG-3969: Javascript UDF fails if no output schema is defined (lbendig via cheolsoo) |
| |
| PIG-3971: Pig on tez fails to run in Oozie in secure cluster (rohini) |
| |
| PIG-3968: OperatorPlan.serialVersionUID is not defined (daijy) |
| |
| |
| Release 0.13.1 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-4139: pig query throws error java.lang.NoSuchFieldException: jobsInProgress on MRv1 (satish via cheolsoo) |
| |
| PIG-4133: Need to update the default $HCAT_HOME dir in the PIG script (mnarayan via cheolsoo) |
| |
| PIG-4106: Describe shouldn't trigger execution in batch mode (cheolsoo) |
| |
| |
| Release 0.13.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-3996: Delete zebra from svn (cheolsoo) |
| |
| PIG-3898: Refactor PPNL for non-MR execution engine (cheolsoo) |
| |
| PIG-3485: Remove CastUtils.bytesToMap(byte[] b) method from LoadCaster interface (cheolsoo) |
| |
| PIG-3419: Pluggable Execution Engine (achalsoni81 via cheolsoo) |
| |
| PIG-2207: Support custom counters for aggregating warnings from different udfs (aniket486) |
| |
| IMPROVEMENTS |
| |
| PIG-3892: Pig distribution for hadoop 2 (daijy) |
| |
| PIG-4006: Make the interval of DAGStatus report configurable (cheolsoo) |
| |
| PIG-3999: Document PIG-3388 (lbendig via cheolsoo) |
| |
| PIG-3954: Document use of user level jar cache (aniket486) |
| |
| PIG-3752: Fix e2e Parallel test for Windows (daijy) |
| |
| PIG-3966: Document variable input arguments of UDFs (lbendig via aniket486) |
| |
| PIG-3963: Documentation for BagToString UDF (mrflip via daijy) |
| |
| PIG-3929: pig.temp.dir should allow to substitute vars as hadoop configuration does (aniket486) |
| |
| PIG-3913: Pig should use job's jobClient wherever possible (fixes local mode counters) (aniket486) |
| |
| PIG-3941: Piggybank's Over UDF returns an output schema with named fields (mrflip via cheolsoo) |
| |
| PIG-3545: Seperate validation rules from optimizer (daijy) |
| |
| PIG-3745: Document auto local mode for pig (aniket486) |
| |
| PIG-3932: Document ROUND_TO builtin UDF (mrflip via cheolsoo) |
| |
| PIG-3926: ROUND_TO function: rounds double/float to fixed number of decimal places (mrflip via cheolsoo) |
| |
| PIG-3901: Organize the Pig properties file and document all properties (mrflip via cheolsoo) |
| |
| PIG-3867: Added hadoop home to build classpath for build pig with unit test on windows (Sergey Svinarchuk via gates) |
| |
| PIG-3914: Change TaskContext to abstract class (cheolsoo) |
| |
| PIG-3672: Pig should not check for hardcoded file system implementations (rohini) |
| |
| PIG-3860: Refactor PigStatusReporter and PigLogger for non-MR execution engine (cheolsoo) |
| |
| PIG-3865: Remodel the XMLLoader to work to be faster and more maintainable (aseldawy via daijy) |
| |
| PIG-3737: Bundle dependent jars in distribution in %PIG_HOME%/lib folder (daijy) |
| |
| PIG-3771: Piggybank Avrostorage makes a lot of namenode calls in the backend (rohini) |
| |
| PIG-3851: Upgrade jline to 2.11 (daijy) |
| |
| PIG-3884: Move multi store counters to PigStatsUtil from MRPigStatsUtil (rohini) |
| |
| PIG-3591: Refactor POPackage to separate MR specific code from packaging (mwagner via cheolsoo) |
| |
| PIG-3449: Move JobCreationException to org.apache.pig.backend.hadoop.executionengine (cheolsoo) |
| |
| PIG-3765: Ability to disable Pig commands and operators (prkommireddi) |
| |
| PIG-3731: Ability to specify local-mode specific configuration (useful for local/auto-local mode) (aniket486) |
| |
| PIG-3793: Provide info on number of LogicalRelationalOperator(s) used in the script through LogicalPlanData (prkommireddi) |
| |
| PIG-3778: Log list of running jobs along with progress (rohini) |
| |
| PIG-3675: Documentation for AccumuloStorage (elserj via daijy) |
| |
| PIG-3648: Make the sample size for RandomSampleLoader configurable (cheolsoo) |
| |
| PIG-259: allow store to overwrite existing directroy (nezihyigitbasi via daijy) |
| |
| PIG-2672: Optimize the use of DistributedCache (aniket486) |
| |
| PIG-3238: Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters |
| and inserts another set of characters at a specified starting point (nezihyigitbasi via daijy) |
| |
| PIG-3299: Provide support for LazyOutputFormat to avoid creating empty files (lbendig via daijy) |
| |
| PIG-3642: Direct HDFS access for small jobs (fetch) (lbendig via cheolsoo) |
| |
| PIG-3730: Performance issue in SelfSpillBag (rajesh.balamohan via rohini) |
| |
| PIG-3654: Add class cache to PigContext (tmwoodruff via daijy) |
| |
| PIG-3463: Pig should use hadoop local mode for small jobs (aniket486) |
| |
| PIG-3573: Provide StoreFunc and LoadFunc for Accumulo (elserj via daijy) |
| |
| PIG-3653: Add support for pre-deployed jars (tmwoodruff via daijy) |
| |
| PIG-3645: Move FileLocalizer.setR() calls to unit tests (cheolsoo) |
| |
| PIG-3637: PigCombiner creating log spam (rohini) |
| |
| PIG-3632: Add option to configure cacheBlocks in HBaseStorage (rohini) |
| |
| PIG-3619: Provide XPath function (Saad Patel via gates) |
| |
| PIG-3590: remove PartitionFilterOptimizer from trunk (aniket486) |
| |
| PIG-3580: MIN, MAX and AVG functions for BigDecimal and BigInteger (harichinnan via cheolsoo) |
| |
| PIG-3569: SUM function for BigDecimal and BigInteger (harichinnan via rohini) |
| |
| PIG-3505: Make AvroStorage sync interval take default from io.file.buffer.size (rohini) |
| |
| PIG-3563: support adding archives to the distributed cache (jdonofrio via cheolsoo) |
| |
| PIG-3388: No support for Regex for row filter in org.apache.pig.backend.hadoop.hbase.HBaseStorage (lbendig via cheolsoo) |
| |
| PIG-3522: Remove shock from pig (daijy) |
| |
| PIG-3295: Casting from bytearray failing after Union even when each field is from a single Loader (knoguchi) |
| |
| PIG-3444: CONCAT with 2+ input parameters fail (lbendig via daijy) |
| |
| PIG-3117: A debug mode in which pig does not delete temporary files (ihadanny via cheolsoo) |
| |
| PIG-3484: Make the size of pig.script property configurable (cheolsoo) |
| |
| OPTIMIZATIONS |
| |
| PIG-3882: Multiquery off mode execution is not done in batch and very inefficient (rohini) |
| |
| BUG FIXES |
| |
| PIG-4037: TestHBaseStorage, TestAccumuloPigCluster has failures with hadoopversion=23 (daijy) |
| |
| PIG-4005: depend on hbase-hadoop2-compat rather than hbase-hadoop1-compat when hbaseversion is 95 (daijy) |
| |
| PIG-4021: Fix TestHBaseStorage failure after auto local mode change (PIG-3463) (daijy) |
| |
| PIG-4029: TestMRCompiler is broken after PIG-3874 (daijy) |
| |
| PIG-4030: TestGrunt, TestPigRunner fail after PIG-3892 (daijy) |
| |
| PIG-3975: Multiple Scalar reference calls leading to missing records (knoguchi via rohini) |
| |
| PIG-4017: NPE thrown from JobControlCompiler.shipToHdfs (cheolsoo) |
| |
| PIG-3997: Issue on Pig docs: Testing and Diagnostics (zjffdu via cheolsoo) |
| |
| PIG-3998: Documentation fix: invalid page links, wrong Groovy udf example (lbendig via cheolsoo) |
| |
| PIG-4000: Minor documentation fix for PIG-3642 (lbendig via cheolsoo) |
| |
| PIG-3991: TestErrorHandling.tesNegative7 is broken in trunk/branch-0.13 (cheolsoo) |
| |
| PIG-3990: ant docs is broken in trunk/branch-0.13 (cheolsoo) |
| |
| PIG-3989: PIG_OPTS does not work with some version of HADOOP (daijy) |
| |
| PIG-3739: The Warning_4 e2e test is broken in trunk (aniket486) |
| |
| PIG-3976: Typo correction in JobStats breaks Oozie (rohini) |
| |
| PIG-3874: FileLocalizer temp path can sometimes be non-unique (chitnis via cheolsoo) |
| |
| PIG-3967: Grunt fail if we running more statement after first store (daijy) |
| |
| PIG-3915: MapReduce queries in Pigmix outputs different results than Pig's (keren3000 via daijy) |
| |
| PIG-3955: Remove url.openStream() file descriptor leak from JCC (aniket486) |
| |
| PIG-3958: TestMRJobStats is broken in 0.13 and trunk (aniket486) |
| |
| PIG-3949: HiveColumnarStorage compile failure with Hive 0.14.0 (daijy) |
| |
| PIG-3960: Compile fail against Hadoop 2.4.0 after PIG-3913 (daijy) |
| |
| PIG-3956: UDF profile is often misleading (cheolsoo) |
| |
| PIG-3950: Removing empty file PColFilterExtractor.java speeds up rebuilds (mrflip via cheolsoo) |
| |
| PIG-3940: NullPointerException writing .pig_header for field with null name in JsonMetadata.java (mrflip via cheolsoo) |
| |
| PIG-3944: PigNullableWritable toString method throws NPE on null value (mauzhang via cheolsoo) |
| |
| PIG-3936: DBStorage fails on storing nulls for non varchar columns (jeremykarn via cheolsoo) |
| |
| PIG-3945: Ant not sending hadoopversion to piggybank sub-ant (mrflip via cheolsoo) |
| |
| PIG-3942: Util.buildPp() is incompatible with Non-MR execution engine (cheolsoo) |
| |
| PIG-3902: PigServer creates cycle (thedatachef via cheolsoo) |
| |
| PIG-3930: "java.io.IOException: Cannot initialize Cluster" in local mode with hadoopversion=23 dependencies (jira.shegalov via cheolsoo) |
| |
| PIG-3921: Obsolete entries in piggybank javadoc build script (mrflip via cheolsoo) |
| |
| PIG-3923: Gitignore file should ignore all generated artifacts (mrflip via cheolsoo) |
| |
| PIG-3922: Increase Forrest heap size to avoid OutOfMemoryError building docs (mrflip via cheolsoo) |
| |
| PIG-3916: isEmpty should not be early terminating (rohini) |
| |
| PIG-3859: auto local mode should not modify reducer configuration (aniket486) |
| |
| PIG-3909: Type Casting issue (daijy) |
| |
| PIG-3905: 0.12.1 release can't be build for Hadoop2 (daijy) |
| |
| PIG-3894: Datetime function AddDuration, SubtractDuration and all Between functions don't check for null values in the input tuple (jennythompson via cheolsoo) |
| |
| PIG-3889: Direct fetch doesn't set job submission timestamps (cheolsoo) |
| |
| PIG-3895: Pigmix run script has compilation error (rohini) |
| |
| PIG-3885: AccumuloStorage incompatible with Accumulo 1.6.0 (elserj via daijy) |
| |
| PIG-3888: Direct fetch doesn't differentiate between frontend and backend sides (lbendig via daijy) |
| |
| PIG-3887: TestMRJobStats is broken in trunk (cheolsoo) |
| |
| PIG-3868: Fix Iterator_1 e2e test on windows (ssvinarchukhorton via rohini) |
| |
| PIG-3871: Replace org.python.google.* with com.google.* in imports (cheolsoo) |
| |
| PIG-3858: PigLogger/PigStatusReporter is not set for fetch tasks (lbendig via cheolsoo) |
| |
| PIG-3798: Registered jar in pig script are appended to the classpath multiple times (cheolsoo) |
| |
| PIG-3844: Make ScriptState InheritableThreadLocal for threads that need it (amatsukawa via cheolsoo) |
| |
| PIG-3837: ant pigperf target is broken in trunk (cheolsoo) |
| |
| PIG-3836: Pig signature has has guava version dependency (amatsukawa via cheolsoo) |
| |
| PIG-3832: Fix piggybank test compilation failure after PIG-3449 (rohini) |
| |
| PIG-3807: Pig creates wrong schema after dereferencing nested tuple fields with sorts (daijy) |
| |
| PIG-3802: Fix TestBlackAndWhitelistValidator failures (prkommireddi) |
| |
| PIG-3815: Hadoop bug causes to pig to fail silently with jar cache (aniket486) |
| |
| PIG-3816: Incorrect Javadoc for launchPlan() method (kyungho via prkommireddi) |
| |
| PIG-3673: Divide by zero error in runpigmix.pl script (suhassatish via daijy) |
| |
| PIG-3805: ToString(datetime [, format string]) doesn't work without the second argument (jennythompson via daijy) |
| |
| PIG-3809: AddForEach optimization doesn't set the alias of the added foreach (cheolsoo) |
| |
| PIG-3811: PigServer.registerScript() wraps exception incorrectly on parsing errors (prkommireddi) |
| |
| PIG-3806: PigServer constructor throws NPE after PIG-3765 (aniket486) |
| |
| PIG-3801: Auto local mode does not call storeSchema (aniket486) |
| |
| PIG-3754: InputSizeReducerEstimator.getTotalInputFileSize reports incorrect size (aniket486) |
| |
| PIG-3679: e2e StreamingPythonUDFs_10 fails in trunk (cheolsoo) |
| |
| PIG-3776: Conflicting versions of jline is present in trunk (cheolsoo) |
| |
| PIG-3674: Fix TestAccumuloPigCluster on Hadoop 2 (elserj via daijy) |
| |
| PIG-3740: Document direct fetch optimization (lbendig via cheolsoo) |
| |
| PIG-3746: NPE is thrown if Pig fails before PigStats is intialized (cheolsoo) |
| |
| PIG-3747: Update skewed join documentation (cheolsoo) |
| |
| PIG-3755: auto local mode selection does not check lower bound for size (aniket486) |
| |
| PIG-3447: Compiler warning message dropped for CastLineageSetter and others with no enum kind (knoguchi via cheolsoo) |
| |
| PIG-3627: Json storage : Doesn't work in cases , where other Store Functions (like PigStorage / AvroStorage) |
| do work (ssvinarchukhorton via daijy) |
| |
| PIG-3606: Pig script throws error when searching for hcatalog jars in latest hive (deepesh via daijy) |
| |
| PIG-3623: HBaseStorage: setting loadKey and noWAL to false doesn't have any affect (nezihyigitbasi via rohini) |
| |
| PIG-3744: SequenceFileLoader does not support BytesWritable (rohini) |
| |
| PIG-3726: Ranking empty records leads to NullPointerException (jarcec via daijy) |
| |
| PIG-3652: Pigmix parser (PigPerformanceLoader) deletes chars during parsing (keren3000 via daijy) |
| |
| PIG-3722: Udf deserialization for registered classes fails in local_mode (aniket486) |
| |
| PIG-3641: Split "otherwise" producing incorrect output when combined with ColumnPruning (knoguchi) |
| |
| PIG-3682: mvn-inst target does not install pig-h2.jar into local .m2 (raluri via aniket486) |
| |
| PIG-3511: Security: Pig temporary directories might have world readable permissions (rohini) |
| |
| PIG-3664: Piggy Bank XPath UDF can't be called (nezihyigitbasi via daijy) |
| |
| PIG-3662: Static loadcaster in BinStorage can cause exception (lbendig via rohini) |
| |
| PIG-3617: problem with temp file deletion in MAPREDUCE operator (nezihyigitbasi via cheolsoo) |
| |
| PIG-3649: POPartialAgg incorrectly calculates size reduction when multiple values aggregated (tmwoodruff via daijy) |
| |
| PIG-3650: Fix for PIG-3100 breaks column pruning (tmwoodruff via daijy) |
| |
| PIG-3643: Nested Foreach with UDF and bincond is broken (cheolsoo) |
| |
| PIG-3616: TestBuiltIn.testURIwithCurlyBrace() silently fails (lbendig via cheolsoo) |
| |
| PIG-3608: ClassCastException when looking up a value from AvroMapWrapper using a Utf8 key (rding) |
| |
| PIG-3639: TestRegisteredJarVisibility is broken in trunk (cheolsoo) |
| |
| PIG-3640: Retain intermediate files for debugging purpose in batch mode (cheolsoo) |
| |
| PIG-3609: ClassCastException when calling compareTo method on AvroBagWrapper (rding via cheolsoo) |
| |
| PIG-3584: AvroStorage does not correctly translate arrays of strings (jadler via cheolsoo) |
| |
| PIG-3633: AvroStorage tests are failing when running against Avro 1.7.5 (jarcec via cheolsoo) |
| |
| PIG-3612: Storing schema does not work cross cluster with PigStorage and JsonStorage (rohini) |
| |
| PIG-3607: PigRecordReader should report progress for each inputsplit processed (rohini) |
| |
| PIG-3566: Cannot set useMatches of REGEX_EXTRACT_ALL and REGEX_EXTRACT (nezihyigitbasi via cheolsoo) |
| |
| PIG-2132: [Piggybank] MIN and MAX functions should ignore nulls (rekhajoshm via cheolsoo) |
| |
| PIG-3581: Incorrect scope resolution with nested foreach (aniket486) |
| |
| PIG-3285: Jobs using HBaseStorage fail to ship dependency jars (ndimiduk via cheolsoo) |
| |
| PIG-3582: Document SUM, MIN, MAX, and AVG functions for BigInteger and BigDecimal (harichinnan via cheolsoo) |
| |
| PIG-3525: PigStats.get() and ScriptState.get() shouldn't return MR-specific objects (cheolsoo) |
| |
| PIG-3568: Define the semantics of POStatus.STATUS_NULL (mwagner via cheolsoo) |
| |
| PIG-3561: Clean up PigStats and JobStats after PIG-3419 (cheolsoo) |
| |
| PIG-3553: HadoopJobHistoryLoader fails to load job history on hadoop v 1.2 (lgiri via cheolsoo) |
| |
| PIG-3559: Trunk is broken by PIG-3522 (cheolsoo) |
| |
| PIG-3551: Minor typo on pig latin basics page (elserj via aniket486) |
| |
| PIG-3526: Unions with Enums do not work with AvroStorage (jadler via cheolsoo) |
| |
| PIG-3377: New AvroStorage throws NPE when storing untyped map/array/bag (jadler via cheolsoo) |
| |
| PIG-3542: Javadoc of REGEX_EXTRACT_ALL (nyigitba via daijy) |
| |
| PIG-3518: Need to ship jruby.jar in the release (daijy) |
| |
| PIG-3524: Clean up Launcher and MapReduceLauncher after PIG-3419 (cheolsoo) |
| |
| PIG-3515: Shell commands are limited from OS buffer (andronat via cheolsoo) |
| |
| PIG-3520: Provide backward compatibility for PigRunner and PPNL after PIG-3419 (daijy via cheolsoo) |
| |
| PIG-3519: Remove dependency on uber avro-tools jar (jarcec via cheolsoo) |
| |
| PIG-3451: EvalFunc<T> ctor reflection to determine value of type param T is brittle (hazen via aniket486) |
| |
| PIG-3509: Exception swallowing in TOP (vrajaram via aniket486) |
| |
| PIG-3506: FLOOR documentation references CEIL function instead of FLOOR (seshness via daijy) |
| |
| PIG-3497: JobControlCompiler should only do reducer estimation when the job has a reduce phase (amatsukawa via aniket486) |
| |
| PIG-3469: Skewed join can cause unrecoverable NullPointerException when one of its inputs is missing (Jarek Jarcec Cecho via xuefuz) |
| |
| PIG-3496: Propagate HBase 0.95 jars to the backend (Jarek Jarcec Cecho via xuefuz) |
| |
| |
| Release 0.12.1 (unreleased changes) |
| |
| IMPROVEMENTS |
| |
| PIG-3529: Upgrade HBase dependency from 0.95-SNAPSHOT to 0.96 (jarcec via daijy) |
| |
| PIG-3552: UriUtil used by reducer estimator should support viewfs (amatsukawa via aniket486) |
| |
| PIG-3549: Print hadoop jobids for failed, killed job (aniket486) |
| |
| PIG-3047: Check the size of a relation before adding it to distributed cache in Replicated join (aniket486) |
| |
| PIG-3480: TFile-based tmpfile compression crashes in some cases (dvryaboy via aniket486) |
| |
| BUG FIXES |
| |
| PIG-3661: Piggybank AvroStorage fails if used in more than one load or store statement (rohini) |
| |
| PIG-3819: e2e tests containing "perl -e "print $_;" fails on Hadoop 2 (daijy) |
| |
| PIG-3813: Rank column is assigned different uids everytime when schema is reset (cheolsoo) |
| |
| PIG-3833: Relation loaded by AvroStorage with schema is projected incorrectly in foreach statement (jeongjinku via cheolsoo) |
| |
| PIG-3794: pig -useHCatalog fails using pig command line interface on HDInsight (ehans via daijy) |
| |
| PIG-3827: Custom partitioner is not picked up with secondary sort optimization (daijy) |
| |
| PIG-3826: Outer join with PushDownForEachFlatten generates wrong result (daijy) |
| |
| PIG-3820: TestAvroStorage fail on some OS (daijy) |
| |
| PIG-3818: PIG-2499 is accidently reverted (daijy) |
| |
| PIG-3516: pig does not bring in joda-time as dependency in its pig-template.xml (daijy) |
| |
| PIG-3753: LOGenerate generates null schema (daijy) |
| |
| PIG-3782: PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing due to incorrect UID assignment (knoguchi via daijy) |
| |
| PIG-3779: Assert constructs ConstantExpression with null when no comment is given (thedatachef via cheolsoo) |
| |
| PIG-3777: Pig 12.0 Documentation (karinahauser via daijy) |
| |
| PIG-3774: Piggybank Over UDF get wrong result (daijy) |
| |
| PIG-3657: New partition filter extractor fails with NPE (cheolsoo) |
| |
| PIG-3347: Store invocation brings side effect (daijy) |
| |
| PIG-3670: Fix assert in Pig script (daijy) |
| |
| PIG-3741: Utils.setTmpFileCompressionOnConf can cause side effect for SequenceFileInterStorage (aniket486) |
| |
| PIG-3677: ConfigurationUtil.getLocalFSProperties can return an inconsistent property set (rohini) |
| |
| PIG-3621: Python Avro library can't read Avros made with builtin AvroStorage (rusell.jurney via cheolsoo) |
| |
| PIG-3592: Should not try to create success file for non-fs schemes like hbase (rohini) |
| |
| PIG-3572: Fix all unit test for during build pig with Hadoop 2.X on Windows (ssvinarchukhorton via daijy) |
| |
| PIG-2629: Wrong Usage of Scalar which is null causes high namenode operation (rohini) |
| |
| PIG-3593: Import jython standard module fail on cluster (daijy) |
| |
| PIG-3576: NPE due to PIG-3549 when job never gets submitted (lbendig via cheolsoo) |
| |
| PIG-3567: LogicalPlanPrinter throws OOM for large scripts (aniket486) |
| |
| PIG-3579: pig.script's deserialized version does not maintain line numbers (jgzhang via aniket486) |
| |
| PIG-3570: Rollback PIG-3060 (daijy) |
| |
| PIG-3530: Some e2e tests is broken due to PIG-3480 (daijy) |
| |
| PIG-3492: ColumnPrune dropping used column due to LogicalRelationalOperator.fixDuplicateUids changes not propagating (knoguchi via daijy) |
| |
| PIG-3325: Adding a tuple to a bag is slow (dvryaboy via aniket486) |
| |
| PIG-3512: Reducer estimater is broken by PIG-3497 |
| |
| PIG-3510: New filter extractor fails with more than one filter statement (aniket486 via cheolsoo) |
| |
| |
| Release 0.12.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-3082: outputSchema of a UDF allows two usages when describing a Tuple schema (jcoveney) |
| |
| PIG-3191: [piggybank] MultiStorage output filenames are not sortable (Danny Antonelli via jcoveney) |
| |
| PIG-3174: Remove rpm and deb artifacts from build.xml (gates) |
| |
| IMPROVEMENTS |
| |
| PIG-3503: More document for Pig 0.12 new features (daijy) |
| |
| PIG-3445: Make Parquet format available out of the box in Pig (lbendig via aniket486) |
| |
| PIG-3483: Document ASSERT keyword (aniket486 via daijy) |
| |
| PIG-3470: Print configuration variables in grunt (lbendig via daijy) |
| |
| PIG-3493: Add max/min for datetime (tyro89 via daijy) |
| |
| PIG-3479: Fix BigInt, BigDec, Date serialization. Improve perf of PigNullableWritable deserilization (dvryaboy) |
| |
| PIG-3461: Rewrite PartitionFilterOptimizer to make it work for all the cases (aniket486) |
| |
| PIG-2417: Streaming UDFs - allow users to easily write UDFs in scripting languages with no |
| JVM implementation. (jeremykarn via daijy) |
| |
| PIG-3199: Provide a method to retriever name of loader/storer in PigServer (prkommireddi via daijy) |
| |
| PIG-3367: Add assert keyword (operator) in pig (aniket486) |
| |
| PIG-3235: Avoid extra byte array copies in streaming (rohini) |
| |
| PIG-3065: pig output format/committer should support recovery for hadoop 0.23 (daijy) |
| |
| PIG-3390: Make pig working with HBase 0.95 (jarcec via daijy) |
| |
| PIG-3431: Return more information for parsing related exceptions. (jeremykarn via daijy) |
| |
| PIG-3430: Add xml format for explaining MapReduce Plan. (jeremykarn via daijy) |
| |
| PIG-3048: Add mapreduce workflow information to job configuration (billie.rinaldi via daijy) |
| |
| PIG-3436: Make pigmix run with Hadoop2 (rohini) |
| |
| PIG-3424: Package import list should consider class name as is first even if -Dudf.import.list is passed (rohini) |
| |
| PIG-3204: Change script parsing to parse entire script instead of line by line (rohini) |
| |
| PIG-3359: Register Statements and Param Substitution in Macros (jpacker via cheolsoo) |
| |
| PIG-3182: Pig currently lacks functions to trim the whitespace only on one hand side (sarutak via cheolsoo) |
| |
| PIG-3163: Pig current releases lack a UDF endsWith. This UDF tests if a given string ends with the specified suffix (sriramkrishnan via cheolsoo) |
| |
| PIG-3015: Rewrite of AvroStorage (jadler via cheolsoo) |
| |
| PIG-3361: Improve Hadoop version detection logic for Pig unit test (daijy) |
| |
| PIG-3280: Document IN operator and CASE expression (cheolsoo) |
| |
| PIG-3342: Allow conditions in case statement (cheolsoo) |
| |
| PIG-3327: Pig hits OOM when fetching task reports (rohini) |
| |
| PIG-3336: Change IN operator to use or-expressions instead of EvalFunc (cheolsoo) |
| |
| PIG-3339: Move pattern compilation in ToDate as a static variable (rohini) |
| |
| PIG-3332: Upgrade Avro dependency to 1.7.4 (nielsbasjes via cheolsoo) |
| |
| PIG-3307: Refactor physical operators to remove methods parameters that are always null (julien) |
| |
| PIG-3317: disable optimizations via pig properties (traviscrawford via billgraham) |
| |
| PIG-3321: AVRO: Support user specified schema on load (harveyc via rohini) |
| |
| PIG-2959: Add a pig.cmd for Pig to run under Windows (daijy) |
| |
| PIG-3311: add pig-withouthadoop-h2 to mvn-jar (julien) |
| |
| PIG-2873: Converting bin/pig shell script to python (vikram.dixit via daijy) |
| |
| PIG-3308: Storing data in hive columnar rc format (maczech via daijy) |
| |
| PIG-3303: add hadoop h2 artifact to publications in ivy.xml (julien) |
| |
| PIG-3169: Remove intermediate data after a job finishes (mwagner via cheolsoo) |
| |
| PIG-3173: Partition filter push down does not happen when partition keys condition include a AND and OR construct (rohini) |
| |
| PIG-2786: enhance Pig launcher script wrt. HBase/HCat integration (ndimiduk via daijy) |
| |
| PIG-3198: Let users use any function from PigType -> PigType as if it were builtlin (jcoveney) |
| |
| PIG-3268: Case statement support (cheolsoo) |
| |
| PIG-3269: In operator support (cheolsoo) |
| |
| PIG-200: Pig Performance Benchmarks (daijy) |
| |
| PIG-3261: User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not |
| appended (qwertymaniac via daijy) |
| |
| PIG-3141: Giving CSVExcelStorage an option to handle header rows (jpacker via cheolsoo) |
| |
| PIG-3217: Add support for DateTime type in Groovy UDFs (herberts via daijy) |
| |
| PIG-3218: Add support for biginteger/bigdecimal type in Groovy UDFs (herberts via daijy) |
| |
| PIG-3248: Upgrade hadoop-2.0.0-alpha to hadoop-2.0.3-alpha (daijy) |
| |
| PIG-3235: Add log4j.properties for unit tests (cheolsoo) |
| |
| PIG-3236: parametrize snapshot and staging repo id (gkesavan via daijy) |
| |
| PIG-3244: Make PIG_HOME configurable (robert.schooley@gmail.com via daijy) |
| |
| PIG-3233: Deploy a Piggybank Jar (njw45 via cheolsoo) |
| |
| PIG-3245: Documentation about HBaseStorage (Daisuke Kobayashi via cheolsoo) |
| |
| PIG-3211: Allow default Load/Store funcs to be configurable (prkommireddi via cheolsoo) |
| |
| PIG-3136: Introduce a syntax making declared aliases optional (jcoveney via cheolsoo) |
| |
| PIG-3142: [piggybank] Fixed-width load and store functions for the Piggybank (jpacker via cheolsoo) |
| |
| PIG-3162: PigTest.assertOutput doesn't allow non-default delimiter (dreambird via cheolsoo) |
| |
| PIG-3002: Pig client should handle CountersExceededException (jarcec via billgraham) |
| |
| PIG-3189: Remove ivy/pig.pom and improve build mvn targets (billgraham) |
| |
| PIG-3192: Better call to action to download Pig in docs (rjurney via jcoveney) |
| |
| PIG-3167: Job stats are printed incorrectly for map-only jobs (Mark Wagner via jcoveney) |
| |
| PIG-3131: Document PluckTuple UDF (rjurney via jcoveney) |
| |
| PIG-3098: Add another test for the self join case (jcoveney) |
| |
| PIG-3129: Document syntax to refer to previous relation (rjurney via jcoveney) |
| |
| PIG-2553: Pig shouldn't allow attempts to write multiple relations into same directory (prkommireddi via cheolsoo) |
| |
| PIG-3179: Task Information Header only prints out the first split for each task (knoguchi via rohini) |
| |
| PIG-3108: HBaseStorage returns empty maps when mixing wildcard with other columns (christoph.bauer via billgraham) |
| |
| PIG-3178: Print a stacktrace when ExecutableManager hits an OOM (knoguchi via rohini) |
| |
| PIG-3160: GFCross uses unnecessary loop (sandyr via cheolsoo) |
| |
| PIG-3138: Decouple PigServer.executeBatch() from compilation of batch (pkommireddi via cheolsoo) |
| |
| PIG-2878: Pig current releases lack a UDF equalIgnoreCase.This function returns a Boolean value indicating whether string left is equal to string right. This |
| check is case insensitive. (shami via gates) |
| |
| PIG-2994: Grunt shortcuts (prasanth_j via cheolsoo) |
| |
| PIG-3140: Document PigProgressNotificationListener configs (billgraham) |
| |
| PIG-3139: Document reducer estimation (billgraham) |
| |
| PIG-2764: Add a biginteger and bigdecimal type to pig (jcoveney) |
| |
| PIG-3073: POUserFunc creating log spam for large scripts (jcoveney) |
| |
| PIG-3124: Push FLATTENs After FILTERs If Possible (nwhite via daijy) |
| |
| PIG-3086: Allow A Prefix To Be Added To URIs In PigUnit Tests (nwhite via gates) |
| |
| PIG-3091: Make schema, header and stats file configurable in JsonMetadata (pkommireddi via jcoveney) |
| |
| PIG-3078: Make a UDF that, given a string, returns just the columns prefixed by that string (jcoveney) |
| |
| PIG-3090: Introduce a syntax to be able to easily refer to the previously defined relation (jcoveney) |
| |
| PIG-3057: Make PigStorage.readField() protected (pablomar and billgraham via billgraham) |
| |
| PIG-2788: improved string interpolation of variables (jcoveney) |
| |
| PIG-2362: Rework Ant build.xml to use macrodef instead of antcall (azaroth via cheolsoo) |
| |
| PIG-2857: Add a -tagPath option to PigStorage (prkommireddi via cheolsoo) |
| |
| PIG-2341: Need better documentation on Pig/HBase integration (jthakrar and billgraham via billgraham) |
| |
| PIG-3075: Allow AvroStorage STORE Operations To Use Schema Specified By URI (nwhite via cheolsoo) |
| |
| PIG-3062: Change HBaseStorage to permit overriding pushProjection (billgraham) |
| |
| PIG-3016: Modernize more tests (jcoveney via cheolsoo) |
| |
| PIG-2582: Store size in bytes (not mbytes) in ResourceStatistics (prkommireddi via billgraham) |
| |
| PIG-3006: Modernize a chunk of the tests (jcoveney via cheolsoo) |
| |
| PIG-2997: Provide a convenience constructor on PigServer that accepts Configuration (prkommireddi via rohini) |
| |
| PIG-2933: HBaseStorage is using setScannerCaching which is deprecated (prkommireddi via rohini) |
| |
| PIG-2881: Add SUBTRACT eval function (jocosti via cheolsoo) |
| |
| PIG-3004: Improve exceptions messages when a RuntimeException is raised in Physical Operators (julien) |
| |
| PIG-2990: the -secretDebugCmd shouldn't be a secret and should just be...a command (jcoveney) |
| |
| PIG-2941: Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices (jgordon via azaroth) |
| |
| PIG-2778: Add 'matches' operator to predicate pushdown (cheolsoo via jcoveney) |
| |
| PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms) |
| |
| PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates) |
| |
| PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas) |
| |
| OPTIMIZATIONS |
| |
| PIG-3395: Large filter expression makes Pig hang (cheolsoo) |
| |
| PIG-3123: Simplify Logical Plans By Removing Unneccessary Identity Projections (njw45 via cheolsoo) |
| |
| PIG-3013: BinInterSedes improve chararray sort performance (rohini) |
| |
| BUG FIXES |
| |
| PIG-3504: Fix e2e Describe_cmdline_12 (cheolsoo via daijy) |
| |
| PIG-3128: Document the BigInteger and BigDecimal data type (daijy via cheolsoo) |
| |
| PIG-3495: Streaming udf e2e tests failures on Windows (daijy) |
| |
| PIG-3292: Logical plan invalid state: duplicate uid in schema during self-join to get cross product (cheolsoo via daijy) |
| |
| PIG-3491: Fix e2e failure Jython_Diagnostics_4 (daijy) |
| |
| PIG-3114: Duplicated macro name error when using pigunit (daijy) |
| |
| PIG-3370: Add New Reserved Keywords To The Pig Docs (cheolsoo) |
| |
| PIG-3487: Fix syntax errors in nightly.conf (arpitgupta via daijy) |
| |
| PIG-3458: ScalarExpression lost with multiquery optimization (knoguchi) |
| |
| PIG-3360: Some intermittent negative e2e tests fail on hadoop 2 (daijy) |
| |
| PIG-3468: PIG-3123 breaks e2e test Jython_Diagnostics_2 (daijy) |
| |
| PIG-3471: Add a base abstract class for ExecutionEngine (cheolsoo) |
| |
| PIG-3466: Race Conditions in InternalDistinctBag during proactive spill (cheolsoo) |
| |
| PIG-3464: Mark ExecType and ExecutionEngine interfaces as evolving (cheolsoo) |
| |
| PIG-3454: Update JsonLoader/JsonStorage (tyro89 via daijy) |
| |
| PIG-3333: Fix remaining Windows core unit test failures (daijy) |
| |
| PIG-3426: Add support for removing s3 files (jeremykarn via daijy) |
| |
| PIG-3349: Document ToString(Datetime, String) UDF (cheolsoo) |
| |
| PIG-3374: CASE and IN fail when expression includes dereferencing operator (cheolsoo) |
| |
| PIG-2606: union/ join operations are not accepting same alias as multiple inputs (hsubramaniyan via daijy) |
| |
| PIG-3379: Alias reuse in nested foreach causes PIG script to fail (xuefuz via daijy) |
| |
| PIG-3432: typo in log message in SchemaTupleFrontend (epishkin via cheolsoo) |
| |
| PIG-3410: LimitOptimizer is applied before PartitionFilterOptimizer (aniket486) |
| |
| PIG-3405: Top UDF documentation indicates improper use (aniket486 via cheolsoo) |
| |
| PIG-3425: Hive jdo api jar referenced in pig script throws error (deepesh via cheolsoo) |
| |
| PIG-3422: AvroStorage failed to read paths separated by commas (yuanlid via rohini) |
| |
| PIG-3420: Failed to retrieve map values from data loaded by AvroStorage (yuanlid via rohini) |
| |
| PIG-3414: QueryParserDriver.parseSchema(String) silently returns a wrong result when a comma is missing in the schema definition (cheolsoo) |
| |
| PIG-3412: jsonstorage breaks when tuple does not have as many columns as schema (aesilberstein via cheolsoo) |
| |
| PIG-3243: Documentation error (sarutak via cheolsoo) |
| |
| PIG-3210: Pig fails to start when it cannot write log to log files (mengsungwu via cheolsoo) |
| |
| PIG-3392: Document STARTSWITH and ENDSWITH UDFs (sriramkrishnan via cheolsoo) |
| |
| PIG-3393: STARTSWITH udf doesn't override outputSchema method (sriramkrishnan via cheolsoo) |
| |
| PIG-3389: "Set job.name" does not work with dump command (cheolsoo) |
| |
| PIG-3387: Miss spelling in test code "TestBuiltin.java" (sarutak via cheolsoo) |
| |
| PIG-3384: Missing negation in UDF doc sample code (ddamours via cheolsoo) |
| |
| PIG-3369: unit test TestImplicitSplitOnTuple.testImplicitSplitterOnTuple failed when using hadoopversion=23 (dreambird via cheolsoo) |
| |
| PIG-3375: CASE does not preserve the order of when branches (cheolsoo) |
| |
| PIG-3364: Case expression fails with an even number of when branches (cheolsoo) |
| |
| PIG-3354: UDF example does not handle nulls (patc888 via daijy) |
| |
| PIG-3355: ColumnMapKeyPrune bug with distinct operator (jeremykarn via aniket486) |
| |
| PIG-3318: AVRO: 'default value' not honored when merging schemas on load with AvroStorage (viraj via rohini) |
| |
| PIG-3250: Pig dryrun generates wrong output in .expanded file for 'SPLIT....OTHERWISE...' command (dreambird via cheolsoo) |
| |
| PIG-3331: Default values not stored in avro file when using specific schemas during store in AvroStorage (viraj via rohini) |
| |
| PIG-3322: AvroStorage give NPE on reading file with union as top level schema (viraj via rohini) |
| |
| PIG-2828: Handle nulls in DataType.compare (aniket486) |
| |
| PIG-3335: TestErrorHandling.tesNegative7 fails on MR2 (xuefuz) |
| |
| PIG-3316: Pig failed to interpret DateTime values in some special cases (xuefuz) |
| |
| PIG-2956: Invalid cache specification for some streaming statement (daijy) |
| |
| PIG-3310: ImplicitSplitInserter does not generate new uids for nested schema fields, leading to miscomputations (cstenac via daijy) |
| |
| PIG-3334: Fix Windows piggybank unit test failures (daijy) |
| |
| PIG-3337: Fix remaining Window e2e tests (daijy) |
| |
| PIG-3328: DataBags created with an initial list of tuples don't get registered as spillable (mwagner via daijy) |
| |
| PIG-3313: pig job hang if the job tracker is bounced during execution (yu.chenjie via daijy) |
| |
| PIG-3297: Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc (nielsbasjes via cheolsoo) |
| |
| PIG-3069: Native Windows Compatibility for Pig E2E Tests and Harness (anthony.murphy via daijy) |
| |
| PIG-3291: TestExampleGenerator fails on Windows because of lack of file name escaping (dwann via daijy) |
| |
| PIG-3026: Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences (dwann via daijy) |
| |
| PIG-3025: TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification (dwann via daijy) |
| |
| PIG-2955: Fix bunch of Pig e2e tests on Windows (daijy) |
| |
| PIG:3302: JSONStorage throws NPE if map has null values (rohini) |
| |
| PIG-3309: TestJsonLoaderStorage fails with IBM JDK 6/7 (lrangel via daijy) |
| |
| PIG-3097: HiveColumnarLoader doesn't correctly load partitioned Hive table (maczech via daijy) |
| |
| PIG-3305: Infinite loop when input path contains empty partition directory (maczech via daijy) |
| |
| PIG-3286: TestPigContext.testImportList fails in trunk (cheolsoo) |
| |
| PIG-2970: Nested foreach getting incorrect schema when having unrelated inner query (daijy) |
| |
| PIG-3304: XMLLoader in piggybank does not work with inline closed tags (aseldawy via daijy) |
| |
| PIG-3028: testGrunt dev test needs some command filters to run correctly without cygwin (jgordon via gates) |
| |
| PIG-3290: TestLogicalPlanBuilder.testQuery85 fail in trunk (daijy) |
| |
| PIG-3027: pigTest unit test needs a newline filter for comparisons of golden multi-line (jgordon via gates) |
| |
| PIG-2767: Pig creates wrong schema after dereferencing nested tuple fields (daijy) |
| |
| PIG-3276: change the default value for hcat.bin to hcat instead of /usr/local/hcat/bin/hcat (arpitgupta via daijy) |
| |
| PIG-3277: fix the path to the benchmarks file in the print statement (arpitgupta via daijy) |
| |
| PIG-3122: Operators should not implicitly become reserved keywords (jcoveney via cheolsoo) |
| |
| PIG-3193: Fix "ant docs" warnings (cheolsoo) |
| |
| PIG-3186: tar/deb/pkg ant targets should depend on piggybank (lbendig via gates) |
| |
| PIG-3270: Union onschema failing at runtime when merging incompatible types (knoguchi via daijy) |
| |
| PIG-3271: POSplit ignoring error from input processing giving empty results (knoguchi via daijy) |
| |
| PIG-2265: Test case TestSecondarySort failure (daijy) |
| |
| PIG-3060: FLATTEN in nested foreach fails when the input contains an empty bag (daijy) |
| |
| PIG-3249: Pig startup script prints out a wrong version of hadoop when using fat jar (prkommireddi via daijy) |
| |
| PIG-3110: pig corrupts chararrays with trailing whitespace when converting them to long (prkommireddi via daijy) |
| |
| PIG-3253: Misleading comment w.r.t getSplitIndex() method in PigSplit.java (cheolsoo) |
| |
| PIG-3208: [zebra] TFile should not set io.compression.codec.lzo.buffersize (ekoontz via daijy) |
| |
| PIG-3172: Partition filter push down does not happen when there is a non partition key map column filter (rohini) |
| |
| PIG-3205: Passing arguments to python script does not work with -f option (rohini) |
| |
| PIG-3239: Unable to return multiple values from a macro using SPLIT (dreambird via cheolsoo) |
| |
| PIG-3077: TestMultiQueryLocal should not write in /tmp (dreambird via cheolsoo) |
| |
| PIG-3081: Pig progress stays at 0% for the first job in hadoop 23 (rohini) |
| |
| PIG-3150: e2e Scripting_5 fails in trunk (dreambird via cheolsoo) |
| |
| PIG-3153: TestScriptUDF.testJavascriptExampleScript fails in trunk (cheolsoo) |
| |
| PIG-3145: Parameters in core-site.xml and mapred-site.xml are not correctly substituted (cheolsoo) |
| |
| PIG-3135: HExecutionEngine should look for resources in user passed Properties (prkommireddi via cheolsoo) |
| |
| PIG-3200: MiniCluster should delete hadoop-site.xml on shutDown (prkommireddi via cheolsoo) |
| |
| PIG-3158: Errors in the document "Control Structures" (miyakawataku via cheolsoo) |
| |
| PIG-3161: Update reserved keywords in Pig docs (russell.jurney via cheolsoo) |
| |
| PIG-3156: TestSchemaTuple fails in trunk (cheolsoo) |
| |
| PIG-3155: TestTypeCheckingValidatorNewLP.testSortWithInnerPlan3 fails in trunk (cheolsoo) |
| |
| PIG-3154: TestPackage.testOperator fails in trunk (dreambird via cheolsoo) |
| |
| PIG-3168: TestMultiQueryBasic.testMultiQueryWithSplitInMapAndMultiMerge fails in trunk (cheolsoo) |
| |
| PIG-3137: Fix Piggybank test to not using /tmp dir (dreambird via cheolsoo) |
| |
| PIG-3149: e2e build.xml still refers to jython 2.5.0 jar even though it's replaced by jython standalone 2.5.2 jar (cheolsoo) |
| |
| PIG-2266: Bug with input file joining optimization in Pig (jadler via cheolsoo) |
| |
| PIG-2645: PigSplit does not handle the case where SerializationFactory returns null (shami via gates) |
| |
| PIG-3031: Update Pig to use a newer version of joda-time (zjshen via cheolsoo) |
| |
| PIG-3071: Update hcatalog jar and path to hbase storage handler jar in pig script (arpitgupta via cheolsoo) |
| |
| PIG-3029 TestTypeCheckingValidatorNewLP has some path reference issues for cross-platform execution (jgordon via gates) |
| |
| PIG-3120: setStoreFuncUDFContextSignature called with null signature (jdler via cheolsoo) |
| |
| PIG-3115: Distinct Build-in Function Doesn't Handle Null Bags (njw45 via daijy) |
| |
| PIG-2433: Jython import module not working if module path is in classpath (rohini) |
| |
| PIG-2769 a simple logic causes very long compiling time on pig 0.10.0 (nwhite via gates) |
| |
| PIG-2251: PIG leaks Zookeeper connections when using HBaseStorage (jamarkha via cheolsoo) |
| |
| PIG-3112: Errors and lacks in document "User Defined Functions" (miyakawataku via cheolsoo) |
| |
| PIG-3050: Fix FindBugs multithreading warnings (cheolsoo) |
| |
| PIG-3066: Fix TestPigRunner in trunk (cheolsoo) |
| |
| PIG-3101: Increase io.sort.mb in YARN MiniCluste (cheolsoo) |
| |
| PIG-3100: If a .pig_schema file is present, can get an index out of bounds error (jcoveney) |
| |
| PIG-3096: Make PigUnit thread safe (cheolsoo) |
| |
| PIG-3095: "which" is called many, many times for each Pig STREAM statement (nwhite via cheolsoo) |
| |
| PIG-3085: Errors and lacks in document "Built In Functions" (miyakawataku via cheolsoo) |
| |
| PIG-3084: Improve exceptions messages in POPackage (julien) |
| |
| PIG-3072: Pig job reporting negative progress (knoguchi via rohini) |
| |
| PIG-3014: CurrentTime() UDF has undesirable characteristics (jcoveney via cheolsoo) |
| |
| PIG-2924: PigStats should not be assuming all Storage classes to be file-based storage (cheolsoo) |
| |
| PIG-3046: An empty file name in -Dpig.additional.jars throws an error (prkommireddi via cheolsoo) |
| |
| PIG-2989: Illustrate for Rank Operator (xalan via gates) |
| |
| PIG-2885: TestJobSumission and TestHBaseStorage don't work with HBase 0.94 and ZK 3.4.3 (cheolsoo via sms) |
| |
| PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy) |
| |
| |
| Release 0.11.2 (Unreleased) |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-3380: Fix e2e test float precision related test failures when run with -Dpig.exec.mapPartAgg=true (anlilin via rohini) |
| |
| OPTIMIZATIONS |
| |
| PIG-2769: a simple logic causes very long compiling time on pig 0.10.0 (njw45 via dvryaboy) (prev. applied to 0.12) |
| |
| BUG FIXES |
| |
| PIG-3455: Pig 0.11.1 OutOfMemory error (rohini) |
| |
| PIG-3435: Custom Partitioner not working with MultiQueryOptimizer (knoguchi via daijy) |
| |
| PIG-3385: DISTINCT no longer uses custom partitioner (knoguchi via daijy) |
| |
| PIG-2507: Semicolon in paramenters for UDF results in parsing error (tnachen via daijy) |
| |
| PIG-3341: Strict datetime parsing and improve performance of loading datetime values (rohini) |
| |
| PIG-3329: RANK operator failed when working with SPLIT (xalan via cheolsoo) |
| |
| PIG-3345: Handle null in DateTime functions (rohini) |
| |
| PIG-3223: AvroStorage does not handle comma separated input paths (dreambird via rohini) |
| |
| PIG-3262: Pig contrib 0.11 doesn't compile on certain rpm systems (mgrover via cheolsoo) |
| |
| PIG-3264: mvn signanddeploy target broken for pigunit, pigsmoke and piggybank (billgraham) |
| |
| |
| Release 0.11.1 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-3256: Upgrade jython to 2.5.3 (legal concern) (daijy) |
| |
| PIG-2988: start deploying pigunit maven artifact part of Pig release process (njw45 via rohini) |
| |
| PIG-3148: OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag. (knoguchi via rohini) |
| |
| PIG-3216: Groovy UDFs documentation has minor typos (herberts via rohini) |
| |
| PIG-3202: CUBE operator not documented in user docs (prasanth_j via billgraham) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-3267: HCatStorer fail in limit query (daijy) |
| |
| PIG-3252: AvroStorage gives wrong schema for schemas with named records (mwagner via cheolsoo) |
| |
| PIG-3132: NPE when illustrating a relation with HCatLoader (daijy) |
| |
| PIG-3194: Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 (prkommireddi via dvryaboy) |
| |
| PIG-3241: ConcurrentModificationException in POPartialAgg (dvryaboy) |
| |
| PIG-3144: Erroneous map entry alias resolution leading to "Duplicate schema alias" errors (jcoveney via cheolsoo) |
| |
| PIGG-3212: Race Conditions in POSort and (Internal)SortedBag during Proactive Spill (kadeng via dvryaboy) |
| |
| PIG-3206: HBaseStorage does not work with Oozie pig action and secure HBase (rohini) |
| |
| |
| Release 0.11.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-3034: Remove Penny code from Pig repository (gates via cheolsoo) |
| |
| PIG-2931: $ signs in the replacement string make parameter substitution fail (cheolsoo via jcoveney) |
| |
| PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) |
| |
| IMPROVEMENTS |
| |
| PIG-3044: Trigger POPartialAgg compaction under GC pressure (dvryaboy) |
| |
| PIG-2907: Publish pig jars for Hadoop2/23 to maven (rohini) |
| |
| PIG-2934: HBaseStorage filter optimizations (billgraham) |
| |
| PIG-2980: documentation for DateTime datatype (zjshen via thejas) |
| |
| PIG-2982: add unit tests for DateTime type that test setting timezone (zjshen via thejas) |
| |
| PIG-2937: generated field in nested foreach does not inherit the variable name as the field name (jcoveney) |
| |
| PIG-3019: Need a target in build.xml for source releases (gates) |
| |
| PIG-2832: org.apache.pig.pigunit.pig.PigServer does not initialize udf.import.list of PigContext (prkommireddi via rohini) |
| |
| PIG-2898: Parallel execution of e2e tests (iveselovsky via rohini) |
| |
| PIG-2913: org.apache.pig.test.TestPigServerWithMacros fails sometimes because it picks up previous minicluster configuration file (cheolsoo via julien) |
| |
| PIG-2976: Reduce HBaseStorage logging (billgraham) |
| |
| PIG-2947: Documentation for Rank operator (xalan via azaroth) |
| |
| PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) |
| |
| PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) |
| |
| PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) |
| |
| PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) |
| |
| PIG-2579: Support for multiple input schemas in AvroStorage (cheolsoo via sms) |
| |
| PIG-2946: Documentation of "history" and "clear" commands (xalan via azaroth) |
| |
| PIG-2877: Make SchemaTuple work in foreach (and thus, in loads) (jcoveney) |
| |
| PIG-2923: Lazily register bags with SpillableMemoryManager (dvryaboy) |
| |
| PIG-2929: Improve documentation around AVG, CONCAT, MIN, MAX (cheolsoo via billgraham) |
| |
| PIG-2852: Update documentation regarding parallel local mode execution (cheolsoo via jcoveney) |
| |
| PIG-2879: Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix. (initialcontext via azaroth) |
| |
| PIG-2712: Pig does not call OutputCommitter.abortJob() on the underlying OutputFormat (rohini via gates) |
| |
| PIG-2918: Avoid Spillable bag overhead where possible (dvryaboy) |
| |
| PIG-2900: Streaming should provide conf settings in the environment (dvryaboy) |
| |
| PIG-2353: RANK function like in SQL (xalan via azaroth) |
| |
| PIG-2915: Builtin TOP udf is sensitive to null input bags (hazen via dvryaboy) |
| |
| PIG-2901: Errors and lacks in document "Pig Latin Basics" (miyakawataku via billgraham) |
| |
| PIG-2905: Improve documentation around REPLACE (cheolsoo via billgraham) |
| |
| PIG-2882: Use Deque instead of Stack (mkhadikov via dvryaboy) |
| |
| PIG-2781: LOSort isEqual method (xalan via dvryaboy) |
| |
| PIG-2835: Optimizing the convertion from bytes to Integer/Long (jay23jack via dvryaboy) |
| |
| PIG-2886: Add Scan TimeRange to HBaseStorage (ted.m via dvryaboy) |
| |
| PIG-2895: jodatime jar missing in pig-withouthadoop.jar (thejas) |
| |
| PIG-2888: Improve performance of POPartialAgg (dvryaboy) |
| |
| PIG-2708: split MiniCluster based tests out of org.apache.pig.test.TestInputOutputFileValidator (analog.sony via daijy) |
| |
| PIG-2890: Revert PIG-2578 (dvryaboy) |
| |
| PIG-2850: Pig should support loading macro files as resources stored in JAR files (matterhayes via dvryaboy) |
| |
| PIG-1314: Add DateTime Support to Pig (zjshen via thejas) |
| |
| PIG-2785: NoClassDefFoundError after upgrading to pig 0.10.0 from 0.9.0 (matterhayes via sms) |
| |
| PIG-2556: CSVExcelStorage load: quoted field with newline as first character sees newline as record end (tivv via dvryaboy) |
| |
| PIG-2875: Add recursive record support to AvroStorage (cheolsoo via sms) |
| |
| PIG-2662: skew join does not honor its config parameters (rajesh.balamohan via thejas) |
| |
| PIG-2871: Refactor signature for PigReducerEstimator (billgraham) |
| |
| PIG-2851: Add flag to ant to run tests with a debugger port (billgraham) |
| |
| PIG-2862: Hardcode certain tuple lengths into the TUPLE BinInterSedes byte identifier (jcoveney) |
| |
| PIG-2855: Provide a method to measure time spent in UDFs (dvryaboy) |
| |
| PIG-2837: AvroStorage throws StackOverFlowError (cheolsoo via sms) |
| |
| PIG-2856: AvroStorage doesn't load files in the directories when a glob pattern matches both files and directories. (cheolsoo via sms) |
| |
| PIG-2569: Fix org.apache.pig.test.TestInvoker.testSpeed (aklochkov via dvryaboy) |
| |
| PIG-2858: Improve PlanHelper to allow finding any PhysicalOperator in a plan (dvryaboy) |
| |
| PIG-2854: AvroStorage doesn't work with Avro 1.7.1 (cheolsoo via sms) |
| |
| PIG-2779: Refactoring the code for setting number of reducers (jay23jack via billgraham) |
| |
| PIG-2765: Implementing RollupDimensions UDF and adding ROLLUP clause in CUBE operator (prasanth_j via dvryaboy) |
| |
| PIG-2814: Fix issues with Sample operator documentation (prasanth_j via dvryaboy) |
| |
| PIG-2817: Documentation for Groovy UDFs (herberts via julien) |
| |
| PIG-2492: AvroStorage should recognize globs and commas (cheolsoo via sms) |
| |
| PIG-2706: Add clear to list of grunt commands (xalan via azaroth) |
| |
| PIG-2823: TestPigContext.testImportList() does not pass if another javac in on the PATH (julien) |
| |
| PIG-2800: pig.additional.jars path separator should align with File.pathSeparator instead of being hard-coded to ":" (jgordon via azaroth) |
| |
| PIG-2797: Tests should not create their own file URIs through string concatenation, should use Util.generateURI instead (jgordon via azaroth) |
| |
| PIG-2820: relToAbsolutePath is not replayed properly when Grunt reparses the script after PIG-2699 (julien) |
| |
| PIG-2763: Groovy UDFs (herberts via julien) |
| |
| PIG-2780: MapReduceLauncher should break early when one of the jobs throws an exception (jay23jack via daijy) |
| |
| PIG-2804: Remove "PIG" exec type (dvryaboy) |
| |
| PIG-2726: Handling legitimate NULL values in Cube operator (prasanth_j via dvryaboy) |
| |
| PIG-2808: Add *.project to .gitignore (azaroth) |
| |
| PIG-2787: change the module name in ivy to lowercase to match the maven repo (julien) |
| |
| PIG-2632: Create a SchemaTuple which generates efficient Tuples via code gen (jcoveney) |
| |
| PIG-2750: add artifacts to the ivy.xml for other jars Pig generates (julien) |
| |
| PIG-2748: Change the names of the jar produced in the build folder to match maven conventions (julien) |
| |
| PIG-2770: Allow easy inclusion of custom build targets (julien) |
| |
| PIG-2697: pretty print schema via pig.pretty.print.schema (rangadi via jcoveney) |
| |
| PIG-2673: Allow Merge join to follow an ORDER statement (dvryaboy) |
| |
| PIG-2699: Reduce the number of instances of Load and Store Funcs down to 2+1 (julien) |
| |
| PIG-2166: UDFs to join a bag (hluu via daijy) |
| |
| PIG-2651: Provide a much easier to use accumulator interface (jcoveney via daijy) |
| |
| PIG-2658: Add pig.script.submitted.timestamp and pig.job.submitted.timestamp in generated Map-Reduce job conf (billgraham) |
| |
| PIG-2735: Add a pig.version.suffix property in build.xml to easily override with a build number (julien) |
| |
| PIG-2705: outputSchema modification from scripting UDFs (levyjoshua via julien) |
| |
| PIG-2724: Make Tuple Iterable (jcoveney) |
| |
| PIG-2733: Add *.patch, *.log, *.orig, *.rej, *.class to gitignore (jcoveney) |
| |
| PIG-2732: Let's get rid of the deprecated Tuple methods (jcoveney) |
| |
| PIG-2638: Optimize BinInterSedes treatment of longs (jcoveney) |
| |
| PIG-2727: PigStorage Source tagging does not need pig.splitCombination to be turned off (prkommireddi via dvryaboy) |
| |
| PIG-2710: Implement Naive CUBE operator (prasanth_j via dvryaboy) |
| |
| PIG-2714: Pig documentation on TOP funcation has issues (daijy) |
| |
| PIG-2066: Accumulators should be able to early-terminate (jcoveney) |
| |
| PIG-2600: Better Map support (prkommireddi via jcoveney) |
| |
| PIG-2711: e2e harness: cache benchmark results between test runs (thw via daijy) |
| |
| PIG-2702: Make Pig local mode (and tests) faster by working around the hard coded sleep(5000) in hadoop's JobControl (julien) |
| |
| PIG-2659: add source location of the aliases in the physical plan (julien) |
| |
| PIG-2547: Easier UDFs: Convenient EvalFunc super-classes (billgraham, dvryaboy) |
| |
| PIG-2639: Utils.getSchemaFromString should automatically give name to all types, but fails on boolean (jcoveney) |
| |
| PIG-2696: Enhance Job Stat to print out median map and reduce time (hluu via daijy) |
| |
| PIG-2583: Add Grunt command to list the statements in cache (xalan via daijy) |
| |
| PIG-2688: Log the aliases being processed for the current job (ddaniels888 via azaroth) |
| |
| PIG-2680: TOBAG output schema reporting (andy schlaikjer via jcoveney) |
| |
| PIG-2685: Fix error in EvalFunc ctor when implementing Algebraic UDF whose return type is parameterized (andy schlaikjer via jcoveney) |
| |
| PIG-2664: Allow PPNL impls to get more job info during the run (billgraham) |
| |
| PIG-2663: Expose helpful ScriptState methods (billgraham) |
| |
| PIG-2660: PPNL notified of plan before it gets executed (billgraham) |
| |
| PIG-2574: Make reducer estimator plugable (billgraham) |
| |
| PIG-2677: Add target to build.xml to generate clover summary reports (gates) |
| |
| PIG-2650: Convenience mock Loader and Storer to simplify unit testing of Pig scripts (julien) |
| |
| PIG-2257: AvroStorage doesn't recognize schema_file field when JSON isn't used in the constructor (billgraham) |
| |
| PIG-2587: Compute LogicalPlan signature and store in job conf (billgraham) |
| |
| PIG-2619: HBaseStorage constructs a Scan with cacheBlocks = false |
| |
| PIG-2604: Pig should print its build info at runtime (traviscrawford via dvryaboy) |
| |
| PIG-2573: Automagically setting parallelism based on input file size does not work with HCatalog (traviscrawford via julien) |
| |
| PIG-2538: Add helper wrapper classes for StoreFunc (billgraham via dvryaboy) |
| |
| PIG-2010: registered jars on distributed cache (traviscrawford and julienledem via dvryaboy) |
| |
| PIG-2533: Pig MR job exceptions masked on frontend (traviscrawford via dvryaboy) |
| |
| PIG-2525: Support pluggable PigProcessNotifcationListeners on the command line (dvryaboy) |
| |
| PIG-2515: [piggybank] Make CustomFormatToISO return null on Exception in parsing dates (rjurney via dvryaboy) |
| |
| PIG-2503: Make @MonitoredUDF inherited (dvryaboy) |
| |
| PIG-2488: Move Python unit tests to e2e tests (alangates via daijy) |
| |
| PIG-2456: Pig should have a pigrc to specify default script cache (prkommireddi via daijy) |
| |
| PIG-2496: Cache resolved classes in PigContext (dvryaboy) |
| |
| PIG-2482: Integrate HCat DDL command into Pig (daijy) |
| |
| PIG-2479: changingPattern should be used with checkmodified in ivysettings.xml (abayer via azaroth) |
| |
| PIG-2349: Ant build repeats ivy-buildJar several times (azaroth) |
| |
| PIG-2359: Support more efficient Tuples when schemas are known (dvryaboy) |
| |
| PIG-2282: Automatically update Eclipse .classpath file when new libs are added to the classpath through Ivy (azaroth via daijy) |
| |
| PIG-2468: Speed up TestBuiltin (dvryaboy) |
| |
| PIG-2467: Speed up TestCommit (dvryaboy) |
| |
| PIG-2460: Use guava 11 instead of r06 (dvryaboy) |
| |
| PIG-2267: Make the name of the columns in schema optional (jcoveney via daijy) |
| |
| PIG-2453: Fetching schema can be very slow for multi-thousand file LOADs (dvryaboy) |
| |
| PIG-2443: [Piggybank] Add UDFs to check if a String is an Integer And if a String is Numeric (prkommireddi via daijy) |
| |
| PIG-2437: Use Ivy to get automaton.jar (azaroth) |
| |
| PIG-2448: Convert more tests to use LOCAL mode (dvryaboy) |
| |
| PIG-2438: Do not hardcode commons-lang version in build.xml (azaroth) |
| |
| PIG-2422: Add log messages for Jython schema definitions (vivekp via gates) |
| |
| PIG-2403: Reduce code duplication in SUM, MAX, MIN udfs (dvryaboy) |
| |
| PIG-2245: Add end to end test for tokenize (markroddy via gates) |
| |
| PIG-2327: bin/pig doesn't have any hooks for picking up ZK installation deployed from tarballs (rvs via hashutosh) |
| |
| PIG-2382: Modify .gitignore to ignore pig-withouthadoop.jar (azaroth via hashutosh) |
| |
| PIG-2380: Expose version information more cleanly (jcoveney via azaroth) |
| |
| PIG-2311: STRSPLIT needs to allow bytearray arguments (xuting via olgan) |
| |
| PIG-2365: Current TOP implementation needlessly results in a null bag name (jcoveney via dvryaboy) |
| |
| PIG-2151: Add annotation to specify output schema in Java UDFs (dvryaboy) |
| |
| PIG-2230: Improved error message for invalid parameter format (xuitingz via olgan) |
| |
| PIG-2328: Add builtin UDFs for building and using bloom filters (gates) |
| |
| PIG-2338: Need signature for EvalFunc (daijy) |
| |
| PIG-2337: Provide UDF with input schema (xutingz via daijy) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-3147: Spill failing with "java.lang.RuntimeException: InternalCachedBag.spill() should not be called" (knoguchi via dvryaboy) |
| |
| PIG-3109: Missing license headers (jarcec via cheolsoo) |
| |
| PIG-3022: TestRegisteredJarVisibility.testRegisteredJarVisibility fails with hadoop-2.0.x (rohini via cheolsoo) |
| |
| PIG-3125: Fix zebra compilation error (cheolsoo) |
| |
| PIG-3051: java.lang.IndexOutOfBoundsException failure with LimitOptimizer + ColumnPruning (knoguchi via rohini) |
| |
| PIG-3076: make TestScalarAliases more reliable (julien) |
| |
| PIG-3020: "Duplicate uid in schema" error when joining two relations derived from the same load statement (jcoveney) |
| |
| PIG-3044: hotfix to remove divide by 0 error (jcoveney) |
| |
| PIG-3033: test-patch failed with javadoc warnings (fang fang chen via cheolsoo) |
| |
| PIG-3058: Upgrade junit to at least 4.8 (fang fang chen via cheolsoo) |
| |
| PIG-2978: TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x (cheolsoo) |
| |
| PIG-3039: Not possible to use custom version of jackson jars (rohini) |
| |
| PIG-3045: Specifying sorting field(s) at nightly.conf - fix sortArgs (rohini via cheolsoo) |
| |
| PIG-2979: Pig.jar doesn't work with hadoop-2.0.x (cheolsoo) |
| |
| PIG-3035: With latest version of hadoop23 pig does not return the correct exception stack trace from backend (rohini) |
| |
| PIG-2405: some unit test case failed with open JDK (fang fang chen via cheolsoo) |
| |
| PIG-3018: Refactor TestScriptLanguage to remove duplication and write script in different files (julien) |
| |
| PIG-2973: TestStreaming test times out (cheolsoo) |
| |
| PIG-3001: TestExecutableManager.testAddJobConfToEnv fails randomly (cheolsoo) |
| |
| PIG-3017: Pig's object serialization should use compression (jcoveney) |
| |
| PIG-2968: ColumnMapKeyPrune fails to prune a subtree inside foreach (knoguchi via cheolsoo) |
| |
| PIG-2999: Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing (knoguchi via azaroth) |
| |
| PIG-2998: Fix TestScriptLanguage and TestMacroExpansion (cheolsoo via jcoveney) |
| |
| PIG-2975: TestTypedMap.testOrderBy failing with incorrect result (knoguchi via jcoveney) |
| |
| PIG-2950: Fix tiny documentation error in BagToString builtin (initialcontext via daijy) |
| |
| PIG-2967: Fix Glob_local test failure for Pig E2E Test Framework (sushantj via daijy) |
| |
| PIG-1283: COUNT on null bag causes failure (analog.sony via jcoveney) |
| |
| PIG-2958: Pig tests do not appear to have a logger attached (daijyc via jcoveney) |
| |
| PIG-2926: TestPoissonSampleLoader failing on rhel environment (jcoveney) |
| |
| PIG-2985: TestRank1,2,3 fail with hadoop-2.0.x (rohini via azaroth) |
| |
| PIG-2971: Add new parameter to specify the streaming environment (jcoveney) |
| |
| PIG-2963: Illustrate command and POPackageLite (cheolsoo via jcoveney) |
| |
| PIG-2961: BinInterSedesRawComparator broken by TUPLE_number patch (jcoveney) |
| |
| PIG-2932: Setting high default_parallel causes IOException in local mode (cheolsoo via gates) |
| |
| PIG-2737: [piggybank] TestIndexedStorage is failing, should be refactored (jcoveney) |
| |
| PIG-2935: Catch NoSuchMethodError when StoreFuncInterface's new cleanupOnSuccess method isn't implemented. (gates via dvryaboy) |
| |
| PIG-2920: e2e tests override PERL5LIB environment variable (azaroth) |
| |
| PIG-2917: SpillableMemoryManager memory leak for WeakReference (haitao.yao via dvryaboy) |
| |
| PIG-2938: All unit tests that use MR2 MiniCluster are broken in trunk (cheolsoo via dvryaboy) |
| |
| PIG-2936: Tuple serialization bug (jcoveney) |
| |
| PIG-2930: ant test doesn't compile in trunk (cheolsoo via daijy) |
| |
| PIG-2791: Pig does not work with ViewFileSystem (rohini via daijy) |
| |
| PIG-2833: org.apache.pig.pigunit.pig.PigServer does not initialize set default log level of pigContext (cheolsoo via jcoveney) |
| |
| PIG-2744: Handle Pig command line with XML special characters (lulynn_2008 via daijy) |
| |
| PIG-2637: Command-line option -e throws TokenMgrError exception (lulynn_2008 via daijy) |
| |
| PIG-2887: Macro cannot handle negative number (knoguchi via gates) |
| |
| PIG-2844: ant makepom is misconfigured (julien) |
| |
| PIG-2896: Pig does not fail anymore if two macros are declared with the same name (julien) |
| |
| PIG-2848: TestBuiltInBagToTupleOrString fails now that mock.Storage enforces not overwriting output (julien) |
| |
| PIG-2884: JobControlCompiler mis-logs after reducer estimation (billgraham) |
| |
| PIG-2876: Bump up Xerces version (jcoveney) |
| |
| PIG-2866: PigServer fails with macros without a script file (billgraham) |
| |
| PIG-2860: [piggybank] TestAvroStorageUtils.testGetConcretePathFromGlob fails on some version of hadoop (cheolsoo via jcoveney) |
| |
| PIG-2861: PlanHelper imports org.python.google.common.collect.Lists instead of org.google.common.collect.Lists (jcoveney) |
| |
| PIG-2849: Errors in document Getting Started (miyakawataku via billgraham) |
| |
| PIG-2843: Typo in Documentation (eric59 via billgraham) |
| |
| PIG-2841: Inconsistent URL in Docs (eric59 via billgraham) |
| |
| PIG-2740: get rid of "java[77427:1a03] Unable to load realm info from SCDynamicStore" log lines when running pig tests (julien) |
| |
| PIG-2839: mock.Storage overwrites output with the last relation written when storing UNION (julien) |
| |
| PIG-2840: Fix SchemaTuple bugs (jcoveney) |
| |
| PIG-2842: TestNewPlanOperatorPlan fails when new Configuration() picks up a previous minicluster conf file (julien) |
| |
| PIG-2827: Unwrap exception swallowing in TOP (haitao.yao via jcoveney) |
| |
| PIG-2825: StoreFunc signature setting in LogicalPlan broken (jcoveney) |
| |
| PIG-2815: class loader management in PigContext (rangadi via jcoveney) |
| |
| PIG-2813: Fix test regressions from PIG-2632 (jcoveney) |
| |
| PIG-2806: Fix merge join test regression from PIG-2632 (jcoveney) |
| |
| PIG-2809: TestUDFContext broken by PIG-2699 (julienledem via daijy) |
| |
| PIG-2807: TestParser TestPigStorage TestNewPlanOperatorPlan broken by PIG-2699 (julienledem via daijy) |
| |
| PIG-2782: Specifying sorting field(s) at nightly.conf (cheolsoo via daijy) |
| |
| PIG-2790: After Pig-2699 the script schema (LOAD ... USING ... AS {script schema}) is passed after getSchema is called (daijy) |
| |
| PIG-2777: Docs are broken due to malformed xml after PIG-2673 (dvryaboy) |
| |
| PIG-2593: Filter by a boolean value does not work (jay23jack via daijy) |
| |
| PIG-2665: Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts with embedded Pig Latin (daijy) |
| |
| PIG-2736: Support implicit cast from bytearray to boolean (jay23jack via daijy) |
| |
| PIG-2508: PIG can unpredictably ignore deprecated Hadoop config options (thw via dvryaboy) |
| |
| PIG-2691: Duplicate TOKENIZE schema (jay23jack via azaroth) |
| |
| PIG-2173: piggybank datetime conversion javadocs not properly formatted (hluu via daijy) |
| |
| PIG-2709: PigAvroRecordReader should specify which file has a problem when throwing IOException (mpercy via daijy) |
| |
| PIG-2640: Usage message gives wrong information for Pig additional jars (prkommireddi via daijy) |
| |
| PIG-2652: Skew join and order by don't trigger reducer estimation (dvryaboy) |
| |
| PIG-2616: JobControlCompiler.getInputSizeFromLoader must handle exceptions from LoadFunc.getStatistics (billgraham) |
| |
| PIG-2644: Piggybank's HadoopJobHistoryLoader throws NPE when reading broken history file (herberts via daijy) |
| |
| PIG-2627: Custom partitioner not set when POSplit is involved in Plan (aniket486 via daijy) |
| |
| PIG-2596: Jython UDF does not handle boolean output (aniket486 via daijy) |
| |
| PIG-2649: org.apache.pig.parser.ParserValidationException does not expose the cause exception |
| |
| PIG-2540: [piggybank] AvroStorage can't read schema on amazon s3 in elastic mapreduce (rjurney via jcoveney) |
| |
| PIG-2618: e2e local fails to build |
| |
| PIG-2608: Typo in PigStorage documentation for source tagging (prkommireddi via daijy) |
| |
| PIG-2590: running ant tar and rpm targets on same copy of pig source results in problems (thejas) |
| |
| PIG-2581: HashFNV inconsistent/non-deterministic due to default platform encoding (prkommireddi via daijy) |
| |
| PIG-2514: REGEX_EXTRACT not returning correct group with non greedy regex (romainr via daijy) |
| |
| PIG-2532: Registered classes fail deserialization in frontend (traviscrawford via julien) |
| |
| PIG-2549: org.apache.pig.piggybank.storage.avro - Broken documentation link for AvroStorage (chrisas via daijy) |
| |
| PIG-2322: varargs functions do not get passed the arguments in Python embedding (julien) |
| |
| PIG-2491: Pig docs still mention hadoop-site.xml (daijy) |
| |
| PIG-2504: Incorrect sample provided for REGEX_EXTRACT (prkommireddi via daijy) |
| |
| PIG-2502: Make "hcat.bin" configurable in e2e test (daijy) |
| |
| PIG-2501: Changes needed to contrib/piggybank/java/build.xml in order to build piggybank.jar with Hadoop 0.23 |
| (ekoontz via daijy) |
| |
| PIG-2499: Pig TestGrunt.testShellCommand occasionally fails (tomwhite via daijy) |
| |
| PIG-2326: Pig minicluster tests can not be run from eclipse (julienledem via daijy) |
| |
| PIG-2432: Eclipse .classpath file is out of date (gates) |
| |
| PIG-2427: getSchemaFromString throws away the name of the tuple that is in a bag (jcoveney via dvryaboy) |
| |
| PIG-2425: Aggregate Warning does not work as expected on Embedding Pig in Java 0.9.1 (prkommireddi via thejas) |
| |
| PIG-2384: Generic Invokers should use PigContext to resolve classes (dvryaboy) |
| |
| PIG-2379: Bug in Schema.getPigSchema(ResourceSchema rSchema) improperly adds two level access (jcoveney via dvryaboy) |
| |
| PIG-2355: ant clean does not clean e2e test build artifacts (daijy) |
| |
| PIG-2352: e2e test harness' use of environment variables causes unintended effects between tests (gates) |
| |
| Release 0.10.1 |
| |
| BUG FIXES |
| |
| PIG-3107: bin and autocomplete are missing in src release (daijy) |
| |
| PIG-3106: Missing license header in several java file (daijy) |
| |
| PIG-3099: Pig unit test fixes for TestGrunt(1), TestStore(2), TestEmptyInputDir(3) (vikram.dixit via daijy) |
| |
| PIG-2953: "which" utility does not exist on Windows (daijy) |
| |
| PIG-2960: Increase the timeout for unit test (daijy) |
| |
| PIG-2942: DevTests, TestLoad has a false failure on Windows (jgordon via daijy) |
| |
| PIG-2801: grunt "sh" command should invoke the shell implicitly instead of calling exec directly with the command tokens |
| (jgordon via daijy) |
| |
| PIG-2798: pig streaming tests assume interpreters are auto-resolved (jgordon via daijy) |
| |
| PIG-2795: Fix test cases that generate pig scripts with "load " + pathStr to encode "\" in the path (jgordon via daijy) |
| |
| PIG-2796: Local temporary paths are not always valid HDFS path names (jgordon via daijy) |
| |
| Release 0.10.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-2664: Allow PPNL impls to get more job info during the run (billgraham) |
| |
| PIG-2663: Expose helpful ScriptState methods (billgraham) |
| |
| PIG-2660: PPNL notified of plan before it gets executed (billgraham) |
| |
| PIG-2574: Make reducer estimator plugable (billgraham) |
| |
| PIG-2601: Additional document for 0.10 (daijy) |
| |
| PIG-2317: Ruby/Jruby UDFs (jcoveney via daijy) |
| |
| PIG-1270: Push limit into loader (daijy) |
| |
| PIG-2589: Additional e2e test for 0.10 new features (daijy) |
| |
| PIG-2182: Add more append support to DataByteArray (gsingers via daijy) |
| |
| PIG-438: Handle realiasing of existing Alias (A=B;) (daijy) |
| |
| PIG-2548: Support for providing parameters to python script (daijy) |
| |
| PIG-2518: Add ability to clean ivy cache in build.xml (daijy) |
| |
| PIG-2300: Pig Docs - release 0.10.0 (and 0.9.1) (chandec via daijy) |
| |
| PIG-2332: JsonLoader/JsonStorage (daijy) |
| |
| PIG-2334: Set default number of reducers for S3N filesystem (ddaniels888 via daijy) |
| |
| PIG-1387: Syntactical Sugar for PIG-1385 (azaroth) |
| |
| PIG-2305: Pig should log the split locations in task logs (vivekp via thejas) |
| |
| PIG-2293: Pig should support a more efficient merge join against data sources that natively support point |
| lookups or where the join is against large, sparse tables (aklish via daijy) |
| |
| PIG-2287: add test cases for limit and sample that use expressions with |
| constants only (no scalar variables) (thejas via gates) |
| |
| PIG-2092: Missing sh command from Grant shell (olgan) |
| |
| PIG-2163: Improve nested cross to stream one relation (zjshen via daijy) |
| |
| PIG-2249: Enable pig e2e testing on EC2 (gates) |
| |
| PIG-2256: Upgrade Avro dependency to 1.5.3 (tucu00 via dvryaboy) |
| |
| PIG-604: Kill the Pig job should kill all associated Hadoop Jobs (daijy) |
| |
| PIG-2096: End to end tests for new Macro feature (gates) |
| |
| PIG-2242: Allow the delimiter to be specified when calling TOKENIZE (markroddy via hashutosh) |
| |
| PIG-2240: Allow any compression codec to be specified in AvroStorage (tomwhite via dvryaboy) |
| |
| PIG-2229: Pig end-to-end tests should test local mode as well as mr mode (gates) |
| |
| PIG-2235: Several files in e2e tests aren't being run (gates) |
| |
| PIG-2196: Test harness should be independent of Pig (hashutosh) -- Missed few |
| changes in last commit. |
| |
| PIG-2196: Test harness should be independent of Pig (hashutosh) |
| |
| PIG-1429: Add Boolean Data Type to Pig (zjshen via daijy) |
| |
| PIG-2218: Pig end-to-end tests should be accessible from top level build.xml (gates) |
| |
| PIG-2176: add logical plan assumption checker (thejas) |
| |
| PIG-1631: Support to 2 level nested foreach (aniket486 via daijy) |
| |
| PIG-2191: Reduce amount of log spam generated by UDFs (dvryaboy) |
| |
| PIG-2200: Piggybank cannot be built from the Git mirror (dvryaboy) |
| |
| PIG-2168: CubeDimensions UDF (dvryaboy) |
| |
| PIG-2189: e2e test harness needs to use Pig as a source of truth (gates via daijy) |
| |
| PIG-1904: Default split destination (azaroth via thejas) |
| |
| PIG-2143: Make PigStorage optionally store schema; improve docs. (dvryaboy) |
| |
| PIG-1973: UDFContext.getUDFContext usage of ThreadLocal pattern |
| is not typical (woody via thejas) |
| |
| PIG-2053: PigInputFormat uses class.isAssignableFrom() where |
| instanceof is more appropriate (woody via thejas) |
| |
| PIG-2161: TOTUPLE should use no-copy tuple creation (dvryaboy) |
| |
| PIG-1946: HBaseStorage constructor syntax is error prone (billgraham via dvryaboy) |
| |
| PIG-2001: DefaultTuple(List) constructor is inefficient, causes List.size() |
| System.arraycopy() calls (though they are 0 byte copies), |
| DefaultTuple(int) constructor is a bit misleading wrt time |
| complexity (woody via thejas) |
| |
| PIG-1916: Nested cross (zjshen via daijy) |
| |
| PIG-2121: e2e test harness should use ant instead of make (gates) |
| |
| PIG-2142: Allow registering multiple jars from DFS via single statement (rangadi via dvryaboy) |
| |
| PIG-1926: Sample/Limit should take scalar (azaroth via thejas) |
| |
| PIG-1950: e2e test harness needs to be able to compare to previous version of |
| Pig (gates) |
| |
| PIG-536: the shell script 'pig' does not work if PIG_HOME has the word 'hadoop' in it's directory (miguno via olgan) |
| |
| PIG-2108 e2e test harness needs to be able to mark certain tests as ignored |
| (gates) |
| |
| PIG-1825: ability to turn off the write ahead log for pig's HBaseStorage (billgraham via dvryaboy) |
| |
| PIG-1772: Pig 090 Documentation (chandec via olgan) |
| |
| PIG-1772: Pig 090 Documentation (chandec via olgan) |
| |
| PIG-1772: Pig 090 Documentation (chandec via olgan) |
| |
| PIG-1824: Support import modules in Jython UDF (woody via rding) |
| |
| PIG-1994: e2e test harness deployment implementation for existing cluster |
| (gates) |
| |
| PIG-2036: [piggybank] Set header delimiter in PigStorageSchema (mmoeller via dvryaboy) |
| |
| PIG-1949: e2e test harness should use bin/pig rather than calling java |
| directly (gates) |
| |
| PIG-2026: e2e tests in eclipse classpath (azaroth via hashutosh) |
| |
| PIG-2024: Incorrect jar paths in .classpath template for eclipse (azaroth via hashutosh) |
| |
| OPTIMIZATIONS |
| |
| PIG-2011: Speed up TestTypedMap.java (dvryaboy) |
| |
| PIG-2228: support partial aggregation in map task (thejas) |
| |
| BUG FIXES |
| |
| PIG-2940: HBaseStorage store fails in secure cluster (cheolsoo via daijy) |
| |
| PIG-2821: HBaseStorage should work with secure hbase (rohini via daijy) |
| |
| PIG-2859: Fix few e2e test failures (rohini via daijy) |
| |
| PIG-2729: Macro expansion does not use pig.import.search.path - UnitTest borked (johannesch via daijy) |
| |
| PIG-2783: Fix Iterator_1 e2e test for Hadoop 23 (rohini via daijy) |
| |
| PIG-2761: With hadoop23 importing modules inside python script does not work (rohini via daijy) |
| |
| PIG-2759: Typo in document "Built In Functions" (daijy) |
| |
| PIG-2745: Pig e2e test RubyUDFs fails in MR mode when running from tarball (cheolsoo via daijy) |
| |
| PIG-2741: Python script throws an NameError: name 'Configuration' is not defined in case cache dir is not created |
| (knoguchi via daijy) |
| |
| PIG-2669: Pig release should include pig-default.properties after rebuild (daijy) |
| |
| PIG-2739: PyList should map to Bag automatically in Jython (daijy) |
| |
| PIG-2730: TFileStorage getStatistics incorrectly throws an exception instead of returning null (traviscrawford via daijy) |
| |
| PIG-2717: Tuple field mangled during flattening (daijy) |
| |
| PIG-2721: Wrong output generated while loading bags as input (knoguchi via daijy) |
| |
| PIG-2578: Multiple Store-commands mess up mapred.output.dir. (daijy) |
| |
| PIG-2623: Support S3 paths for registering UDFs (nshkrob via daijy) |
| |
| PIG-2505: AvroStorage won't read any file not ending in .avro (russell.jurney via daijy) |
| |
| PIG-2585: Enable ignored e2e test cases (daijy) |
| |
| PIG-2563: IndexOutOfBoundsException: while projecting fields from a bag (daijy) |
| |
| PIG-2411: AvroStorage UDF in PiggyBank fails to STORE a bag of single-field tuples as Avro arrays (russell.jurney via daijy) |
| |
| PIG-2565: Support IMPORT for macros stored in S3 Buckets (daijy) |
| |
| PIG-2570: LimitOptimizer fails with dynamic LIMIT argument (daijy) |
| |
| PIG-2543: PigStats.isSuccessful returns false if embedded pig script has sh commands (daijy) |
| |
| PIG-2509: Util.getSchemaFromString fails with java.lang.NullPointerException when a tuple in a bag has no name (as when used in MongoStorage UDF) (jcoveney via daijy) |
| |
| PIG-2559: Embedded pig in python; invoking sys.exit(0) causes script failure (vivekp via daijy) |
| |
| PIG-2530: Reusing alias name in nested foreach causes incorrect results (daijy) |
| |
| PIG-2489: Input Path Globbing{} not working with PigStorageSchema or PigStorage('\t', '-schema') (daijy) |
| |
| PIG-2484: Fix several e2e test failures/aborts for 23 (daijy) |
| |
| PIG-2400: Document has based aggregation support (chandec via daijy) |
| |
| PIG-2444: Remove the Zebra *.xml documentation files from the TRUNK and Branch-10 (chandec via daijy) |
| |
| PIG-2430: An EvalFunc which overrides getArgToFuncMapping with FuncSpec |
| with constructor arguments is not properly instantiated with said arguments (jcoveney via thejas) |
| |
| PIG-2457: JsonLoaderStorage tests is broken for e2e (daijy) |
| |
| PIG-2426: ProgressableReporter.progress(String msg) is an empty function (vivekp via daijy) |
| |
| PIG-2363: _logs for streaming commands bug in new parser (vivekp via daijy) |
| |
| PIG-2331: BinStorage in LOAD statement failing when input has curly braces (xutingz via thejas) |
| |
| PIG-2391: Bzip_2 test is broken (xutingz via daijy) |
| |
| PIG-2358: JobStats.getHadoopCounters() is never set and always returns null (xutingz via daijy) |
| |
| PIG-2184: Not able to provide positional reference to macro invocations (xutingz via daijy) |
| |
| PIG-2209: JsonMetadata fails to find schema for glob paths (daijy) |
| |
| PIG-2165: Need a way to deal with params and param_file in embedded pig in python (daijy) |
| |
| PIG-2313: NPE in ILLUSTRATE trying to get StatusReporter in STORE (daijy) |
| |
| PIG-2335: bin/pig does not work with bash 3.0 (azaroth) |
| |
| PIG-2275: NullPointerException from ILLUSTRATE (daijy) |
| |
| PIG-2119: DuplicateForEachColumnRewrite makes assumptions about the position of LOGGenerate in the plan (daijy) |
| |
| PIG-2290: TOBAG wraps tuple parameters in another tuple (ryan.hoegg via thejas) |
| |
| PIG-2288: Pig 0.9 error message not useful as compared to 0.8 in case |
| of group by (vivekp via thejas) |
| |
| PIG-2309: Keyword 'NOT' is wrongly treated as a UDF in split statement (vivekp via thejas) |
| |
| PIG-2307: Jetty version should be updated in .eclipse.templates/.classpath, |
| pig-template.xml and pig.pom as well (zjshen via daijy) |
| |
| PIG-2273: Pig.compileFromFile in embedded python fails when pig script starts with a comment (ddaniels888 via gates) |
| |
| PIG-2278: Wrong version numbers for libraries in eclipse template classpath (azaroth) |
| |
| PIG-2115: Fix Pig HBaseStorage configuration and setup issues (gbowyer@fastmail.co.uk via dvryaboy) |
| |
| PIG-2232: "declare" document contains a typo (daijy) |
| |
| PIG-2055: inconsistent behavior in parser generated during build (thejas) |
| |
| PIG-2185: NullPointerException while Accessing Empty Bag in FOREACH { FILTER } (daijy) |
| |
| PIG-2227: Wrong jars copied into lib directory in e2e tests when invoked from top level (gates) |
| |
| PIG-2219: Pig tests fail if ${user.home}/pigtest/conf does not already exist (cwsteinbach via gates) |
| |
| PIG-2215: Newlines in function arguments still cause exceptions to be thrown (awarring via gates) |
| |
| PIG-2214: InternalSortedBag two-arg constructor doesn't pass bagCount (sallen via gates) |
| |
| PIG-2174: HBaseStorage column filters miss some fields (billgraham via dvryaboy) |
| |
| PIG-2090: re-enable TestGrunt test cases (thejas) |
| |
| PIG-2181: Improvement : for error message when describe misses alias (vivekp via daijy) |
| |
| PIG-2124: Script never ending when joining from the same source (daijy) |
| |
| PIG-2170: NPE thrown during illustrate (thejas) |
| |
| PIG-2186: PigStorage new warnings about missing schema file |
| can be confusing (thejas) |
| |
| PIG-2179: tests in TestLoad are failing (thejas) |
| |
| PIG-2146: POStore.getSchema() returns null because of which PigOutputCommitter |
| is not storing schema while cleanup (thejas) |
| |
| PIG-2027: NPE if Pig don't have permission for log file (daijy) |
| |
| PIG-2171: TestScriptLanguage is broken on trunk (daijy and thejas) |
| |
| PIG-2172: Fix test failure for ant 1.8.x (daijy) |
| |
| PIG-2162: bin/pig should not modify user args (rangadi via thejas) |
| |
| PIG-2060: Fix errors in pig grammars reported by ANTLRWorks (azaroth via thejas) |
| |
| PIG-2156: Limit/Sample with variable does not work if the expression starts |
| with an integer/double (azaroth via thejas) |
| |
| PIG-2130: Piggybank:MultiStorage is not compressing output files (vivekp via daijy) |
| |
| PIG-2147: Support nested tags for XMLLoader (vivekp via daijy) |
| |
| PIG-1890: Fix piggybank unit test TestAvroStorage (kengoodhope via daijy) |
| |
| PIG-2110: NullPointerException in piggybank.evaluation.util.apachelogparser.SearchTermExtractor (dale_jin via daijy) |
| |
| PIG-2144: ClassCastException when using IsEmpty(DIFF()) (thejas) |
| |
| PIG-2139: LogicalExpressionSimplifier optimizer rule should check if udf is |
| deterministic while checking if they are equal (thejas) |
| |
| PIG-2137: SAMPLE should not be pushed above DISTINCT (dvryaboy and thejas) |
| |
| PIG-2136: Implementation of Sample should use LessThanExpression |
| instead of LessThanEqualExpression (azaroth via thejas) |
| |
| PIG-2140: Usage printed from Main.java gives wrong option for disabling |
| LogicalExpressionSimplifier (thejas) |
| |
| PIG-2120: UDFContext.getClientSystemProps() does not respect pig.properties (dvryaboy) |
| |
| PIG-2129: NOTICE file needs updates (gates) |
| |
| PIG-2131: Add back test for PIG-1769 (qwertymaniac via gates) |
| |
| PIG-2112: ResourceSchema.toString does not properly handle maps in the schema (gates) |
| |
| PIG-1702: Streaming debug output outputs null input-split information (awarring via daijy) |
| |
| PIG-2109: Ant build continues even if the parser classes fail to be generated. (zjshen via daijy) |
| |
| PIG-2071: casting numeric type to chararray during schema merge for union |
| is inconsistent with other schema merge cases (thejas) |
| |
| PIG-2044: Patten match bug in org.apache.pig.newplan.optimizer.Rule (knoguchi via daijy) |
| |
| PIG-2048: Add zookeeper to pig jar (gbowyer via gates) |
| |
| PIG-2008: Cache outputFormat in HBaseStorage (thedatachef via gates) |
| |
| PIG-2025: org.apache.pig.test.udf.evalfunc.TOMAP is missing package |
| declaration (azaroth via gates) |
| |
| PIG-2019: smoketest-jar target has to depend on pigunit-jar to guarantee |
| inclusion of test classes (cos via gates) |
| |
| Release 0.9.3 - Unreleased |
| |
| BUG FIXES |
| |
| PIG-2944: ivysettings.xml does not let you override .m2/repository (raluri via daijy) |
| |
| PIG-2912: Pig should clone JobConf while creating JobContextImpl and TaskAttemptContextImpl in Hadoop23 (rohini via daijy) |
| |
| PIG-2775: Register jar does not goes to classpath in some cases (daijy) |
| |
| PIG-2693: LoadFunc.setLocation should be called before LoadMetadata.getStatistics (billgraham via julien) |
| |
| PIG-2666: LoadFunc.setLocation() is not called when pig script only has Order By (daijy) |
| |
| PIG-2671: e2e harness: Reference local test path via :LOCALTESTPATH: (thw via daijy) |
| |
| PIG-2642: StoreMetadata.storeSchema can't access files in the output directory (Hadoop 0.23) (thw via daijy) |
| |
| PIG-2621: Documentation inaccurate regarding Pig Properties in trunk (prkommireddi via daijy) |
| |
| PIG-2550: Custom tuple results in "Unexpected datatype 110 while reading tuplefrom binary file" while spilling (daijy) |
| |
| PIG-2442: Multiple Stores in pig streaming causes infinite waiting (daijy) |
| |
| PIG-2609: e2e harness: make hdfs base path configurable (outside default.conf) (thw via daijy) |
| |
| PIG-2576: Change in behavior for UDFContext.getUDFContext().getJobConf() in front-end (thw via daijy) |
| |
| PIG-2588: e2e harness: use pig command for cluster deploy (thw via daijy) |
| |
| PIG-2572: e2e harness deploy fails when using pig that does not bundle hadoop (thw via daijy) |
| |
| PIG-2568: PigOutputCommitter hide exception in commitJob (daijy) |
| |
| PIG-2564: Build fails - Hadoop 0.23.1-SNAPSHOT no longer available (thw via daijy) |
| |
| PIG-2535: Bug in new logical plan results in no output for join (daijy) |
| |
| PIG-2534: Pig generating infinite map outputs (daijy) |
| |
| PIG-2493: UNION causes casting issues (vivekp via daijy) |
| |
| PIG-2497: Order of execution of fs, store and sh commands in Pig is not maintained (daijy) |
| |
| Release 0.9.2 |
| |
| IMPROVEMENTS |
| |
| PIG-2766: Pig-HCat Usability (vikram.dixit via daijy) |
| |
| PIG-2125: Make Pig work with hadoop .NEXT (daijy) |
| |
| PIG-2471: Pig Requirements Hadoop (chandec via daijy) |
| |
| PIG-2431: Upgrade bundled hadoop version to 1.0.0 (daijy) |
| |
| PIG-2447: piggybank: get hive dependency from maven (thw via azaroth) |
| |
| PIG-2347: Fix Pig Unit tests for hadoop 23 (daijy) |
| |
| PIG-2128: Generating the jar file takes a lot of time and is unnecessary when running Pig local mode (julien) |
| |
| BUG FIXES |
| |
| PIG-2477: TestBuiltin testLFText/testSFPig failing against 23 due to invalid test setup -- InvalidInputException (phunt via daijy) |
| |
| PIG-2462: getWrappedSplit is incorrectly returning the first split instead of the current split. (arov via daijy) |
| |
| PIG-2472: piggybank unit tests write directly to /tmp (thw via daijy) |
| |
| PIG-2413: e2e test should support testing against two cluster (daijy) |
| |
| PIG-2342: Pig tutorial documentation needs to update about building tutorial (daijy) |
| |
| PIG-2458: Can't have spaces in parameter substitution (jcoveney via daijy) |
| |
| PIG-2410: Piggybank does not compile in 23 (daijy) |
| |
| PIG-2418: rpm release package does not take PIG_CLASSPATH (daijy) |
| |
| PIG-2291: PigStats.isSuccessful returns false if embedded pig script has dump (xutingz via daijy) |
| |
| PIG-2415: A fix for 0.23 local mode: put "yarn-default.xml" into the configuration (daijy) |
| |
| PIG-2402: inIllustrator condition in PigMapReduce is wrong for hadoop 23 (daijy) |
| |
| PIG-2370: SkewedParitioner results in Kerberos error (daijy) |
| |
| PIG-2374: streaming regression with dotNext (daijy) |
| |
| PIG-2387: BinStorageRecordReader causes negative progress (xutingz via daijy) |
| |
| PIG-2354: Several fixes for bin/pig (daijy) |
| |
| PIG-2385: Store statements not getting processed (daijy) |
| |
| PIG-2320: Error: "projection with nothing to reference" (daijy) |
| |
| PIG-2346: TypeCastInsert should not insert Foreach if there is no as statement (daijy) |
| |
| PIG-2339: HCatLoader loads all the partitions in a partitioned table even though |
| a filter clause on the partitions is specified in the Pig script (daijy) |
| |
| PIG-2316: Incorrect results for FILTER *** BY ( *** OR ***) with |
| FilterLogicExpressionSimplifier optimizer turned on (knoguchi via thejas) |
| |
| PIG-2271: PIG regression in BinStorage/PigStorage in 0.9.1 (thejas) |
| |
| Release 0.9.1 |
| |
| IMPROVEMENTS |
| |
| PIG-2284: Add pig-setup-conf.sh script (eyang via daijy) |
| |
| PIG-2272: e2e test harness should be able to set HADOOP_HOME (gates via daijy) |
| |
| PIG-2239: Pig should use "bin/hadoop jar pig-withouthadoop.jar" in bin/pig instead of forming java command itself (daijy) |
| |
| PIG-2213: Pig 0.9.1 Documentation (chandec via daijy) |
| |
| PIG-2221: Couldnt find documentation for ColumnMapKeyPrune optimization rule (chandec via daijy) |
| |
| BUG FIXES |
| |
| PIG-2310: bin/pig fail when both pig-0.9.1.jar and pig.jar are in PIG_HOME (daijy) |
| |
| PIG-1857: Create an package integration project (eyang via daijy) |
| |
| PIG-2013: Penny gets a null pointer when no properties are set (breed via daijy) |
| |
| PIG-2102: MonitoredUDF does not work (dvryaboy) |
| |
| PIG-2152: Null pointer exception while reporting progress (thejas) |
| |
| PIG-2183: Pig not working with Hadoop 0.20.203.0 (daijy) |
| |
| PIG-2193: Using HBaseStorage to scan 2 tables in the same Map job produces bad data (rangadi via dvryaboy) |
| |
| PIG-2199: Penny throws Exception when netty classes are missing (ddaniels888 via daijy) |
| |
| PIG-2223: error accessing column in output schema of udf having project-star input (thejas) |
| |
| PIG-2208: Restrict number of PIG generated Haddop counters (rding via daijy) |
| |
| PIG-2299: jetty 6.1.14 startup issue causes unit tests to fail in CI (thw via daijy) |
| |
| PIG-2301: Some more bin/pig, build.xml cleanup for 0.9.1 (daijy) |
| |
| PIG-2237: LIMIT generates wrong number of records if pig determines no of reducers as more than 1 (daijy) |
| |
| PIG-2261: Restore support for parenthesis in Pig 0.9 (rding via daijy) |
| |
| PIG-2238: Pig 0.9 error message not useful as compared to 0.8 (daijy) |
| |
| PIG-2286: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization (daijy) |
| |
| PIG-2270: Put jython.jar in classpath (daijy) |
| |
| PIG-2274: remove pig deb package dependency on sun-java6-jre (gkesavan via daijy) |
| |
| PIG-2264: Change conf/log4j.properties to conf/log4j.properties.template (daijy) |
| |
| PIG-2231: Limit produce wrong number of records after foreach flatten (daijy) |
| |
| Release 0.9.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-1622: DEFINE streaming options are ill defined and not properly documented (xuefu) |
| |
| PIG-1680: HBaseStorage should work with HBase 0.90 (gstathis, billgraham, dvryaboy, tlipcon via dvryaboy) |
| |
| PIG-1745: Disable converting bytes loading from BinStorage (daijy) |
| |
| PIG-1188: Padding nulls to the input tuple according to input schema (daijy) |
| |
| PIG-1876: Typed map for Pig (daijy) |
| |
| IMPROVEMENTS |
| |
| PIG-1938: support project-range as udf argument (thejas) |
| |
| PIG-2059: PIG doesn't validate incomplete query in batch mode even if -c option is given (xuefu) |
| |
| PIG-2062: Script silently ended (xuefu) |
| |
| PIG-2039: IndexOutOfBounException for a case (xuefu) |
| |
| PIG-2038: Pig fails to parse empty tuple/map/bag constant (xuefu) |
| |
| PIG-1775: Removal of old logical plan (xuefu) |
| |
| PIG-1998: Allow macro to return void (rding) |
| |
| PIG-2003: Using keyward as alias doesn't either emit an error or produce a logical plan (xuefu) |
| |
| PIG-1981: LoadPushDown.pushProjection should pass alias in addition to position (daijy) |
| |
| PIG-2006: Regression: NPE when Pig processes an empty script file, fix test case (xuefu) |
| |
| PIG-2006: Regression: NPE when Pig processes an empty script file (xuefu) |
| |
| PIG-2007: Parsing error when map key referred directly from udf in nested foreach (xuefu) |
| |
| PIG-2000: Pig gives incorrect error message dealing with scalar projection (xuefu) |
| |
| PIG-2002: Regression: Pig gives error "Projection with nothing to reference!" for a valid query (xuefu) |
| |
| PIG-1921: Improve error messages in new parser (xuefu) |
| |
| PIG-1996: Pig new parser fails to recognize PARALLEL keywords in a case (xuefu) |
| |
| PIG-1612: error reporting: PigException needs to have a way to indicate that |
| its message is appropriate for user (laukik via thejas) |
| |
| PIG-1782: Add ability to load data by column family in HBaseStorage (billgraham via dvryaboy) |
| |
| PIG-1772: Pig 090 Documentation (chandec via olgan) |
| |
| PIG-1954: Design deployment interface for e2e test harness (gates) |
| |
| PIG-1881: Need a special interface for Penny (Inspector Gadget) (laukik via |
| gates) |
| |
| PIG-1947: Incorrect line number is reported during parsing(xuefu) |
| |
| PIG1918: Line number should be give for logical plan failures (xuefu) |
| |
| PIG-1961: Pig prints "null" as file name in case of grammar error (xuefu) |
| |
| PIG-1956: Pig parser shouldn't log error code 0 (xuefu) |
| |
| PIG-1957: Pig parser gives misleading error message when the next foreach block has syntactic errors (xuefu) |
| |
| PIG-1958: Regression: Pig doesn't log type cast warning messages (xuefu) |
| |
| PIG-1918: Line number should be give for logical plan failures (xuefu) |
| |
| PIG-1899: Add end to end test harness for Pig (gates) |
| |
| PIG-1932: GFCross should allow the user to set the DEFAULT_PARALLELISM value (gates) |
| |
| PIG-1913: Use a file for excluding tests (tomwhite via gates) |
| |
| PIG-1693: support project-range expression. (was: There |
| needs to be a way in foreach to indicate "and all the |
| rest of the fields" ) (thejas) |
| |
| PIG-1772: Pig 090 Documentation (chandec via daijy) |
| |
| PIG-1830: Type mismatch error in key from map, when doing GROUP on PigStorageSchema() variable (dvryaboy) |
| |
| PIG-1566: Support globbing for registering jars in pig script (nrai via daijy) |
| |
| PIG-1886: Add zookeeper jar to list of jars shipped when HBaseStorage used (dvryaboy) |
| |
| PIG-1874: Make PigServer work in a multithreading environment (rding) |
| |
| PIG-1889: bin/pig should pick up HBase configuration from HBASE_CONF_DIR |
| |
| PIG-1794: Javascript support for Pig embedding and UDFs in scripting languages (julien) |
| |
| PIG-1853: Using ANTLR jars from maven repository (rding) |
| |
| PIG-1728: more doc updates (chandec via olgan) |
| |
| PIG-1793: Add macro expansion to Pig Latin (rding) |
| |
| PIG-847: Setting twoLevelAccessRequired field in a bag schema should not be required to access fields in the tuples of the bag (daijy) |
| |
| PIG-1748: Add load/store function AvroStorage for avro data (guolin2001, jghoman via daijy) |
| |
| PIG-1769: Consistency for HBaseStorage (dvryaboy) |
| |
| PIG-1786: Move describe/nested describe to new logical plan (daijy) |
| |
| PIG-1809: addition of TOMAP function (olgan) |
| |
| PIG-1749: Update Pig parser so that function arguments can contain newline characters (jghoman via daijy) |
| |
| PIG-1806: Modify embedded Pig API for usability (rding) |
| |
| PIG-1799: Provide deployable maven artifacts for pigunit and pig smoke tests |
| (cos via gates) |
| |
| PIG-1728: turing complete docs (chandec via olgan) |
| |
| PIG-1675: allow PigServer to register pig script from InputStream (zjffdu via dvryaboy) |
| |
| PIG-1479: Embed Pig in scripting languages (rding) |
| |
| PIG-946: Combiner optimizer does not optimize when limit follow group, foreach (thejas) |
| |
| PIG-1277: Pig should give error message when cogroup on tuple keys of different inner type (daijy) |
| |
| PIG-1755: Clean up duplicated code in PhysicalOperators (dvryaboy) |
| |
| PIG-750: Use combiner when algebraic UDFs are used in expressions (thejas) |
| |
| PIG-490: Combiner not used when group elements referred to in |
| tuple notation instead of flatten. (thejas) |
| |
| PIG-1768: 09 docs: illustrate (changec via olgan) |
| |
| PIG-1768: docs reorg (changec via olgan) |
| |
| PIG-1712: ILLUSTRATE rework (yanz) |
| |
| PIG-1758: Deep cast of complex type (daijy) |
| |
| PIG-1728: doc updates (chandec via olgan) |
| |
| PIG-1752: Enable UDFs to indicate files to load into the Distributed Cache |
| (gates) |
| |
| PIG-1747: pattern match classes for matching patterns in physical plan (thejas) |
| |
| PIG-1707: Allow pig build to pull from alternate maven repo to enable building |
| against newer hadoop versions (pradeepkth) |
| |
| PIG-1618: Switch to new parser generator technology (xuefuz via thejas) |
| |
| PIG-1531: Pig gobbles up error messages (nrai via hashutosh) |
| |
| PIG-1508: Make 'docs' target (forrest) work with Java 1.6 (cwsteinbach via gates) |
| |
| PIG-1608: pig should always include pig-default.properties and pig.properties in the pig.jar (nrai via daijy) |
| |
| OPTIMIZATIONS |
| |
| PIG-1696: Performance: Use System.arraycopy() instead of manually copying the bytes while reading the data (hashutosh) |
| |
| BUG FIXES |
| |
| PIG-2159: New logical plan uses incorrect class for SUM causing for ClassCastException (daijy) |
| |
| PIG-2106: Fix Zebra unit test TestBasicUnion.testNeg3, TestBasicUnion.testNeg4 (daijy) |
| |
| PIG-2083: bincond ERROR 1025: Invalid field projection when null is used (thejas) |
| |
| PIG-2089: Javadoc for ResourceFieldSchema.getSchema() is wrong (daijy) |
| |
| PIG-2084: pig is running validation for a statement at a time batch mode, |
| instead of running it for whole script (thejas) |
| |
| PIG-2088: Return alias validation failed when there is single line comment in the macro (rding) |
| |
| PIG-2081: Dryrun gives wrong line numbers in error message for scripts containing macro (rding) |
| |
| PIG-2078: POProject.getNext(DataBag) does not handle null (daijy) |
| |
| PIG-2029: Inconsistency in Pig Stats reports (rding) |
| |
| PIG-2070: "Unknown" appears in error message for an error case (thejas) |
| |
| PIG-2069: LoadFunc jar does not ship to backend in MultiQuery case (rding) |
| |
| PIG-2076: update documentation, help command with correct default value |
| of pig.cachedbag.memusage (thejas) |
| |
| PIG-2072: NPE when udf has project-star argument and input schema is null (thejas) |
| |
| PIG-2075: Bring back TestNewPlanPushUpFilter (daijy) |
| |
| PIG-1827: When passing a parameter to Pig, if the value contains $ it has to be escaped for no apparent reason (rding) |
| |
| PIG-2056: Jython error messages should show script name (rding) |
| |
| PIG-2014: SAMPLE shouldn't be pushed up (dvryaboy) |
| |
| PIG-2058: Macro missing returns clause doesn't give a good error message (rding) |
| |
| PIG-2035: Macro expansion doesn't handle multiple expansions of same macro inside another macro (rding) |
| |
| PIG-2030: Merged join/cogroup does not automatically ship loader (daijy) |
| |
| PIG-2052: Ship guava.jar to backend (daijy) |
| |
| PIG-2012: Comments at the begining of the file throws off line numbers in errors (rding) |
| |
| PIG-2043: Ship antlr-runtime.jar to backend (daijy) |
| |
| PIG-2049: Pig should display TokenMgrError message consistently across all parsers (rding) |
| |
| PIG-2041: Minicluster should make each run independent (daijy) |
| |
| PIG-2040: Move classloader from QueryParserDriver to PigContext (daijy) |
| |
| PIG-1999: Macro alias masker should consider schema context (rding) |
| |
| PIG-1821: UDFContext.getUDFProperties does not handle collisions |
| in hashcode of udf classname (+ arg hashcodes) (thejas) |
| |
| PIG-2028: Speed up multiquery unit tests (rding) |
| |
| PIG-1990: support casting of complex types with empty inner schema |
| to complex type with non-empty inner schema (thejas) |
| |
| PIG-2016: -dot option does not work with explain and new logical plan (daijy) |
| |
| PIG-2018: NPE for co-group with group-by column having complex schema and |
| different load functions for each input (thejas) |
| |
| PIG-2015: Explain writes out logical plan twice (alangates) |
| |
| PIG-2017: consumeMap() fails with EmptyStackException (thedatachef via daijy) |
| |
| PIG-1989: complex type casting should return null on casting failure (daijy) |
| |
| PIG-1826: Unexpected data type -1 found in stream error (daijy) |
| |
| PIG-2004: Incorrect input types passed on to eval function (thejas) |
| |
| PIG-1814: mapred.output.compress in SET statement does not work (daijy) |
| |
| PIG-1976: One more TwoLevelAccess to remove (daijy) |
| |
| PIG-1865: BinStorage/PigStorageSchema cannot load data from a different namenode (daijy) |
| |
| PIG-1910: incorrect schema shown when project-star is used with other projections (daijy) |
| |
| PIG-2005: Discrepancy in the way dry run handles semicolon in macro definition (rding) |
| |
| PIG-1281: Detect org.apache.pig.data.DataByteArray cannot be cast to |
| org.apache.pig.data.Tuple type of errors at Compile Type during |
| creation of logical plan (thejas) |
| |
| PIG-1939: order-by statement should support project-range to-end in |
| any position among the sort columns if input schema is known (thejas) |
| |
| PIG-1978: Secondary sort fail when dereferencing two fields inside foreach (daijy) |
| |
| PIG-1962: Wrong alias assinged to store operator (daijy) |
| |
| PIG-1975: Need to provide backward compatibility for legacy LoadCaster (without bytesToMap(bytes, fieldSchema)) (daijy) |
| |
| PIG-1987: -dryrun does not work with set (rding) |
| |
| PIG-1871: Dont throw exception if partition filters cannot be pushed up. (rding) |
| |
| PIG-1870: HBaseStorage doesn't project correctly (dvryaboy) |
| |
| PIG-1788: relation-as-scalar error messages should indicate the field |
| being used as scalar (laukik via thejas) |
| |
| PIG-1697: NullPointerException if log4j.properties is Used (laukik via daijy) |
| |
| PIG-1929:Type checker failed to catch invalid type comparison (thejas) |
| |
| PIG-1928: Type Checking, incorrect error message (thejas) |
| |
| PIG-1979: New logical plan failing with ERROR 2229: Couldn't find matching uid -1 (daijy) |
| |
| PIG-1897: multiple star projection in a statement does not produce |
| the right plan (thejas) |
| |
| PIG-1917: NativeMapReduce does not Allow Configuration Parameters |
| containing Spaces (thejas) |
| |
| PIG-1974: Lineage need to set for every cast (thejas) |
| |
| PIG-1988: Importing an empty macro file causing NPE (rding) |
| |
| PIG-1977: "Stream closed" error while reading Pig temp files (results of intermediate jobs) (rding) |
| |
| PIG-1963: in nested foreach, accumutive udf taking input from order-by does not get results in order (thejas) |
| |
| PIG-1911: Infinite loop with accumulator function in nested foreach (thejas) |
| |
| PIG-1923: Jython UDFs fail to convert Maps of Integer values back to Pig types (julien) |
| |
| PIG-1944: register javascript UDFs does not work (julien) |
| |
| PIG-1955: PhysicalOperator has a member variable (non-static) Log object that |
| is non-transient, this causes serialization errors (woody via rding) |
| |
| PIG-1964: PigStorageSchema fails if a column value is null (thejas)) |
| |
| PIG-1866: Dereference a bag within a tuple does not work (daijy) |
| |
| PIG-1984: Worng stats shown when there are multiple stores but same file names (rding) |
| |
| PIG-1893: Pig report input size -1 for empty input file (rding) |
| |
| PIG-1868: New logical plan fails when I have complex data types from udf |
| (daijy) |
| |
| PIG-1927: Dereference partial name failed (daijy) |
| |
| PIG-1934: Fix zebra test TestCheckin1, TestCheckin4 (daijy) |
| |
| PIG-1931: Integrate Macro Expansion with New Parser (rding) |
| |
| PIG-1933: Hints such as 'collected' and 'skewed' for "group by" or "join by" |
| should not be treated as tokens. (xuefuz via thejas) |
| |
| PIG-1925: Parser error message doesn't show location of the error or show it |
| as Line 0:0 (xuefuz via gates) |
| |
| PIG-671: typechecker does not throw an error when multiple arguments are |
| passed to COUNT (deepujain via gates) |
| |
| PIG-1152: bincond operator throws parser error (xuefuz via thejas) |
| |
| PIG-1885: SUBSTRING fails when input length less than start (deepujain via |
| gates) |
| |
| PIG-719: store <expr> into 'filename'; should be valid syntax, but does not work (xuefuz via thejas) |
| |
| PIG-1770: matches clause problem with chars that have special meaning in dk.brics - #, @ .. (thejas) |
| |
| PIG-1862: Pig returns exit code 0 for the failed Pig script due to non-existing input directory (rding) |
| |
| PIG-1888: Fix TestLogicalPlanGenerator not use hardcoded path (daijy) |
| |
| PIG-1837: Error while using IsEmpty function (rding) |
| |
| PIG-1884: Change ReadToEndLoader.setLocation not throw UnsupportedOperationException (thejas) |
| |
| PIG-1887: Fix pig-withouthadoop.jar to contains proper jars (daijy) |
| |
| PIG-1779: Wrong stats shown when there are multiple loads but same file names (rding) |
| |
| PIG-1861: The pig script stored in the Hadoop History logs is stored as a concatenated string without whitespace this causes problems when attempting to extract and execute the script (rding) |
| |
| PIG-1829: "0" value seen in PigStat's map/reduce runtime, even when the job is successful (rding) |
| |
| PIG-1856: Custom jar is not packaged with the new job created by LimitAdjuster (rding) |
| |
| PIG-1872: Fix bug in AvroStorage (guolin2001, jghoman via daijy) |
| |
| PIG-1536: use same logic for merging inner schemas in "default union" and |
| "union onschema" (daijy) |
| |
| PIG-1304: Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input (laukik via rding) |
| |
| PIG-1852: Packaging antlr jar with pig.jar (rding via daijy) |
| |
| PIG-1717 pig needs to call setPartitionFilter if schema is null but |
| getPartitionKeys is not (gerritjvv via gates) |
| |
| PIG-313: Error handling aggregate of a computation (daijy) |
| |
| PIG-496: project of bags from complex data causes failures (daijy) |
| |
| PIG-730: problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema (daijy) |
| |
| PIG-767: Schema reported from DESCRIBE and actual schema of inner bags are different (daijy) |
| |
| PIG-1801: Need better error message for Jython errors (rding) |
| |
| PIG-1742: org.apache.pig.newplan.optimizer.Rule.java does not work |
| with plan patterns where leaves/sinks are not siblings (thejas) |
| |
| Release 0.8.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-1518: multi file input format for loaders (yanz via rding) |
| |
| PIG-1249: Safe-guards against misconfigured Pig scripts without PARALLEL keyword (zjffdu vi olgan) |
| |
| IMPROVEMENTS |
| |
| PIG-1561: XMLLoader in Piggybank does not support bz2 or gzip compressed XML files (vivekp via daijy) |
| |
| PIG-1677: modify the repository path of pig artifacts to org/apache/pig in stead or org/apache/hadoop/pig (nrai via olgan) |
| |
| PIG-1600: Docs update (romainr via olgan) |
| |
| PIG-1632: The core jar in the tarball contains the kitchen sink (eli via olgan) |
| |
| PIG-1617: 'group all' should always use one reducer (thejas) |
| |
| PIG-1589: add test cases for mapreduce operator which use distributed cache (thejas) |
| |
| PIG-1548: Optimize scalar to consolidate the part file (rding) |
| |
| PIG-1600: Docs update (chandec via olgan) |
| |
| PIG-1585: Add new properties to help and documentation(olgan) |
| |
| PIG-1399: Filter expression optimizations (yanz via gates) |
| |
| PIG-1531: Pig gobbles up error messages (nrai via hashutosh) |
| |
| PIG-1458: aggregate files for replicated join (rding) |
| |
| PIG-1205: Enhance HBaseStorage-- Make it support loading row key and implement StoreFunc (zjffdu and dvryaboy) |
| |
| PIG-1568: Optimization rule FilterAboveForeach is too restrictive and doesn't |
| handle project * correctly (xuefuz via daijy) |
| |
| PIG-1574: Optimization rule PushUpFilter causes filter to be pushed up out joins (xuefuz via daijy) |
| |
| PIG-1515: Migrate logical optimization rule: PushDownForeachFlatten (xuefuz via daijy) |
| |
| PIG-1321: Logical Optimizer: Merge cascading foreach (xuefuz via daijy) |
| |
| PIG-1483: [piggybank] Add HadoopJobHistoryLoader to the piggybank (rding) |
| |
| PIG-1555: [piggybank] add CSV Loader (dvryaboy) |
| |
| PIG-1501: need to investigate the impact of compression on pig performance (yanz via thejas) |
| |
| PIG-1497: Mandatory rule PartitionFilterOptimizer (xuefuz via daijy) |
| |
| PIG-1514: Migrate logical optimization rule: OpLimitOptimizer (xuefuz via daijy) |
| |
| PIG-1551: Improve dynamic invokers to deal with no-arg methods and array parameters (dvryaboy) |
| |
| PIG-1311: Document audience and stability for remaining interfaces (gates) |
| |
| PIG-506: Does pig need a NATIVE keyword? (aniket486 via thejas) |
| |
| PIG-1510: Add `deepCopy` for LogicalExpressions (swati.j via daijy) |
| |
| PIG-1447: Tune memory usage of InternalCachedBag (thejas) |
| |
| PIG-1505: support jars and scripts in dfs (anhi via rding) |
| |
| PIG-1334: Make pig artifacts available through maven (niraj via rding) |
| |
| PIG-1466: Improve log messages for memory usage (thejas) |
| |
| PIG-1404: added PigUnit, a framework fo building unit tests of Pig Latin scripts (romainr via gates) |
| |
| PIG-1452: to remove hadoop20.jar from lib and use hadoop from the apache maven |
| repo. (rding) |
| |
| PIG-1295: Binary comparator for secondary sort (azaroth via daijy) |
| |
| PIG-1448: Detach tuple from inner plans of physical operator (thejas) |
| |
| PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi |
| via olgan) |
| |
| PIG-103: Shared Job /tmp location should be configurable (niraj via rding) |
| |
| PIG-1496: Mandatory rule ImplicitSplitInserter (yanz via daijy) |
| |
| PIG-346: grant help command cleanup (olgan) |
| |
| PIG-1199: help includes obsolete options (olgan) |
| |
| PIG-1434: Allow casting relations to scalars (aniket486 via rding) |
| |
| PIG-1461: support union operation that merges based on column names (thejas) |
| |
| PIG-1517: Pig needs to support keywords in the package name (aniket486 via olgan) |
| |
| PIG-928: UDFs in scripting languages (aniket486 via daijy) |
| |
| PIG-1509: Add .gitignore file (cwsteinbach via gates) |
| |
| PIG-1478: Add progress notification listener to PigRunner API (rding) |
| |
| PIG-1472: Optimize serialization/deserialization between Map and Reduce and between MR jobs (thejas) |
| |
| PIG-1389: Implement Pig counter to track number of rows for each input files |
| (rding) |
| |
| PIG-1454: Consider clean up backend code (rding) |
| |
| PIG-1333: API interface to Pig (rding) |
| |
| PIG-1405: Need to move many standard functions from piggybank into Pig |
| (aniket486 via daijy) |
| |
| PIG-1427: Monitor and kill runaway UDFs (dvryaboy) |
| |
| PIG-1428: Make a StatusReporter singleton available for incrementing counters (dvryaboy) |
| |
| PIG-972: Make describe work with nested foreach (aniket486 via daijy) |
| |
| PIG-1438: [Performance] MultiQueryOptimizer should also merge DISTINCT jobs |
| (rding) |
| |
| PIG-1441: new test targets (olgan) |
| |
| PIG-282: Custom Partitioner (aniket486 via daijy) |
| |
| PIG-283: Allow to set arbitrary jobconf key-value pairs inside pig program (hashutosh) |
| |
| PIG-1373: We need to add jdiff output to docs on the website (daijy) |
| |
| PIG-1422: Duplicate code in LOPrinter.java (zjffdu) |
| |
| PIG-1420: Make CONCAT act on all fields of a tuple, instead of just the first two fields of a tuple (rjurney via dvryaboy) |
| |
| PIG-1408: Annotate explain plans with aliases (rding) |
| |
| PIG-1410: Make PigServer can handle files with parameters (zjffdu) |
| |
| PIG-1406: Allow to run shell commands from grunt (zjffdu) |
| |
| PIG-1398: Marking Pig interfaces for org.apache.pig.data package (gates) |
| |
| PIG-1396: eclipse-files target in build.xml fails to generate necessary classes in src-gen |
| |
| PIG-1390: Provide a target to generate eclipse-related classpath and files (chaitk via thejas) |
| |
| PIG-1384: Adding contrib javadoc to main Pig javadoc (daijy) |
| |
| PIG-1320: final documentation updates for Pig 0.7.0 (chandec via olgan) |
| |
| PIG-1363: Unnecessary loadFunc instantiations (hashutosh) |
| |
| PIG-1370: Marking Pig interface for org.apache.pig package (gates) |
| |
| PIG-1354: UDFs for dynamic invocation of simple Java methods (dvryaboy) |
| |
| PIG-1316: TextLoader should use Bzip2TextInputFormat for bzip files so that |
| bzip files can be efficiently processed by splitting the files (pradeepkth) |
| |
| PIG-1317: LOLoad should cache results of LoadMetadata.getSchema() for use in |
| subsequent calls to LOLoad.getSchema() or LOLoad.determineSchema() |
| (pradeepkth) |
| |
| PIG-1413: Remove svn:externals reference for test-patch.sh and |
| create a local copy of test-patch.sh (gkesavan) |
| |
| PIG-1302: Include zebra's "pigtest" ant target as a part of pig's |
| ant test target. (gkesavan) |
| |
| PIG-1582: To upgrade commons-logging |
| |
| OPTIMIZATIONS |
| |
| PIG-1353: Map-side joins (ashutoshc) |
| |
| PIG-1309: Map-side Cogroup (ashutoshc) |
| |
| BUG FIXES |
| |
| PIG-2067: FilterLogicExpressionSimplifier removed some branches in some cases (daijy) |
| |
| PIG-2033: Pig returns sucess for the failed Pig script (rding) |
| |
| PIG-1993: PigStorageSchema throw NPE with ColumnPruning (daijy) |
| |
| PIG-1935: New logical plan: Should not push up filter in front of Bincond (daijy) |
| |
| PIG-1912: non-deterministic output when a file is loaded multiple times (daijy) |
| |
| PIG-1892: Bug in new logical plan : No output generated even though there are |
| valid records (daijy) |
| |
| PIG-1808: Error message in 0.8 not much helpful as compared to 0.7 (daijy) |
| |
| PIG-1850: Order by is failing with ClassCastException if schema is undefined |
| for new logical plan in 0.8 (daijy) |
| |
| PIG-1831: Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf (daijy) |
| |
| PIG-1841: TupleSize implemented incorrectly (laukik via daijy) |
| |
| PIG-1843: NPE in schema generation (daijy) |
| |
| PIG-1820: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF (daijy) |
| |
| PIG-1854: Pig returns exit code 0 for the failed Pig script (rding) |
| |
| PIG-1812: Problem with DID_NOT_FIND_LOAD_ONLY_MAP_PLAN (daijy) |
| |
| PIG-1813: Pig 0.8 throws ERROR 1075 while trying to refer a map in the result |
| of eval udf.Works with 0.7 (daijy) |
| |
| PIG-1776: changing statement corresponding to alias after explain , then |
| doing dump gives incorrect result (thejas) |
| |
| PIG-1800: Missing Signature for maven staging release (rding) |
| |
| PIG-1815: pig task retains used instances of PhysicalPlan (thejas) |
| |
| PIG-1785: New logical plan: uid conflict in flattened fields (daijy) |
| |
| PIG-1787: Error in logical plan generated (daijy) |
| |
| PIG-1791: System property mapred.output.compress, but pig-cluster-hadoop-site.xml doesn't (daijy) |
| |
| PIG-1771: New logical plan: Merge schema fail if LoadFunc.getSchema return different schema with "Load...AS" (daijy) |
| |
| PIG-1766: New logical plan: ImplicitSplitInserter should before DuplicateForEachColumnRewrite (daijy) |
| |
| PIG-1762: Logical simplification fails on map key referenced values (yanz) |
| |
| PIG-1761: New logical plan: Exception when bag dereference in the middle of expression (daijy) |
| |
| PIG-1757: After split combination, the number of maps may vary slightly (yanz) |
| |
| PIG-1760: Need to report progress in all databags (rding) |
| |
| PIG-1709: Skewed join use fewer reducer for extreme large key (daijy) |
| |
| PIG-1751: New logical plan: PushDownForEachFlatten fail in UDF with unknown |
| output schema (daijy) |
| |
| PIG-1741: Lineage fail when flatten a bag (daijy) |
| |
| PIG-1739: zero status is returned when pig script fails (yanz) |
| |
| PIG-1738: New logical plan: Optimized UserFuncExpression.getFieldSchema (daijy) |
| |
| PIG-1732: New logical plan: logical plan get confused if we generate the same |
| field twice in ForEach (daijy) |
| |
| PIG-1737: New logical plan: Improve error messages when merge schema fail (daijy) |
| |
| PIG-1725: New logical plan: uidOnlySchema bug in LOGenerate (daijy) |
| |
| PIG-1729: New logical plan: Dereference does not add into plan after deepCopy (daijy) |
| |
| PIG-1721: New logical plan: script fail when reuse foreach inner alias (daijy) |
| |
| PIG-1716: New logical plan: LogToPhyTranslationVisitor should translate the structure for regex optimization (daijy) |
| |
| PIG-1740: Fix SVN location in setup doc (chandec via olgan) |
| |
| PIG-1719: New logical plan: FieldSchema generation for BinCond is wrong (daijy) |
| |
| PIG-1720: java.lang.NegativeArraySizeException during Quicksort (thejas) |
| |
| PIG-1727: Hadoop default config override pig.properties (rding) |
| |
| PIG-1731: Stack Overflows where there are composite logical expressions on UDFs using the new logical plan (yanz) |
| |
| PIG-1723: Need to limit the length of Pig counter names (rding) |
| |
| PIG-1714: Option mapred.output.compress doesn't work in Pig 0.8 but worked in |
| 0.7 (xuefuz via rding) |
| |
| PIG-1715: pig-withouthadoop.jar missing automaton.jar (thejas) |
| |
| PIG-1706: New logical plan: PushDownFlattenForEach fail if flattened field has user defined schema (daijy) |
| |
| PIG-1705: New logical plan: self-join fail for some queries (daijy) |
| |
| PIG-1704: Output Compression is not at work if the output path is absolute and there is a trailing / afte the compression suffix (yanz) |
| |
| PIG-1695: MergeForEach does not carry user defined schema if any one of the merged ForEach has user defined schema (daijy) |
| |
| PIG-1684: Inconsistent usage of store func. (thejas) |
| |
| PIG-1694: union-onschema projects null schema at parsing stage for some queries (thejas) |
| |
| PIG-1685: Pig is unable to handle counters for glob paths ? (daijy) |
| |
| PIG-1683: New logical plan: Nested foreach plan fail if one inner alias is refered more than once (daijy) |
| |
| PIG-1542: log level not propogated to MR task loggers (nrai via daijy) |
| |
| PIG-1673: query with consecutive union-onschema statement errors out (thejas) |
| |
| PIG-1653: Scripting UDF fails if the path to script is an absolute path (daijy) |
| |
| PIG-1669: PushUpFilter fail when filter condition contains scalar (daijy) |
| |
| PIG-1672: order of relations in replicated join gets switched in a query where |
| first relation has two mergeable foreach statements (thejas) |
| |
| PIG-1666: union onschema fails when the input relation has cast from bytearray to another type (thejas) |
| |
| PIG-1655: code duplicated for udfs that were moved from piggybank to builtin (nrai via daijy) |
| |
| PIG-1670: pig throws ExecException in stead of FrontEnd exception when the plan validation fails (nrai via daijy) |
| |
| PIG-1668: Order by failed with RuntimeException (rding) |
| |
| PIG-1659: sortinfo is not set for store if there is a filter after ORDER BY (daijy) |
| |
| PIG-1664: leading '_' in directory/file names should be ignored; the "pigtest" build target should include all pig-related zebra tests. (yanz) |
| |
| PIG-1662: Need better error message for MalFormedProbVecException (rding) |
| |
| PIG-1656: TOBAG udfs ignores columns with null value; it does not use input type |
| to determine output schema (thejas) |
| |
| PIG-1658: ORDER BY does not work properly on integer/short keys that are -1 (yanz) |
| |
| PIG-1638: sh output gets mixed up with the grunt prompt (nrai via daijy) |
| |
| PIG-1607: pig should have separate javadoc.jar in the maven |
| repository (nrai via thejas) |
| |
| PIG-1651: PIG class loading mishandled (rding) |
| |
| PIG-1650: pig grunt shell breaks for many commands like perl , awk , |
| pipe , 'ls -l' etc (nrai via thejas) |
| |
| PIG-1649: FRJoin fails to compute number of input files for replicated |
| input (thejas) |
| |
| PIG-1637: Combiner not use because optimizor inserts a foreach between group |
| and algebric function (daijy) |
| |
| PIG-1648: Split combination may return too many block locations to map/reduce framework (yanz) |
| |
| PIG-1641: Incorrect counters in local mode (rding) |
| |
| PIG-1647: Logical simplifier throws a NPE (yanz) |
| |
| PIG-1642: Order by doesn't use estimation to determine the parallelism (rding) |
| |
| PIG-1644: New logical plan: Plan.connect with position is misused in some |
| places (daijy) |
| |
| PIG-1643: join fails for a query with input having 'load using pigstorage |
| without schema' + 'foreach' (daijy) |
| |
| PIG-1645: Using both small split combination and temporary file compression on a query of ORDER BY may cause crash (yanz) |
| |
| PIG-1635: Logical simplifier does not simplify away constants under AND and OR; after simplificaion the ordering of operands of |
| AND and OR may get changed (yanz) |
| |
| PIG-1639: New logical plan: PushUpFilter should not push before group/cogroup |
| if filter condition contains UDF (xuefuz via daijy) |
| |
| PIG-1643: join fails for a query with input having 'load using pigstorage |
| without schema' + 'foreach' (thejas) |
| |
| PIG-1628: log this message at debug level : 'Pig Internal storage in use' (thejas) |
| |
| PIG-1636: Scalar fail if the scalar variable is generated by limit (daijy) |
| |
| PIG-1605: PIG-1605: Adding soft link to plan to solve input file dependency |
| (daijy) |
| |
| PIG-1598: Pig gobbles up error messages - Part 2 (nrai via daijy) |
| |
| PIG-1616: 'union onschema' does not use create output with correct schema |
| when udfs are involved (thejas) |
| |
| PIG-1610: 'union onschema' does handle some cases involving 'namespaced' |
| column names in schema (thejas) |
| |
| PIG-1609: 'union onschema' should give a more useful error message when |
| schema of one of the relations has null column name (thejas) |
| |
| PIG-1562: Fix the version for the dependent packages for the maven (nrai via |
| rding) |
| |
| PIG-1604: 'relation as scalar' does not work with complex types (thejas) |
| |
| PIG-1601: Make scalar work for secure hadoop (daijy) |
| |
| PIG-1602: The .classpath of eclipse template still use hbase-0.20.0 (zjffdu) |
| |
| PIG-1596: NPE's thrown when attempting to load hbase columns containing null values (zjffdu) |
| |
| PIG-1597: Development snapshot jar no longer picked up by bin/pig (dvryaboy) |
| |
| PIG-1599: pig gives generic message for few cases (nrai via rding) |
| |
| PIG-1595: casting relation to scalar- problem with handling of data from non PigStorage loaders (thejas) |
| |
| PIG-1591: pig does not create a log file, if tje MR job succeeds but front end fails (nrai via daijy) |
| |
| PIG-1543: IsEmpty returns the wrong value after using LIMIT (daijy) |
| |
| PIG-1550: better error handling in casting relations to scalars (thejas) |
| |
| PIG-1572: change default datatype when relations are used as scalar to bytearray (thejas) |
| |
| PIG-1583: piggybank unit test TestLookupInFiles is broken (daijy) |
| |
| PIG-1563: some of string functions don't work on bytearrays (olgan) |
| |
| PIG-1569: java properties not honored in case of properties such as |
| stop.on.failure (rding) |
| |
| PIG-1570: native mapreduce operator MR job does not follow same failure handling logic as other pig MR jobs (thejas) |
| |
| PIG-1343: pig_log file missing even though Main tells it is creating one and |
| an M/R job fails (nrai via rding) |
| |
| PIG-1482: Pig gets confused when more than one loader is involved (xuefuz via thejas) |
| |
| PIG-1579: Intermittent unit test failure for TestScriptUDF.testPythonScriptUDFNullInputOutput (daijy) |
| |
| PIG-1557: couple of issue mapping aliases to jobs (rding) |
| |
| PIG-1552: Nested describe failed when the alias is not referred in the first foreach inner plan (aniket486 via daijy) |
| |
| PIG-1486: update ant eclipse-files target to include new jar and remove contrib dirs from build path (thejas) |
| |
| PIG-1524: 'Proactive spill count' is misleading (thejas) |
| |
| PIG-1546: Incorrect assert statements in operator evaluation (ajaykidave via |
| pradeepkth) |
| |
| PIG-1392: Parser fails to recognize valid field (niraj via rding) |
| |
| PIG-1541: FR Join shouldn't match null values (rding) |
| |
| PIG-1525: Incorrect data generated by diff of SUM (rding) |
| |
| PIG-1288: EvalFunc returnType is wrong for generic subclasses (daijy) |
| |
| PIG-1534: Code discovering UDFs in the script has a bug in a order by case |
| (pradeepkth) |
| |
| PIG-1533: Compression codec should be a per-store property (rding) |
| |
| PIG-1527: No need to deserialize UDFContext on the client side (rding) |
| |
| PIG-1516: finalize in bag implementations causes pig to run out of memory in reduce (thejas) |
| |
| PIG-1521: explain plan does not show correct Physical operator in MR plan when POSortedDistinct, POPackageLite are used (thejas) |
| |
| PIG-1513: Pig doesn't handle empty input directory (rding) |
| |
| PIG-1500: guava.jar should be removed from the lib folder (niraj via rding) |
| |
| PIG-1034: Pig does not support ORDER ... BY group alias (zjffdu) |
| |
| PIG-1445: Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented (daijy) |
| |
| PIG-348: -j command line option doesn't work (rding) |
| |
| PIG-1487: Replace "bz" with ".bz" in all the LoadFunc |
| |
| PIG-1489: Pig MapReduceLauncher does not use jars in register statement |
| (rding) |
| |
| PIG-1435: make sure dependent jobs fail when a jon in multiquery fails (niraj |
| via rding) |
| |
| PIG-1492: DefaultTuple and DefaultMemory understimate their memory footprint (thejas) |
| |
| PIG-1409: Fix up javadocs for org.apache.pig.builtin (gates) |
| |
| PIG-1490: Make Pig storers work with remote HDFS in secure mode (rding) |
| |
| PIG-1469: DefaultDataBag assumes ArrayList as default List type (azaroth via dvryaboy) |
| |
| PIG-1467: order by fail when set "fs.file.impl.disable.cache" to true (daijy) |
| |
| PIG-1463: Replace "bz" with ".bz" in setStoreLocation in PigStorage (zjffdu) |
| |
| PIG-1221: Filter equality does not work for tuples (zjffdu) |
| |
| PIG-1456: TestMultiQuery takes a long time to run (rding) |
| |
| PIG-1457: Pig will run complete zebra test even we give -Dtestcase=xxx (daijy) |
| |
| PIG-1450: TestAlgebraicEvalLocal failures due to OOM (daijy) |
| |
| PIG-1433: pig should create success file if |
| mapreduce.fileoutputcommitter.marksuccessfuljobs is true (pradeepkth) |
| |
| PIG-1347: Clear up output directory for a failed job (daijy) |
| |
| PIG-1419: Remove "user.name" from JobConf (daijy) |
| |
| PIG-1359: bin/pig script does not pick up correct jar libraries (zjffdu) |
| |
| PIG-566: Dump and store outputs do not match for PigStorage (azaroth via daijy) |
| |
| PIG-1414: Problem with parameter substitution (rding) |
| |
| PIG-1407: Logging starts before being configured (azaroth via daijy) |
| |
| PIG-1391: pig unit tests leave behind files in temp directory because |
| MiniCluster files don't get deleted (thejas) |
| |
| PIG-1211: Pig script runs half way after which it reports syntax error |
| (pradeepkth) |
| |
| PIG-1401: "explain -script <script file>" executes grunt commands like |
| run/dump/copy etc - explain -script should not execute any grunt command and |
| only explain the query plans (pradeepkth) |
| |
| PIG-1303: Inconsistent instantiation of parametrized UDFs (jrussek and dvryaboy) |
| |
| 740 : Incorrect line number is generated when a string with double quotes is |
| used instead of single quotes and is passed to UDF (pradeepkth) |
| |
| 1378: har url not usable in Pig scripts (pradeepkth) |
| |
| PIG-1395: Mapside cogroup runs out of memory (ashutoshc) |
| |
| PIG-1383: Remove empty svn directorirs from source tree (rding) |
| |
| PIG-1348: PigStorage making unnecessary byte array copy when storing data |
| (rding) |
| |
| PIG-1372: Restore PigInputFormat.sJob for backward compatibility (pradeepkth) |
| |
| PIG-1369: POProject does not handle null tuples and non existent fields in |
| some cases (pradeepkth) |
| |
| PIG-1364: Public javadoc on apache site still on 0.2, needs to be updated for each version release (gates) |
| |
| PIG-1338: Pig should exclude hadoop conf in local mode (daijy) |
| |
| PIG-1299: Implement Pig counter to track number of output rows for each output |
| files (rding) |
| |
| PIG-1366: PigStorage's pushProjection implementation results in NPE under |
| certain data conditions (pradeepkth) |
| |
| PIG-1365: WrappedIOException is missing from Pig.jar (pradeepkth) |
| |
| PIG-1313: PigServer leaks memory over time (billgraham via daijy) |
| |
| PIG-1346: In unit tests Util.executeShellCommand relies on java commands being |
| in the path and does not consider JAVA_HOME (pradeepkth) |
| |
| PIG-1352: piggybank UPPER udf throws exception if argument is null |
| |
| PIG-1560: Fix ant target checkstyle (gkesavan) |
| |
| Release 0.7.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-1292: Interface Refinements (ashutoshc) |
| |
| PIG-1259: ResourceFieldSchema.setSchema should not allow a bag field without a |
| Tuple as its only sub field (the tuple itself can have a schema with > 1 |
| subfields) (pradeepkth) |
| |
| PIG-1265: Change LoadMetadata and StoreMetadata to use Job instead of |
| Configuraiton and add a cleanupOnFailure method to StoreFuncInterface |
| (pradeepkth) |
| |
| PIG-1250: Make StoreFunc an abstract class and create a mirror interface |
| called StoreFuncInterface (pradeepkth) |
| |
| PIG-1234: Unable to create input slice for har:// files (pradeepkth) |
| |
| PIG-1200: Using TableInputFormat in HBaseStorage (zjffdu via pradeepkth) |
| |
| PIG-1148: Move splitable logic from pig latin to InputFormat (zjffdu via |
| pradeepkth) |
| |
| PIG-1141: Make streaming work with the new load-store interfaces (rding via |
| pradeepkth) |
| |
| PIG-1110: Handle compressed file formats -- Gz, BZip with the new proposal |
| (rding via pradeepkth) |
| |
| PIG-1088: change merge join and merge join indexer to work with new LoadFunc |
| interface (thejas via pradeepkth) |
| |
| PIG-879: Pig should provide a way for input location string in load statement |
| to be passed as-is to the Loader (rding via pradeepkth) |
| |
| PIG-966: load-store-redesign branch: change SampleLoader and subclasses to |
| work with new LoadFunc interface (thejas via pradeepkth) |
| |
| PIG-1094: Fix unit tests corresponding to source changes so far (pradeepkth) |
| |
| PIG-1090: Update sources to reflect recent changes in load-store interfaces |
| (pradeepkth) |
| |
| PIG-1072: ReversibleLoadStoreFunc interface should be removed to enable |
| different load and store implementation classes to be used in a reversible |
| manner (rding via pradeepkth) |
| |
| IMPROVEMENTS |
| |
| PIG-1381: Need a way for Pig to take an alternative property file (daijy) |
| |
| PIG-1330: Move pruned schema tracking logic from LoadFunc to core code (daijy) |
| |
| PIG-1320: more documentation updates for Pig 0.7.0 (chandec via olgan) |
| |
| PIG-1320: documentation updates for Pig 0.7.0 (chandec via olgan) |
| |
| PIG-1325: Provide a way to exclude a testcase when running "ant test" |
| (pradeepkth) |
| |
| PIG-1312: Make Pig work with hadoop security (daijy) |
| |
| PIG-1308: Inifinite loop in JobClient when reading from BinStorage Message: |
| [org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to |
| process : 2] (pradeepkth) |
| |
| PIG-1285: Allow SingleTupleBag to be serialized (dvryaboy) |
| |
| PIG-1117: Pig reading hive columnar rc tables (gerritjvv via dvryaboy) |
| |
| PIG-1287: Use hadoop-0.20.2 with pig 0.7.0 release (pradeepkth) |
| |
| PIG-1257: PigStorage per the new load-store redesign should support splitting |
| of bzip files (pradeepkth) |
| |
| PIG-1290: WeightedRangePartitioner should not check if input is empty if |
| quantile file is empty (pradeepkth) |
| |
| PIG-1262: Additional findbugs and javac warnings (daijy) |
| |
| PIG-1248: [piggybank] some useful String functions (dvryaboy) |
| |
| PIG-1251: Move SortInfo calculation earlier in compilation (ashutoshc) |
| |
| PIG-1233: NullPointerException in AVG (ankur via olgan) |
| |
| PIG-1218: Use distributed cache to store samples (rding via pradeepkth) |
| |
| PIG-1226: suuport for additional jar files (thejas via olgan) |
| |
| PIG-1230: Streaming input in POJoinPackage should use nonspillable bag to |
| collect tuples (ashutoshc) |
| |
| PIG-1224: Collected group should change to use new (internal) bag (ashutoshc) |
| |
| PIG-1046: join algorithm specification is within double quotes (ashutoshc) |
| |
| PIG-1209: Port POJoinPackage to proactively spill (ashutoshc) |
| |
| PIG-1190: Handling of quoted strings in pig-latin/grunt commands (ashutoshc) |
| |
| PIG-1214: Pig 0.6 Docs fixes (chandec via olgan) |
| |
| PIG-977: exit status does not account for JOB_STATUS.TERMINATED (ashutoshc) |
| |
| PIG-1192: Pig 0.6 Docs fixes (chandec via olgan) |
| |
| PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan) |
| |
| PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan) |
| |
| PIG-1102: Collect number of spills per job (sriranjan via olgan) |
| |
| PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs |
| (dvryaboy via pradeepkth) |
| |
| PIG-1162: Pig 0.6.0 - UDF doc (chandec via olgan) |
| |
| PIG-1163: Pig/Zebra 0.6.0 release (chandec via olgan) |
| |
| PIG-1156: Add aliases to ExecJobs and PhysicalOperators (dvryaboy via gates) |
| |
| PIG-1161: add missing license headers (dvryaboy via olgan) |
| |
| PIG-760: Add a new PigStorageSchema load/store function that |
| store schemas for text files (dvryaboy via gates) |
| |
| PIG-1106: FR join should not spill (ankit.modi via olgan) |
| |
| PIG-1147: Zebra Docs for Pig 0.6.0 (chandec via olgan) |
| |
| PIG-1129: Pig UDF doc: fieldsToRead function (chandec via olgan) |
| |
| PIG-978: MQ docs update (chandec via olgan) |
| |
| PIG-990: Provide a way to pin LogicalOperator Options (dvryaboy via gates) |
| |
| PIG-1103: refactoring of commit tests (olgan) |
| |
| PIG-1101: Allow arugment to limit to be long in addition to int (ashutoshc via |
| gates) |
| |
| PIG-872: use distributed cache for the replicated data set in FR join |
| (sriranjan via olgan) |
| |
| PIG-1053: Consider moving to Hadoop for local mode (ankit.modi via olgan) |
| |
| PIG-1085: Pass JobConf and UDF specific configuration information to UDFs |
| (gates) |
| |
| PIG-1173: pig cannot be built without an internet connection (jmhodges via daijy) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-1507: Full outer join fails while doing a filter on joined data (daijy) |
| |
| PIG-1493: Column Pruner throw exception "inconsistent pruning" (daijy) |
| |
| PIG-1484: BinStorage should support comma seperated path (daijy) |
| |
| PIG-1443: DefaultTuple underestimate the memory footprint for string (daijy) |
| |
| PIG-1446: OOME in a query having a bincond in the inner plan of a Foreach.(hashutosh) |
| |
| PIG-1415: LoadFunc signature is not correct in LoadFunc.getSchema sometimes (daijy) |
| |
| PIG-1403: Make Pig work with remote HDFS in secure mode (daijy) |
| |
| PIG-1394: POCombinerPackage hold too much memory for InternalCachedBag (daijy) |
| |
| PIG-1374: PushDownForeachFlatten shall not push ForEach below Join if the flattened fields is used in the next statement (daijy) |
| |
| PIG-1336: Optimize POStore serialized into JobConf (daijy) |
| |
| PIG-1335: UDFFinder should find LoadFunc used by POCast (daijy) |
| |
| PIG-1307: when we spill the DefaultDataBag we are not setting the sized changed flag to be true. (breed via daijy) |
| |
| PIG-1298: Restore file traversal behavior to Pig loaders (rding) |
| |
| PIG-1289: PIG Join fails while doing a filter on joined data (daijy) |
| |
| PIG-1266: Show spill count on the pig console at the end of the job (sriranjan |
| via rding) |
| |
| PIG-1296: Skewed join fail due to negative partition index (daijy) |
| |
| PIG-1293: pig wrapper script tends to fail if pig is in the path and PIG_HOME |
| isn't set (aw via gates) |
| |
| PIG-1272: Column pruner causes wrong results (daijy) |
| |
| PIG-1275: empty bag in PigStorage read as null (daijy) |
| |
| PIG-1252: Diamond splitter does not generate correct results when using |
| Multi-query optimization (rding) |
| |
| PIG-1260: Param Subsitution results in parser error if there is no EOL after |
| last line in script (rding) |
| |
| PIG-1238: Dump does not respect the schema (rding) |
| |
| PIG-1261: PigStorageSchema broke after changes to ResourceSchema (dvryaboy via |
| daijy) |
| |
| PIG-1053: Put pig.properties back into release distribution (gates). |
| |
| PIG-1273: Skewed join throws error (rding) |
| |
| PIG-1267: Problems with partition filter optimizer (rding) |
| |
| PIG-1079: Modify merge join to use distributed cache to maintain the index |
| (rding) |
| |
| PIG-1241: Accumulator is turned on when a map is used with a non-accumulative |
| UDF (yinghe vi olgan) |
| |
| PIG-1215: Make Hadoop jobId more prominent in the client log (ashutoshc) |
| |
| PIG-1216: New load store design does not allow Pig to validate inputs and |
| outputs up front (ashutoshc via pradeepkth) |
| |
| PIG-1239: PigContext.connect() should not create a jobClient and jobClient |
| should be created on demand when needed (pradeepkth) |
| |
| PIG-1169: Top-N queries produce incorrect results when a store statement is added between order by and limit statement (rding) |
| |
| PIG-1131: Pig simple join does not work when it contains empty lines (ashutoshc) |
| |
| PIG-834: incorrect plan when algebraic functions are nested (ashutoshc) |
| |
| PIG-1217: Fix argToFuncMapping in Piggybank Top function (dvryaboy via gates) |
| |
| PIG-1154: Local Mode fails when hadoop config directory is specified in |
| classpath (ankit.modi via gates) |
| |
| PIG-1124: Unable to set Custom Job Name using the -Dmapred.job.name parameter (ashutoshc) |
| |
| PIG-1213: Schema serialization is broken (pradeepkth) |
| |
| PIG-1194: ERROR 2055: Received Error while processing the map plan (rding via ashutoshc) |
| |
| PIG-1204: Pig hangs when joining two streaming relations in local mode |
| (rding) |
| |
| PIG-1191: POCast throws exception for certain sequences of LOAD, FILTER, |
| FORACH (pradeepkth via gates) |
| |
| PIG-1171: Top-N queries produce incorrect results when followed by a cross statement (rding via olgan) |
| |
| PIG-1159: merge join right side table does not support comma seperated paths |
| (rding via olgan) |
| |
| PIG-1158: pig command line -M option doesn't support table union correctly |
| (comma seperated paths) (rding via olgan) |
| |
| PIG-1143: Poisson Sample Loader should compute the number of samples required |
| only once (sriranjan via olgan) |
| |
| PIG-1157: Sucessive replicated joins do not generate Map Reduce plan and fails |
| due to OOM (rding via olgan) |
| |
| PIG-1075: Error in Cogroup when key fields types don't match (rding via olgan) |
| |
| PIG-973: type resolution inconsistency (rding via olgan) |
| |
| PIG-1135: skewed join partitioner returns negative partition index (yinghe |
| via olgan) |
| |
| PIG-1134: Skewed Join sampling job overwhelms the name node (sriranjan via |
| olgan) |
| |
| PIG-1105: COUNT_STAR accumulate interface implementation cases failure |
| (sriranjan via olgan) |
| |
| PIG-1118: expression with aggregate functions returning null, with accumulate |
| interface (yinghe via olgan) |
| |
| PIG-1068: COGROUP fails with 'Type mismatch in key from map: expected |
| org.apache.pig.impl.io.NullableText, recieved |
| org.apache.pig.impl.io.NullableTuple' (rding via gates) |
| |
| PIG-1113: Diamond query optimization throws error in JOIN (rding via olgan) |
| |
| PIG-1116: Remove redundant map-reduce job for merge join (pradeepkth) |
| |
| PIG-1114: MultiQuery optimization throws error when merging 2 level spl (rding |
| via olgan) |
| |
| PIG-1108: Incorrect map output key type in MultiQuery optimiza (rding via |
| olgan) |
| |
| PIG-1022: optimizer pushes filter before the foreach that generates column |
| used by filter (daijy via gates) |
| |
| PIG-1107: PigLineRecordReader bails out on an empty line for compressed data |
| (ankit.modi via olgan) |
| |
| PIG-598: Parameter substitution ($PARAMETER) should not be performed in |
| comments (thejas via olgan) |
| |
| PIG-1064: Behaviour of COGROUP with and without schema when using "*" operator |
| (pradeepkth) |
| |
| PIG-965: PERFORMANCE: optimize common case in matches (PORegex) (ankit.modi |
| via ) |
| |
| PIG-1086: Nested sort by * throw exception (rding via daijy) |
| |
| PIG-1146: Inconsistent column pruning in LOUnion (daijy) |
| |
| PIG-1176: Column Pruner issues in union of loader with and without schema |
| (daijy) |
| |
| PIG-1184: PruneColumns optimization does not handle the case of foreach |
| flatten correctly if flattened bag is not used later (daijy) |
| |
| PIG-1189: StoreFunc UDF should ship to the backend automatically without |
| "register" (daijy) |
| |
| PIG-1212: LogicalPlan.replaceAndAddSucessors produce wrong result when successors are null (daijy) |
| |
| PIG-1255: Tiny code cleanup for serialization code for PigSplit (daijy) |
| |
| PIG-613: Casting elements inside a tuple does not take effect (daijy) |
| |
| Release 0.6.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-922: Logical optimizer: push up project (daijy) |
| |
| IMPROVEMENTS |
| |
| PIG-1084: Pig 0.6.0 Documentation improvements (chandec via olgan) |
| |
| PIG-1089: Pig 0.6.0 Documentation (chandec via olgan) |
| |
| PIG-958: Splitting output data on key field (ankur via pradeepkth) |
| |
| PIG-1058: FINDBUGS: remaining "Correctness Warnings" (olgan) |
| |
| PIG-1036: Fragment-replicate left outer join (ankit.modi via pradeepkth) |
| |
| PIG-920: optimizing diamond queries (rding via pradeepkth) |
| |
| PIG-1040: FINDBUGS: MS_SHOULD_BE_FINAL: Field isn't final but should be (olgan) |
| |
| PIG-1059: FINDBUGS: remaining Bad practice + Multithreaded correctness Warning (olgan) |
| |
| PIG-953: Enable merge join in pig to work with loaders and store functions |
| which can internally index sorted data (pradeepkth) |
| |
| PIG-1055: FINDBUGS: remaining "Dodgy Warnings" (olgan) |
| |
| PIG-1052: FINDBUGS: remaining performance warningse(olgan) |
| |
| PIG-1037: Converted sorted and distinct bags to use the new active spilling |
| paradigm (yinghe via gates) |
| |
| PIG-1051: FINDBUGS: NP_NULL_ON_SOME_PATH_FROM_RETURN_VALUE (olgan) |
| |
| PIG-1050: FINDBUGS: DLS_DEAD_LOCAL_STORE: Dead store to local variable (olgan) |
| |
| PIG-1045: Integration with Hadoop 20 New API (rding via pradeepkth) |
| |
| PIG-1043: FINDBUGS: SIC_INNER_SHOULD_BE_STATIC: Should be a static inner class |
| (olgan) |
| |
| PIG-1047: FINDBUGS: URF_UNREAD_FIELD: Unread field (olgan) |
| |
| PIG-1032: FINDBUGS: DM_STRING_CTOR: Method invokes inefficient new |
| String(String) constructor (olgan) |
| |
| PIG-984: Add map side grouping for data that is already collected when |
| it is read into the map (rding via gates) |
| |
| PIG-1025: Add ability to set job priority from Pig Latin script (kevinweil via |
| gates) |
| |
| PIG-1028: FINDBUGS: DM_NUMBER_CTOR: Method invokes inefficient Number |
| constructor; use static valueOf instead (olgan) |
| |
| PIG-1012: FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in |
| serializable class (olgan) |
| |
| PIG-1013: FINDBUGS: DMI_INVOKING_TOSTRING_ON_ARRAY: Invocation of toString on |
| an array (olgan) |
| |
| PIG-1011: FINDBUGS: SE_NO_SERIALVERSIONID: Class is Serializable, but doesn't |
| define serialVersionUID (olgan) |
| |
| PIG-1009: FINDBUGS: OS_OPEN_STREAM: Method may fail to close stream (olgan) |
| |
| PIG-1008: FINDBUGS: NP_TOSTRING_COULD_RETURN_NULL (olgan) |
| |
| PIG-1018: FINDBUGS: NM_FIELD_NAMING_CONVENTION: Field names should start with |
| a lower case letter (olgan) |
| |
| PIG-1023: FINDBUGS: exclude CN_IDIOM_NO_SUPER_CALL (olgan) |
| |
| PIG-1019: added findbugs exclusion file (olgan) |
| |
| PIG-983: PERFORMANCE: multi-query optimization on multiple group bys |
| following a join or cogroup (rding via pradeepkth) |
| |
| PIG-975: Need a databag that does not register with SpillableMemoryManager and |
| spill data pro-actively (yinghe via olgan) |
| |
| PIG-891: Fixing dfs statement for Pig (zjffdu via daijy |
| |
| PIG-956: 10 minute commit tests (olgan) |
| |
| PIG-948: [Usability] Relating pig script with MR jobs (ashutoshc via daijy) |
| |
| PIG-960: Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage ( ankit.modi via daijy) |
| |
| PIG-1020: Include an ant target to build pig.jar without hadoop libraries (daijy) |
| |
| PIG-1033: javac warnings: deprecated hadoop APIs (daijy) |
| |
| PIG-1041: javac warnings: cast, fallthrough, serial (daijy) |
| |
| PIG-1042: javac warnings: unchecked (daijy) |
| |
| PIG-1038: Optimize nested distinct/sort to use secondary key (daijy) |
| |
| PIG-979: Acummulator Interface for UDFs (yinghe via daijy) |
| |
| OPTIMIZATIONS |
| |
| PIG-922: Logical optimizer: push up project (daijy) |
| |
| BUG FIXES |
| |
| PIG-1080: PigStorage may miss records when loading a file (rding via olgan) |
| |
| PIG-1071: Support comma separated file/directory names in load statements |
| (rding via pradeepkth) |
| |
| PIG-970: Changes to make HBase loader work with HBase 0.20 (vbarat and zjffdu |
| via gates) |
| |
| PIG-1035: support for skewed outer join (sriranjan via pradeepkth) |
| |
| PIG-1030: explain and dump not working with two UDFs inside inner plan of |
| foreach (rding via pradeepkth) |
| |
| PIG-1048: inner join using 'skewed' produces multiple rows for keys with |
| single row in both input relations (sriranjan via gates) |
| |
| PIG-1063: Pig does not call checkOutSpecs() on OutputFormat provided by |
| StoreFunc in the multistore case (pradeepkth) |
| |
| PIG-746: Works in --exectype local, fails on grid - ERROR 2113: SingleTupleBag |
| should never be serialized (rding via pradeepkth) |
| |
| PIG-1027: Number of bytes written are always zero in local mode (zjffdu via gates) |
| |
| PIG-976: Multi-query optimization throws ClassCastException (rding via |
| pradeepkth) |
| |
| PIG-858: Order By followed by "replicated" join fails while compiling MR-plan |
| from physical plan (ashutoshc via gates) |
| |
| PIG-968: Fix findContainingJar to work properly when there is a + in the jar |
| path (tlipcon via gates) |
| |
| PIG-738: Regexp passed from pigscript fails in UDF (pradeepkth) |
| |
| PIG-942: Maps are not implicitly casted (pradeepkth) |
| |
| PIG-513: Removed unecessary bounds check in DefaultTuple (ashutoshc via |
| gates) |
| |
| PIG-951: Set parallelism explicitly to 1 for indexing job in merge join |
| (ashutoshc via gates) |
| |
| PIG-592: schema inferred incorrectly (daijy) |
| |
| PIG-989: Allow type merge between numerical type and non-numerical type (daijy) |
| |
| PIG-894: order-by fails when input is empty (daijy) |
| |
| PIG-995: Limit Optimizer throw exception "ERROR 2156: Error while fixing projections" (daijy) |
| |
| PIG-1000: InternalCachedBag.java generates javac warning and findbug warning (yinghe via daijy) |
| |
| PIG-921: Strange use case for Join which produces different results in local and map reduce mode (daijy) |
| |
| PIG-1024: Script contains nested limit fail due to "LOLimit does not support multiple outputs" (daijy) |
| |
| PIG-644: Duplicate column names in foreach do not throw parser error (daijy) |
| |
| PIG-927: null should be handled consistently in Join (daijy) |
| |
| PIG-790: Error message should indicate in which line number in the Pig script the error occured (debugging BinCond) (daijy) |
| |
| PIG-1001: Generate more meaningful error message when one input file does not exist (daijy) |
| |
| PIG-1060: MultiQuery optimization throws error for multi-level splits (rding via daijy) |
| |
| PIG-1128: column pruning causing failure when foreach has user-specified |
| schema (daijy) |
| |
| PIG-1127: Logical operator should contains individual copy of schema object |
| (daijy) |
| |
| PIG-1133: UDFContext should be made available to LoadFunc.bindTo (daijy) |
| |
| PIG-1132: Column Pruner issues in dealing with unprunable loader (daijy) |
| |
| PIG-1142: Got NullPointerException merge join with pruning (daijy) |
| |
| PIG-1155: Need to make sure existing loaders work "as is" (daijy) |
| |
| PIG-1144: set default_parallelism construct does not set the number of |
| reducers correctly (daijy) |
| |
| PIG-1165: Signature of loader does not set correctly for order by (daijy) |
| |
| PIG-761: ERROR 2086 on simple JOIN (daijy) |
| |
| PIG-1172: PushDownForeachFlatten shall not push ForEach below Join if the |
| flattened fields is used in Join (daijy) |
| |
| PIG-1180: Piggybank should compile even if we only have |
| "pig-withouthadoop.jar" but no "pig.jar" in the pig home directory (daijy) |
| |
| PIG-1185: Data bags do not close spill files after using iterator to read |
| tuples (yinghe via daijy) |
| |
| PIG-1186: Pig do not take values in "pig-cluster-hadoop-site.xml" (daijy) |
| |
| PIG-1193: Secondary sort issue on nested desc sort (daijy) |
| |
| PIG-1195: POSort should take care of sort order (daijy) |
| |
| PIG-1210: fieldsToRead send the same fields more than once in some cases (daijy) |
| |
| PIG-1231: DefaultDataBagIterator.hasNext() should be idempotent in all cases |
| (daijy) |
| |
| Release 0.5.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-1039: documentation update (chandec via olgan) |
| |
| OPTIMIZATIONS |
| |
| BUG FIXES |
| |
| PIG-963: Join in local mode matches null keys (pradeepkth) |
| PIG-660: Integration with Hadoop 20 (sms via olgan) |
| |
| Release 0.4.0 - 2009-09-26 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart |
| (olgan) |
| PIG-734: Changed maps to only take strings as keys (gates) |
| |
| IMPROVEMENTS |
| |
| PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan) |
| |
| PIG-578: join ... outer, ... outer semantics are a no-ops, should produce |
| corresponding null values (pradeepkth) |
| |
| PIG-936: making dump and PigDump independent from Tuple.toString (daijy) |
| |
| PIG-890: Create a sampler interface and improve the skewed join sampler (sriranjan via daijy) |
| |
| PIG-922: Logical optimizer: push up project part 1 (daijy) |
| |
| PIG-812: COUNT(*) does not work (breed) |
| |
| PIG-923: Allow specifying log file location through pig.properties (dvryaboy via daijy) |
| |
| PIG-926: Merge-Join phase 2 (ashutoshc via pradeepkth) |
| |
| PIG-845: PERFORMANCE: Merge Join (ashutoshc via pradeepkth) |
| |
| PIG-893: Added string -> integer, long, float, and double casts (zjffdu via gates) |
| |
| PIG-833: Added Zebra, new columnar storage mechanism for HDFS (rangadi plus many others via gates) |
| |
| PIG-697: Proposed improvements to pig's optimizer, Phase5 (daijy) |
| |
| PIG-895: Default parallel for Pig (daijy) |
| |
| PIG-820: Change RandomSampleLoader to take a LoadFunc instead of extending |
| BinStorage. Added new Samplable interface for loaders to implement |
| allowing them to be used by RandomSampleLoader (ashutoshc via gates) |
| |
| PIG-832: Make import list configurable (daijy) |
| |
| PIG-697: Proposed improvements to pig's optimizer (sms) |
| |
| PIG-753: Allow UDFs with no parameters (zjffdu via gates) |
| |
| PIG-765: jdiff for pig ( gkesavan |
| |
| OPTIMIZATIONS |
| |
| PIG-792: skew join implementation (sriranjan via olgan) |
| |
| BUG FIXES |
| |
| PIG-964: Handling null in skewed join (sriranjan via olgan) |
| |
| PIG-962: Skewed join creates 3 map reduce jobs (sriranjan via olgan) |
| |
| PIG-957: Tutorial is broken with 0.4 branch and trunk (pradeepkth) |
| |
| PIG-955: Skewed join produces invalid results (yinghe via olgan) |
| |
| PIG-954: Skewed join fails when pig.skewedjoin.reduce.memusage is not |
| configured(yinghe via olgan) |
| |
| PIG-882: log level not propogated to loggers - duplicate message (daijy) |
| |
| PIG-943: Pig crash when it cannot get counter from hadoop (daijy) |
| |
| PIG-935: Skewed join throws an exception when used with map keys(sriranjan |
| via pradeepkth) |
| |
| PIG-934: Merge join implementation currently does not seek to right point |
| on the right side input based on the offset provided by the index |
| (ashutoshc via pradeepkth) |
| |
| PIG-925: Fix join in local mode (daijy) |
| |
| PIG-913: Error in Pig script when grouping on chararray column (daijy) |
| |
| PIG-907: Provide multiple version of HashFNV (Piggybank) (daijy) |
| |
| PIG-905: TOKENIZE throws exception on null data (daijy) |
| |
| PIG-901: InputSplit (SliceWrapper) created by Pig is big in size due to |
| serialized PigContext (pradeepkth) |
| |
| PIG-882: log level not propogated to loggers (daijy) |
| |
| PIG-880: Order by is borken with complex fields (sms) |
| |
| PIG-773: Empty complex constants (empty bag, empty tuple and empty map) |
| should be supported (ashutoshc via sms) |
| |
| PIG-695: Pig should not fail when error logs cannot be created (sms) |
| |
| PIG-878: Pig is returning too many blocks in the input split. (arunc via gates) |
| |
| PIG-888: Pig do not pass udf to the backend in some situation (daijy) |
| |
| PIG-728: All backend error messages must be logged to preserve the |
| original error messages (sms) |
| |
| PIG-877: Push up filter does not account for added columns in foreach |
| (sms) |
| |
| PIG-883: udf import list does not send to the backend (daijy) |
| |
| PIG-881: Pig should ship load udfs to the backend (daijy) |
| |
| PIG-876: limit changes order of order-by to ascending (daijy) |
| |
| PIG-851: Map type used as return type in UDFs not recognized at all times |
| (zjffdu via sms) |
| |
| PIG-861: POJoinPackage lose tuple in large dataset (daijy) |
| |
| PIG-797: Limit with ORDER BY producing wrong results (daijy) |
| |
| PIG-850: Dump produce wrong result while "store into" is ok (daijy) |
| |
| PIG-852: pig -version or pig -help returns exit code of 1 (milindb via |
| olgan) |
| |
| PIG-849: Local engine loses records in splits (hagleitn via olgan) |
| |
| PIG-939: Fix checkstyle ivy configuration ( gkesavan ) |
| |
| Release 0.3.0 - Unreleased |
| |
| INCOMPATIBLE CHANGES |
| |
| IMPROVEMENTS |
| |
| PIG-817: documentation update (chandec via olgan) |
| |
| PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates) |
| |
| PIG-831: Turned off reporting of records and bytes written for mutli-store |
| queries as the returned results are confusing and wrong. (gates) |
| |
| PIG-813: documentation updates (chandec via olgan) |
| |
| PIG-825: PIG_HADOOP_VERSION should be set to 18 (dvryaboy via gates) |
| |
| PIG-795: support for SAMPLE command (ericg via olgan) |
| |
| PIG-619: Create one InputSplit even when the input file is zero length |
| so that hadoop runs maps and creates output for the next |
| job (gates) |
| |
| PIG-697: Proposed improvements to pig's optimizer (sms) |
| |
| PIG-700: To automate the pig patch test process (gkesavan via sms) |
| |
| PIG-712: Added utility functions to create schemas for tuples and bags (zjffdu |
| via gates) |
| |
| PIG-652: Adapt changes in store interface to multi-query changes (hagleitn |
| via gates) |
| |
| PIG-775: PORelationToExprProject should create a NonSpillableDataBag to create |
| empty bags (pradeepkth) |
| |
| PIG-741: Allow limit to be nested in a foreach. |
| |
| PIG-627: multiquery support phase 3 (hagleitn and Richard Ding via olgan) |
| |
| PIG-743: To implement clover (gkesavan) |
| |
| PIG-701: Implement IVY for resolving pig dependencies (gkesavan) |
| |
| PIG-626: Add access to hadoop counters (shubhamc via gates) |
| |
| PIG-627: multiquery support phase 1 and phase 2 (hagleitn and Richard Ding via pradeepkth) |
| |
| BUG FIXES |
| |
| PIG-846: MultiQuery optimization in some cases has an issue when there is a |
| split in the map plan (pradeepkth) |
| |
| PIG-835: Multiquery optimization does not handle the case where the map keys |
| in the split plans have different key types (tuple and non tuple key type) |
| (pradeepkth) |
| |
| PIG-839: incorrect return codes on failure when using -f or -e flags (hagleitn |
| via sms) |
| |
| PIG-796: support conversion from numeric types to chararray (ashutoshc |
| via pradeepkth) |
| |
| PIG-564: problem with parameter substitution and special charachters (olgan) |
| |
| PIG-802: PERFORMANCE: not creating bags for ORDER BY (serakesh via olgan) |
| |
| PIG-816: PigStorage() does not accept Unicode characters in its contructor (pradeepkth) |
| |
| PIG-818: Explain doesn't handle PODemux properly (hagleitn via olgan) |
| |
| PIG-819: run -param -param; is a valid grunt command (milindb via olgan) |
| |
| PIG-656: Use of eval or any other keyword in the package hierarchy of a UDF causes |
| parse exception (milindb via sms) |
| |
| PIG-814: Make Binstorage more robust when data contains record markers (pradeepkth) |
| |
| PIG-811: Globs with "?" in the pattern are broken in local mode (hagleitn via |
| olgan) |
| |
| PIG-810: Fixed NPE in PigStats (gates) |
| |
| PIG-804: problem with lineage with double map redirection (pradeepkth) |
| |
| PIG-733: Order by sampling dumps entire sample to hdfs which causes dfs |
| "FileSystem closed" error on large input (pradeepkth) |
| |
| PIG-693: Parameter to UDF which is an alias returned in another UDF in nested |
| foreach causes incorrect results (thejas via sms) |
| |
| PIG-725: javadoc: warning - Multiple sources of package comments found for |
| package "org.apache.commons.logging" (gkesavan via sms) |
| |
| PIG-745: Add DataType.toString() to force basic types to chararray, useful |
| for UDFs that want to handle all simple types as strings (ciemo via gates) |
| |
| PIG-514: COUNT returns no results as a result of two filter statements in |
| FOREACH (pradeepkth) |
| |
| PIG-789: Fix dump and illustrate to work with new multi-query feature |
| (hagleitn via gates) |
| |
| PIG-774: Pig does not handle Chinese characters (in both the parameter subsitution |
| using -param_file or embedded in the Pig script) correctly (daijy) |
| |
| PIG-800: Fix distinct and order in local mode to not go into an infinite loop |
| (gates) |
| |
| PIG-806: to remove author tags in the pig source code (sms |
| |
| PIG-799: Unit tests on windows are failing after multiquery commit (daijy) |
| |
| PIG-781: Error reporting for failed MR jobs (hagleitn via olgan) |
| |
| Release 0.2.0 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-157: Add types and rework execution pipeline (gates) |
| |
| PIG-458: integration with Hadoop 18 (olgan) |
| |
| NEW FEATURES |
| PIG-139: command line editing (daijy via olgan) |
| |
| PIG-554 Added fragment replicate map side join (shravanmn via pkamath and gates) |
| |
| PIG-535: added rmf command |
| |
| PIG-704 Added ALIASES command that shows all currently defined ALIASES. |
| Changed semantics of DEFINE to define last used alias if no argument is |
| given (ericg via gates) |
| |
| PIG-713 Added alias completion as part of tab completion in grunt (ericg |
| via gates) |
| |
| IMPROVEMENTS |
| |
| PIG-270: proper line number for parse errors (daijy via olgan) |
| |
| PIG-367: convinience function for UDFs to name schema |
| |
| PIG-443: Illustrate for the Types branch (shubhamc via olgan) |
| |
| PIG-599: Added buffering to BufferedPositionedInputStream (gates) |
| |
| PIG-629: performance improvement: getting rid of targeted tuple (pradeepkth |
| via olgan) |
| |
| PIG-628: misc performance improvements (pradeepkth via olgan) |
| |
| PIG-589: error handling, phase 1-2 (sms via olgan) |
| |
| PIG-590: error handling, phase 3 (sms) |
| |
| PIG-591: error handling, phase 4 (sms) |
| |
| PIG-545: PERFORMANCE: Sampler for order bys does not produce a good |
| distribution (pradeepkth) |
| |
| PIG-580: using combiner to compute distinct aggs (pradeepkth via olgan) |
| |
| PIG-636: Use lightweight bag implementations which do not register with |
| SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-563: support for multiple combiner invocations (pradeepkth via olgan) |
| |
| PIG-465: performance improvement - removing keys from the value (pradeepkth |
| via olgan) |
| |
| PIG-450: PERFORMANCE: Distinct should make use of combiner to remove |
| duplicate values from keys. (gates) |
| |
| PIG-350: PERFORMANCE: Join optimization for pipeline rework (pradeepkth |
| via gates) |
| |
| BUG FIXES |
| |
| PIG-294: string comparator unit tests (sms via pi_song) |
| |
| PIG-258: cleaning up directories on failure (daijy via olgan) |
| |
| PIG-363: fix for describe to produce schema name |
| |
| PIG-368: making JobConf available to Load/Store UDFs |
| |
| PIG-311: cross is broken |
| |
| PIG-369: support for filter UDFs |
| |
| PIG-375: support for implicit split |
| |
| PIG-301: fix for order by descending |
| |
| PIG-378: fix for GENERATE + LIMIT |
| |
| PIG-362: don't push limit above generate with flatten |
| |
| PIG-381: bincond does not handle null data |
| |
| PIG-382: bincond throws typecast exception |
| |
| PIG-352: java.lang.ClassCastException when invalid field is accessed |
| |
| PIG-329: TestStoreOld, 2 unit tests were broken |
| |
| PIG-353: parsing of complex types |
| |
| PIG-392: error handling with multiple MRjobs |
| |
| PIG-397: code defaults to single reducer |
| |
| PIG-373: unconnected load causes problem, |
| |
| PIG-413: problem with float sum |
| |
| PIG-398: Expressions not allowed inside foreach (sms via olgan) |
| |
| PIG-418: divide by 0 problem |
| |
| PIG-402: order by with user comparator (shravanmn via olgan) |
| |
| PIG-415: problem with comparators (shravanmn via olgan) |
| |
| PIG-422: cross is broken (shravanmn via olgan) |
| |
| PIG-407: need to clone operators (pradeepkth via olgan) |
| |
| PIG-428: TypeCastInserter does not replace projects in inner plans |
| correctly (pradeepkth vi olgan) |
| |
| PIG-421: error with complex nested plan (sms via olgan) |
| |
| PIG-429: Self join wth implicit split has the join output in wrong order |
| (pradeepkth via olgan) |
| |
| PIG-434: short-circuit AND and OR (pradeepkth viia olgan) |
| |
| PIG-333: allowing no parethesis with single column alias with flatten (sms |
| via olgan) |
| |
| PIG-426: Adding result of two UDFs gives a syntax error |
| |
| PIG-426: Adding result of two UDFs gives a syntax error (sms via olgan) |
| |
| PIG-436: alias is lost when single column is flattened (pradeepkth via |
| olgan) |
| |
| PIG-364: Limit return incorrect records when we use multiple reducer |
| (daijy via olgan) |
| |
| PIG-439: disallow alias renaming (pradeepkth via olgan) |
| |
| PIG-440: Exceptions from UDFs inside a foreach are not captured (pradeepkth |
| via olgan) |
| |
| PIG-442: Disambiguated alias after a foreach flatten is not accessible a |
| couple of statements after the foreach (sms via olgan) |
| |
| PIG-424: nested foreach with flatten and agg gives an error (sms via |
| olgan) |
| |
| PIG-411: Pig leaves HOD processes behind if Ctrl-C is used before HOD |
| connection is fully established (olgan) |
| |
| PIG-430: Projections in nested filter and inside foreach do not work (sms |
| via olgan) |
| |
| PIG-445: Null Pointer Exceptions in the mappers leading to lot of retries |
| (shravanmn via olgan) |
| |
| PIG-444: job.jar is left behined (pradeepkth via olgan) |
| |
| PIG-447: improved error messages (pradeepkth via olgan) |
| |
| PIG-448: explain broken after load with types (pradeepkth via olgan) |
| |
| PIG-380: invalid schema for databag constant (sms via olgan) |
| |
| PIG-451: If an field is part of group followed by flatten, then referring |
| to it causes a parse error (pradeepkth via olgan) |
| |
| PIG-455: "group" alias is lost after a flatten(group) (pradeepkth vi olgan) |
| |
| PIG-459: increased sleep time before checking for job progress |
| |
| PIG-462: LIMIT N should create one output file with N rows (shravanmn via |
| olgan) |
| |
| PIG-376: set job name (olgan) |
| |
| PIG-463: POCast changes (pradeepkth via olgan) |
| |
| PIG-427: casting input to UDFs |
| |
| PIG-437: as in alias names causing problems (sms via olgan) |
| |
| PIG-54: MIN/MAX don't deal with invalid data (pradeepkth via olgan) |
| |
| PIG-470: TextLoader should produce bytearrays (sms via olgan) |
| |
| PIG-335: lineage (sms vi olgan) |
| |
| PIG-464: bag schema definition (pradeepkth via olgan) |
| |
| PIG-457: report 100% on successful jobs only (shravanmn via olgan) |
| |
| PIG-471: ignoring status errors from hadoop (pradeepkth via olgan) |
| |
| PIG-489: (*) processing (sms via olgan) |
| |
| PIG-475: missing heartbeats (shravanmn via olgan) |
| |
| PIG-468: make determine Schema work for BinStorage (pradeepkth via olgan) |
| |
| PIG-494: invalid handling of UTF-8 data in PigStorage (pradeepkth via olgan) |
| |
| PIG-501: Make branches/types work under cygwin (daijy via olgan) |
| |
| PIG-504: cleanup illustrate not to produce cn= (shubhamc via olgan) |
| |
| PIG-469: make sure that describe says "int" not "integer" (sms via olgan) |
| |
| PIG-495: projecting of bags only give 1 field (olgan) |
| |
| PIG-500: Load Func for POCast is not being set in some cases (sms via |
| olgan) |
| |
| PIG-499: parser issue with as (sms via olgan) |
| |
| PIG-507: permission error not reported (pradeepkth via olgan) |
| |
| PIG-508: problem with double joins (pradeepkth via olgan) |
| |
| PIG-497: problems with UTF8 handling in BinStorage (pradeepkth via olgan) |
| |
| PIG-505: working with map elements (sms via olgan) |
| |
| PIG-517: load functiin with parameters does not work with cast (pradeepkth |
| via olgan) |
| |
| PIG-525: make sure cast for udf parameters works (olgan) |
| |
| PIG-512: Expressions in foreach lead to errors (sms via olgan) |
| |
| PIG-528: use UDF return in schema computation (sms via olgan) |
| |
| PIG-527: allow PigStorage to write out complex output (sms via olgan) |
| |
| PIG-537: Failure in Hadoop map collect stage due to type mismatch in the |
| keys used in cogroup (pradeepkth vi olgan) |
| |
| PIG-538: support for null constants (pradeepkth via olgan) |
| |
| PIG-385: more null handling (pradeepkth via olgan) |
| |
| PIG-546: FilterFunc calls empty constructor when it should be calling |
| parameterized constructor (sms via olgan) |
| |
| PIG-449: Schemas for bags should contain tuples all the time (pradeepkth via |
| olgan) |
| |
| PIG-501: make unit tests run under windows (daijy via olgan) |
| |
| PIG-543: Restore local mode to truly run locally instead of use map |
| reduce. (shubhamc via gates) |
| |
| PIG-556: Changed FindQuantiles to report progress. Fixed issue with null |
| reporter being passed to EvalFuncs. (gates) |
| |
| PIG-6: Add load support from hbase (hustlmsp via gates) |
| |
| PIG-522: make negation work (pradeepkth via olgan) |
| |
| PIG-558: Distinct followed by a Join results in Invalid size 0 for a tuple |
| error (pradeepkth via olgan) |
| |
| PIG-572 A PigServer.registerScript() method, which lets a client |
| programmatically register a Pig Script. (shubhamc via gates) |
| |
| PIG-570: problems with handling bzip data (breed via olgan) |
| |
| PIG-597: Fix for how * is treated by UDFs (shravanmn via olgan) |
| |
| PIG-623: Fix spelling errors in output messages (tomwhite via sms) |
| |
| PIG-622: Include pig executable in distribution (tomwhite via sms) |
| |
| PIG-615: Wrong number of jobs with limit (shravanmn via sms) |
| |
| PIG-635: POCast.java has incorrect formatting (sms) |
| |
| PIG-634: When POUnion is one of the roots of a map plan, POUnion.getNext() |
| gives a null pointer exception (pradeepkth) |
| |
| PIG-632: Improved error message for binary operators (sms) |
| |
| PIG-636: Performance improvement: Use lightweight bag implementations which do not |
| register with SpillableMemoryManager with Combiner (pradeepkth) |
| |
| PIG-631: 4 Unit test failures on Windows (daijy) |
| |
| PIG-645: Streaming is broken with the latest trunk (pradeepkth) |
| |
| PIG-646: Distinct UDF should report progress (sms) |
| |
| PIG-647: memory sized passed on pig command line does not get propagated |
| to JobConf (sms) |
| |
| PIG-648: BinStorage fails when it finds markers unexpectedly in the data |
| (pradeepkth) |
| |
| PIG-649: RandomSampleLoader does not handle skipping correctly in |
| getNext() (pradeepkth) |
| |
| PIG-560: UTFDataFormatException (encoded string too long) is thrown when |
| storing strings > 65536 bytes (in UTF8 form) using BinStorage() (sms) |
| |
| PIG-642: Limit after FRJ causes problems (daijy) |
| |
| PIG-637: Limit broken after order by in the local mode (shubhamc via |
| olgan) |
| |
| PIG-553: EvalFunc.finish() not getting called (shravanmn via sms) |
| |
| PIG-654: Optimize build.xml (daijy) |
| |
| PIG-574: allowing to run scripts from within grunt shell (hagleitn via |
| olgan) |
| |
| PIG-665: Map key type not correctly set (for use when key is null) when |
| map plan does not have localrearrange (pradeepkth) |
| |
| PIG-590: error handling on the backend (sms via olgan) |
| |
| PIG-590: error handling on the backend (sms) |
| |
| PIG-658: Data type long : When 'L' or 'l' is included with data |
| (123L or 123l) load produces null value. Also the case with Float (thejas |
| via sms) |
| |
| PIG-591: Error handling phase four (sms via pradeepkth) |
| |
| PIG-664: Semantics of * is not consistent (sms) |
| |
| PIG-684: outputSchema method in TOKENIZE is broken (thejas via sms) |
| |
| PIG-655: Comparison of schemas of bincond operands is flawed (sms via |
| pradeepkth) |
| |
| PIG-691: BinStorage skips tuples when ^A is present in data (pradeepkth |
| via sms) |
| |
| PIG-577: outer join query looses name information (sms via pradeepkth) |
| |
| PIG-690: UNION doesn't work in the latest code (pradeepkth via sms) |
| |
| PIG-544: Utf8StorageConverter.java does not always produce NULLs when data |
| is malformed(thejas via sms) |
| |
| PIG-532: Casting a field removes its alias.(thejas via sms) |
| |
| PIG-705: Pig should display a better error message when backend error |
| messages cannot be parsed (sms) |
| |
| PIG-650: pig should look for and use the pig specific |
| 'pig-cluster-hadoop-site.xml' in the non HOD case just like it does in the |
| HOD case (sms) |
| |
| PIG-699: Implement forrest docs target in Pig Build (gkesavan via olgan) |
| |
| PIG-706: Implement ant target to use findbugs on PIG (gkesavan via olgan) |
| |
| PIG-708: implement releaseaudit tart to use rats on pig (gkesavan via |
| olgan) |
| |
| PIG-703: user documentation (chandec vi olgan) |
| |
| PIG-711: Implement checkstyle for pig (gkesavan via olgan) |
| |
| PIG-715: doc updates (chandec vi olgan) |
| |
| PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates) |
| |
| PIG-692: When running a job from a script, use the name of that script as |
| the default name for the job (vzaliva via gates) |
| |
| PIG-718: To add standard ant targets to build.xml file (gkesavan via olgan) |
| |
| PIG-720: further doc cleanup (gkesavan via olgan) |
| |
| Release 0.1.1 - 2008-12-04 |
| |
| INCOMPATIBLE CHANGES |
| |
| NEW FEATURES |
| |
| IMPROVEMENTS |
| |
| PIG-253: integration with hadoop-18 |
| |
| BUG FIXES |
| |
| PIG-342: Fix DistinctDataBag to recalculate size after it has spilled. |
| (bdimcheff via gates) |
| |
| Release 0.1.0 - 2008-09-11 |
| |
| INCOMPATIBLE CHANGES |
| |
| PIG-123: requires escape of '\' in chars and string |
| |
| NEW FEATURES |
| |
| PIG-20 Added custom comparator functions for order by (phunt via gates) |
| |
| PIG-94: Streaming implementation (arunc via olgan) |
| |
| PIG-58: parameter substitution |
| |
| PIG-55: added custom splitter (groves via olgan) |
| |
| PIG-59: Add a new ILLUSTRATE command (shubhamc via gates) |
| |
| PIG-256: Added variable argument support for UDFs (pi_song) |
| |
| IMPROVEMENTS: |
| |
| PIG-8 added binary comparator (olgan) |
| |
| PIG-11 Add capability to search for jar file to register. (antmagna via olgan) |
| |
| PIG-7: Added use of combiner in some restricted cases. (gates) |
| |
| PIG-47: Added methods to DataMap to provide access to its content |
| |
| PIG-30: Rewrote DataBags to better handle decisions of when to spill to |
| disk and to spill more intelligently. (gates) |
| |
| PIG-12: Added time stamps to log4j messages (phunt via gates) |
| |
| PIG-44: Added adaptive decision of the number of records to hold in memory |
| before spilling (utkarsh) |
| |
| PIG-56: Made DataBag implement Iterable. (groves via gates) |
| |
| PIG-39: created more efficient version of read (spullara via olgan) |
| |
| PIG-32: ABstraction layer (olgan) |
| |
| PIG-83: Change everything except grunt and Main (PigServer on down) to use |
| common logging abstraction instead of log4j. By default in grunt, log4j |
| still used as logging layer. Also converted all System.out/err.println |
| statements to use logging instead. (francisoud via gates) |
| |
| PIG-13: adding version to the system (joa23 via olgan) |
| |
| PIG-113: Make explain output more understandable (pi_song via gates) |
| |
| PIG-120: Support map reduce in local mode. To do this user needs to |
| specify execution type as mapreduce and cluster name as local (joa23 via gates) |
| |
| PIG-106: Change StringBuffer and String '+' to StringBuilder (francisoud via gates) |
| |
| PIG-111: Reworked configuration to be setable via properties. (joa23, pi_song, oae via gates) |
| |
| BUG FIXES |
| PIG-24 Files that were incorrectly placed under test/reports have been |
| removed. ant clean now cleans test/reports. (milindb via gates) |
| |
| PIG-25 com.yahoo.pig dir left under pig/test by mistake. removed it (olgan@) |
| |
| PIG-23 Made pig work with java 1.5. (milindb via gates) |
| |
| PIG-17 integrated with Hadoop 0.15 (olgan@) |
| |
| PIG-33 Help was commented out - uncommented (olgan) |
| |
| PIG-31: second half of concurrent mode problem addressed (olgan) |
| |
| PIG-14: added heartbeat functionality (olgan) |
| |
| PIG-17: updated hadoop15.jar to match hadoop 0.15.1 release |
| |
| PIG-29: fixed bag factory to be properly initialized (utkarsh) |
| |
| PIG-43: fixed problem where using the combiner prevented a pig alias |
| from being evaluated more than once. (gates) |
| |
| PIG-45: Fixed pig.pl to not assume hodrc file is named the same as |
| cluster name (gates) |
| |
| PIG-7 (more): Fixed bug in PigCombiner where it was writing IndexedTuples |
| instead of Tuples, causing Reducer to crash in some cases. |
| |
| PIG-41: Added patterns to svn:ignore |
| |
| PIG-51: Fixed combiner in the presence of flattening |
| |
| PIG-61: Fixed MapreducePlanCompiler to use PigContext to load up the |
| comparator function instead of Class.forName. (gates) |
| |
| PIG-63: Fix for non-ascii UTF-8 data (breed@ and olgan@) |
| |
| PIG-77: Added eclipse specific files to svn:ignore |
| |
| PIG-57: Fixed NPE in PigContext.fixUpDomain (francisoud via gates) |
| |
| PIG-69: NPE in PigContext.setJobtrackerLocation (francisoud via gates) |
| |
| PIG-78: src/org/apache/pig/builtin/PigStorage.java doesn't compile (arunc |
| via olgan) |
| |
| PIG-87: Fix pig.pl to find java via JAVA_HOME instead of hardcoded default |
| path. Also fix it to not die if pigclient.conf is missing. (craigm via |
| gates) |
| |
| PIG-89: Fix DefaultDataBag, DistinctDataBag, SortedDataBag to close spill |
| files when they are done spilling (contributions by craigm, breed, and |
| gates, committed by gates) |
| |
| PIG-95: Remove System.exit() statements from inside pig (joa23 via gates) |
| |
| PIG-65: convert tabs to spaces (groves via olgan) |
| |
| PIG-97: Turn off combiner in the case of Cogroup, as it doesn't work when |
| more than one bag is involved (gates) |
| |
| PIG-92: Fix NullPointerException in PIgContext due to uninitialized conf |
| reference. (francisoud via gates) |
| |
| PIG-80: In a number of places stack trace information was being lost by an |
| exception being caught, and a different exception then thrown. All those |
| locations have been changed so that the new exception now wraps the old. |
| (francisoud via gates) |
| |
| PIG-84: Converted printStackTrace calls to calls to the logger. |
| (francisoud via gates) |
| |
| PIG-88: Remove unused HadoopExe import from Main. (pi_song via gates) |
| |
| PIG-99: Fix to make unit tests not run out of memory. (francisoud via |
| gates) |
| |
| PIG-107: enabled several tests. (francisoud via olgan) |
| |
| PIG-46: abort processing on error for non-interactive mode (olston via |
| olgan) |
| |
| PIG-109: improved exception handling (oae via olgan) |
| |
| PIG-72: Move unit tests to use MiniDFS and MiniMR so that unit tests can |
| be run w/o access to a hadoop cluster. (xuzh via gates) |
| |
| PIG-68: improvements to build.xml (joa23 via olgan) |
| |
| PIG-110: Replaced code accidently merged out in PIG-32 fix that handled |
| flattening the combiner case. (gates and oae) |
| |
| PIG-213: Remove non-static references to logger from data bags and tuples, |
| as it causes significant overhead (vgeschel via gates) |
| |
| PIG-284: target for building source jar (oae via olgan) |
| |