blob: d271660b56d513933275541b262c33e6c38bca47 [file] [log] [blame]
Zebra Change Log
Trunk (unreleased changes)
INCOMPATIBLE CHANGES
PIG-1455 Addition of test-unit as an ant target (yanz)
PIG-1451 Change the build.test property in build to test.build.dir to be consistent with PIG (yanz)
PIG-1444 Addition of test-smoke ant target (gauravj via yanz)
PIG-1357 Addition of Test cases of map-side GROUP-BY (yanz)
PIG-1282 make Zebra's pig test cases run on real cluster (chaow via yanz)
PIG-1164 Addition of smoke tests (gauravj via yanz)
PIG-1122 Changed version number of pig dev core jar used in Zebra build from 0.6.0 to 0.7.0
to match Pig version number (yanz)
PIG-1099 Changed version number to be 0.7.0 to match Pig version number
change (yanz via gates)
IMPROVEMENTS
PIG-1425 support of source table index on unsorted table in the mapred APIs (yanz)
PIG-1375 Support of multiple Zebra table writing through Pig (chaow via yanz)
PIG-1351 Addition of type check when writing to basic table (chaow via yanz)
PIG-1361 Zebra TableLoader.getSchema() should return the projectionSchema specified in the constructor of TableLoader instead of pruned proejction by pig (gauravj via daijy)
PIG-1291 Support of virtual column "source_table" on unsorted table (yanz)
PIG-1315 Implementing OrderedLoadFunc interface for Zebra TableLoader (xuefux via yanz)
PIG-1306 Support of locally sorted input splits (yanz)
PIG-1268 Need an ant target that runs all pig-related tests in Zebra (xuefuz via yanz)
PIG-1207 Data sanity check should be performed at the end of writing instead of later at query time (yanz)
PIG-1206 Storing descendingly sorted PIG table as unsorted table (yanz)
PIG-1240 zebra manifest file enhancement (gauravj via yanz)
PIG-1140 Support of Hadoop 2.0 API (xuefuz via yanz)
PIG-1170 new end-to-end and stress test cases (jing1234 via yanz)
PIG-1136 Support of map split on hash keys with leading underscore (xuefuz via yanz)
PIG-1125 Map/Reduce API Changes (Chao Wang via yanz)
PIG-1104 Streaming Support (Chao Wang via yanz)
PIG-1119 Support of "group" as a column name (Gaurav Jain via yanz)
PIG-653 Pig Projection Push Down (Gaurav Jain via yanz)
PIG-1111 Multiple Outputs Support (Gaurav Jain via yanz)
PIG-1098 Zebra Performance Optimizations (yanz via gates)
PIG-1074 Zebra store function should allow '::' in column names in output
schema (yanz via gates)
PIG-1077 Support record(row)-based file split in Zebra's
TableInputFormat (chaow via gates)
PIG-1089 Use Pig's version for Zebra's own versi (chaow via olgan)
PIG-1069 Order Preserving Sorted Table Union (yanz via gates)
PIG-997 Sorted Table Support by Zebra (yanz via gates)
PIG-996 Add findbugs, checkstyle, and clover to zebra build file (chaow via
gates)
PIG-993 Ability to drop a column group in a table (yanz and rangadi via gates)
PIG-992 Separate schema related files into a schema package (yanz via
gates)
OPTIMIZATIONS
PIG-1198: performance improvements through use of unsorted input splits that span multiple files (yanz)
BUG FIXES
PIG-1453 Intermittent failure in pigtest (yanz)
PIG-1432 There are some debuging info output to STDOUT in PIG's TableStorer call path (yanz)
PIG-1421 Name node calls made by each mapper (xuefuz via yanz)
PIG-1342 Avoid making unnecessary name node calls for writes in Zebra (chaow via yanz)
PIG-1356 TableLoader makes unnecessary calls to build a Job instance that create a new JobClient in the hadoop 0.20.9 (yanz)
PIG-1349 Hubson test failure in test case TestBasicUnion (xuefuz via yanz)
PIG-1340 The zebra version number should be changed from 0.7 to 0.8 (yanz)
PIG-1318 Invalid type for source_table field when using order-preserving Sorted Table Union (gauravj via yanz)
PIG-1258 Number of sorted input splits is unusually high (yanz)
PIG-1269 Restrict schema definition for collection (xuefuz via yanz)
PIG-1253: make map/reduce test cases run on real cluster (chaow via yanz)
PIG-1276: Pig resource schema interface changed, so Zebra needs to catch exception thrown from the new interfaces. (xuefuz via yanz)
PIG-1256: Bag field should always contain a tuple type as the field schema in ResourceSchema object converted from Zebra Schema (xuefuz via yanz)
PIG-1227: Throw exception if column group meta file is missing for an unsorted table (yanz)
PIG-1201: unnecessary name node calls by each mapper; too big input split serialization size by Pig's Slice implementation (yanz)
PIG-1115: cleanup of temp files left by failed tasks (gauravj via yanz)
PIG-1167: Hadoop file glob support (yanz)
PIG-1153: Record split exception fix (yanz)
PIG-1145: Merge Join on Large Table throws an EOF exception (yanz)
PIG-1095: Schema support of anonymous fields in COLECTION fails (yanz via
gates)
PIG-1078: merge join with empty table failed (yanz via gates)
PIG-1091: Exception when load with projection of map keys on a map column
that is not map split (yanz via gates).
PIG-1026: [zebra] map split returns null (yanz via pradeepkth)
PIG-1057 Zebra does not support concurrent deletions of column groups now
(chaow via gates).
PIG-944 Change schema to be taken from StoreConfig instead of
TableStorer's constructor (yanz via gates).
PIG-918. Fix infinite loop only columns in first column group are
specified. (Yan Zhou via rangadi)
PIG-949. If an entire map is placed in non default column group,
and a specific key placed in another CG, the second CG did not
work as expected. (Yan Zhou, Jing Huang (tests) via rangadi)
PIG-987. Access control for column groups. Users can specifify
desired group, owner, and, permissions for a column group.
(Yan Zhou via rangadi)
PIG-991. Various minor bugs. (Yan Zhou via rangadi)
PIG-986. Column groups can have explicit names specified in
storage hint. (Yan Zhou via rangadi)