Apache Tez

Clone this repo:
  1. 3546a41 TEZ-3384. Fix TestATSV15HistoryLoggingService::testDAGGroupingGroupingEnabled unit test. (Sushmitha Sreenivasan via hitesh) by Hitesh Shah · 1 year, 5 months ago master
  2. f27f4a1 TEZ-3382. Tez analyzer: Should be resilient to new counters (rbalamohan) by Rajesh Balamohan · 1 year, 5 months ago
  3. 30eaa1e TEZ-3379. Tez analyzer: Move sysout to log4j (rbalamohan) by Rajesh Balamohan · 1 year, 5 months ago
  4. d3011a9 TEZ-3376. Fix groupId generation to account for dagId starting with 1. (Harish Jaiprakash via hitesh) by Hitesh Shah · 1 year, 5 months ago
  5. cbc0c63 TEZ-3359. Add granular log levels for HistoryLoggingService. (Harish Jaiprakash via hitesh) by Hitesh Shah · 1 year, 5 months ago

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

  • The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every ‘task’ in tez has the following:
  • Input to consume key/value pairs from.
  • Processor to process them.
  • Output to collect the processed key/value pairs.
  • A master for the data-processing application, where-by one can put together arbitrary data-processing ‘tasks’ described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.