Mirror of Apache Tez

Clone this repo:
  1. 6c53307 TEZ-4243: Changes for 0.10.0 release (László Bodor reviewed by Jonathan Turner Eagles) by László Bodor · 3 days ago master
  2. 970d46b TEZ-4070: SSLFactory not closed in DAGClientTimelineImpl caused native memory issues (László Bodor reviewed by Jonathan Turner Eagles) by László Bodor · 11 days ago
  3. 16a0050 TEZ-4229: Improve TezLocalCacheManager to use configured root directory (László Bodor reviewed by Panagiotis Garefalakis, Ashutosh Chauhan) by László Bodor · 12 days ago
  4. 6e9c1b2 TEZ-4238: Check null mrReader in MRInput.close (László Bodor reviewed by Hadoop QA, Jonathan Turner Eagles, Hadoop QA) by László Bodor · 3 weeks ago
  5. d0f4987 TEZ-4234: Compressor can cause IllegalArgumentException in Buffer.limit where limit exceeds capacity (László Bodor reviewed by Rajesh Balamohan, Jonathan Turner Eagles) by László Bodor · 3 weeks ago

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

  • The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every ‘task’ in tez has the following:
  • Input to consume key/value pairs from.
  • Processor to process them.
  • Output to collect the processed key/value pairs.
  • A master for the data-processing application, where-by one can put together arbitrary data-processing ‘tasks’ described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.