Apache Tez

Clone this repo:
  1. 621a831 TEZ-4435: use jackson v2 - jackson v1 is EOL and full of security issues (#231) (PJ Fanning reviewed by Laszlo Bodor, Ayush Saxena) by PJ Fanning · 11 days ago master
  2. 5e31e4d [TEZ-4439] Update the protobuf documentation in TEZ codebase (#233) (Aman Raj reviewed by Laszlo Bodor) by Aman Raj · 3 weeks ago
  3. c386865 TEZ-4363: Bump protobuf dependency to 3.x (#192) (Mark Bathori reviewed by Laszlo Bodor, Aman Raj) by Mark Bathori · 3 weeks ago
  4. a192ec4 Revert "TEZ-4397 Open Tez Input splits asynchronously" by Laszlo Bodor · 6 weeks ago
  5. 06fff5c TEZ-4430: Fix tez.task.launch.cmd-opts property not working (#228) (Ganesha Shreedhara reviewed by Laszlo Bodor) by Ganesha Shreedhara · 6 weeks ago branch-0.10.2

Apache Tez

Apache Tez is a generic data-processing pipeline engine envisioned as a low-level engine for higher abstractions such as Apache Hadoop Map-Reduce, Apache Pig, Apache Hive etc.

At its heart, tez is very simple and has just two components:

  • The data-processing pipeline engine where-in one can plug-in input, processing and output implementations to perform arbitrary data-processing. Every ‘task’ in tez has the following:
  • Input to consume key/value pairs from.
  • Processor to process them.
  • Output to collect the processed key/value pairs.
  • A master for the data-processing application, where-by one can put together arbitrary data-processing ‘tasks’ described above into a task-DAG to process data as desired. The generic master is implemented as a Apache Hadoop YARN ApplicationMaster.