docs/src/site/apt/index.apt - tez - Git at Google

 ~~ Licensed to the Apache Software Foundation (ASF) under one or more
 ~~ contributor license agreements.  See the NOTICE file distributed with
 ~~ this work for additional information regarding copyright ownership.
 ~~ The ASF licenses this file to You under the Apache License, Version 2.0
 ~~ (the "License"); you may not use this file except in compliance with
 ~~ the License.  You may obtain a copy of the License at
 ~~
 ~~     http://www.apache.org/licenses/LICENSE-2.0
 ~~
 ~~ Unless required by applicable law or agreed to in writing, software
 ~~ distributed under the License is distributed on an "AS IS" BASIS,
 ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ~~ See the License for the specific language governing permissions and
 ~~ limitations under the License.
 ~~
 Introduction

   The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html}Apache Hadoop YARN}}

   The 2 main design themes for Tez are:

   * <<Empowering end users by:>>

     * Expressive dataflow definition APIs

     * Flexible Input-Processor-Output runtime model

     * Data type agnostic

     * Simplifying deployment


   * <<Execution Performance>>

     * Performance gains over Map Reduce

     * Optimal resource management

     * Plan reconfiguration at runtime

     * Dynamic physical data flow decisions

   []

   By allowing projects like Apache Hive and Apache Pig to run a complex DAG of tasks, Tez can be used to process data, that earlier took multiple MR jobs, now in a single Tez job as shown below.

 [./images/PigHiveQueryOnMR.png] Flow for a Hive or Pig Query on MapReduce

 [./images/PigHiveQueryOnTez.png] Flow for a Hive or Pig Query on Tez

 Disclaimer

   Apache Tez is an effort currently undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC.
	~~ Licensed to the Apache Software Foundation (ASF) under one or more
	~~ contributor license agreements. See the NOTICE file distributed with
	~~ this work for additional information regarding copyright ownership.
	~~ The ASF licenses this file to You under the Apache License, Version 2.0
	~~ (the "License"); you may not use this file except in compliance with
	~~ the License. You may obtain a copy of the License at
	~~
	~~ http://www.apache.org/licenses/LICENSE-2.0
	~~
	~~ Unless required by applicable law or agreed to in writing, software
	~~ distributed under the License is distributed on an "AS IS" BASIS,
	~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	~~ See the License for the specific language governing permissions and
	~~ limitations under the License.
	~~
	Introduction

	The Apache Tez project is aimed at building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html}Apache Hadoop YARN}}

	The 2 main design themes for Tez are:

	* <<Empowering end users by:>>

	* Expressive dataflow definition APIs

	* Flexible Input-Processor-Output runtime model

	* Data type agnostic

	* Simplifying deployment


	* <<Execution Performance>>

	* Performance gains over Map Reduce

	* Optimal resource management

	* Plan reconfiguration at runtime

	* Dynamic physical data flow decisions

	[]

	By allowing projects like Apache Hive and Apache Pig to run a complex DAG of tasks, Tez can be used to process data, that earlier took multiple MR jobs, now in a single Tez job as shown below.

	[./images/PigHiveQueryOnMR.png] Flow for a Hive or Pig Query on MapReduce

	[./images/PigHiveQueryOnTez.png] Flow for a Hive or Pig Query on Tez

	Disclaimer

	Apache Tez is an effort currently undergoing incubation at The Apache Software Foundation (ASF) sponsored by the Apache Incubator PMC.