docs/internal_general_arch.md - flink - Git at Google

 ---
 title:  "General Architecture and Process Model"
 ---

 ## The Processes

 When the Flink system is started, it bring up the *JobManager* and one or more *TaskManagers*. The JobManager
 is the coordinator of the Flink system, while the TaskManagers are the worksers that execute parts of the
 parallel programs. When starting the systen in *local* mode, a single JobManager and TaskManager are brought
 up within the same JVM.

 When a program is submitted, a client is created that performs the pre-processing and turns the program
 into the parallel data flow form that is executed by the JobManager and TaskManagers. The figure below
 illustrates the different actors in the system very coarsely.

 <div style="text-align: center;">
 <img src="img/ClientJmTm.svg" alt="The Interactions between Client, JobManager and TaskManager" height="400px" style="text-align: center;"/>
 </div>

 ## Component Stack

 An alternative view on the system is given by the stack below. The different layers of the stack build on
 top of each other and raise the abstraction level of the program representations they accept:

 - The **runtime** layer receive a program in the form of a *JobGraph*. A JobGraph is a generic parallel
 data flow with arbitrary tasks that consume and produce data streams.

 - The **optimizer** and **common api** layer takes programs in the form of operator DAGs. The operators are
 specific (e.g., Map, Join, Filter, Reduce, ...), but are data type agnostic. The concrete types and their
 interaction with the runtime is specified by the higher layers.

 - The **API layer** implements multiple APIs that create operator DAGs for their programs. Each API needs
 to provide utilities (serializers, comparators) that describe the interaction between its data types and
 the runtime.

 <div style="text-align: center;">
 <img src="img/stack.svg" alt="The Flink component stack" width="800px" />
 </div>

 ## Projects and Dependencies

 The Flink system code is divided into multiple sub-projects. The goal is to reduce the number of
 dependencies that a project implementing a Flink progam needs, as well as to faciltate easier testing
 of smaller sub-modules.

 The individual projects and their dependencies are shown in the figure below.

 <div style="text-align: center;">
 <img src="img/projects_dependencies.svg" alt="The Flink sub-projects and their dependencies" height="600px" style="text-align: center;"/>
 </div>

 In addition to the projects listed in the figure above, Flink currently contains the following sub-projects:

 - `flink-dist`: The *distribution* project. It defines how to assemble the compiled code, scripts, and other resources
 into the final folder structure that is ready to use.

 - `flink-addons`: A series of projects that are in an early version. Currently contains
 among other things projects for YARN support, JDBC data sources and sinks, hadoop compatibility,
 graph specific operators, and HBase connectors.

 - `flink-quickstart`: Scripts, maven archetypes, and example programs for the quickstarts and tutorials.
	---
	title: "General Architecture and Process Model"
	---

	## The Processes

	When the Flink system is started, it bring up the JobManager and one or more TaskManagers. The JobManager
	is the coordinator of the Flink system, while the TaskManagers are the worksers that execute parts of the
	parallel programs. When starting the systen in local mode, a single JobManager and TaskManager are brought
	up within the same JVM.

	When a program is submitted, a client is created that performs the pre-processing and turns the program
	into the parallel data flow form that is executed by the JobManager and TaskManagers. The figure below
	illustrates the different actors in the system very coarsely.

	<div style="text-align: center;">
	<img src="img/ClientJmTm.svg" alt="The Interactions between Client, JobManager and TaskManager" height="400px" style="text-align: center;"/>
	</div>

	## Component Stack

	An alternative view on the system is given by the stack below. The different layers of the stack build on
	top of each other and raise the abstraction level of the program representations they accept:

	- The runtime layer receive a program in the form of a JobGraph. A JobGraph is a generic parallel
	data flow with arbitrary tasks that consume and produce data streams.

	- The optimizer and common api layer takes programs in the form of operator DAGs. The operators are
	specific (e.g., Map, Join, Filter, Reduce, ...), but are data type agnostic. The concrete types and their
	interaction with the runtime is specified by the higher layers.

	- The API layer implements multiple APIs that create operator DAGs for their programs. Each API needs
	to provide utilities (serializers, comparators) that describe the interaction between its data types and
	the runtime.

	<div style="text-align: center;">
	<img src="img/stack.svg" alt="The Flink component stack" width="800px" />
	</div>

	## Projects and Dependencies

	The Flink system code is divided into multiple sub-projects. The goal is to reduce the number of
	dependencies that a project implementing a Flink progam needs, as well as to faciltate easier testing
	of smaller sub-modules.

	The individual projects and their dependencies are shown in the figure below.

	<div style="text-align: center;">
	<img src="img/projects_dependencies.svg" alt="The Flink sub-projects and their dependencies" height="600px" style="text-align: center;"/>
	</div>

	In addition to the projects listed in the figure above, Flink currently contains the following sub-projects:

	- `flink-dist`: The distribution project. It defines how to assemble the compiled code, scripts, and other resources
	into the final folder structure that is ready to use.

	- `flink-addons`: A series of projects that are in an early version. Currently contains
	among other things projects for YARN support, JDBC data sources and sinks, hadoop compatibility,
	graph specific operators, and HBase connectors.

	- `flink-quickstart`: Scripts, maven archetypes, and example programs for the quickstarts and tutorials.