docs/src/site/twiki/Architecture.twiki - incubator-atlas - Git at Google

 ---+ Architecture

 ---++ Introduction

 ---++ Atlas High Level Architecture - Overview
 <img src="images/twiki/architecture.png" height="400" width="600" />

 The components of Atlas can be grouped under the following major categories:

 ---+++ Core

 This category contains the components that implement the core of Atlas functionality, including:

 *Type System*: Atlas allows users to define a model for the metadata objects they want to manage. The model is composed
 of definitions called ‘types’. Instances of ‘types’ called ‘entities’ represent the actual metadata objects that are
 managed. The Type System is a component that allows users to define and manage the types and entities. All metadata
 objects managed by Atlas out of the box (like Hive tables, for e.g.) are modelled using types and represented as
 entities. To store new types of metadata in Atlas, one needs to understand the concepts of the type system component.

 One key point to note is that the generic nature of the modelling in Atlas allows data stewards and integrators to
 define both technical metadata and business metadata. It is also possible to define rich relationships between the
 two using features of Atlas.

 *Ingest / Export*: The Ingest component allows metadata to be added to Atlas. Similarly, the Export component exposes
 metadata changes detected by Atlas to be raised as events. Consumers can consume these change events to react to
 metadata changes in real time.

 *Graph Engine*: Internally, Atlas represents metadata objects it manages using a Graph model. It does this to
 achieve great flexibility and rich relations between the metadata objects. The Graph Engine is a component that is
 responsible for translating between types and entities of the Type System, and the underlying Graph model.
 In addition to managing the Graph objects, The Graph Engine also creates the appropriate indices for the metadata
 objects so that they can be searched for efficiently.

 *Titan*: Currently, Atlas uses the Titan Graph Database to store the metadata objects. Titan is used as a library
 within Atlas. Titan uses two stores: The Metadata store is configured to !HBase by default and the Index store
 is configured to Solr. It is also possible to use the Metadata store as BerkeleyDB and Index store as !ElasticSearch
 by building with corresponding profiles. The Metadata store is used for storing the metadata objects proper, and the
 Index store is used for storing indices of the Metadata properties, that allows efficient search.


 ---+++ Integration

 Users can manage metadata in Atlas using two methods:

 *API*: All functionality of Atlas is exposed to end users via a REST API that allows types and entities to be created,
 updated and deleted. It is also the primary mechanism to query and discover the types and entities managed by Atlas.

 *Messaging*: In addition to the API, users can choose to integrate with Atlas using a messaging interface that is
 based on Kafka. This is useful both for communicating metadata objects to Atlas, and also to consume metadata change
 events from Atlas using which applications can be built. The messaging interface is particularly useful if one wishes
 to use a more loosely coupled integration with Atlas that could allow for better scalability, reliability etc. Atlas
 uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata
 notification events. Events are written by the hooks and Atlas to different Kafka topics.

 ---+++ Metadata sources

 Atlas supports integration with many sources of metadata out of the box. More integrations will be added in future
 as well. Currently, Atlas supports ingesting and managing metadata from the following sources:

    * [[Bridge-Hive][Hive]]
    * [[Bridge-Sqoop][Sqoop]]
    * [[Bridge-Falcon][Falcon]]
    * [[StormAtlasHook][Storm]]

 The integration implies two things:
 There are metadata models that Atlas defines natively to represent objects of these components.
 There are components Atlas provides to ingest metadata objects from these components
 (in real time or in batch mode in some cases).

 ---+++ Applications
 The metadata managed by Atlas is consumed by a variety of applications for satisfying many governance use cases.

 *Atlas Admin UI*: This component is a web based application that allows data stewards and scientists to discover
 and annotate metadata. Of primary importance here is a search interface and SQL like query language that can be
 used to query the metadata types and objects managed by Atlas. The Admin UI uses the REST API of Atlas for
 building its functionality.

 *Tag Based Policies*: [[http://ranger.apache.org/][Apache Ranger]] is an advanced security management solution
 for the Hadoop ecosystem having wide integration with a variety of Hadoop components. By integrating with Atlas,
 Ranger allows security administrators to define metadata driven security policies for effective governance.
 Ranger is a consumer to the metadata change events notified by Atlas.

 *Business Taxonomy*: The metadata objects ingested into Atlas from the Metadata sources are primarily a form
 of technical metadata. To enhance the discoverability and governance capabilities, Atlas comes with a Business
 Taxonomy interface that allows users to first, define a hierarchical set of business terms that represent their
 business domain and associate them to the metadata entities Atlas manages. Business Taxonomy is a web application that
 is part of the Atlas Admin UI currently and integrates with Atlas using the REST API.
	---+ Architecture

	---++ Introduction

	---++ Atlas High Level Architecture - Overview
	<img src="images/twiki/architecture.png" height="400" width="600" />

	The components of Atlas can be grouped under the following major categories:

	---+++ Core

	This category contains the components that implement the core of Atlas functionality, including:

	Type System: Atlas allows users to define a model for the metadata objects they want to manage. The model is composed
	of definitions called ‘types’. Instances of ‘types’ called ‘entities’ represent the actual metadata objects that are
	managed. The Type System is a component that allows users to define and manage the types and entities. All metadata
	objects managed by Atlas out of the box (like Hive tables, for e.g.) are modelled using types and represented as
	entities. To store new types of metadata in Atlas, one needs to understand the concepts of the type system component.

	One key point to note is that the generic nature of the modelling in Atlas allows data stewards and integrators to
	define both technical metadata and business metadata. It is also possible to define rich relationships between the
	two using features of Atlas.

	Ingest / Export: The Ingest component allows metadata to be added to Atlas. Similarly, the Export component exposes
	metadata changes detected by Atlas to be raised as events. Consumers can consume these change events to react to
	metadata changes in real time.

	Graph Engine: Internally, Atlas represents metadata objects it manages using a Graph model. It does this to
	achieve great flexibility and rich relations between the metadata objects. The Graph Engine is a component that is
	responsible for translating between types and entities of the Type System, and the underlying Graph model.
	In addition to managing the Graph objects, The Graph Engine also creates the appropriate indices for the metadata
	objects so that they can be searched for efficiently.

	Titan: Currently, Atlas uses the Titan Graph Database to store the metadata objects. Titan is used as a library
	within Atlas. Titan uses two stores: The Metadata store is configured to !HBase by default and the Index store
	is configured to Solr. It is also possible to use the Metadata store as BerkeleyDB and Index store as !ElasticSearch
	by building with corresponding profiles. The Metadata store is used for storing the metadata objects proper, and the
	Index store is used for storing indices of the Metadata properties, that allows efficient search.


	---+++ Integration

	Users can manage metadata in Atlas using two methods:

	API: All functionality of Atlas is exposed to end users via a REST API that allows types and entities to be created,
	updated and deleted. It is also the primary mechanism to query and discover the types and entities managed by Atlas.

	Messaging: In addition to the API, users can choose to integrate with Atlas using a messaging interface that is
	based on Kafka. This is useful both for communicating metadata objects to Atlas, and also to consume metadata change
	events from Atlas using which applications can be built. The messaging interface is particularly useful if one wishes
	to use a more loosely coupled integration with Atlas that could allow for better scalability, reliability etc. Atlas
	uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata
	notification events. Events are written by the hooks and Atlas to different Kafka topics.

	---+++ Metadata sources

	Atlas supports integration with many sources of metadata out of the box. More integrations will be added in future
	as well. Currently, Atlas supports ingesting and managing metadata from the following sources:

	* [[Bridge-Hive][Hive]]
	* [[Bridge-Sqoop][Sqoop]]
	* [[Bridge-Falcon][Falcon]]
	* [[StormAtlasHook][Storm]]

	The integration implies two things:
	There are metadata models that Atlas defines natively to represent objects of these components.
	There are components Atlas provides to ingest metadata objects from these components
	(in real time or in batch mode in some cases).

	---+++ Applications
	The metadata managed by Atlas is consumed by a variety of applications for satisfying many governance use cases.

	Atlas Admin UI: This component is a web based application that allows data stewards and scientists to discover
	and annotate metadata. Of primary importance here is a search interface and SQL like query language that can be
	used to query the metadata types and objects managed by Atlas. The Admin UI uses the REST API of Atlas for
	building its functionality.

	Tag Based Policies: [[http://ranger.apache.org/][Apache Ranger]] is an advanced security management solution
	for the Hadoop ecosystem having wide integration with a variety of Hadoop components. By integrating with Atlas,
	Ranger allows security administrators to define metadata driven security policies for effective governance.
	Ranger is a consumer to the metadata change events notified by Atlas.

	Business Taxonomy: The metadata objects ingested into Atlas from the Metadata sources are primarily a form
	of technical metadata. To enhance the discoverability and governance capabilities, Atlas comes with a Business
	Taxonomy interface that allows users to first, define a hierarchical set of business terms that represent their
	business domain and associate them to the metadata entities Atlas manages. Business Taxonomy is a web application that
	is part of the Atlas Admin UI currently and integrates with Atlas using the REST API.