blob: 5317abfb4c0c64f7afc639cc5e079e86e2c32a13 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/twiki/TypeSystem.twiki at 2018-09-06
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20180906" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache Atlas &#x2013; Type System</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.7.min.js"></script>
</head>
<body class="topBarEnabled">
<div id="topbar" class="navbar navbar-fixed-top ">
<div class="navbar-inner">
<div class="container" style="width: 68%;"><div class="nav-collapse">
<ul class="nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Apache Atlas <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="index.html" title="Overview">Overview</a></li>
<li><a href="license.html" title="License">License</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="Downloads">Downloads</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS" title="Wiki">Wiki</a></li>
<li><a href="https://git-wip-us.apache.org/repos/asf/atlas.git" title="Git">Git</a></li>
<li><a href="https://issues.apache.org/jira/browse/ATLAS" title="Jira">Jira</a></li>
<li><a href="https://reviews.apache.org/groups/atlas/?sort=-time_added" title="Review Board">Review Board</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="project-info.html" title="Summary">Summary</a></li>
<li><a href="mail-lists.html" title="Mailing Lists">Mailing Lists</a></li>
<li><a href="team-list.html" title="Team">Team</a></li>
<li><a href="issue-tracking.html" title="Issue Tracking">Issue Tracking</a></li>
<li><a href="source-repository.html" title="Source Repository">Source Repository</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Downloads <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="1.1.0">1.1.0</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="1.0.0">1.0.0</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.8.2">0.8.2</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.8.1">0.8.1</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="http://atlas.apache.org/#/Downloads" target="_blank" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="../index.html" title="latest">latest</a></li>
<li><a href="../1.1.0/index.html" title="1.1.0">1.1.0</a></li>
<li><a href="../1.0.0/index.html" title="1.0.0">1.0.0</a></li>
<li><a href="../0.8.2/index.html" title="0.8.2">0.8.2</a></li>
<li><a href="../0.8.1/index.html" title="0.8.1">0.8.1</a></li>
<li><a href="../0.8.0-incubating/index.html" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="../0.7.1-incubating/index.html" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="../0.7.0-incubating/index.html" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="../0.6.0-incubating/index.html" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="../0.5.0-incubating/index.html" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/foundation/how-it-works.html" title="How Apache Works">How Apache Works</a></li>
<li><a href="https://www.apache.org/events/current-event" title="Events">Events</a></li>
<li><a href="https://www.apache.org/licenses/" title="License">License</a></li>
<li><a href="http://www.apache.org/foundation/" title="Foundation">Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsoring Apache">Sponsoring Apache</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
</ul>
</li>
</ul>
<form id="search-form" action="https://www.google.com/search" method="get" class="navbar-search pull-right" >
<input value="http://atlas.apache.org" name="sitesearch" type="hidden"/>
<input class="search-query" name="q" id="query" type="text" />
</form>
<script type="text/javascript">asyncJs( 'https://cse.google.com/brand?form=search-form' )</script>
<iframe src="https://www.facebook.com/plugins/like.php?href=http://atlas.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
scrolling="no" frameborder="0"
style="border:none; width:100px; height:20px; margin-top: 10px;" class="pull-right" ></iframe>
<script type="text/javascript">asyncJs( 'https://apis.google.com/js/plusone.js' )</script>
<ul class="nav pull-right"><li style="margin-top: 10px;">
<div class="g-plusone" data-href="http://atlas.apache.org/atlas-docs" data-size="medium" width="60px" align="right" ></div>
</li></ul>
</div>
</div>
</div>
</div>
<div class="container">
<div id="banner">
<div class="pull-left"><a href=".." id="bannerLeft"><img src="images/atlas-logo.png" alt="Apache Atlas" width="200px" height="45px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="index.html" title="Atlas">Atlas</a><span class="divider">/</span></li>
<li class="active ">Type System</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2018-09-18</li>
<li id="projectVersion" class="pull-right">Version: 1.1.0</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2><a name="Type_System"></a>Type System</h2></div>
<div class="section">
<h3><a name="Overview"></a>Overview</h3>
<p>Atlas allows users to define a model for the metadata objects they want to manage. The model is composed of definitions called &#x2018;types&#x2019;. Instances of &#x2018;types&#x2019; called &#x2018;entities&#x2019; represent the actual metadata objects that are managed. The Type System is a component that allows users to define and manage the types and entities. All metadata objects managed by Atlas out of the box (like Hive tables, for e.g.) are modelled using types and represented as entities. To store new types of metadata in Atlas, one needs to understand the concepts of the type system component.</p></div>
<div class="section">
<h3><a name="Types"></a>Types</h3>
<p>A &#x2018;Type&#x2019; in Atlas is a definition of how a particular type of metadata objects are stored and accessed. A type represents one or a collection of attributes that define the properties for the metadata object. Users with a development background will recognize the similarity of a type to a &#x2018;Class&#x2019; definition of object oriented programming languages, or a &#x2018;table schema&#x2019; of relational databases.</p>
<p>An example of a type that comes natively defined with Atlas is a Hive table. A Hive table is defined with these attributes:</p>
<div class="source"><pre class="prettyprint">
Name: hive_table
TypeCategory: Entity
SuperTypes: DataSet
Attributes:
name: string
db: hive_db
owner: string
createTime: date
lastAccessTime: date
comment: string
retention: int
sd: hive_storagedesc
partitionKeys: array&lt;hive_column&gt;
aliases: array&lt;string&gt;
columns: array&lt;hive_column&gt;
parameters: map&lt;string,string&gt;
viewOriginalText: string
viewExpandedText: string
tableType: string
temporary: boolean
</pre></div>
<p>The following points can be noted from the above example:</p>
<p></p>
<ul>
<li>A type in Atlas is identified uniquely by a &#x2018;name&#x2019;</li>
<li>A type has a metatype. Atlas has the following metatypes:
<ul>
<li>Primitive metatypes: boolean, byte, short, int, long, float, double, biginteger, bigdecimal, string, date</li>
<li>Enum metatypes</li>
<li>Collection metatypes: array, map</li>
<li>Composite metatypes: Entity, Struct, Classification, Relationship</li></ul></li>
<li>Entity &amp; Classification types can &#x2018;extend&#x2019; from other types, called &#x2018;supertype&#x2019; - by virtue of this, it will get to include the attributes that are defined in the supertype as well. This allows modellers to define common attributes across a set of related types etc. This is again similar to the concept of how Object Oriented languages define super classes for a class. It is also possible for a type in Atlas to extend from multiple super types.
<ul>
<li>In this example, every hive table extends from a pre-defined supertype called a &#x2018;DataSet&#x2019;. More details about this pre-defined types will be provided later.</li></ul></li>
<li>Types which have a metatype of &#x2018;Entity&#x2019;, &#x2018;Struct&#x2019;, &#x2018;Classification&#x2019; or 'Relationship' can have a collection of attributes. Each attribute has a name (e.g. &#x2018;name&#x2019;) and some other associated properties. A property can be referred to using an expression type_name.attribute_name. It is also good to note that attributes themselves are defined using Atlas metatypes.
<ul>
<li>In this example, hive_table.name is a String, hive_table.aliases is an array of Strings, hive_table.db refers to an instance of a type called hive_db and so on.</li></ul></li>
<li>Type references in attributes, (like hive_table.db) are particularly interesting. Note that using such an attribute, we can define arbitrary relationships between two types defined in Atlas and thus build rich models. Note that one can also collect a list of references as an attribute type (e.g. hive_table.columns which represents a list of references from hive_table to hive_column type)</li></ul></div>
<div class="section">
<h3><a name="Entities"></a>Entities</h3>
<p>An &#x2018;entity&#x2019; in Atlas is a specific value or instance of an Entity &#x2018;type&#x2019; and thus represents a specific metadata object in the real world. Referring back to our analogy of Object Oriented Programming languages, an &#x2018;instance&#x2019; is an &#x2018;Object&#x2019; of a certain &#x2018;Class&#x2019;.</p>
<p>An example of an entity will be a specific Hive Table. Say Hive has a table called &#x2018;customers&#x2019; in the &#x2018;default&#x2019; database. This table will be an &#x2018;entity&#x2019; in Atlas of type hive_table. By virtue of being an instance of an entity type, it will have values for every attribute that are a part of the Hive table &#x2018;type&#x2019;, such as:</p>
<div class="source"><pre class="prettyprint">
guid: &quot;9ba387dd-fa76-429c-b791-ffc338d3c91f&quot;
typeName: &quot;hive_table&quot;
status: &quot;ACTIVE&quot;
values:
name: &#x201c;customers&#x201d;
db: { &quot;guid&quot;: &quot;b42c6cfc-c1e7-42fd-a9e6-890e0adf33bc&quot;, &quot;typeName&quot;: &quot;hive_db&quot; }
owner: &#x201c;admin&#x201d;
createTime: 1490761686029
updateTime: 1516298102877
comment: null
retention: 0
sd: { &quot;guid&quot;: &quot;ff58025f-6854-4195-9f75-3a3058dd8dcf&quot;, &quot;typeName&quot;: &quot;hive_storagedesc&quot; }
partitionKeys: null
aliases: null
columns: [ { &quot;guid&quot;: &quot;&quot;65e2204f-6a23-4130-934a-9679af6a211f&quot;, &quot;typeName&quot;: &quot;hive_column&quot; }, { &quot;guid&quot;: &quot;&quot;d726de70-faca-46fb-9c99-cf04f6b579a6&quot;, &quot;typeName&quot;: &quot;hive_column&quot; }, ...]
parameters: { &quot;transient_lastDdlTime&quot;: &quot;1466403208&quot;}
viewOriginalText: null
viewExpandedText: null
tableType: &#x201c;MANAGED_TABLE&#x201d;
temporary: false
</pre></div>
<p>The following points can be noted from the example above:</p>
<p></p>
<ul>
<li>Every instance ofan entity type is identified by a unique identifier, a GUID. This GUID is generated by the Atlas server when the object is defined, and remains constant for the entire lifetime of the entity. At any point in time, this particular entity can be accessed using its GUID.
<ul>
<li>In this example, the &#x2018;customers&#x2019; table in the default database is uniquely identified by the GUID &quot;9ba387dd-fa76-429c-b791-ffc338d3c91f&quot;</li></ul></li>
<li>An entity is of a given type, and the name of the type is provided with the entity definition.
<ul>
<li>In this example, the &#x2018;customers&#x2019; table is a &#x2018;hive_table.</li></ul></li>
<li>The values of this entity are a map of all the attribute names and their values for attributes that are defined in the hive_table type definition.</li>
<li>Attribute values will be according to the datatype of the attribute. Entity-type attributes will have value of type <a href="./AtlasObjectId.html">AtlasObjectId</a></li></ul>
<p>With this idea on entities, we can now see the difference between Entity and Struct metatypes. Entities and Structs both compose attributes of other types. However, instances of Entity types have an identity (with a GUID value) and can be referenced from other entities (like a hive_db entity is referenced from a hive_table entity). Instances of Struct types do not have an identity of their own. The value of a Struct type is a collection of attributes that are &#x2018;embedded&#x2019; inside the entity itself.</p></div>
<div class="section">
<h3><a name="Attributes"></a>Attributes</h3>
<p>We already saw that attributes are defined inside metatypes like Entity, Struct, Classification and Relationship. But we implistically referred to attributes as having a name and a metatype value. However, attributes in Atlas have some more properties that define more concepts related to the type system.</p>
<p>An attribute has the following properties:</p>
<div class="source"><pre class="prettyprint">
name: string,
typeName: string,
isOptional: boolean,
isIndexable: boolean,
isUnique: boolean,
cardinality: enum
</pre></div>
<p>The properties above have the following meanings:</p>
<p></p>
<ul>
<li>name - the name of the attribute</li>
<li>dataTypeName - the metatype name of the attribute (native, collection or composite)</li>
<li>isComposite -
<ul>
<li>This flag indicates an aspect of modelling. If an attribute is defined as composite, it means that it cannot have a lifecycle independent of the entity it is contained in. A good example of this concept is the set of columns that make a part of a hive table. Since the columns do not have meaning outside of the hive table, they are defined as composite attributes.</li>
<li>A composite attribute must be created in Atlas along with the entity it is contained in. i.e. A hive column must be created along with the hive table.</li></ul></li>
<li>isIndexable -
<ul>
<li>This flag indicates whether this property should be indexed on, so that look ups can be performed using the attribute value as a predicate and can be performed efficiently.</li></ul></li>
<li>isUnique -
<ul>
<li>This flag is again related to indexing. If specified to be unique, it means that a special index is created for this attribute in <a href="./JanusGraph.html">JanusGraph</a> that allows for equality based look ups.</li>
<li>Any attribute with a true value for this flag is treated like a primary key to distinguish this entity from other entities. Hence care should be taken ensure that this attribute does model a unique property in real world.
<ul>
<li>For e.g. consider the name attribute of a hive_table. In isolation, a name is not a unique attribute for a hive_table, because tables with the same name can exist in multiple databases. Even a pair of (database name, table name) is not unique if Atlas is storing metadata of hive tables amongst multiple clusters. Only a cluster location, database name and table name can be deemed unique in the physical world.</li></ul></li></ul></li>
<li>multiplicity - indicates whether this attribute is required, optional, or could be multi-valued. If an entity&#x2019;s definition of the attribute value does not match the multiplicity declaration in the type definition, this would be a constraint violation and the entity addition will fail. This field can therefore be used to define some constraints on the metadata information.</li></ul>
<p>Using the above, let us expand on the attribute definition of one of the attributes of the hive table below. Let us look at the attribute called &#x2018;db&#x2019; which represents the database to which the hive table belongs:</p>
<div class="source"><pre class="prettyprint">
db:
&quot;name&quot;: &quot;db&quot;,
&quot;typeName&quot;: &quot;hive_db&quot;,
&quot;isOptional&quot;: false,
&quot;isIndexable&quot;: true,
&quot;isUnique&quot;: false,
&quot;cardinality&quot;: &quot;SINGLE&quot;
</pre></div>
<p>Note the &#x201c;isOptional=true&#x201d; constraint - a table entity cannot be created without a db reference.</p>
<div class="source"><pre class="prettyprint">
columns:
&quot;name&quot;: &quot;columns&quot;,
&quot;typeName&quot;: &quot;array&lt;hive_column&gt;&quot;,
&quot;isOptional&quot;: optional,
&quot;isIndexable&quot;: true,
&#x201c;isUnique&quot;: false,
&quot;constraints&quot;: [ { &quot;type&quot;: &quot;ownedRef&quot; } ]
</pre></div>
<p>Note the &#x201c;ownedRef&#x201d; constraint for columns. By doing this, we are indicating that the defined column entities should always be bound to the table entity they are defined with.</p>
<p>From this description and examples, you will be able to realize that attribute definitions can be used to influence specific modelling behavior (constraints, indexing, etc) to be enforced by the Atlas system.</p></div>
<div class="section">
<h3><a name="System_specific_types_and_their_significance"></a>System specific types and their significance</h3>
<p>Atlas comes with a few pre-defined system types. We saw one example (DataSet) in preceding sections. In this section we will see more of these types and understand their significance.</p>
<p><b>Referenceable</b>: This type represents all entities that can be searched for using a unique attribute called qualifiedName.</p>
<p><b>Asset</b>: This type extends Referenceable and adds attributes like name, description and owner. Name is a required attribute (isOptional=false), the others are optional.</p>
<p>The purpose of Referenceable and Asset is to provide modellers with way to enforce consistency when defining and querying entities of their own types. Having these fixed set of attributes allows applications and user interfaces to make convention based assumptions about what attributes they can expect of types by default.</p>
<p><b>Infrastructure</b>: This type extends Asset and typically can be used to be a common super type for infrastructural metadata objects like clusters, hosts etc.</p>
<p><b>DataSet</b>: This type extends Referenceable. Conceptually, it can be used to represent an type that stores data. In Atlas, hive tables, hbase_tables etc are all types that extend from DataSet. Types that extend DataSet can be expected to have a Schema in the sense that they would have an attribute that defines attributes of that dataset. For e.g. the columns attribute in a hive_table. Also entities of types that extend DataSet participate in data transformation and this transformation can be captured by Atlas via lineage (or provenance) graphs.</p>
<p><b>Process</b>: This type extends Asset. Conceptually, it can be used to represent any data transformation operation. For example, an ETL process that transforms a hive table with raw data to another hive table that stores some aggregate can be a specific type that extends the Process type. A Process type has two specific attributes, inputs and outputs. Both inputs and outputs are arrays of DataSet entities. Thus an instance of a Process type can use these inputs and outputs to capture how the lineage of a DataSet evolves.</p></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row">
<p><a href="https://www.apache.org/foundation/contributing"><img src="https://www.apache.org/images/SupportApache-small.png" alt="Support the ASF" id="asf-logo" height="20" width="20" /></a>Copyright © 2011-2018 The Apache Software Foundation. Licensed under the <a href="https://www.apache.org/licenses/">Apache License, Version 2.0</a>.<br/>
Apache Atlas, Atlas, Apache, the Apache feather logo are trademarks of the <a href="https://www.apache.org">Apache Software Foundation</a>.<br/>
All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
<p id="poweredBy" class="pull-right"><a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /></a>
</p>
</div>
</footer>
</body>
</html>