blob: f6e9354bd13daf90ddd6cb9973fb74c3afc393c1 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/twiki/StormAtlasHook.twiki at 2018-10-31
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20181031" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache Atlas &#x2013; Storm Atlas Bridge</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.7.min.js"></script>
</head>
<body class="topBarEnabled">
<div id="topbar" class="navbar navbar-fixed-top ">
<div class="navbar-inner">
<div class="container" style="width: 68%;"><div class="nav-collapse">
<ul class="nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Atlas <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="index.html" title="About">About</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS" title="Wiki">Wiki</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS" title="News">News</a></li>
<li><a href="https://git-wip-us.apache.org/repos/asf/atlas.git" title="Git">Git</a></li>
<li><a href="https://issues.apache.org/jira/browse/ATLAS" title="Jira">Jira</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS/PoweredBy" title="Powered by">Powered by</a></li>
<li><a href="http://blogs.apache.org/atlas/" title="Blog">Blog</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="project-info.html" title="Summary">Summary</a></li>
<li><a href="mail-lists.html" title="Mailing Lists">Mailing Lists</a></li>
<li><a href="http://webchat.freenode.net?channels=apacheatlas&uio=d4" title="IRC">IRC</a></li>
<li><a href="team-list.html" title="Team">Team</a></li>
<li><a href="issue-tracking.html" title="Issue Tracking">Issue Tracking</a></li>
<li><a href="source-repository.html" title="Source Repository">Source Repository</a></li>
<li><a href="license.html" title="License">License</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Releases <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/dyn/closer.cgi/atlas/0.8.2/" title="0.8.2">0.8.2</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.8.1/" title="0.8.1">0.8.1</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.8.0-incubating/" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.7.1-incubating/" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.7.0-incubating/" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.6.0-incubating/" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.5.0-incubating/" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="../index.html" title="latest">latest</a></li>
<li><a href="../0.8.2/index.html" title="0.8.2">0.8.2</a></li>
<li><a href="../0.8.1/index.html" title="0.8.1">0.8.1</a></li>
<li><a href="../0.8.0-incubating/index.html" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="../0.7.1-incubating/index.html" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="../0.7.0-incubating/index.html" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="../0.6.0-incubating/index.html" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="../0.5.0-incubating/index.html" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/foundation/how-it-works.html" title="How Apache Works">How Apache Works</a></li>
<li><a href="http://www.apache.org/foundation/" title="Foundation">Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsoring Apache">Sponsoring Apache</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
</ul>
</li>
</ul>
<form id="search-form" action="https://www.google.com/search" method="get" class="navbar-search pull-right" >
<input value="http://atlas.apache.org" name="sitesearch" type="hidden"/>
<input class="search-query" name="q" id="query" type="text" />
</form>
<script type="text/javascript">asyncJs( 'https://cse.google.com/brand?form=search-form' )</script>
<iframe src="https://www.facebook.com/plugins/like.php?href=http://atlas.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
scrolling="no" frameborder="0"
style="border:none; width:100px; height:20px; margin-top: 10px;" class="pull-right" ></iframe>
<script type="text/javascript">asyncJs( 'https://apis.google.com/js/plusone.js' )</script>
<ul class="nav pull-right"><li style="margin-top: 10px;">
<div class="g-plusone" data-href="http://atlas.apache.org/atlas-docs" data-size="medium" width="60px" align="right" ></div>
</li></ul>
</div>
</div>
</div>
</div>
<div class="container">
<div id="banner">
<div class="pull-left"><a href=".." id="bannerLeft"><img src="images/atlas-logo.png" alt="Apache Atlas" width="200px" height="45px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="index.html" title="Atlas">Atlas</a><span class="divider">/</span></li>
<li class="active ">Storm Atlas Bridge</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2018-10-31</li>
<li id="projectVersion" class="pull-right">Version: 0.8.3</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2><a name="Storm_Atlas_Bridge"></a>Storm Atlas Bridge</h2></div>
<div class="section">
<h3><a name="Introduction"></a>Introduction</h3>
<p>Apache Storm is a distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. The process is essentially a DAG of nodes, which is called <b>topology</b>.</p>
<p>Apache Atlas is a metadata repository that enables end-to-end data lineage, search and associate business classification.</p>
<p>The goal of this integration is to push the operational topology metadata along with the underlying data source(s), target(s), derivation processes and any available business context so Atlas can capture the lineage for this topology.</p>
<p>There are 2 parts in this process detailed below:</p>
<ul>
<li>Data model to represent the concepts in Storm</li>
<li>Storm Atlas Hook to update metadata in Atlas</li></ul></div>
<div class="section">
<h3><a name="Storm_Data_Model"></a>Storm Data Model</h3>
<p>A data model is represented as Types in Atlas. It contains the descriptions of various nodes in the topology graph, such as spouts and bolts and the corresponding producer and consumer types.</p>
<p>The following types are added in Atlas.</p>
<p></p>
<ul>
<li>storm_topology - represents the coarse-grained topology. A storm_topology derives from an Atlas Process type and hence can be used to inform Atlas about lineage.</li>
<li>Following data sets are added - kafka_topic, jms_topic, hbase_table, hdfs_data_set. These all derive from an Atlas Dataset type and hence form the end points of a lineage graph.</li>
<li>storm_spout - Data Producer having outputs, typically Kafka, JMS</li>
<li>storm_bolt - Data Consumer having inputs and outputs, typically Hive, HBase, HDFS, etc.</li></ul>
<p>The Storm Atlas hook auto registers dependent models like the Hive data model if it finds that these are not known to the Atlas server.</p>
<p>The data model for each of the types is described in the class definition at org.apache.atlas.storm.model.StormDataModel.</p></div>
<div class="section">
<h3><a name="Storm_Atlas_Hook"></a>Storm Atlas Hook</h3>
<p>Atlas is notified when a new topology is registered successfully in Storm. Storm provides a hook, backtype.storm.ISubmitterHook, at the Storm client used to submit a storm topology.</p>
<p>The Storm Atlas hook intercepts the hook post execution and extracts the metadata from the topology and updates Atlas using the types defined. Atlas implements the Storm client hook interface in org.apache.atlas.storm.hook.StormAtlasHook.</p></div>
<div class="section">
<h3><a name="Limitations"></a>Limitations</h3>
<p>The following apply for the first version of the integration.</p>
<p></p>
<ul>
<li>Only new topology submissions are registered with Atlas, any lifecycle changes are not reflected in Atlas.</li>
<li>The Atlas server needs to be online when a Storm topology is submitted for the metadata to be captured.</li>
<li>The Hook currently does not support capturing lineage for custom spouts and bolts.</li></ul></div>
<div class="section">
<h3><a name="Installation"></a>Installation</h3>
<p>The Storm Atlas Hook needs to be manually installed in Storm on the client side. The hook artifacts are available at: $ATLAS_PACKAGE/hook/storm</p>
<p>Storm Atlas hook jars need to be copied to $STORM_HOME/extlib. Replace STORM_HOME with storm installation path.</p>
<p>Restart all daemons after you have installed the atlas hook into Storm.</p></div>
<div class="section">
<h3><a name="Configuration"></a>Configuration</h3></div>
<div class="section">
<h4><a name="Storm_Configuration"></a>Storm Configuration</h4>
<p>The Storm Atlas Hook needs to be configured in Storm client config in <b>$STORM_HOME/conf/storm.yaml</b> as:</p>
<div class="source"><pre class="prettyprint">
storm.topology.submission.notifier.plugin.class: &quot;org.apache.atlas.storm.hook.StormAtlasHook&quot;
</pre></div>
<p>Also set a 'cluster name' that would be used as a namespace for objects registered in Atlas. This name would be used for namespacing the Storm topology, spouts and bolts.</p>
<p>The other objects like data sets should ideally be identified with the cluster name of the components that generate them. For e.g. Hive tables and databases should be identified using the cluster name set in Hive. The Storm Atlas hook will pick this up if the Hive configuration is available in the Storm topology jar that is submitted on the client and the cluster name is defined there. This happens similarly for HBase data sets. In case this configuration is not available, the cluster name set in the Storm configuration will be used.</p>
<div class="source"><pre class="prettyprint">
atlas.cluster.name: &quot;cluster_name&quot;
</pre></div>
<p>In <b>$STORM_HOME/conf/storm_env.ini</b>, set an environment variable as follows:</p>
<div class="source"><pre class="prettyprint">
STORM_JAR_JVM_OPTS:&quot;-Datlas.conf=$ATLAS_HOME/conf/&quot;
</pre></div>
<p>where ATLAS_HOME is pointing to where ATLAS is installed.</p>
<p>You could also set this up programatically in Storm Config as:</p>
<div class="source"><pre class="prettyprint">
Config stormConf = new Config();
...
stormConf.put(Config.STORM_TOPOLOGY_SUBMISSION_NOTIFIER_PLUGIN,
org.apache.atlas.storm.hook.StormAtlasHook.class.getName());
</pre></div></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row">
Copyright © 2018 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
</div>
<p id="poweredBy" class="pull-right"><a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /></a>
</p>
</div>
</footer>
</body>
</html>