blob: 32a3bea637a80211c142a4cc84732f1276d162c1 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia Site Renderer 1.8 from src/site/twiki/Bridge-Sqoop.twiki at 2018-10-31
| Rendered using Apache Maven Fluido Skin 1.7
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20181031" />
<meta http-equiv="Content-Language" content="en" />
<title>Apache Atlas &#x2013; Sqoop Atlas Bridge</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.7.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.7.min.js"></script>
</head>
<body class="topBarEnabled">
<div id="topbar" class="navbar navbar-fixed-top ">
<div class="navbar-inner">
<div class="container" style="width: 68%;"><div class="nav-collapse">
<ul class="nav">
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Atlas <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="index.html" title="About">About</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS" title="Wiki">Wiki</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS" title="News">News</a></li>
<li><a href="https://git-wip-us.apache.org/repos/asf/atlas.git" title="Git">Git</a></li>
<li><a href="https://issues.apache.org/jira/browse/ATLAS" title="Jira">Jira</a></li>
<li><a href="https://cwiki.apache.org/confluence/display/ATLAS/PoweredBy" title="Powered by">Powered by</a></li>
<li><a href="http://blogs.apache.org/atlas/" title="Blog">Blog</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Project Information <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="project-info.html" title="Summary">Summary</a></li>
<li><a href="mail-lists.html" title="Mailing Lists">Mailing Lists</a></li>
<li><a href="http://webchat.freenode.net?channels=apacheatlas&uio=d4" title="IRC">IRC</a></li>
<li><a href="team-list.html" title="Team">Team</a></li>
<li><a href="issue-tracking.html" title="Issue Tracking">Issue Tracking</a></li>
<li><a href="source-repository.html" title="Source Repository">Source Repository</a></li>
<li><a href="license.html" title="License">License</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Releases <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/dyn/closer.cgi/atlas/0.8.2/" title="0.8.2">0.8.2</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.8.1/" title="0.8.1">0.8.1</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.8.0-incubating/" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.7.1-incubating/" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.7.0-incubating/" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.6.0-incubating/" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="http://archive.apache.org/dist/incubator/atlas/0.5.0-incubating/" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="../index.html" title="latest">latest</a></li>
<li><a href="../0.8.2/index.html" title="0.8.2">0.8.2</a></li>
<li><a href="../0.8.1/index.html" title="0.8.1">0.8.1</a></li>
<li><a href="../0.8.0-incubating/index.html" title="0.8-incubating">0.8-incubating</a></li>
<li><a href="../0.7.1-incubating/index.html" title="0.7.1-incubating">0.7.1-incubating</a></li>
<li><a href="../0.7.0-incubating/index.html" title="0.7-incubating">0.7-incubating</a></li>
<li><a href="../0.6.0-incubating/index.html" title="0.6-incubating">0.6-incubating</a></li>
<li><a href="../0.5.0-incubating/index.html" title="0.5-incubating">0.5-incubating</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">ASF <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="http://www.apache.org/foundation/how-it-works.html" title="How Apache Works">How Apache Works</a></li>
<li><a href="http://www.apache.org/foundation/" title="Foundation">Foundation</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html" title="Sponsoring Apache">Sponsoring Apache</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html" title="Thanks">Thanks</a></li>
</ul>
</li>
</ul>
<form id="search-form" action="https://www.google.com/search" method="get" class="navbar-search pull-right" >
<input value="http://atlas.apache.org" name="sitesearch" type="hidden"/>
<input class="search-query" name="q" id="query" type="text" />
</form>
<script type="text/javascript">asyncJs( 'https://cse.google.com/brand?form=search-form' )</script>
<iframe src="https://www.facebook.com/plugins/like.php?href=http://atlas.apache.org/atlas-docs&send=false&layout=button_count&show-faces=false&action=like&colorscheme=dark"
scrolling="no" frameborder="0"
style="border:none; width:100px; height:20px; margin-top: 10px;" class="pull-right" ></iframe>
<script type="text/javascript">asyncJs( 'https://apis.google.com/js/plusone.js' )</script>
<ul class="nav pull-right"><li style="margin-top: 10px;">
<div class="g-plusone" data-href="http://atlas.apache.org/atlas-docs" data-size="medium" width="60px" align="right" ></div>
</li></ul>
</div>
</div>
</div>
</div>
<div class="container">
<div id="banner">
<div class="pull-left"><a href=".." id="bannerLeft"><img src="images/atlas-logo.png" alt="Apache Atlas" width="200px" height="45px"/></a></div>
<div class="pull-right"></div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class=""><a href="http://www.apache.org" class="externalLink" title="Apache">Apache</a><span class="divider">/</span></li>
<li class=""><a href="index.html" title="Atlas">Atlas</a><span class="divider">/</span></li>
<li class="active ">Sqoop Atlas Bridge</li>
<li id="publishDate" class="pull-right"><span class="divider">|</span> Last Published: 2018-10-31</li>
<li id="projectVersion" class="pull-right">Version: 0.8.3</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2><a name="Sqoop_Atlas_Bridge"></a>Sqoop Atlas Bridge</h2></div>
<div class="section">
<h3><a name="Sqoop_Model"></a>Sqoop Model</h3>
<p>The default hive model includes the following types:</p>
<ul>
<li>Entity types:
<ul>
<li>sqoop_process
<ul>
<li>super-types: Process</li>
<li>attributes: name, operation, dbStore, hiveTable, commandlineOpts, startTime, endTime, userName</li></ul></li>
<li>sqoop_dbdatastore
<ul>
<li>super-types: DataSet</li>
<li>attributes: name, dbStoreType, storeUse, storeUri, source, description, ownerName</li></ul></li></ul></li></ul>
<p></p>
<ul>
<li>Enum types:
<ul>
<li>sqoop_operation_type
<ul>
<li>values: IMPORT, EXPORT, EVAL</li></ul></li>
<li>sqoop_dbstore_usage
<ul>
<li>values: TABLE, QUERY, PROCEDURE, OTHER</li></ul></li></ul></li></ul>
<p>The entities are created and de-duped using unique qualified name. They provide namespace and can be used for querying as well:</p>
<ul>
<li>sqoop_process.qualifiedName - dbStoreType-storeUri-endTime</li>
<li>sqoop_dbdatastore.qualifiedName - dbStoreType-storeUri-source</li></ul></div>
<div class="section">
<h3><a name="Sqoop_Hook"></a>Sqoop Hook</h3>
<p>Sqoop added a SqoopJobDataPublisher that publishes data to Atlas after completion of import Job. Today, only hiveImport is supported in SqoopHook. This is used to add entities in Atlas using the model detailed above.</p>
<p>Follow the instructions below to setup Atlas hook in Hive:</p>
<p>Add the following properties to to enable Atlas hook in Sqoop:</p>
<ul>
<li>Set-up Atlas hook in &lt;sqoop-conf&gt;/sqoop-site.xml by adding the following:</li></ul>
<div class="source"><pre class="prettyprint">
&lt;property&gt;
&lt;name&gt;sqoop.job.data.publish.class&lt;/name&gt;
&lt;value&gt;org.apache.atlas.sqoop.hook.SqoopHook&lt;/value&gt;
&lt;/property&gt;
</pre></div>
<p></p>
<ul>
<li>Copy &lt;atlas-conf&gt;/atlas-application.properties to to the sqoop conf directory &lt;sqoop-conf&gt;/</li>
<li>Link &lt;atlas-home&gt;/hook/sqoop/*.jar in sqoop lib</li></ul>
<p>Refer <a href="./Configuration.html">Configuration</a> for notification related configurations</p></div>
<div class="section">
<h3><a name="NOTES"></a>NOTES</h3>
<p></p>
<ul>
<li>Only the following sqoop operations are captured by sqoop hook currently - hiveImport</li></ul></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row">
Copyright © 2018 The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
</div>
<p id="poweredBy" class="pull-right"><a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"><img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /></a>
</p>
</div>
</footer>
</body>
</html>