blob: 29f02a26c2d144f2fb428a425fd68ef06ced6048 [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- Generated by Apache Maven Doxia Site Renderer 1.3 at Jul 9, 2015 -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Apache Atlas Hive Bridge - Hive Atlas Bridge</title>
<style type="text/css" media="all">
@import url("./css/maven-base.css");
@import url("./css/maven-theme.css");
@import url("./css/site.css");
</style>
<link rel="stylesheet" href="./css/print.css" type="text/css" media="print" />
<meta name="Date-Revision-yyyymmdd" content="20150709" />
<meta http-equiv="Content-Language" content="en" />
</head>
<body class="composite">
<div id="banner">
<div id="bannerLeft">
Apache Atlas Hive Bridge
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="breadcrumbs">
<div class="xleft">
<span id="publishDate">Last Published: 2015-07-09</span>
&nbsp;| <span id="projectVersion">Version: 0.6-incubating-SNAPSHOT</span>
</div>
<div class="xright"> <a href="./" title="Apache Atlas Hive Bridge">Apache Atlas Hive Bridge</a>
</div>
<div class="clear">
<hr/>
</div>
</div>
<div id="leftColumn">
<div id="navcolumn">
<h5>Parent Project</h5>
<ul>
<li class="none">
<a href="../../index.html" title="apache-atlas">apache-atlas</a>
</li>
</ul>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="poweredBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
</a>
</div>
</div>
<div id="bodyColumn">
<div id="contentBox">
<div class="section"><h2>Hive Atlas Bridge<a name="Hive_Atlas_Bridge"></a></h2><p>Hive metadata can be modelled in Atlas using its Type System. The default modelling is available in org.apache.atlas.hive.model.HiveDataModelGenerator. It defines the following types:</p><ul><li>hive_resource_type(EnumType) - [JAR, FILE, ARCHIVE]</li><li>hive_principal_type(EnumType) - [USER, ROLE, GROUP]</li><li>hive_function_type(EnumType) - [JAVA]</li><li>hive_order(StructType) - [col, order]</li><li>hive_resourceuri(StructType) - [resourceType, uri]</li><li>hive_serde(StructType) - [name, serializationLib, parameters]</li><li>hive_process(ClassType) - [name, startTime, endTime, userName, sourceTableNames, targetTableNames, queryText, queryPlan, queryId, queryGraph]</li><li>hive_function(ClassType) - [functionName, dbName, className, ownerName, ownerType, createTime, functionType, resourceUris]</li><li>hive_type(ClassType) - [name, type1, type2, fields]</li><li>hive_partition(ClassType) - [values, dbName, tableName, createTime, lastAccessTime, sd, parameters]</li><li>hive_storagedesc(ClassType) - [cols, location, inputFormat, outputFormat, compressed, numBuckets, serdeInfo, bucketCols, sortCols, parameters, storedAsSubDirectories]</li><li>hive_index(ClassType) - [indexName, indexHandlerClass, dbName, createTime, lastAccessTime, origTableName, indexTableName, sd, parameters, deferredRebuild]</li><li>hive_role(ClassType) - [roleName, createTime, ownerName]</li><li>hive_column(ClassType) - [name, type, comment]</li><li>hive_db(ClassType) - [name, description, locationUri, parameters, ownerName, ownerType]</li><li>hive_table(ClassType) - [name, dbName, owner, createTime, lastAccessTime, retention, sd, partitionKeys, columns, parameters, viewOriginalText, viewExpandedText, tableType, temporary]</li></ul></div><div class="section"><h3>Importing Hive Metadata<a name="Importing_Hive_Metadata"></a></h3><p>org.apache.atlas.hive.bridge.HiveMetaStoreBridge imports the hive metadata into Atlas using the typesystem defined in org.apache.atlas.hive.model.HiveDataModelGenerator. import-hive.sh command can be used to facilitate this. Set-up the following configs in hive-site.xml of your hive set-up and set environment variable HIVE_CONFIG to the hive conf directory:</p><ul><li>Atlas endpoint - Add the following property with the Atlas endpoint for your set-up</li></ul><div class="source"><pre>
&lt;property&gt;
&lt;name&gt;atlas.rest.address&lt;/name&gt;
&lt;value&gt;http://localhost:21000/&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;atlas.cluster.name&lt;/name&gt;
&lt;value&gt;primary&lt;/value&gt;
&lt;/property&gt;
</pre></div><p>Usage: &lt;dgi package&gt;/bin/import-hive.sh. The logs are in &lt;dgi package&gt;/logs/import-hive.log</p></div><div class="section"><h3>Hive Hook<a name="Hive_Hook"></a></h3><p>Hive supports listeners on hive command execution using hive hooks. This is used to add/update/remove entities in Atlas using the model defined in org.apache.atlas.hive.model.HiveDataModelGenerator. The hook submits the request to a thread pool executor to avoid blocking the command execution. Follow the these instructions in your hive set-up to add hive hook for Atlas:</p><ul><li>Add org.apache.atlas.hive.hook.HiveHook as post execution hook in hive-site.xml</li></ul><div class="source"><pre>
&lt;property&gt;
&lt;name&gt;hive.exec.post.hooks&lt;/name&gt;
&lt;value&gt;org.apache.atlas.hive.hook.HiveHook&lt;/value&gt;
&lt;/property&gt;
</pre></div><p></p><ul><li>Add the following properties in hive-ste.xml with the Atlas endpoint for your set-up</li></ul><div class="source"><pre>
&lt;property&gt;
&lt;name&gt;atlas.rest.address&lt;/name&gt;
&lt;value&gt;http://localhost:21000/&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;atlas.cluster.name&lt;/name&gt;
&lt;value&gt;primary&lt;/value&gt;
&lt;/property&gt;
</pre></div><p></p><ul><li>Add 'export HIVE_AUX_JARS_PATH=&lt;dgi package&gt;/hook/hive' in hive-env.sh</li></ul><p>The following properties in hive-site.xml control the thread pool details:</p><ul><li>atlas.hook.hive.minThreads - core number of threads. default 5</li><li>atlas.hook.hive.maxThreads - maximum number of threads. default 5</li><li>atlas.hook.hive.keepAliveTime - keep alive time in msecs. default 10</li><li>atlas.hook.hive.synchronous - boolean, true to run the hook synchronously. default false</li></ul></div><div class="section"><h3>Limitations<a name="Limitations"></a></h3><p></p><ul><li>Since database name, table name and column names are case insensitive in hive, the corresponding names in entities are lowercase. So, any search APIs should use lowercase while querying on the entity names</li><li>Only the following hive operations are captured by hive hook currently - create database, create table, create view, CTAS, load, import, export, query, alter table rename and alter view rename</li></ul></div>
</div>
</div>
<div class="clear">
<hr/>
</div>
<div id="footer">
<div class="xright">
Copyright &#169; 2015
<a href="http://www.apache.org">Apache Software Foundation</a>.
All Rights Reserved.
</div>
<div class="clear">
<hr/>
</div>
</div>
</body>
</html>