blob: 041f3d63ebcdd46b6efc663b6a0f9bd33b03e8da [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia at 2018-03-12
| Rendered using Apache Maven Fluido Skin 1.3.0
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20180312" />
<meta http-equiv="Content-Language" content="en" />
<title>Falcon - Hive Mirroring</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
</head>
<body class="topBarDisabled">
<div class="container">
<div id="banner">
<div class="pull-left">
<div id="bannerLeft">
<img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/>
</div>
</div>
<div class="pull-right"> </div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class="">
<a href="index.html" title="Falcon">
Falcon</a>
</li>
<li class="divider ">/</li>
<li class="">Hive Mirroring</li>
<li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li>
<li id="projectVersion" class="pull-right">Version: 0.11</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2>Hive Mirroring<a name="Hive_Mirroring"></a></h2></div>
<div class="section">
<h3>Overview<a name="Overview"></a></h3>
<p>Falcon provides feature to replicate Hive metadata and data events from source cluster to destination cluster. This is supported for both secure and unsecure cluster through Falcon extensions.</p></div>
<div class="section">
<h3>Prerequisites<a name="Prerequisites"></a></h3>
<p>Following is the prerequisites to use Hive Mirrroring</p>
<p></p>
<ul>
<li><b>Hive 1.2.0+</b></li>
<li><b>Oozie 4.2.0+</b></li></ul>
<p><b>Note:</b> Set following properties in hive-site.xml for replicating the Hive events on source and destination Hive cluster:</p>
<div class="source">
<pre>
&lt;property&gt;
&lt;name&gt;hive.metastore.event.listeners&lt;/name&gt;
&lt;value&gt;org.apache.hive.hcatalog.listener.DbNotificationListener&lt;/value&gt;
&lt;description&gt;event listeners that are notified of any metastore changes&lt;/description&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;hive.metastore.dml.events&lt;/name&gt;
&lt;value&gt;true&lt;/value&gt;
&lt;/property&gt;
</pre></div></div>
<div class="section">
<h3>Use Case<a name="Use_Case"></a></h3>
<p>* Replicate data/metadata of Hive DB &amp; table from source to target cluster</p></div>
<div class="section">
<h3>Limitations<a name="Limitations"></a></h3>
<p>* Currently Hive doesn't support create database, roles, views, offline tables, direct HDFS writes without registering with metadata and Database/Table name mapping replication events. Hence Hive mirroring extension cannot be used to replicate above mentioned events between warehouses.</p></div>
<div class="section">
<h3>Usage<a name="Usage"></a></h3></div>
<div class="section">
<h4>Bootstrap<a name="Bootstrap"></a></h4>
<p>Perform initial bootstrap of Table and Database from source cluster to destination cluster</p>
<ul>
<li><b>Database Bootstrap</b></li></ul>For bootstrapping DB replication, first destination DB should be created. This step is expected, since DB replication definitions can be set up by users only on pre-existing DB&#xe2;&#x80;&#x99;s. Second, Export all tables in the source db and Import it in the destination db, as described in Table bootstrap.
<p></p>
<ul>
<li><b>Table Bootstrap</b></li></ul>For bootstrapping table replication, essentially after having turned on the DbNotificationListener on the source db, perform an Export of the table, distcp the Export over to the destination warehouse and do an Import over there. Check the following <a class="externalLink" href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport">Hive Export-Import</a> for syntax details and examples. This will set up the destination table so that the events on the source cluster that modify the table will then be replicated.</div>
<div class="section">
<h4>Setup source and destination clusters<a name="Setup_source_and_destination_clusters"></a></h4>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon entity -submit -type cluster -file /cluster/definition.xml
</pre></div></div>
<div class="section">
<h4>Hive mirroring extension properties<a name="Hive_mirroring_extension_properties"></a></h4>
<p>Extension artifacts are expected to be installed on HDFS at the path specified by &quot;extension.store.uri&quot; in startup properties. hive-mirroring-properties.json file located at &quot;&lt;extension.store.uri&gt;/hive-mirroring/META/hive-mirroring-properties.json&quot; lists all the required and optional parameters/arguments for scheduling Hive mirroring job.</p></div>
<div class="section">
<h4>Submit and schedule Hive mirroring extension<a name="Submit_and_schedule_Hive_mirroring_extension"></a></h4>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hive-mirroring -file /process/definition.xml
</pre></div>
<p>Please Refer to <a href="./Falconcli/FalconCLI.html">Falcon CLI</a> and <a href="./Restapi/ResourceList.html">REST API</a> for more details on usage of CLI and REST API's.</p></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row span12">Copyright &copy; 2013-2018
<a href="http://www.apache.org">Apache Software Foundation</a>.
All Rights Reserved.
</div>
<p id="poweredBy" class="pull-right">
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
</a>
</p>
</div>
</footer>
</body>
</html>