blob: 63f6d3e372582ad310ace26dee543d3d88a87fcd [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia at 2018-03-12
| Rendered using Apache Maven Fluido Skin 1.3.0
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20180312" />
<meta http-equiv="Content-Language" content="en" />
<title>Falcon - HDFS Snapshot based Mirroring</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
</head>
<body class="topBarDisabled">
<div class="container">
<div id="banner">
<div class="pull-left">
<div id="bannerLeft">
<img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/>
</div>
</div>
<div class="pull-right"> </div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class="">
<a href="index.html" title="Falcon">
Falcon</a>
</li>
<li class="divider ">/</li>
<li class="">HDFS Snapshot based Mirroring</li>
<li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li>
<li id="projectVersion" class="pull-right">Version: 0.11</li>
</ul>
</div>
<div id="bodyColumn" >
<div class="section">
<h2>HDFS Snapshot based Mirroring<a name="HDFS_Snapshot_based_Mirroring"></a></h2></div>
<div class="section">
<h3>Overview<a name="Overview"></a></h3>
<p>HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes for cost effective HDFS mirroring.</p></div>
<div class="section">
<h3>Prerequisites<a name="Prerequisites"></a></h3>
<p>Following is the prerequisite to use HDFS Snapshot based Mirrroring.</p>
<p></p>
<ul>
<li>Hadoop version 2.7.0 or higher.</li>
<li>User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories.</li></ul></div>
<div class="section">
<h3>Use Case<a name="Use_Case"></a></h3>
<p>Create and manage snapshots on source/target directories. Mirror data from source to target for disaster recovery using these snapshots. Perform retention on the snapshots created on source and target.</p></div>
<div class="section">
<h3>Usage<a name="Usage"></a></h3></div>
<div class="section">
<h4>Setup<a name="Setup"></a></h4>
<p></p>
<ul>
<li>Submit a source cluster and target cluster entities to Falcon.</li></ul>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml
$FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml
</pre></div>
<p></p>
<ul>
<li>Ensure that source directory on source cluster and target directory on target cluster exists.</li>
<li>Ensure that these dirs are snapshot-able by user submitting extension. You can find more <a class="externalLink" href="https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">information on snapshots here</a>.</li></ul></div>
<div class="section">
<h4>HDFS Snapshot based mirroring extension properties<a name="HDFS_Snapshot_based_mirroring_extension_properties"></a></h4>
<p>Extension artifacts are expected to be installed on HDFS at the path specified by &quot;extension.store.uri&quot; in startup properties. hdfs-snapshot-mirroring-properties.json file located at &quot;&lt;extension.store.uri&gt;/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json&quot; lists all the required and optional parameters/arguments for scheduling the mirroring job.</p>
<p>Here is a sample set of properties,</p>
<div class="source">
<pre>
## Job Properties
jobName=hdfs-snapshot-test
jobClusterName=backupCluster
jobValidityStart=2016-01-01T00:00Z
jobValidityEnd=2016-04-01T00:00Z
jobFrequency=hours(12)
jobTimezone=UTC
jobTags=consumer=consumer@xyz.com
jobRetryPolicy=periodic
jobRetryDelay=minutes(30)
jobRetryAttempts=3
## Job owner
jobAclOwner=ambari-qa
jobAclGroup=users
jobAclPermission=*
## Source information
sourceCluster=primaryCluster
sourceSnapshotDir=/apps/falcon/snapshots/source/
sourceSnapshotRetentionPolicy=delete
sourceSnapshotRetentionAgeLimit=days(15)
sourceSnapshotRetentionNumber=10
## Target information
targetCluster=backupCluster
targetSnapshotDir=/apps/falcon/snapshots/target/
targetSnapshotRetentionPolicy=delete
targetSnapshotRetentionAgeLimit=months(6)
targetSnapshotRetentionNumber=20
## Distcp properties
distcpMaxMaps=1
distcpMapBandwidth=100
tdeEncryptionEnabled=false
</pre></div>
<p>The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours.</p>
<ul>
<li>Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster.</li>
<li>DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster.</li>
<li>Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster.</li>
<li>Perform retention job on source and target.
<ul>
<li>Maintain at least N latest snapshots and delete all other snapshots older than specified age limit.</li>
<li>Today, only &quot;delete&quot; policy is supported for snapshot retention.</li></ul></li></ul>
<p><b>Note:</b> When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful for creating and maintaining snapshots.</p></div>
<div class="section">
<h4>Submit and schedule HDFS snapshot mirroring extension<a name="Submit_and_schedule_HDFS_snapshot_mirroring_extension"></a></h4>
<p>User can submit extension using CLI or RestAPI. CLI command looks as follows</p>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt
</pre></div>
<p>Please Refer to <a href="./Falconcli/FalconCLI.html">Falcon CLI</a> and <a href="./Restapi/ResourceList.html">REST API</a> for more details on usage of CLI and REST API's.</p></div>
</div>
</div>
<hr/>
<footer>
<div class="container">
<div class="row span12">Copyright &copy; 2013-2018
<a href="http://www.apache.org">Apache Software Foundation</a>.
All Rights Reserved.
</div>
<p id="poweredBy" class="pull-right">
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
</a>
</p>
</div>
</footer>
</body>
</html>