blob: cd76edefcdae19ee58170b5ffcf45ad78f646d33 [file] [log] [blame]
<!DOCTYPE html>
<!--
| Generated by Apache Maven Doxia at 2018-03-12
| Rendered using Apache Maven Fluido Skin 1.3.0
-->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="Date-Revision-yyyymmdd" content="20180312" />
<meta http-equiv="Content-Language" content="en" />
<title>Falcon - HDFS Snapshot based Mirroring</title>
<link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
<link rel="stylesheet" href="./css/site.css" />
<link rel="stylesheet" href="./css/print.css" media="print" />
<script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script>
<script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script>
</head>
<body class="topBarDisabled">
<div class="container-fluid">
<div id="banner">
<div class="pull-left">
<a href="../index.html" id="bannerLeft">
<img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/>
</a>
</div>
<div class="pull-right"> <a href="http://www.apache.org" id="bannerRight">
<img src="images/apache-feather-tm.gif" alt="Falcon" height="45px"/>
</a>
</div>
<div class="clear"><hr/></div>
</div>
<div id="breadcrumbs">
<ul class="breadcrumb">
<li class="">
<a href="http://www.apache.org" class="externalLink" title="Apache">
Apache</a>
</li>
<li class="divider ">/</li>
<li class="">
<a href="index.html" title="Falcon">
Falcon</a>
</li>
<li class="divider ">/</li>
<li class="">HDFS Snapshot based Mirroring</li>
<li id="publishDate" class="pull-right">Last Published: 2018-03-12</li>
</ul>
</div>
<div class="row-fluid">
<div id="leftColumn" class="span3">
<div class="well sidebar-nav">
<ul class="nav nav-list">
<li class="nav-header">Falcon</li>
<li>
<a href="index.html" title="About">
<i class="none"></i>
About</a>
</li>
<li>
<a href="slides/falcon-overview.html" title="Overview">
<i class="none"></i>
Overview</a>
</li>
<li>
<a href="slides/falcon-user-guide.html" title="User Guide">
<i class="none"></i>
User Guide</a>
</li>
<li>
<a href="GettingStarted.html" title="Getting Started">
<i class="none"></i>
Getting Started</a>
</li>
<li>
<a href="FalconDocumentation.html" title="Architecture">
<i class="none"></i>
Architecture</a>
</li>
<li>
<a href="InstallationSteps.html" title="Installation">
<i class="none"></i>
Installation</a>
</li>
<li>
<a href="OnBoarding.html" title="On Boarding">
<i class="none"></i>
On Boarding</a>
</li>
<li>
<a href="MigrationInstructions.html" title="Migrate to 0.10">
<i class="none"></i>
Migrate to 0.10</a>
</li>
<li>
<a href="Operability.html" title="Operability">
<i class="none"></i>
Operability</a>
</li>
<li>
<a href="EntitySpecification.html" title="Entity Specification">
<i class="none"></i>
Entity Specification</a>
</li>
<li>
<a href="falconcli/FalconCLI.html" title="Client (Falcon CLI)">
<i class="none"></i>
Client (Falcon CLI)</a>
</li>
<li>
<a href="restapi/ResourceList.html" title="Rest API">
<i class="icon-chevron-right"></i>
Rest API</a>
</li>
<li>
<a href="HiveIntegration.html" title="Hive Integration">
<i class="none"></i>
Hive Integration</a>
</li>
<li>
<a href="Extensions.html" title="Server side Extensions">
<i class="none"></i>
Server side Extensions</a>
</li>
<li>
<a href="Security.html" title="Security">
<i class="none"></i>
Security</a>
</li>
<li class="nav-header">Project Information</li>
<li>
<a href="project-info.html" title="Summary">
<i class="none"></i>
Summary</a>
</li>
<li>
<a href="mail-lists.html" title="Mailing Lists">
<i class="none"></i>
Mailing Lists</a>
</li>
<li>
<a href="http://webchat.freenode.net?channels=apachefalcon&uio=d4" class="externalLink" title="IRC">
<i class="none"></i>
IRC</a>
</li>
<li>
<a href="team-list.html" title="Team">
<i class="none"></i>
Team</a>
</li>
<li>
<a href="issue-tracking.html" title="Issue Tracking">
<i class="none"></i>
Issue Tracking</a>
</li>
<li>
<a href="source-repository.html" title="Source Repository">
<i class="none"></i>
Source Repository</a>
</li>
<li>
<a href="https://cwiki.apache.org/confluence/display/FALCON/Index" class="externalLink" title="Wiki">
<i class="none"></i>
Wiki</a>
</li>
<li>
<a href="license.html" title="License">
<i class="none"></i>
License</a>
</li>
<li>
<a href="https://cwiki.apache.org/confluence/display/FALCON/News" class="externalLink" title="News">
<i class="none"></i>
News</a>
</li>
<li>
<a href="https://cwiki.apache.org/confluence/display/FALCON/PoweredBy" class="externalLink" title="Powered by">
<i class="none"></i>
Powered by</a>
</li>
<li>
<a href="https://cwiki.apache.org/confluence/display/FALCON/Acknowledgements" class="externalLink" title="Acknowledgements">
<i class="none"></i>
Acknowledgements</a>
</li>
<li>
<a href="http://blogs.apache.org/falcon/" class="externalLink" title="Blog">
<i class="none"></i>
Blog</a>
</li>
<li class="nav-header">Releases</li>
<li>
<a href="http://www.apache.org/dyn/closer.lua/falcon/0.11" class="externalLink" title="0.11">
<i class="none"></i>
0.11</a>
</li>
<li>
<a href="http://www.apache.org/dyn/closer.lua/falcon/0.10" class="externalLink" title="0.10">
<i class="none"></i>
0.10</a>
</li>
<li>
<a href="http://www.apache.org/dyn/closer.lua/falcon/0.9" class="externalLink" title="0.9">
<i class="none"></i>
0.9</a>
</li>
<li>
<a href="http://www.apache.org/dyn/closer.lua/falcon/0.8" class="externalLink" title="0.8">
<i class="none"></i>
0.8</a>
</li>
<li>
<a href="http://www.apache.org/dyn/closer.lua/falcon/0.7" class="externalLink" title="0.7">
<i class="none"></i>
0.7</a>
</li>
<li>
<a href="http://archive.apache.org/dist/falcon/0.6.1" class="externalLink" title="0.6.1">
<i class="none"></i>
0.6.1</a>
</li>
<li>
<a href="http://archive.apache.org/dist/incubator/falcon/0.6-incubating" class="externalLink" title="0.6-incubating">
<i class="none"></i>
0.6-incubating</a>
</li>
<li>
<a href="http://archive.apache.org/dist/incubator/falcon/0.5-incubating" class="externalLink" title="0.5-incubating">
<i class="none"></i>
0.5-incubating</a>
</li>
<li>
<a href="http://archive.apache.org/dist/incubator/falcon/0.4-incubating" class="externalLink" title="0.4-incubating">
<i class="none"></i>
0.4-incubating</a>
</li>
<li>
<a href="http://archive.apache.org/dist/incubator/falcon/0.3-incubating" class="externalLink" title="0.3-incubating">
<i class="none"></i>
0.3-incubating</a>
</li>
<li>
<a href="https://cwiki.apache.org/confluence/display/FALCON/Roadmap" class="externalLink" title="Coming soon">
<i class="none"></i>
Coming soon</a>
</li>
<li class="nav-header">Documentation</li>
<li>
<a href="0.11/index.html" title="0.11 (Current)">
<i class="none"></i>
0.11 (Current)</a>
</li>
<li>
<a href="0.10/index.html" title="0.10">
<i class="none"></i>
0.10</a>
</li>
<li>
<a href="0.9/index.html" title="0.9">
<i class="none"></i>
0.9</a>
</li>
<li>
<a href="0.8/index.html" title="0.8">
<i class="none"></i>
0.8</a>
</li>
<li>
<a href="0.7/index.html" title="0.7">
<i class="none"></i>
0.7</a>
</li>
<li>
<a href="0.6.1/index.html" title="0.6.1">
<i class="none"></i>
0.6.1</a>
</li>
<li>
<a href="0.6-incubating/index.html" title="0.6-incubating">
<i class="none"></i>
0.6-incubating</a>
</li>
<li>
<a href="0.5-incubating/index.html" title="0.5-incubating">
<i class="none"></i>
0.5-incubating</a>
</li>
<li>
<a href="0.4-incubating/index.html" title="0.4-incubating">
<i class="none"></i>
0.4-incubating</a>
</li>
<li>
<a href="0.3-incubating/index.html" title="0.3-incubating">
<i class="none"></i>
0.3-incubating</a>
</li>
<li class="nav-header">ASF</li>
<li>
<a href="http://www.apache.org/foundation/how-it-works.html" class="externalLink" title="How Apache Works">
<i class="none"></i>
How Apache Works</a>
</li>
<li>
<a href="http://www.apache.org/foundation/" class="externalLink" title="Foundation">
<i class="none"></i>
Foundation</a>
</li>
<li>
<a href="http://www.apache.org/foundation/sponsorship.html" class="externalLink" title="Sponsoring Apache">
<i class="none"></i>
Sponsoring Apache</a>
</li>
<li>
<a href="http://www.apache.org/foundation/thanks.html" class="externalLink" title="Thanks">
<i class="none"></i>
Thanks</a>
</li>
</ul>
<hr class="divider" />
<div id="poweredBy">
<div class="clear"></div>
<div class="clear"></div>
<div class="clear"></div>
<a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy">
<img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" />
</a>
</div>
</div>
</div>
<div id="bodyColumn" class="span9" >
<div class="section">
<h2>HDFS Snapshot based Mirroring<a name="HDFS_Snapshot_based_Mirroring"></a></h2></div>
<div class="section">
<h3>Overview<a name="Overview"></a></h3>
<p>HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes for cost effective HDFS mirroring.</p></div>
<div class="section">
<h3>Prerequisites<a name="Prerequisites"></a></h3>
<p>Following is the prerequisite to use HDFS Snapshot based Mirrroring.</p>
<p></p>
<ul>
<li>Hadoop version 2.7.0 or higher.</li>
<li>User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories.</li></ul></div>
<div class="section">
<h3>Use Case<a name="Use_Case"></a></h3>
<p>Create and manage snapshots on source/target directories. Mirror data from source to target for disaster recovery using these snapshots. Perform retention on the snapshots created on source and target.</p></div>
<div class="section">
<h3>Usage<a name="Usage"></a></h3></div>
<div class="section">
<h4>Setup<a name="Setup"></a></h4>
<p></p>
<ul>
<li>Submit a source cluster and target cluster entities to Falcon.</li></ul>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml
$FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml
</pre></div>
<p></p>
<ul>
<li>Ensure that source directory on source cluster and target directory on target cluster exists.</li>
<li>Ensure that these dirs are snapshot-able by user submitting extension. You can find more <a class="externalLink" href="https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">information on snapshots here</a>.</li></ul></div>
<div class="section">
<h4>HDFS Snapshot based mirroring extension properties<a name="HDFS_Snapshot_based_mirroring_extension_properties"></a></h4>
<p>Extension artifacts are expected to be installed on HDFS at the path specified by &quot;extension.store.uri&quot; in startup properties. hdfs-snapshot-mirroring-properties.json file located at &quot;&lt;extension.store.uri&gt;/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json&quot; lists all the required and optional parameters/arguments for scheduling the mirroring job.</p>
<p>Here is a sample set of properties,</p>
<div class="source">
<pre>
## Job Properties
jobName=hdfs-snapshot-test
jobClusterName=backupCluster
jobValidityStart=2016-01-01T00:00Z
jobValidityEnd=2016-04-01T00:00Z
jobFrequency=hours(12)
jobTimezone=UTC
jobTags=consumer=consumer@xyz.com
jobRetryPolicy=periodic
jobRetryDelay=minutes(30)
jobRetryAttempts=3
## Job owner
jobAclOwner=ambari-qa
jobAclGroup=users
jobAclPermission=*
## Source information
sourceCluster=primaryCluster
sourceSnapshotDir=/apps/falcon/snapshots/source/
sourceSnapshotRetentionPolicy=delete
sourceSnapshotRetentionAgeLimit=days(15)
sourceSnapshotRetentionNumber=10
## Target information
targetCluster=backupCluster
targetSnapshotDir=/apps/falcon/snapshots/target/
targetSnapshotRetentionPolicy=delete
targetSnapshotRetentionAgeLimit=months(6)
targetSnapshotRetentionNumber=20
## Distcp properties
distcpMaxMaps=1
distcpMapBandwidth=100
tdeEncryptionEnabled=false
</pre></div>
<p>The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours.</p>
<ul>
<li>Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster.</li>
<li>DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster.</li>
<li>Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster.</li>
<li>Perform retention job on source and target.
<ul>
<li>Maintain at least N latest snapshots and delete all other snapshots older than specified age limit.</li>
<li>Today, only &quot;delete&quot; policy is supported for snapshot retention.</li></ul></li></ul>
<p><b>Note:</b> When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful for creating and maintaining snapshots.</p></div>
<div class="section">
<h4>Submit and schedule HDFS snapshot mirroring extension<a name="Submit_and_schedule_HDFS_snapshot_mirroring_extension"></a></h4>
<p>User can submit extension using CLI or RestAPI. CLI command looks as follows</p>
<div class="source">
<pre>
$FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt
</pre></div>
<p>Please Refer to <a href="./Falconcli/FalconCLI.html">Falcon CLI</a> and <a href="./Restapi/ResourceList.html">REST API</a> for more details on usage of CLI and REST API's.</p></div>
</div>
</div>
</div>
<hr/>
<footer>
<div class="container-fluid">
<div class="row span12">Copyright &copy; 2013-2018
<a href="http://www.apache.org">Apache Software Foundation</a>.
All Rights Reserved.
</div>
</div>
</footer>
</body>
</html>