| <!DOCTYPE html> |
| <!-- |
| | Generated by Apache Maven Doxia at 2018-03-12 |
| | Rendered using Apache Maven Fluido Skin 1.3.0 |
| --> |
| <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| <head> |
| <meta charset="UTF-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> |
| <meta name="Date-Revision-yyyymmdd" content="20180312" /> |
| <meta http-equiv="Content-Language" content="en" /> |
| <title>Falcon - HDFS Snapshot based Mirroring</title> |
| <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" /> |
| <link rel="stylesheet" href="./css/site.css" /> |
| <link rel="stylesheet" href="./css/print.css" media="print" /> |
| |
| |
| <script type="text/javascript" src="./js/apache-maven-fluido-1.3.0.min.js"></script> |
| |
| |
| |
| <script type="text/javascript">$( document ).ready( function() { $( '.carousel' ).carousel( { interval: 3500 } ) } );</script> |
| |
| </head> |
| <body class="topBarDisabled"> |
| |
| |
| |
| |
| <div class="container"> |
| <div id="banner"> |
| <div class="pull-left"> |
| <div id="bannerLeft"> |
| <img src="images/falcon-logo.png" alt="Apache Falcon" width="200px" height="45px"/> |
| </div> |
| </div> |
| <div class="pull-right"> </div> |
| <div class="clear"><hr/></div> |
| </div> |
| |
| <div id="breadcrumbs"> |
| <ul class="breadcrumb"> |
| |
| |
| <li class=""> |
| <a href="index.html" title="Falcon"> |
| Falcon</a> |
| </li> |
| <li class="divider ">/</li> |
| <li class="">HDFS Snapshot based Mirroring</li> |
| |
| |
| |
| <li id="publishDate" class="pull-right">Last Published: 2018-03-12</li> <li class="divider pull-right">|</li> |
| <li id="projectVersion" class="pull-right">Version: 0.11</li> |
| |
| </ul> |
| </div> |
| |
| |
| |
| <div id="bodyColumn" > |
| |
| <div class="section"> |
| <h2>HDFS Snapshot based Mirroring<a name="HDFS_Snapshot_based_Mirroring"></a></h2></div> |
| <div class="section"> |
| <h3>Overview<a name="Overview"></a></h3> |
| <p>HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR). This makes for cost effective HDFS mirroring.</p></div> |
| <div class="section"> |
| <h3>Prerequisites<a name="Prerequisites"></a></h3> |
| <p>Following is the prerequisite to use HDFS Snapshot based Mirrroring.</p> |
| <p></p> |
| <ul> |
| <li>Hadoop version 2.7.0 or higher.</li> |
| <li>User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories.</li></ul></div> |
| <div class="section"> |
| <h3>Use Case<a name="Use_Case"></a></h3> |
| <p>Create and manage snapshots on source/target directories. Mirror data from source to target for disaster recovery using these snapshots. Perform retention on the snapshots created on source and target.</p></div> |
| <div class="section"> |
| <h3>Usage<a name="Usage"></a></h3></div> |
| <div class="section"> |
| <h4>Setup<a name="Setup"></a></h4> |
| <p></p> |
| <ul> |
| <li>Submit a source cluster and target cluster entities to Falcon.</li></ul> |
| <div class="source"> |
| <pre> |
| $FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml |
| $FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml |
| |
| </pre></div> |
| <p></p> |
| <ul> |
| <li>Ensure that source directory on source cluster and target directory on target cluster exists.</li> |
| <li>Ensure that these dirs are snapshot-able by user submitting extension. You can find more <a class="externalLink" href="https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html">information on snapshots here</a>.</li></ul></div> |
| <div class="section"> |
| <h4>HDFS Snapshot based mirroring extension properties<a name="HDFS_Snapshot_based_mirroring_extension_properties"></a></h4> |
| <p>Extension artifacts are expected to be installed on HDFS at the path specified by "extension.store.uri" in startup properties. hdfs-snapshot-mirroring-properties.json file located at "<extension.store.uri>/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json" lists all the required and optional parameters/arguments for scheduling the mirroring job.</p> |
| <p>Here is a sample set of properties,</p> |
| <div class="source"> |
| <pre> |
| ## Job Properties |
| jobName=hdfs-snapshot-test |
| jobClusterName=backupCluster |
| jobValidityStart=2016-01-01T00:00Z |
| jobValidityEnd=2016-04-01T00:00Z |
| jobFrequency=hours(12) |
| jobTimezone=UTC |
| jobTags=consumer=consumer@xyz.com |
| jobRetryPolicy=periodic |
| jobRetryDelay=minutes(30) |
| jobRetryAttempts=3 |
| |
| ## Job owner |
| jobAclOwner=ambari-qa |
| jobAclGroup=users |
| jobAclPermission=* |
| |
| ## Source information |
| sourceCluster=primaryCluster |
| sourceSnapshotDir=/apps/falcon/snapshots/source/ |
| sourceSnapshotRetentionPolicy=delete |
| sourceSnapshotRetentionAgeLimit=days(15) |
| sourceSnapshotRetentionNumber=10 |
| |
| ## Target information |
| targetCluster=backupCluster |
| targetSnapshotDir=/apps/falcon/snapshots/target/ |
| targetSnapshotRetentionPolicy=delete |
| targetSnapshotRetentionAgeLimit=months(6) |
| targetSnapshotRetentionNumber=20 |
| |
| ## Distcp properties |
| distcpMaxMaps=1 |
| distcpMapBandwidth=100 |
| tdeEncryptionEnabled=false |
| |
| </pre></div> |
| <p>The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours.</p> |
| <ul> |
| <li>Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster.</li> |
| <li>DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster.</li> |
| <li>Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster.</li> |
| <li>Perform retention job on source and target. |
| <ul> |
| <li>Maintain at least N latest snapshots and delete all other snapshots older than specified age limit.</li> |
| <li>Today, only "delete" policy is supported for snapshot retention.</li></ul></li></ul> |
| <p><b>Note:</b> When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful for creating and maintaining snapshots.</p></div> |
| <div class="section"> |
| <h4>Submit and schedule HDFS snapshot mirroring extension<a name="Submit_and_schedule_HDFS_snapshot_mirroring_extension"></a></h4> |
| <p>User can submit extension using CLI or RestAPI. CLI command looks as follows</p> |
| <div class="source"> |
| <pre> |
| $FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt |
| |
| </pre></div> |
| <p>Please Refer to <a href="./Falconcli/FalconCLI.html">Falcon CLI</a> and <a href="./Restapi/ResourceList.html">REST API</a> for more details on usage of CLI and REST API's.</p></div> |
| </div> |
| </div> |
| |
| <hr/> |
| |
| <footer> |
| <div class="container"> |
| <div class="row span12">Copyright © 2013-2018 |
| <a href="http://www.apache.org">Apache Software Foundation</a>. |
| All Rights Reserved. |
| |
| </div> |
| |
| |
| <p id="poweredBy" class="pull-right"> |
| <a href="http://maven.apache.org/" title="Built by Maven" class="poweredBy"> |
| <img class="builtBy" alt="Built by Maven" src="./images/logos/maven-feather.png" /> |
| </a> |
| </p> |
| |
| </div> |
| </footer> |
| </body> |
| </html> |