| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="utf-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>Apache Aurora</title> |
| <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css"> |
| <link href="/assets/css/main.css" rel="stylesheet"> |
| <!-- Analytics --> |
| <script type="text/javascript"> |
| var _gaq = _gaq || []; |
| _gaq.push(['_setAccount', 'UA-45879646-1']); |
| _gaq.push(['_setDomainName', 'apache.org']); |
| _gaq.push(['_trackPageview']); |
| |
| (function() { |
| var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; |
| ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; |
| var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); |
| })(); |
| </script> |
| </head> |
| <body> |
| <div class="container-fluid section-header"> |
| <div class="container"> |
| <div class="nav nav-bar"> |
| <a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a> |
| <ul class="nav navbar-nav navbar-right"> |
| <li><a href="/documentation/latest/">Documentation</a></li> |
| <li><a href="/community/">Community</a></li> |
| <li><a href="/downloads/">Downloads</a></li> |
| <li><a href="/blog/">Blog</a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| |
| <div class="container-fluid"> |
| <div class="container content"> |
| <div class="col-md-12 documentation"> |
| <h5 class="page-header text-uppercase">Documentation |
| <select onChange="window.location.href='/documentation/' + this.value + '/operations/backup-restore/'" |
| value="0.21.0"> |
| <option value="0.22.0" |
| > |
| 0.22.0 |
| (latest) |
| </option> |
| <option value="0.21.0" |
| selected="selected"> |
| 0.21.0 |
| </option> |
| <option value="0.20.0" |
| > |
| 0.20.0 |
| </option> |
| <option value="0.19.1" |
| > |
| 0.19.1 |
| </option> |
| <option value="0.19.0" |
| > |
| 0.19.0 |
| </option> |
| <option value="0.18.1" |
| > |
| 0.18.1 |
| </option> |
| <option value="0.18.0" |
| > |
| 0.18.0 |
| </option> |
| <option value="0.17.0" |
| > |
| 0.17.0 |
| </option> |
| <option value="0.16.0" |
| > |
| 0.16.0 |
| </option> |
| <option value="0.15.0" |
| > |
| 0.15.0 |
| </option> |
| <option value="0.14.0" |
| > |
| 0.14.0 |
| </option> |
| <option value="0.13.0" |
| > |
| 0.13.0 |
| </option> |
| <option value="0.12.0" |
| > |
| 0.12.0 |
| </option> |
| <option value="0.11.0" |
| > |
| 0.11.0 |
| </option> |
| <option value="0.10.0" |
| > |
| 0.10.0 |
| </option> |
| <option value="0.9.0" |
| > |
| 0.9.0 |
| </option> |
| <option value="0.8.0" |
| > |
| 0.8.0 |
| </option> |
| <option value="0.7.0-incubating" |
| > |
| 0.7.0-incubating |
| </option> |
| <option value="0.6.0-incubating" |
| > |
| 0.6.0-incubating |
| </option> |
| <option value="0.5.0-incubating" |
| > |
| 0.5.0-incubating |
| </option> |
| </select> |
| </h5> |
| <h1 id="recovering-from-a-scheduler-backup">Recovering from a Scheduler Backup</h1> |
| |
| <p><strong>Be sure to read the entire page before attempting to restore from a backup, as it may have |
| unintended consequences.</strong></p> |
| |
| <h2 id="summary">Summary</h2> |
| |
| <p>The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an |
| earlier, backed up, version and requires all schedulers to be taken down temporarily while |
| restoring. Once completed, the scheduler state resets to what it was when the backup was created. |
| This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will |
| be killed shortly after the cluster restarts. All other tasks continue operating as normal.</p> |
| |
| <p>Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few |
| hours). This is because the scheduler will expect the cluster to look exactly as the backup does, |
| so any tasks that have been rescheduled since the backup was taken will be killed.</p> |
| |
| <p>Instructions below have been verified in <a href="../../getting-started/vagrant/">Vagrant environment</a> and with minor |
| syntax/path changes should be applicable to any Aurora cluster.</p> |
| |
| <p>Follow these steps to prepare the cluster for restoring from a backup:</p> |
| |
| <h2 id="preparation">Preparation</h2> |
| |
| <ul> |
| <li><p>Stop all scheduler instances.</p></li> |
| <li><p>Pick a backup to use for rehydrating the mesos-replicated log. Backups can be found in the |
| directory given to the scheduler as the <code>-backup_dir</code> argument. Backups are stored in the format |
| <code>scheduler-backup-<yyyy-MM-dd-HH-mm></code>.</p></li> |
| <li><p>If running the Aurora Scheduler in HA mode, pick a single scheduler instance to rehydrate.</p></li> |
| <li><p>Locate the <code>recovery-tool</code> in your setup. If Aurora was installed using a Debian package |
| generated by our <code>aurora-packaging</code> script, the recovery tool can be found |
| in <code>/usr/share/aurora/bin/recovery-tool</code>.</p></li> |
| </ul> |
| |
| <h2 id="cleanup">Cleanup</h2> |
| |
| <ul> |
| <li><p>Delete (or move) the Mesos replicated log path for each scheduler instance. The location of the |
| Mesos replicated log file path can be found by looking at the value given to the flag |
| <code>-native_log_file_path</code> for each instance.</p></li> |
| <li><p>Initialize the Mesos replicated log files using the mesos-log tool: |
| <code> |
| sudo su -u <USER> mesos-log initialize --path=<native_log_file_path> |
| </code> |
| Where <code>USER</code> is the user under which the scheduler instance will be run. For installations using |
| Debian packages, the default user will be <code>aurora</code>. You may alternatively choose to specify |
| a group as well by passing the <code>-g <GROUP></code> option to <code>su</code>. |
| Note that if the user under which the Aurora scheduler instance is run <em>does not</em> have permissions |
| to read this directory and the files it contains, the instance will fail to start.</p></li> |
| </ul> |
| |
| <h2 id="restore-from-backup">Restore from backup</h2> |
| |
| <ul> |
| <li>Run the <code>recovery-tool</code>. Wherever the flags match those used for the scheduler instance, |
| use the same values: |
| <code> |
| $ recovery-tool -from BACKUP \ |
| -to LOG \ |
| -backup=<selected_backup_location> \ |
| -native_log_zk_group_path=<native_log_zk_group_path> \ |
| -native_log_file_path=<native_log_file_path> \ |
| -zk_endpoints=<zk_endpoints> |
| </code></li> |
| </ul> |
| |
| <h2 id="bring-scheduler-instances-back-online">Bring scheduler instances back online</h2> |
| |
| <h3 id="if-running-in-ha-mode">If running in HA Mode</h3> |
| |
| <ul> |
| <li><p>Start the rehydrated scheduler instance along with enough cleaned up instances to |
| meet the <code>-native_log_quorum_size</code>. The mesos-replicated log algorithm will replenish |
| the “blank” scheduler instances with the information from the rehydrated instance.</p></li> |
| <li><p>Start any remaining scheduler instances.</p></li> |
| </ul> |
| |
| <h3 id="if-running-in-singleton-mode">If running in singleton mode</h3> |
| |
| <ul> |
| <li>Start the single scheduler instance.</li> |
| </ul> |
| |
| </div> |
| |
| </div> |
| </div> |
| <div class="container-fluid section-footer buffer"> |
| <div class="container"> |
| <div class="row"> |
| <div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3> |
| <ul> |
| <li><a href="/downloads/">Downloads</a></li> |
| <li><a href="/community/">Mailing Lists</a></li> |
| <li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li> |
| <li><a href="/documentation/latest/contributing/">How To Contribute</a></li> |
| </ul> |
| </div> |
| <div class="col-md-2"><h3>The ASF</h3> |
| <ul> |
| <li><a href="http://www.apache.org/licenses/">License</a></li> |
| <li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li> |
| <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li> |
| <li><a href="http://www.apache.org/security/">Security</a></li> |
| </ul> |
| </div> |
| <div class="col-md-6"> |
| <p class="disclaimer">© 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p> |
| </div> |
| </div> |
| </div> |
| |
| </body> |
| </html> |