blob: 52f337030189d1bdc8677e60b94a1362bfba3d30 [file] [log] [blame]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Apache Aurora</title>
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.1/css/bootstrap.min.css">
<link href="/assets/css/main.css" rel="stylesheet">
<!-- Analytics -->
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-45879646-1']);
_gaq.push(['_setDomainName', 'apache.org']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
</head>
<body>
<div class="container-fluid section-header">
<div class="container">
<div class="nav nav-bar">
<a href="/"><img src="/assets/img/aurora_logo_dkbkg.svg" width="300" alt="Transparent Apache Aurora logo with dark background"/></a>
<ul class="nav navbar-nav navbar-right">
<li><a href="/documentation/latest/">Documentation</a></li>
<li><a href="/community/">Community</a></li>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/blog/">Blog</a></li>
</ul>
</div>
</div>
</div>
<div class="container-fluid">
<div class="container content">
<div class="col-md-12 documentation">
<h5 class="page-header text-uppercase">Documentation
<select onChange="window.location.href='/documentation/' + this.value + '/operations/backup-restore/'"
value="0.21.0">
<option value="0.22.0"
>
0.22.0
(latest)
</option>
<option value="0.21.0"
selected="selected">
0.21.0
</option>
<option value="0.20.0"
>
0.20.0
</option>
<option value="0.19.1"
>
0.19.1
</option>
<option value="0.19.0"
>
0.19.0
</option>
<option value="0.18.1"
>
0.18.1
</option>
<option value="0.18.0"
>
0.18.0
</option>
<option value="0.17.0"
>
0.17.0
</option>
<option value="0.16.0"
>
0.16.0
</option>
<option value="0.15.0"
>
0.15.0
</option>
<option value="0.14.0"
>
0.14.0
</option>
<option value="0.13.0"
>
0.13.0
</option>
<option value="0.12.0"
>
0.12.0
</option>
<option value="0.11.0"
>
0.11.0
</option>
<option value="0.10.0"
>
0.10.0
</option>
<option value="0.9.0"
>
0.9.0
</option>
<option value="0.8.0"
>
0.8.0
</option>
<option value="0.7.0-incubating"
>
0.7.0-incubating
</option>
<option value="0.6.0-incubating"
>
0.6.0-incubating
</option>
<option value="0.5.0-incubating"
>
0.5.0-incubating
</option>
</select>
</h5>
<h1 id="recovering-from-a-scheduler-backup">Recovering from a Scheduler Backup</h1>
<p><strong>Be sure to read the entire page before attempting to restore from a backup, as it may have
unintended consequences.</strong></p>
<h2 id="summary">Summary</h2>
<p>The restoration procedure replaces the existing (possibly corrupted) Mesos replicated log with an
earlier, backed up, version and requires all schedulers to be taken down temporarily while
restoring. Once completed, the scheduler state resets to what it was when the backup was created.
This means any jobs/tasks created or updated after the backup are unknown to the scheduler and will
be killed shortly after the cluster restarts. All other tasks continue operating as normal.</p>
<p>Usually, it is a bad idea to restore a backup that is not extremely recent (i.e. older than a few
hours). This is because the scheduler will expect the cluster to look exactly as the backup does,
so any tasks that have been rescheduled since the backup was taken will be killed.</p>
<p>Instructions below have been verified in <a href="../../getting-started/vagrant/">Vagrant environment</a> and with minor
syntax/path changes should be applicable to any Aurora cluster.</p>
<p>Follow these steps to prepare the cluster for restoring from a backup:</p>
<h2 id="preparation">Preparation</h2>
<ul>
<li><p>Stop all scheduler instances.</p></li>
<li><p>Pick a backup to use for rehydrating the mesos-replicated log. Backups can be found in the
directory given to the scheduler as the <code>-backup_dir</code> argument. Backups are stored in the format
<code>scheduler-backup-&lt;yyyy-MM-dd-HH-mm&gt;</code>.</p></li>
<li><p>If running the Aurora Scheduler in HA mode, pick a single scheduler instance to rehydrate.</p></li>
<li><p>Locate the <code>recovery-tool</code> in your setup. If Aurora was installed using a Debian package
generated by our <code>aurora-packaging</code> script, the recovery tool can be found
in <code>/usr/share/aurora/bin/recovery-tool</code>.</p></li>
</ul>
<h2 id="cleanup">Cleanup</h2>
<ul>
<li><p>Delete (or move) the Mesos replicated log path for each scheduler instance. The location of the
Mesos replicated log file path can be found by looking at the value given to the flag
<code>-native_log_file_path</code> for each instance.</p></li>
<li><p>Initialize the Mesos replicated log files using the mesos-log tool:
<code>
sudo su -u &lt;USER&gt; mesos-log initialize --path=&lt;native_log_file_path&gt;
</code>
Where <code>USER</code> is the user under which the scheduler instance will be run. For installations using
Debian packages, the default user will be <code>aurora</code>. You may alternatively choose to specify
a group as well by passing the <code>-g &lt;GROUP&gt;</code> option to <code>su</code>.
Note that if the user under which the Aurora scheduler instance is run <em>does not</em> have permissions
to read this directory and the files it contains, the instance will fail to start.</p></li>
</ul>
<h2 id="restore-from-backup">Restore from backup</h2>
<ul>
<li>Run the <code>recovery-tool</code>. Wherever the flags match those used for the scheduler instance,
use the same values:
<code>
$ recovery-tool -from BACKUP \
-to LOG \
-backup=&lt;selected_backup_location&gt; \
-native_log_zk_group_path=&lt;native_log_zk_group_path&gt; \
-native_log_file_path=&lt;native_log_file_path&gt; \
-zk_endpoints=&lt;zk_endpoints&gt;
</code></li>
</ul>
<h2 id="bring-scheduler-instances-back-online">Bring scheduler instances back online</h2>
<h3 id="if-running-in-ha-mode">If running in HA Mode</h3>
<ul>
<li><p>Start the rehydrated scheduler instance along with enough cleaned up instances to
meet the <code>-native_log_quorum_size</code>. The mesos-replicated log algorithm will replenish
the &ldquo;blank&rdquo; scheduler instances with the information from the rehydrated instance.</p></li>
<li><p>Start any remaining scheduler instances.</p></li>
</ul>
<h3 id="if-running-in-singleton-mode">If running in singleton mode</h3>
<ul>
<li>Start the single scheduler instance.</li>
</ul>
</div>
</div>
</div>
<div class="container-fluid section-footer buffer">
<div class="container">
<div class="row">
<div class="col-md-2 col-md-offset-1"><h3>Quick Links</h3>
<ul>
<li><a href="/downloads/">Downloads</a></li>
<li><a href="/community/">Mailing Lists</a></li>
<li><a href="http://issues.apache.org/jira/browse/AURORA">Issue Tracking</a></li>
<li><a href="/documentation/latest/contributing/">How To Contribute</a></li>
</ul>
</div>
<div class="col-md-2"><h3>The ASF</h3>
<ul>
<li><a href="http://www.apache.org/licenses/">License</a></li>
<li><a href="http://www.apache.org/foundation/sponsorship.html">Sponsorship</a></li>
<li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
<li><a href="http://www.apache.org/security/">Security</a></li>
</ul>
</div>
<div class="col-md-6">
<p class="disclaimer">&copy; 2014-2017 <a href="http://www.apache.org/">Apache Software Foundation</a>. Licensed under the <a href="http://www.apache.org/licenses/">Apache License v2.0</a>. The <a href="https://www.flickr.com/photos/trondk/12706051375/">Aurora Borealis IX photo</a> displayed on the homepage is available under a <a href="https://creativecommons.org/licenses/by-nc-nd/2.0/">Creative Commons BY-NC-ND 2.0 license</a>. Apache, Apache Aurora, and the Apache feather logo are trademarks of The Apache Software Foundation.</p>
</div>
</div>
</div>
</body>
</html>