blob: aab14700a1b787e8bcd06e3b39efc1a513b4cdfe [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<html>
<head>
<title>Accumulo Administration</title>
<link rel='stylesheet' type='text/css' href='documentation.css' media='screen'/>
</head>
<body>
<h1>Apache Accumulo Documentation : Administration</h1>
<h3>Starting accumulo for the first time</h3>
<p>For the most part, accumulo is ready to go out of the box. To start it, first you must distribute and install
the accumulo software to each machine in the cloud that you wish to run on. The software should be installed
in the same directory on each machine and configured identically (or at least similarly... see the configuration
sections for more details). Select one machine to be your bootstrap machine, the one that you will start accumulo
with. Note that you must have passphrase-less ssh access to each machine from your bootstrap machine. On this machine,
create a conf/masters and conf/slaves file. In the masters file, type the hostname of the machine you wish to run the master on (probably localhost).
In the slaves file, type the hostnames, separated by newlines of each machine you wish to participate in accumulo as a tablet server. If you neglect
to create these files, the startup scripts will assume you are trying to run on localhost only, and will instantiate a single-node instance only.
It is probably a good idea to back up these files, or distribute them to the other nodes as well, so that you can easily boot up accumulo
from another machine, if necessary. You can also make create a <code>conf/accumulo-env.sh</code> file if you want to configure any custom environment variables.
<p>Once properly configured, you can initialize or prepare an instance of accumulo by running: <code>bin/accumulo&nbsp;init</code><br />
Follow the prompts and you are ready to go. This step only prepares accumulo to run, it does not start up accumulo.
<h3>Starting accumulo</h3>
<p>Once you have configured accumulo to your liking, and distributed the appropriate configuration to each machine, you can start accumulo with
bin/start-all.sh. If at any time, you wish to bring accumulo servers online after one or more have been shutdown, you can run bin/start-all.sh again.
This step will only start services that are not already running. Be aware that if you run this command on more than one machine, you may unintentionally
start an extra copy of the garbage collector service and the monitoring service, since each of these will run on the server on which you run this script.
<h3>Stopping accumulo</h3>
<p>Similar to the start-all.sh script, we provide a bin/stop-all.sh script to shut down accumulo. This will prompt for the root password so that it can
ask the master to shut down the tablet servers gracefully. If the tablet servers do not respond, or the master takes too long, you can force a shutdown by hitting Ctrl-C
at the password prompt, and waiting 15 seconds for the script to force a shutdown. Normally, once the shutdown happens gracefully, unresponsive tablet servers are
forcibly shut down after 5 seconds.
<h3>Adding a Node</h3>
<p>Update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the addition; at a minimum this needs to be on the host(s) being added, but in practice it's good to ensure consistent configuration across all nodes.</p>
<pre>
$ACCUMULO_HOME/bin/accumulo admin start &lt;host(s)&gt; {&lt;host&gt; ...}
</pre>
<p>Alternatively, you can ssh to each of the hosts you want to add and run <code>$ACCUMULO_HOME/bin/start-here.sh</code>.</p>
<p>Make sure the host in question has the new configuration, or else the tablet server won't start.</p>
<h3>Decomissioning a Node</h3>
<p>If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet server. Accumulo will automatically rebalance the tablets across the available tablet servers.</p>
<pre>
$ACCUMULO_HOME/bin/accumulo admin stop &lt;host(s)&gt; {&lt;host&gt; ...}
</pre>
<p>Alternatively, you can ssh to each of the hosts you want to remove and run <code>$ACCUMULO_HOME/bin/stop-here.sh</code>.</p>
<p>Be sure to update your <code>$ACCUMULO_HOME/conf/slaves</code> (or <code>$ACCUMULO_CONF_DIR/slaves</code>) file to account for the removal of these hosts. Bear in mind that the monitor will not re-read the slaves file automatically, so it will report the decomissioned servers as down; it's recommended that you restart the monitor so that the node list is up to date.</p>
<h3>Configuration</h3>
<p>Accumulo configuration information is stored in a xml file and ZooKeeper. System wide
configuration information is stored in accumulo-site.xml. In order for accumulo to
find this file its directory must be on the classpath. Accumulo will log a warning if it can not find
it, and will use built-in default values. The accumulo scripts try to put the config directory on the classpath.
<p>Starting with version 1.0, per-table configuration was
introduced. This information is stored in ZooKeeper. This information
can be manipulated using the config command in the accumulo
shell. ZooKeeper will notify all tablet servers when config properties
are modified. This makes it possible to change major compaction
settings, for example, for a table while accumulo is running.
<p>Per-table configuration settings override system settings.
<p>See the possible configuration options and their default values <a href='config.html'>here</a>
<h3>Managing system resources</h3>
<p>It is very important how disk and memory usage are allocated across the cluster and how servers processes are allocated across the cluster.
<ul>
<li> On larger clusters, run the namenode, secondary namenode, jobtracker, accumulo master, and zookeepers on dedicated nodes. On a smaller cluster you may want to run all master processes on one node. When doing this ensure that the max total memory that could be used by all master processes does not exceed system memory. Swapping on your single master node would not be good.
<li> Accumulo 1.2 and earlier rely on zookeeper but do not use it heavily. On a large cluster setting up 3 or 5 zookeepers should be plenty. Since there is no performance gain when running more zookeepers, fault tolerance is the only benefit.
<li> On slave nodes ensure the memory used by all slave processes is less than system memory. For example the following slave node config could use up to 38G of RAM : tablet server 3G, logger 1G, data node 2G, up to 10 mappers each using 2G, and up 6 reducers each using 2G. If the slave nodes only have 32G, then using 38G will result in swapping which could cause tablet server to lose their lock in zookeeper and die. Even if swapping does not cause tablet servers to die, it will kill performance.
<li>Accumulo and map reduce will work with less memory, but it has an impact. Accumulo will minor compact more frequently when it has less map memory, resulting in more major compactions. The minor and major compactions both use CPU and HDFS I/O. The same goes for map reduce, the less memory you give it, the more it has to sort and spill. Try to minimize spilling and compactions as much as possible without causing swapping.
<li>Accumulo writes data to disk before it sorts it in memory. This allows data that was in memory when a tablet server crashes to be recovered. Each slave node needs a local directory to write this data to. Ensure the file system holding this directory has at least 100G free on all nodes. Also, if this directory is in a filesystem used by map reduce or hdfs they may effect each others performance.
</ul>
<p>There are a few settings that determine how much memory accumulo tablet
servers use. In accumulo-env.sh there is a setting called
ACCUMULO_TSERVER_OPTS. By default this is set to something like "-Xmx512m
-Xms512m". These are Java jvm options asking Java to use 512 megabytes of
memory. By default accumulo stores data written to it outside of the Java
memory space in order to avoid pauses caused by the Java garbage collector. The
amount of memory it uses for this data is determined by the accumulo setting
"tserver.memory.maps.max". Since this memory is outside of the Java managed
memory, the process can grow larger than the -Xmx setting. So if -Xmx is set
to 512M and tserver.memory.maps.max is set to 1G, a tablet server process can
be expected to use 1.5G. If tserver.memory.maps.native.enabled is set to
false, then accumulo will only use memory managed by Java and the process will
not use more than what -Xmx is set to. In this case the
tserver.memory.maps.max setting should be 75% of the -Xmx setting.
<h3>Swappiness</h3>
<p>The linux kernel will swap out memory of running programs to increase
the size of the disk buffers. This tendency to swap out is controlled by
a kernel setting called "swappiness." This behavior does not work well for
large java servers. When a java process runs a garbage collection, it touches
lots of pages forcing all swapped out pages back into memory. It is suggested
that swappiness be set to zero.
<pre>
# sysctl -w vm.swappiness=0
# echo "vm.swappiness = 0" &gt;&gt; /etc/sysctl.conf
</pre>
<h3>Zone reclaim mode</h3>
<p>Turn off zone reclaim mode. See this great article for all the
<a href="http://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases">
details on why.
</a>
<pre>
# sysctl -w vm.zone_reclaim_mode=0
# echo "vm.zone_reclaim_mode = 0" &gt;&gt; /etc/sysctl.conf
</pre>
<h3>Hadoop timeouts</h3>
<p>In order to detect failed datanodes, use shorter timeouts. Add the following to your
hdfs-site.xml file:
<pre>
&lt;property&gt;
&lt;name&gt;dfs.socket.timeout&lt;/name&gt;
&lt;value&gt;3000&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;dfs.socket.write.timeout&lt;/name&gt;
&lt;value&gt;5000&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ipc.client.connect.timeout&lt;/name&gt;
&lt;value&gt;1000&lt;/value&gt;
&lt;/property&gt;
&lt;property&gt;
&lt;name&gt;ipc.clident.connect.max.retries.on.timeouts&lt;/name&gt;
&lt;value&gt;2&lt;/value&gt;
&lt;/property&gt;
</pre>
</body>
</html>