blob: fb8b2415021d56ab1b04354a4754c4ba8493747f [file] [log] [blame] [view]
---
title: Setup
category: getting-started
order: 1
skip_doc_h1: true
---
# User Manual (2.x and 3.x)
Starting with Accumulo 2.0, the user manual now lives on the website as a series
of web pages. Previously, it was one large pdf document that was only generated
during a release. The user manual can now be updated very quickly and indexed
for searching across many webpages.
The manual can now be searched using the [Search link][search] at the top of the
website or navigated by clicking the links to the left. If you are new to
Accumulo, follow the instructions below to get started. For detailed
instructions, see the [in-depth installation guide][in-depth].
## Master/Manager naming
As of release 2.1, all references to "master" have been changed to "manager." If you are using/installing
a release prior to 2.1, substitute "master" in place of "manager" for any property name, file name, or
process name referenced in this documentation.
## Setup for testing or development
If you are setting up Accumulo for **testing or development,** consider using
the following tools:
* [Uno] sets up Accumulo on a single machine for development
* [Muchos] sets up Accumulo on a cluster (optionally launched in Amazon EC2 and
Microsoft Azure VM)
If you are setting up Accumulo for a **production** environment, follow the
instructions below.
## Setup for Production
Either [download] or [build] a binary distribution of Accumulo from source code
and unpack as follows.
```
tar xzf /path/to/accumulo-{{ page.latest_release}}-bin.tar.gz
cd accumulo-{{ page.latest_release }}
```
There are four scripts in the `bin` directory of the tarball distribution that
are used to manage Accumulo:
1. `accumulo` - Runs Accumulo command-line tools and starts Accumulo processes
2. `accumulo-service` - Runs individual Accumulo processes as background
services
3. `accumulo-cluster` - Manages Accumulo cluster on a single node or several
nodes
4. `accumulo-util` - Accumulo utilities for building native libraries, running
jars, etc.
These scripts will be used in the remaining instructions to configure and run
Accumulo. For convenience, consider adding `accumulo-{{ page.latest_release }}/bin/`
to your shell's path.
## Configuring Accumulo
Accumulo requires running [Zookeeper] and [HDFS] instances which should be set
up before configuring Accumulo.
**Important note:** If using [Erasure Coding] (EC), data loss will occur unless
it is configured properly for Accumulo. Please see the [Erasure Coding guide][ec-guide]
for more information.
The primary configuration files for Accumulo are [accumulo.properties],
[accumulo-env.sh], and [accumulo-client.properties] which are located in the
`conf/` directory.
The [accumulo.properties] file configures Accumulo server processes (i.e. tablet
server, manager, monitor, etc). Follow these steps to set it up:
1. Run `accumulo-util build-native` to build native code. If this command fails,
disable native maps by setting {% plink tserver.memory.maps.native.enabled %}
to `false`.
2. Set {% plink instance.volumes %} to HDFS location where Accumulo will store
data. If your namenode is running at 192.168.1.9:8020, and you want to store
data in `/accumulo` in HDFS, then set {% plink instance.volumes %} to
`hdfs://192.168.1.9:8020/accumulo`.
3. Set {% plink instance.zookeeper.host %} to the location of your Zookeepers
4. (Optional) Change {% plink instance.secret %} (which is used by Accumulo
processes to communicate) from the default. This value should match on all
servers.
The [accumulo-env.sh] file sets up environment variables needed by Accumulo:
1. Set `HADOOP_HOME` and `ZOOKEEPER_HOME` to the location of your Hadoop and
Zookeeper installations. Accumulo will use these locations to find Hadoop and
Zookeeper jars and add them to your `CLASSPATH` variable. If you are running
a vendor-specific release of Hadoop or Zookeeper, you may need to modify how
the `CLASSPATH` variable is built in [accumulo-env.sh]. If Accumulo has
problems loading classes when you start it, run `accumulo classpath` to print
Accumulo's classpath.
2. Accumulo tablet servers are configured by default to use 1GB of memory (768MB
is allocated to JVM and 256MB is allocated for native maps). Native maps are
allocated memory equal to 33% of the tserver JVM heap. The table below can be
used if you would like to change tserver memory usage in the `JAVA_OPTS`
section of [accumulo-env.sh]:
| Native? | 512MB | 1GB | 2GB | 3GB |
|---------|-------------------|-------------------|---------------------|---------------|
| Yes | -Xmx384m -Xms384m | -Xmx768m -Xms768m | -Xmx1536m -Xms1536m | -Xmx2g -Xms2g |
| No | -Xmx512m -Xms512m | -Xmx1g -Xms1g | -Xmx2g -Xms2g | -Xmx3g -Xms3g |
3. (Optional) Review the memory settings for the Accumulo manager, garbage collector, and monitor
in the `JAVA_OPTS` section of [accumulo-env.sh].
The [accumulo-client.properties] file is used by the Accumulo shell and can be
passed to Accumulo clients to simplify connecting to Accumulo. Below are steps
to configure it.
1. Set {% plink -c instance.name %} and {% plink -c instance.zookeepers %} to
the Accumulo instance and zookeeper connection string of your instance.
2. Pick an authentication type and set {% plink -c auth.type %} accordingly. The
most common `auth.type` is `password` which requires {% plink -c auth.principal %}
to be set and {% plink -c auth.token %} to be set the password of
`auth.principal`. For the Accumulo shell, `auth.token` can be commented out
and the shell will prompt you for the password of `auth.principal` at login.
## Initialization
Accumulo needs to initialize the locations where it stores data in Zookeeper and HDFS.
Note: Initialization only needs to be performed once for an instance - if you are performing an
upgrade you should not run the initialization command a second time unless you really want a new
instance.
The following command will perform the initialization.
```
accumulo init
```
The initialization command will prompt for the following information.
* **Instance name** : This is the name of the Accumulo instance and its
Accumulo clients need to know it in order to connect.
* **Root password** : Initialization sets up an initial Accumulo root user and
prompts for its password. This information will be needed to later connect to
Accumulo.
## Run Accumulo
There are several methods for running Accumulo:
1. Run Accumulo processes using `accumulo` command which runs processes in
foreground and will not redirect stderr/stdout. Useful for creating init.d
scripts that run Accumulo.
2. Run individual Accumulo processes as services using `accumulo-service` which
uses `accumulo` command but backgrounds processes, redirects stderr/stdout
and manages pid files. This is useful if you are using a cluster management
tool (i.e. Ansible, Salt, etc).
2. Run an Accumulo cluster on one or more nodes using `accumulo-cluster` (which
uses `accumulo-service` to run services). Useful for local development and
testing or if you are not using your own cluster management tool in
production.
Each method above has instructions below.
### Run individual Accumulo processes
Start Accumulo processes (tserver, manager, monitor, etc) using the accumulo
command followed by the service name. For example, to start only the tserver,
run:
```
accumulo tserver
```
The process will run in the foreground. Use ctrl-c to quit.
For a fully operational instance, each individual service will need to be
started.
### Run individual Accumulo services
Start individual Accumulo processes (tserver, master, monitor, etc.) as a
background service using the example accumulo-service script followed by the
service name. For example, to start only the tserver, run:
```
accumulo-service tserver start
```
For a fully operational instance, each individual service will need to be
started.
### Run an Accumulo cluster
Before using the `accumulo-cluster` script to start the cluster, additional
configuration files may need to be created. Use the command below to create them
from provided templates:
```
accumulo-cluster create-config
```
This creates a yaml configuration file in the `conf/` directory named
`cluster.yaml` that contains the node names where Accumulo services are
run on your cluster. By default, all services are configured to `localhost`. If you
are running a single-node Accumulo cluster, these files do not need to be
changed and the next section should be skipped. The external compaction services
exist in the file but are commented out as they are optional.
#### Multi-node configuration
If you are running an Accumulo cluster on multiple nodes, the `conf/cluster.yaml`
file contains sections that should be configured with a list of node names in yaml format:
* [manager] : Accumulo primary coordinating process. Must specify one node. Can
specify a few for fault tolerance.
* [gc] : Accumulo garbage collector. Must specify one node. Can specify a
few for fault tolerance.
* [monitor] : Node where Accumulo monitoring web server is run.
* [tserver] : Accumulo worker processes. List all of the nodes where tablet
servers should run.
* [sserver] : Optional. List of all nodes where scan servers should run.
* [compaction.coordinator] : Optional. Must specify one node. Can specify a few
for fault tolerance.
* [compaction.compactor] : Optional. Accumulo external compactor processes. List of
all nodes where compactors should run.
The Accumulo, Hadoop, and Zookeeper software should be present at the same
location on every node. Also, the files in the `conf` directory must be copied to
every node. There are many ways to replicate the software and configuration, two
possible tools that can help replicate software and/or config are [pdcp] and
[prsync].
The `accumulo-cluster` script uses ssh to start processes on remote nodes.
Before attempting to start Accumulo, [passwordless ssh][pwl] must be setup on
the cluster.
#### Start cluster
After configuring and initializing Accumulo, use the following command to start
the cluster using the provided cluster management script:
```
accumulo-cluster start
```
## First steps
Once you have started Accumulo, use the following command to run the Accumulo
shell:
```
accumulo shell -u root
```
Use your web browser to connect the Accumulo monitor page on port 9995.
```
http://<hostname in conf/monitor>:9995/
```
## Stopping Accumulo
When finished, use the following commands to stop Accumulo:
* Stop an individual Accumulo service: `accumulo-service tserver stop`
* Stop Accumulo cluster: `accumulo-cluster stop`
[search]: {{ site.baseurl }}/search
[in-depth]: {% durl administration/in-depth-install %}
[download]: {{ site.baseurl }}/downloads
[build]: https://github.com/apache/accumulo/blob/main/README.md#building
[Zookeeper]: https://zookeeper.apache.org/
[HDFS]: https://hadoop.apache.org/
[pdcp]: https://code.google.com/p/pdsh/
[prsync]: https://code.google.com/p/parallel-ssh/
[pwl]: https://www.google.com/search?q=hadoop+passwordless+ssh&ie=utf-8&oe=utf-8
[accumulo-env.sh]: {% durl configuration/files#accumulo-envsh %}
[accumulo.properties]: {% durl configuration/files#accumuloproperties %}
[accumulo-client.properties]: {% durl configuration/files#accumulo-clientproperties %}
[gc]: {% durl configuration/files#gc %}
[monitor]: {% durl configuration/files#monitor %}
[manager]: {% durl configuration/files#managers %}
[tserver]: {% durl configuration/files#tservers %}
[tracer]: {% durl configuration/files#tracers %}
[compaction.coordinator]: {% durl configuration/files#compaction%20coordinator %}
[compaction.compactor]: {% durl configuration/files#compaction%20compactor %}
[sserver]: {% durl configuration/files#sserver %}
[Uno]: https://github.com/apache/fluo-uno
[Muchos]: https://github.com/apache/fluo-muchos
[Erasure Coding]: https://hadoop.apache.org/docs/r3.2.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html
[ec-guide]: {% durl administration/erasure-coding %}