| ****************************************************************************** |
| 0. Introduction |
| |
| Apache Accumulo is a sorted, distributed key/value store based on Google's |
| BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It |
| features a few novel improvements on the BigTable design in the form of |
| cell-level access labels and a server-side programming mechanism that can modify |
| key/value pairs at various points in the data management process. |
| |
| ****************************************************************************** |
| 1. Building |
| |
| In the normal tarball or RPM release of Apache Accumulo, everything is built and |
| ready to go: there is no build step. |
| |
| However, if you only have source code, or you wish to make changes, you need to |
| have maven configured to get Accumulo pre-requisites from repositories. See |
| the pom.xml file for the necessary components. |
| |
| The libthrift 0.3 jar is no longer available from a repository. This jar will |
| be automatically built from the thrift tag and installed into your local maven |
| repository during the Accumulo build process via the |
| src/assemble/install-thrift-jar.sh script. |
| |
| Run the following commands to build Accumulo. |
| |
| tar xvzf accumulo-1.3.6-src.tar.gz |
| cd accumulo-1.3.6 |
| mvn package && mvn assembly:single |
| |
| ****************************************************************************** |
| 2. Deployment |
| |
| Copy the accumulo tar file produced by "mvn package && mvn assembly:single" from |
| the target/ directory to the desired destination, then untar it (e.g. |
| tar xvzf accumulo-1.3.6-dist.tar.gz). |
| |
| If you are using the RPM, install the RPM on every machine that will run |
| accumulo. |
| |
| ****************************************************************************** |
| 3. Configuring |
| |
| Apache Accumulo has two prerequisites, Hadoop and Zookeeper. Zookeeper must be |
| at least version 3.3.0. Both of these must be installed and configured. |
| |
| Ensure you (or the some special hadoop user account) have accounts on all of |
| the machines in the cluster and that hadoop and accumulo install files can be |
| found in the same location on every machine in the cluster. You will need to |
| have password-less ssh set up as described in the hadoop documentation. |
| |
| You will need to have hadoop installed and configured on your system. |
| Apache Accumulo 1.3.6 has been tested with hadoop version |
| 0.20.1 and 0.20.2. |
| |
| Create a "slaves" file in $ACCUMULO_HOME/conf/. This is a list of machines |
| where tablet servers and loggers will run. |
| |
| Create a "masters" file in $ACCUMULO_HOME/conf/. This is a list of |
| machines where the master server will run. |
| |
| Create conf/accumulo-env.sh following the template of |
| conf/accumulo-env.sh.example. Set JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME. |
| These directories must be at the same location on every node in the cluster. |
| Note that zookeeper must be installed on every machine, but it should not be |
| run on every machine. |
| |
| * Note that you will be specifying the Java heap space in accumulo-env.sh. |
| You should make sure that the total heap space used for the accumulo tserver, |
| logger and the hadoop datanode and tasktracker is less than the available |
| memory on each slave node in the cluster. On large clusters, it is recommended |
| that the accumulo master, hadoop namenode, secondary namenode, and hadoop |
| jobtracker all be run on separate machines to allow them to use more heap |
| space. If you are running these on the same machine on a small cluster, make |
| sure their heap space settings fit within the available memory. The zookeeper |
| instances are also time sensitive and should be on machines that will not be |
| heavily loaded, or over-subscribed for memory. |
| |
| Create conf/accumulo-site.xml. You must set the zookeeper servers in this |
| file (instance.zookeeper.host). Look at docs/config.html to see what |
| additional variables you can modify and what the defaults are. |
| |
| Create the write-ahead log directory on all slaves. The directory is set in |
| the accumulo-site.xml as the "logger.dir.walog" parameter. It is a local |
| directory that will be used to log updates which will be used in the event of |
| tablet server failure, so it is important that it have sufficient space and |
| reliability. |
| |
| Synchronize your accumulo conf directory across the cluster. As a precaution |
| against mis-configured systems, servers using different configuration files |
| will not communicate with the rest of the cluster. |
| |
| ****************************************************************************** |
| 4. Running Apache Accumulo |
| |
| Make sure hadoop is configured on all of the machines in the cluster, including |
| access to a shared hdfs instance. Make sure hdfs is running. |
| |
| Make sure zookeeper is configured and running on at least one machine in the |
| cluster. |
| |
| Run "bin/accumulo init" to create the hdfs directory structure |
| (hdfs:///accumulo/*) and initial zookeeper settings. This will also allow you |
| to also configure the initial root password. Only do this once. |
| |
| Start accumulo using the bin/start-all.sh script. |
| |
| Use the "bin/accumulo shell -u <username>" command to run an accumulo shell |
| interpreter. Within this interpreter, run "createtable <tablename>" to create |
| a table, and run "table <tablename>" followed by "scan" to scan a table. |
| |
| In the example below a table is created, data is inserted, and the table is |
| scanned. |
| |
| $ ./bin/accumulo shell -u root |
| Enter current password for 'root'@'acu13': ****** |
| |
| Shell - Apache Accumulo Interactive Shell |
| - |
| - version: 1.3.6 |
| - instance name: acu13 |
| - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f |
| - |
| - type 'help' for a list of available commands |
| - |
| root@ac> createtable foo |
| root@ac foo> insert row1 colf1 colq1 val1 |
| root@ac foo> insert row1 colf1 colq2 val2 |
| root@ac foo> scan |
| row1 colf1:colq1 [] val1 |
| row1 colf1:colq2 [] val2 |
| |
| The example below start the shell, switches to table foo, and scans for a |
| certain column. |
| |
| $ ./bin/accumulo shell -u root |
| Enter current password for 'root'@'acu13': ****** |
| |
| Shell - Apache Accumulo Interactive Shell |
| - |
| - version: 1.3.6 |
| - instance name: acu13 |
| - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f |
| - |
| - type 'help' for a list of available commands |
| - |
| root@ac> table foo |
| root@ac foo> scan -c colf1:colq2 |
| row1 colf1:colq2 [] val2 |
| |
| |
| |
| |
| ****************************************************************************** |
| 5. Monitoring Apache Accumulo |
| |
| You can point your browser to the master host, on port 50095 to see the status |
| of accumulo across the cluster. You can even do this with the text-based |
| browser "elinks": |
| |
| $ links http://localhost:50095 |
| |
| From this GUI, you can ensure that tablets are assigned, tables are online, |
| tablet servers are up. You can monitor query and ingest rates across the |
| cluster. |
| |
| ****************************************************************************** |
| 6. Stopping Apache Accumulo |
| |
| Do not kill the tabletservers or run bin/tdown.sh unless absolutely necessary. |
| Recovery from a catastrophic loss of servers can take a long time. To shutdown |
| cleanly, run "bin/stop-all.sh" and the master will orchestrate the shutdown of |
| all the tablet servers. Shutdown waits for all writes to finish, so it may |
| take some time for particular configurations. |
| |
| ****************************************************************************** |
| 7. Logging |
| |
| DEBUG and above are logged to the logs/ dir. To modify this behavior change |
| the scripts in conf/. To change the logging dir, set ACCUMULO_LOG_DIR in |
| conf/accumulo-env.sh. Stdout and stderr of each accumulo process is |
| redirected to the log dir. |
| |
| ****************************************************************************** |
| 8. API |
| |
| The public accumulo API is composed of everything in the |
| org.apache.accumulo.core.client package (excluding the |
| org.apache.accumulo.core.client.impl package) and the following classes from |
| org.apache.accumulo.core.data : Key, Mutation, Value, and Range. To get started |
| using accumulo review the example and the javadoc for the packages and classes |
| mentioned above. |
| |
| ****************************************************************************** |
| 9. Performance Tuning |
| |
| Accumulo has exposed several configuration properties that can be changed. |
| These properties and configuration management are described in detail in |
| docs/config.html. While the default value is usually optimal, there are cases |
| where a change can increase query and ingest performance. |
| |
| Before changing a property from its default in a production system, you should |
| develop a good understanding of the property and consider creating a test to |
| prove the increased performance. |
| |
| ****************************************************************************** |