| ****************************************************************************** |
| 0. Introduction |
| |
| Apache Accumulo is a sorted, distributed key/value store based on Google's |
| BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It |
| features a few novel improvements on the BigTable design in the form of |
| cell-level access labels and a server-side programming mechanism that can modify |
| key/value pairs at various points in the data management process. |
| |
| ****************************************************************************** |
| 1. Building |
| |
| In the normal tarball or RPM release of accumulo, everything is built and |
| ready to go on x86 GNU/Linux: there is no build step. |
| |
| However, if you only have source code, or you wish to make changes, you need to |
| have maven configured to get Accumulo prerequisites from repositories. See |
| the pom.xml file for the necessary components. |
| |
| Run "mvn package && mvn assembly:single -N" |
| |
| If you are running on another Unix-like operating system (OSX, etc) then |
| you may wish to build the native libraries. They are not strictly necessary |
| but having them available suppresses a runtime warning: |
| |
| $ ( cd ./src/server/src/main/c++ ; make ) |
| |
| If you want to build the debian release, use the command "mvn package && mvn |
| install" to generate the .deb files in the target/ directory. Please follow |
| the steps at |
| https://cwiki.apache.org/BIGTOP/how-to-install-hadoop-distribution-from-bigtop.html |
| to add bigtop to your debian sources list. This will make it substantially |
| easier to install. |
| |
| ****************************************************************************** |
| 2. Deployment |
| |
| Copy the accumulo tar file produced by "mvn package && mvn assembly:single -N" |
| from the target/ directory to the desired destination, then untar it (e.g. |
| tar xvzf accumulo-1.4.0-dist.tar.gz). |
| |
| If you are using the RPM/deb, install the RPM/deb on every machine that will run |
| accumulo. |
| |
| ****************************************************************************** |
| 3. Upgrading from 1.3 to 1.4 |
| |
| There are no steps for upgrading from 1.3 to 1.4. |
| |
| ****************************************************************************** |
| 4. Configuring |
| |
| Apache Accumulo has two prerequisites, hadoop and zookeeper. Zookeeper must be |
| at least version 3.3.0. Both of these must be installed and configured. |
| Zookeeper normally only allows for 10 connections from one computer. On a |
| single-host install, this number is a little two low. Add the following to the |
| $ZOOKEEPER_HOME/conf/zoo.cfg file: |
| |
| maxClientCnxns=100 |
| |
| Ensure you (or the some special hadoop user account) have accounts on all of |
| the machines in the cluster and that hadoop and accumulo install files can be |
| found in the same location on every machine in the cluster. You will need to |
| have password-less ssh set up as described in the hadoop documentation. |
| |
| You will need to have hadoop installed and configured on your system. |
| Accumulo 1.4.0 has been tested with hadoop version 0.20.2. |
| |
| The example accumulo configuration files are placed in directories based on the |
| memory footprint for the accumulo processes. If you are using native libraries |
| for you tablet server in-memory map, then you can use the files in "native-standalone". |
| If you get warnings about not being able to load the native libraries, you can |
| use the configuration files in "standalone". |
| |
| For testing on a single computer, use a fairly small configuration: |
| |
| $ cp conf/examples/512MB/native-standalone/* conf |
| |
| Please note that the footprints are for only the Accumulo system processes, so |
| ample space should be left for other processes like hadoop, zookeeper, and the |
| accumulo client code. These directories must be at the same location on every |
| node in the cluster. |
| |
| If you are configuring a larger cluster you will need to create the configuration |
| files yourself and propogate the changes to the $ACCUMULO_HOME/conf directories: |
| |
| Create a "slaves" file in $ACCUMULO_HOME/conf/. This is a list of machines |
| where tablet servers and loggers will run. |
| |
| Create a "masters" file in $ACCUMULO_HOME/conf/. This is a list of |
| machines where the master server will run. |
| |
| Create conf/accumulo-env.sh following the template of |
| example/3GB/native-standalone/accumulo-env.sh. |
| |
| However you create your configuration files, you will need to set |
| JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME in conf/accumulo-env.sh |
| |
| Note that zookeeper client jar files must be installed on every machine, but |
| the server should not be run on every machine. |
| |
| Create the $ACCUMULO_LOG_DIR on every machine in the slaves file. |
| |
| * Note that you will be specifying the Java heap space in accumulo-env.sh. |
| You should make sure that the total heap space used for the accumulo tserver, |
| logger and the hadoop datanode and tasktracker is less than the available |
| memory on each slave node in the cluster. On large clusters, it is recommended |
| that the accumulo master, hadoop namenode, secondary namenode, and hadoop |
| jobtracker all be run on separate machines to allow them to use more heap |
| space. If you are running these on the same machine on a small cluster, make |
| sure their heap space settings fit within the available memory. The zookeeper |
| instances are also time sensitive and should be on machines that will not be |
| heavily loaded, or over-subscribed for memory. |
| |
| Edit conf/accumulo-site.xml. You must set the zookeeper servers in this |
| file (instance.zookeeper.host). Look at docs/config.html to see what |
| additional variables you can modify and what the defaults are. |
| |
| It is advisable to change the instance secret (instance.secret) to some new |
| value. Also ensure that the accumulo-site.xml file is not readable by other |
| users on the machine. |
| |
| Create the write-ahead log directory on all slaves. The directory is set in |
| the accumulo-site.xml as the "logger.dir.walog" parameter. It is a local |
| directory that will be used to log updates which will be used in the event of |
| tablet server failure, so it is important that it have sufficient space and |
| reliability. It is possible to specify a comma-separated list of directories |
| to use for write-ahead logs, in which case each directory in the list must be |
| created on all slaves. |
| |
| Synchronize your accumulo conf directory across the cluster. As a precaution |
| against mis-configured systems, servers using different configuration files |
| will not communicate with the rest of the cluster. |
| |
| ****************************************************************************** |
| 5. Running Apache Accumulo |
| |
| Make sure hadoop is configured on all of the machines in the cluster, including |
| access to a shared hdfs instance. Make sure hdfs is running. |
| |
| Make sure zookeeper is configured and running on at least one machine in the |
| cluster. |
| |
| Run "bin/accumulo init" to create the hdfs directory structure |
| (hdfs:///accumulo/*) and initial zookeeper settings. This will also allow you |
| to also configure the initial root password. Only do this once. |
| |
| Start accumulo using the bin/start-all.sh script. |
| |
| Use the "bin/accumulo shell -u <username>" command to run an accumulo shell |
| interpreter. Within this interpreter, run "createtable <tablename>" to create |
| a table, and run "table <tablename>" followed by "scan" to scan a table. |
| |
| In the example below a table is created, data is inserted, and the table is |
| scanned. |
| |
| $ ./bin/accumulo shell -u root |
| Enter current password for 'root'@'accumulo': ****** |
| |
| Shell - Apache Accumulo Interactive Shell |
| - |
| - version: 1.4.0 |
| - instance name: accumulo |
| - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f |
| - |
| - type 'help' for a list of available commands |
| - |
| root@ac> createtable foo |
| root@ac foo> insert row1 colf1 colq1 val1 |
| root@ac foo> insert row1 colf1 colq2 val2 |
| root@ac foo> scan |
| row1 colf1:colq1 [] val1 |
| row1 colf1:colq2 [] val2 |
| |
| The example below start the shell, switches to table foo, and scans for a |
| certain column. |
| |
| $ ./bin/accumulo shell -u root |
| Enter current password for 'root'@'accumulo': ****** |
| |
| Shell - Apache Accumulo Interactive Shell |
| - |
| - version: 1.4.0 |
| - instance name: accumulo |
| - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f |
| - |
| - type 'help' for a list of available commands |
| - |
| root@ac> table foo |
| root@ac foo> scan -c colf1:colq2 |
| row1 colf1:colq2 [] val2 |
| |
| |
| If you are running on top of hdfs with kerberos enabled, then you need to do |
| some extra work. We currently do not internally support kerberos, so you must |
| manually manage the accumulo users tickets. First, create an accumulo principal |
| |
| kadmin.local -q "addprinc -randkey accumulo/<host.domain.name>" |
| |
| where <host.domain.name> is replaced by a fully qualified domain name. Export |
| the principals to a keytab file |
| |
| kadmin.local -q "xst -k accumulo.keytab -glob accumulo*" |
| |
| Place this file in $ACCUMULO_HOME/conf for every host. It should be owned by |
| the accumulo user and chmodded to 400. Add the following to the accumulo-env.sh |
| |
| kinit -kt $ACCUMULO_HOME/conf/accumulo.keytab accumulo/`hostname -f` |
| |
| And set the following crontab for every host |
| |
| 0 5 * * * kinit -kt $ACCUMULO_HOME/conf/accumulo.keytab accumulo/`hostname -f` |
| |
| Additionally, adjust the $ACCUMULO_HOME/conf/monitor.security.policy to change |
| |
| permission java.util.PropertyPermission "*", "read"; |
| |
| to |
| |
| permission java.util.PropertyPermission "*", "read,write"; |
| |
| And add these lines to the end of the policy file |
| |
| permission javax.security.auth.AuthPermission "createLoginContext.hadoop-user-kerberos"; |
| permission java.lang.RuntimePermission "createSecurityManager"; |
| permission javax.security.auth.AuthPermission "doAs"; |
| permission javax.security.auth.AuthPermission "getPolicy"; |
| permission java.security.SecurityPermission "createAccessControlContext"; |
| permission javax.security.auth.AuthPermission "getSubjectFromDomainCombiner"; |
| permission java.lang.RuntimePermission "getProtectionDomain"; |
| permission javax.security.auth.AuthPermission "modifyPrivateCredentials"; |
| permission javax.security.auth.PrivateCredentialPermission "javax.security.auth.kerberos.KerberosTicket javax.security.auth.kerberos.KerberosPrincipal \"*\"", "read"; |
| permission javax.security.auth.kerberos.ServicePermission "krbtgt/<REALM>@<REALM>", "initiate"; |
| permission javax.security.auth.kerberos.ServicePermission "hdfs/<namenode.domain.name>@<REALM>", "initiate"; |
| permission javax.security.auth.kerberos.ServicePermission "mapred/<jobtracker.domain.name>@<REALM>", "initiate"; |
| |
| Where <REALM> is replaced with the kerberos realm for the Hadoop cluster, |
| <namenode.domain.name> is replaced with the fully qualified domain name of the |
| server running the namenode and <jobtracker.domain.name> is replaced with the |
| fully qualified domain name of the server running the job tracker. |
| |
| |
| ****************************************************************************** |
| 6. Monitoring Apache Accumulo |
| |
| You can point your browser to the master host, on port 50095 to see the status |
| of accumulo across the cluster. You can even do this with the text-based |
| browser "links": |
| |
| $ links http://localhost:50095 |
| |
| From this GUI, you can ensure that tablets are assigned, tables are online, |
| tablet servers are up. You can monitor query and ingest rates across the |
| cluster. |
| |
| ****************************************************************************** |
| 7. Stopping Apache Accumulo |
| |
| Do not kill the tabletservers or run bin/tdown.sh unless absolutely necessary. |
| Recovery from a catastrophic loss of servers can take a long time. To shutdown |
| cleanly, run "bin/stop-all.sh" and the master will orchestrate the shutdown of |
| all the tablet servers. Shutdown waits for all writes to finish, so it may |
| take some time for particular configurations. |
| |
| ****************************************************************************** |
| 8. Logging |
| |
| DEBUG and above are logged to the logs/ dir. To modify this behavior change |
| the scripts in conf/. To change the logging dir, set ACCUMULO_LOG_DIR in |
| conf/accumulo-env.sh. Stdout and stderr of each accumulo process is |
| redirected to the log dir. |
| |
| ****************************************************************************** |
| 9. API |
| |
| The public accumulo API is composed of everything in the |
| org.apache.accumulo.core.client package (excluding the |
| org.apache.accumulo.core.client.impl package) and the following classes from |
| org.apache.accumulo.core.data : Key, Mutation, Value, and Range. To get started |
| using accumulo review the example and the javadoc for the packages and classes |
| mentioned above. |
| |
| ****************************************************************************** |
| 10. Performance Tuning |
| |
| Apache Accumulo has exposed several configuration properties that can be |
| changed. These properties and configuration management are described in detail |
| in docs/config.html. While the default value is usually optimal, there are |
| cases where a change can increase query and ingest performance. |
| |
| Before changing a property from its default in a production system, you should |
| develop a good understanding of the property and consider creating a test to |
| prove the increased performance. |
| |
| ****************************************************************************** |