| // Licensed to the Apache Software Foundation (ASF) under one or more |
| // contributor license agreements. See the NOTICE file distributed with |
| // this work for additional information regarding copyright ownership. |
| // The ASF licenses this file to You under the Apache License, Version 2.0 |
| // (the "License"); you may not use this file except in compliance with |
| // the License. You may obtain a copy of the License at |
| // |
| // http://www.apache.org/licenses/LICENSE-2.0 |
| // |
| // Unless required by applicable law or agreed to in writing, software |
| // distributed under the License is distributed on an "AS IS" BASIS, |
| // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| // See the License for the specific language governing permissions and |
| // limitations under the License. |
| |
| == Administration |
| |
| === Hardware |
| |
| Because we are running essentially two or three systems simultaneously layered |
| across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to |
| consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have |
| at least one core and 2 - 4 GB each. |
| |
| One core running HDFS can typically keep 2 to 4 disks busy, so each machine may |
| typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks. |
| |
| It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB |
| each, but in this case it is recommended to only run up to two processes per |
| machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but |
| not all three. The constraint here is having enough available heap space for all the |
| processes on a machine. |
| |
| === Network |
| |
| Accumulo communicates via remote procedure calls over TCP/IP for both passing |
| data and control messages. In addition, Accumulo uses HDFS clients to |
| communicate with HDFS. To achieve good ingest and query performance, sufficient |
| network bandwidth must be available between any two machines. |
| |
| In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will |
| use the following default ports. Please make sure that they are open, or change |
| their value in conf/accumulo-site.xml. |
| |
| .Accumulo default ports |
| [width="75%",cols=">,^2,^2"] |
| [options="header"] |
| |==== |
| |Port | Description | Property Name |
| |4445 | Shutdown Port (Accumulo MiniCluster) | n/a |
| |4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j |
| |9995 | Accumulo HTTP monitor | monitor.port.client |
| |9997 | Tablet Server | tserver.port.client |
| |9998 | Accumulo GC | gc.port.client |
| |9999 | Master Server | master.port.client |
| |12234 | Accumulo Tracer | trace.port.client |
| |42424 | Accumulo Proxy Server | n/a |
| |10001 | Master Replication service | master.replication.coordinator.port |
| |10002 | TabletServer Replication service | replication.receipt.service.port |
| |==== |
| |
| In addition, the user can provide +0+ and an ephemeral port will be chosen instead. This |
| ephemeral port is likely to be unique and not already bound. Thus, configuring ports to |
| use +0+ instead of an explicit value, should, in most cases, work around any issues of |
| running multiple distinct Accumulo instances (or any other process which tries to use the |
| same default ports) on the same hardware. Finally, the *.port.client properties will work |
| with the port range syntax (M-N) allowing the user to specify a range of ports for the |
| service to attempt to bind. The ports in the range will be tried in a 1-up manner starting |
| at the low end of the range to, and including, the high end of the range. |
| |
| === Installation |
| |
| Download a binary distribution of Accumulo and install it to a directory on a disk with |
| sufficient space: |
| |
| cd <install directory> |
| tar xzf accumulo-X.Y.Z-bin.tar.gz # Replace 'X.Y.Z' with your Accumulo version |
| cd accumulo-X.Y.Z |
| |
| Repeat this step on each machine in your cluster. Typically, the same +<install directory>+ |
| is chosen for all machines in the cluster. When you configure Accumulo, the +$ACCUMULO_HOME+ |
| environment variable should be set to +/path/to/<install directory>/accumulo-X.Y.Z+. |
| |
| === Dependencies |
| Accumulo requires HDFS and ZooKeeper to be configured and running |
| before starting. Password-less SSH should be configured between at least the |
| Accumulo master and TabletServer machines. It is also a good idea to run Network |
| Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too out of |
| sync, which can cause problems with automatically timestamped data. |
| |
| === Configuration |
| |
| Accumulo is configured by editing several Shell and XML files found in |
| +$ACCUMULO_HOME/conf+. The structure closely resembles Hadoop's configuration |
| files. |
| |
| Logging is primarily controlled using the log4j configuration files, |
| +generic_logger.xml+ and +monitor_logger.xml+ (or their corresponding |
| +.properties+ version if the +.xml+ version is missing). The generic logger is |
| used for most server types, and is typically configured to send logs to the |
| monitor, as well as log files. The monitor logger is used by the monitor, and |
| is typically configured to log only errors the monitor itself generates, |
| rather than all the logs that it receives from other server types. |
| |
| ==== Edit conf/accumulo-env.sh |
| |
| Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh |
| and specify the following: |
| |
| . Enter the location of the installation directory of Accumulo for +$ACCUMULO_HOME+ |
| . Enter your system's Java home for +$JAVA_HOME+ |
| . Enter the location of Hadoop for +$HADOOP_PREFIX+ |
| . Choose a location for Accumulo logs and enter it for +$ACCUMULO_LOG_DIR+ |
| . Enter the location of ZooKeeper for +$ZOOKEEPER_HOME+ |
| |
| By default Accumulo TabletServers are set to use 1GB of memory. You may change |
| this by altering the value of +$ACCUMULO_TSERVER_OPTS+. Note the syntax is that of |
| the Java JVM command line options. This value should be less than the physical |
| memory of the machines running TabletServers. |
| |
| There are similar options for the master's memory usage and the garbage collector |
| process. Reduce these if they exceed the physical RAM of your hardware and |
| increase them, within the bounds of the physical RAM, if a process fails because of |
| insufficient memory. |
| |
| Note that you will be specifying the Java heap space in accumulo-env.sh. You should |
| make sure that the total heap space used for the Accumulo tserver and the Hadoop |
| DataNode and TaskTracker is less than the available memory on each slave node in |
| the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop |
| NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate |
| machines to allow them to use more heap space. If you are running these on the |
| same machine on a small cluster, likewise make sure their heap space settings fit |
| within the available memory. |
| |
| ==== Native Map |
| |
| The tablet server uses a data structure called a MemTable to store sorted key/value |
| pairs in memory when they are first received from the client. When a minor compaction |
| occurs, this data structure is written to HDFS. The MemTable will default to using |
| memory in the JVM but a JNI version, called the native map, can be used to significantly |
| speed up performance by utilizing the memory space of the native operating system. The |
| native map also avoids the performance implications brought on by garbage collection |
| in the JVM by causing it to pause much less frequently. |
| |
| ===== Building |
| |
| 32-bit and 64-bit Linux and Mac OS X versions of the native map can be built |
| from the Accumulo bin package by executing |
| +$ACCUMULO_HOME/bin/build_native_library.sh+. If your system's |
| default compiler options are insufficient, you can add additional compiler |
| options to the command line, such as options for the architecture. These will be |
| passed to the Makefile in the environment variable +USERFLAGS+. |
| |
| Examples: |
| |
| . +$ACCUMULO_HOME/bin/build_native_library.sh+ |
| . +$ACCUMULO_HOME/bin/build_native_library.sh -m32+ |
| |
| After building the native map from the source, you will find the artifact in |
| +$ACCUMULO_HOME/lib/native+. Upon starting up, the tablet server will look |
| in this directory for the map library. If the file is renamed or moved from its |
| target directory, the tablet server may not be able to find it. The system can |
| also locate the native maps shared library by setting +LD_LIBRARY_PATH+ |
| (or +DYLD_LIBRARY_PATH+ on Mac OS X) in +$ACCUMULO_HOME/conf/accumulo-env.sh+. |
| |
| ===== Native Maps Configuration |
| |
| As mentioned, Accumulo will use the native libraries if they are found in the expected |
| location and +tserver.memory.maps.native.enabled+ is set to +true+ (which is the default). |
| Using the native maps over JVM Maps nets a noticable improvement in ingest rates; however, |
| certain configuration variables are important to modify when increasing the size of the |
| native map. |
| |
| To adjust the size of the native map, increase the value of +tserver.memory.maps.max+. |
| By default, the maximum size of the native map is 1GB. When increasing this value, it is |
| also important to adjust the values of +table.compaction.minor.logs.threshold+ and |
| +tserver.walog.max.size+. +table.compaction.minor.logs.threshold+ is the maximum |
| number of write-ahead log files that a tablet can reference before they will be automatically |
| minor compacted. +tserver.walog.max.size+ is the maximum size of a write-ahead log. |
| |
| The maximum size of the native maps for a server should be less than the product |
| of the write-ahead log maximum size and minor compaction threshold for log files: |
| |
| +$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= $tserver.memory.maps.max+ |
| |
| This formula ensures that minor compactions won't be automatically triggered before the native |
| maps can be completely saturated. |
| |
| Subsequently, when increasing the size of the write-ahead logs, it can also be important |
| to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log. |
| This is controlled via +tserver.wal.blocksize+. A basic recommendation is that when |
| +tserver.walog.max.size+ is larger than 2GB in size, set +tserver.wal.blocksize+ to 2GB. |
| Increasing the block size to a value larger than 2GB can result in decreased write |
| performance to the write-ahead log file which will slow ingest. |
| |
| ==== Cluster Specification |
| |
| On the machine that will serve as the Accumulo master: |
| |
| . Write the IP address or domain name of the Accumulo Master to the +$ACCUMULO_HOME/conf/masters+ file. |
| . Write the IP addresses or domain name of the machines that will be TabletServers in +$ACCUMULO_HOME/conf/slaves+, one per line. |
| |
| Note that if using domain names rather than IP addresses, DNS must be configured |
| properly for all machines participating in the cluster. DNS can be a confusing source |
| of errors. |
| |
| ==== Accumulo Settings |
| Specify appropriate values for the following settings in |
| +$ACCUMULO_HOME/conf/accumulo-site.xml+ : |
| |
| [source,xml] |
| <property> |
| <name>instance.zookeeper.host</name> |
| <value>zooserver-one:2181,zooserver-two:2181</value> |
| <description>list of zookeeper servers</description> |
| </property> |
| |
| This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate |
| settings between processes and helps finalize TabletServer failure. |
| |
| [source,xml] |
| <property> |
| <name>instance.secret</name> |
| <value>DEFAULT</value> |
| </property> |
| |
| The instance needs a secret to enable secure communication between servers. Configure your |
| secret and make sure that the +accumulo-site.xml+ file is not readable to other users. |
| For alternatives to storing the +instance.secret+ in plaintext, please read the |
| +Sensitive Configuration Values+ section. |
| |
| Some settings can be modified via the Accumulo shell and take effect immediately, but |
| some settings require a process restart to take effect. See the configuration documentation |
| (available in the docs directory of the tarball and in <<configuration>>) for details. |
| |
| One aspect of Accumulo's configuration which is different as compared to the rest of the Hadoop |
| ecosystem is that the server-process classpath is determined in part by multiple values. A |
| bootstrap classpath is based soley on the `accumulo-start.jar`, Log4j and `$ACCUMULO_CONF_DIR`. |
| |
| A second classloader is used to dynamically load all of the resources specified by `general.classpaths` |
| in `$ACCUMULO_CONF_DIR/accumulo-site.xml`. This value is a comma-separated list of regular-expression |
| paths which are all loaded into a secondary classloader. This includes Hadoop, Accumulo and ZooKeeper |
| jars necessary to run Accumulo. When this value is not defined, a default value is used which attempts |
| to load Hadoop from multiple potential locations depending on how Hadoop was installed. It is strongly |
| recommended that `general.classpaths` is defined and limited to only the necessary jars to prevent |
| extra jars from being unintentionally loaded into Accumulo processes. |
| |
| ==== Hostnames in configuration files |
| |
| Accumulo has a number of configuration files which can contain references to other hosts in your |
| network. All of the "host" configuration files for Accumulo (+gc+, +masters+, +slaves+, +monitor+, |
| +tracers+) as well as +instance.volumes+ in accumulo-site.xml must contain some host reference. |
| |
| While IP address, short hostnames, or fully qualified domain names (FQDN) are all technically valid, it |
| is good practice to always use FQDNs for both Accumulo and other processes in your Hadoop cluster. |
| Failing to consistently use FQDNs can have unexpected consequences in how Accumulo uses the FileSystem. |
| |
| A common way for this problem can be observed is via applications that use Bulk Ingest. The Accumulo |
| Master coordinates moving the input files to Bulk Ingest to an Accumulo-managed directory. However, |
| Accumulo cannot safely move files across different Hadoop FileSystems. This is problematic because |
| Accumulo also cannot make reliable assertions across what is the same FileSystem which is specified |
| with different names. Naively, while 127.0.0.1:8020 might be a valid identifier for an HDFS instance, |
| Accumulo identifies +localhost:8020+ as a different HDFS instance than +127.0.0.1:8020+. |
| |
| ==== Deploy Configuration |
| |
| Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml |
| from the +$ACCUMULO_HOME/conf/+ directory on the master to all the machines |
| specified in the slaves file. |
| |
| ==== Sensitive Configuration Values |
| |
| Accumulo has a number of properties that can be specified via the accumulo-site.xml |
| file which are sensitive in nature, instance.secret and trace.token.property.password |
| are two common examples. Both of these properties, if compromised, have the ability |
| to result in data being leaked to users who should not have access to that data. |
| |
| In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common |
| implementation to abstract away the storage and retrieval of passwords from plaintext |
| storage in configuration files. Any Property marked with the +Sensitive+ annotation |
| is a candidate for use with these CredentialProviders. For version of Hadoop which lack |
| these classes, the feature will just be unavailable for use. |
| |
| A comma separated list of CredentialProviders can be configured using the Accumulo Property |
| +general.security.credential.provider.paths+. Each configured URL will be consulted |
| when the Configuration object for accumulo-site.xml is accessed. |
| |
| ==== Using a JavaKeyStoreCredentialProvider for storage |
| |
| One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider. |
| Each entry in the KeyStore is the Accumulo Property key name. For example, to store the |
| `instance.secret`, the following command can be used: |
| |
| hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks |
| |
| The command will then prompt you to enter the secret to use and create a keystore in: |
| |
| /etc/accumulo/conf/accumulo.jceks |
| |
| Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider: |
| |
| [source,xml] |
| <property> |
| <name>general.security.credential.provider.paths</name> |
| <value>jceks://file/etc/accumulo/conf/accumulo.jceks</value> |
| </property> |
| |
| This configuration will then transparently extract the +instance.secret+ from |
| the configured KeyStore and alleviates a human readable storage of the sensitive |
| property. |
| |
| A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to |
| all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server |
| will expect the KeyStore in the same location. |
| |
| [[ClientConfiguration]] |
| ==== Client Configuration |
| |
| In version 1.6.0, Accumulo included a new type of configuration file known as a client |
| configuration file. One problem with the traditional "site.xml" file that is prevalent |
| through Hadoop is that it is a single file used by both clients and servers. This makes |
| it very difficult to protect secrets that are only meant for the server processes while |
| allowing the clients to connect to the servers. |
| |
| The client configuration file is a subset of the information stored in accumulo-site.xml |
| meant only for consumption by clients of Accumulo. By default, Accumulo checks a number |
| of locations for a client configuration by default: |
| |
| * +\${ACCUMULO_CONF_DIR}/client.conf+ |
| * +/etc/accumulo/client.conf+ |
| * +/etc/accumulo/conf/client.conf+ |
| * +~/.accumulo/config+ |
| |
| These files are https://en.wikipedia.org/wiki/.properties[Java Properties files]. These files |
| can currently contain information about ZooKeeper servers, RPC properties (such as SSL or SASL |
| connectors), distributed tracing properties. Valid properties are defined by the |
| https://github.com/apache/accumulo/blob/f1d0ec93d9f13ff84844b5ac81e4a7b383ced467/core/src/main/java/org/apache/accumulo/core/client/ClientConfiguration.java#L54[ClientProperty] |
| enum contained in the client API. |
| |
| ==== Custom Table Tags |
| |
| Accumulo has the ability for users to add custom tags to tables. This allows |
| applications to set application-level metadata about a table. These tags can be |
| anything from a table description, administrator notes, date created, etc. |
| This is done by naming and setting a property with a prefix +table.custom.*+. |
| |
| Currently, table properties are stored in ZooKeeper. This means that the number |
| and size of custom properties should be restricted on the order of 10's of properties |
| at most without any properties exceeding 1MB in size. ZooKeeper's performance can be |
| very sensitive to an excessive number of nodes and the sizes of the nodes. Applications |
| which leverage the user of custom properties should take these warnings into |
| consideration. There is no enforcement of these warnings via the API. |
| |
| ==== Configuring the ClassLoader |
| |
| Accumulo loads classes from the locations specified in the +general.classpaths+ property. Additionally, Accumulo will load classes |
| from the locations specified in the +general.dynamic.classpaths+ property and will monitor and reload them if they change. The reloading |
| feature is useful during the development and testing of iterators as new or modified iterator classes can be deployed to Accumulo without |
| having to restart the database. |
| |
| Accumulo also has an alternate configuration for the classloader which will allow it to load classes from remote locations. This mechanism |
| uses Apache Commons VFS which enables locations such as http and hdfs to be used. This alternate configuration also uses the |
| +general.classpaths+ property in the same manner described above. It differs in that you need to configure the |
| +general.vfs.classpaths+ property instead of the +general.dynamic.classpath+ property. As in the default configuration, this alternate |
| configuration will also monitor the vfs locations for changes and reload if necessary. |
| |
| ===== ClassLoader Contexts |
| |
| With the addition of the VFS based classloader, we introduced the notion of classloader contexts. A context is identified |
| by a name and references a set of locations from which to load classes and can be specified in the accumulo-site.xml file or added |
| using the +config+ command in the shell. Below is an example for specify the app1 context in the accumulo-site.xml file: |
| |
| [source,xml] |
| <property> |
| <name>general.vfs.context.classpath.app1</name> |
| <value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value> |
| <description>Application A classpath, loads jars from HDFS and local file system</description> |
| </property> |
| |
| The default behavior follows the Java ClassLoader contract in that classes, if they exists, are loaded from the parent classloader first. |
| You can override this behavior by delegating to the parent classloader after looking in this classloader first. An example of this |
| configuration is: |
| |
| [source,xml] |
| <property> |
| <name>general.vfs.context.classpath.app1.delegation=post</name> |
| <value>hdfs://localhost:8020/applicationA/classpath/.*.jar,file:///opt/applicationA/lib/.*.jar</value> |
| <description>Application A classpath, loads jars from HDFS and local file system</description> |
| </property> |
| |
| To use contexts in your application you can set the +table.classpath.context+ on your tables or use the +setClassLoaderContext()+ method on Scanner |
| and BatchScanner passing in the name of the context, app1 in the example above. Setting the property on the table allows your minc, majc, and scan |
| iterators to load classes from the locations defined by the context. Passing the context name to the scanners allows you to override the table setting |
| to load only scan time iterators from a different location. |
| |
| |
| === Initialization |
| |
| Accumulo must be initialized to create the structures it uses internally to locate |
| data across the cluster. HDFS is required to be configured and running before |
| Accumulo can be initialized. |
| |
| Once HDFS is started, initialization can be performed by executing |
| +$ACCUMULO_HOME/bin/accumulo init+ . This script will prompt for a name |
| for this instance of Accumulo. The instance name is used to identify a set of tables |
| and instance-specific settings. The script will then write some information into |
| HDFS so Accumulo can start properly. |
| |
| The initialization script will prompt you to set a root password. Once Accumulo is |
| initialized it can be started. |
| |
| === Running |
| |
| ==== Starting Accumulo |
| |
| Make sure Hadoop is configured on all of the machines in the cluster, including |
| access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running. |
| Make sure ZooKeeper is configured and running on at least one machine in the |
| cluster. |
| Start Accumulo using the +bin/start-all.sh+ script. |
| |
| To verify that Accumulo is running, check the Status page as described in |
| <<monitoring>>. In addition, the Shell can provide some information about the status of |
| tables via reading the metadata tables. |
| |
| ==== Stopping Accumulo |
| |
| To shutdown cleanly, run +bin/stop-all.sh+ and the master will orchestrate the |
| shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may |
| take some time for particular configurations. |
| |
| ==== Adding a Node |
| |
| Update your +$ACCUMULO_HOME/conf/slaves+ (or +$ACCUMULO_CONF_DIR/slaves+) file to account for the addition. |
| |
| Next, ssh to each of the hosts you want to add and run: |
| |
| $ACCUMULO_HOME/bin/start-here.sh |
| |
| Make sure the host in question has the new configuration, or else the tablet |
| server won't start; at a minimum this needs to be on the host(s) being added, |
| but in practice it's good to ensure consistent configuration across all nodes. |
| |
| ==== Decomissioning a Node |
| |
| If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet |
| server. Accumulo will automatically rebalance the tablets across the available tablet servers. |
| |
| $ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...} |
| |
| Alternatively, you can ssh to each of the hosts you want to remove and run: |
| |
| $ACCUMULO_HOME/bin/stop-here.sh |
| |
| Be sure to update your +$ACCUMULO_HOME/conf/slaves+ (or +$ACCUMULO_CONF_DIR/slaves+) file to |
| account for the removal of these hosts. Bear in mind that the monitor will not re-read the |
| slaves file automatically, so it will report the decomissioned servers as down; it's |
| recommended that you restart the monitor so that the node list is up to date. |
| |
| ==== Restarting process on a node |
| |
| Occasionally, it might be necessary to restart the processes on a specific node. In addition |
| to the +start-all.sh+ and +stop-all.sh+ scripts, Accumulo contains scripts to start/stop all processes |
| on a node and start/stop a given process on a node. |
| |
| +start-here.sh+ and +stop-here.sh+ will start/stop all Accumulo processes on the current node. The |
| necessary processes to start/stop are determined via the "hosts" files (e.g. slaves, masters, etc). |
| These scripts expect no arguments. |
| |
| +start-server.sh+ can also be useful in starting a given process on a host. |
| The first argument to the process is the hostname of the machine. Use the same host that |
| you specified in hosts file (if you specified FQDN in the masters file, use the FQDN, not |
| the shortname). The second argument is the name of the process to start (e.g. master, tserver). |
| |
| The steps described to decomission a node can also be used (without removal of the host |
| from the +$ACCUMULO_HOME/conf/slaves+ file) to gracefully stop a node. This will |
| ensure that the tabletserver is cleanly stopped and recovery will not need to be performed |
| when the tablets are re-hosted. |
| |
| ===== A note on rolling restarts |
| |
| For sufficiently large Accumulo clusters, restarting multiple TabletServers within a short window can place significant |
| load on the Master server. If slightly lower availability is acceptable, this load can be reduced by globally setting |
| +table.suspend.duration+ to a positive value. |
| |
| With +table.suspend.duration+ set to, say, +5m+, Accumulo will wait |
| for 5 minutes for any dead TabletServer to return before reassigning that TabletServer's responsibilities to other TabletServers. |
| If the TabletServer returns to the cluster before the specified timeout has elapsed, Accumulo will assign the TabletServer |
| its original responsibilities. |
| |
| It is important not to choose too large a value for +table.suspend.duration+, as during this time, all scans against the |
| data that TabletServer had hosted will block (or time out). |
| |
| ==== Running multiple TabletServers on a single node |
| |
| With very powerful nodes, it may be beneficial to run more than one TabletServer on a given |
| node. This decision should be made carefully and with much deliberation as Accumulo is designed |
| to be able to scale to using 10's of GB of RAM and 10's of CPU cores. |
| |
| To run multiple TabletServers on a single host you will need to change the +NUM_TSERVERS+ property |
| in the accumulo-env.sh file from 1 to the number of TabletServers that you want to run. On NUMA |
| hardware, with numactl installed, the TabletServer will interleave its memory allocations across |
| the NUMA nodes and the processes will be scheduled on all the NUMA cores without restriction. To |
| change this behavior you can uncomment the +TSERVER_NUMA_OPTIONS+ example in accumulo-env.sh and |
| set the numactl options for each TabletServer. |
| |
| Accumulo TabletServers bind certain ports on the host to accommodate remote procedure calls to/from |
| other nodes. Running more than one TabletServer on a host requires that you set the following |
| properties in +accumulo-site.xml+: |
| |
| <property> |
| <name>tserver.port.client</name> |
| <value>0</value> |
| </property> |
| <property> |
| <name>replication.receipt.service.port</name> |
| <value>0</value> |
| </property> |
| |
| Accumulo's provided scripts for starting and stopping the cluster should work normally with multiple |
| TabletServers on a host. Sanity checks are provided in the scripts and will output an error when there |
| is a configuration mismatch. |
| |
| [[monitoring]] |
| === Monitoring |
| |
| ==== Accumulo Monitor |
| The Accumulo Monitor provides an interface for monitoring the status and health of |
| Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at |
| +http://_monitorhost_:9995/+. |
| |
| Things highlighted in yellow may be in need of attention. |
| If anything is highlighted in red on the monitor page, it is something that definitely needs attention. |
| |
| The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID. |
| There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time. |
| These include ingest and scan performance and other useful measurements. |
| |
| The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table). |
| Metrics typically include number of entries (key/value pairs), ingest and query rates. |
| The number of running scans, major and minor compactions are in the form _number_running_ (_number_queued_). |
| Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction. |
| |
| The Server Activity page graphically displays tablet server status, with each server represented as a circle or square. |
| Different metrics may be assigned to the nodes' color and speed of oscillation. |
| The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization). |
| Similarly, the Overall Max metric picks the metric with the maximum normalized value. |
| |
| The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle. |
| The Traces page displays data for recent traces performed (see the following section for information on <<tracing>>). |
| The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes. |
| Also, the XML and JSON links provide metrics in XML and JSON formats, respectively. |
| |
| The Accumulo monitor does a best-effort to not display any sensitive information to users; however, |
| the monitor is intended to be a tool used with care. It is not a production-grade webservice. It is |
| a good idea to whitelist access to the monitor via an authentication proxy or firewall. It |
| is strongly recommended that the Monitor is not exposed to any publicly-accessible networks. |
| |
| ==== SSL |
| SSL may be enabled for the monitor page by setting the following properties in the +accumulo-site.xml+ file: |
| |
| monitor.ssl.keyStore |
| monitor.ssl.keyStorePassword |
| monitor.ssl.trustStore |
| monitor.ssl.trustStorePassword |
| |
| If the Accumulo conf directory has been configured (in particular the +accumulo-env.sh+ file must be set up), the +generate_monitor_certificate.sh+ script in the Accumulo +bin+ directory can be used to create the keystore and truststore files with random passwords. |
| The script will print out the properties that need to be added to the +accumulo-site.xml+ file. |
| The stores can also be generated manually with the Java +keytool+ command, whose usage can be seen in the +generate_monitor_certificate.sh+ script. |
| |
| If desired, the SSL ciphers allowed for connections can be controlled via the following properties in +accumulo-site.xml+: |
| |
| monitor.ssl.include.ciphers |
| monitor.ssl.exclude.ciphers |
| |
| If SSL is enabled, the monitor URL can only be accessed via https. |
| This also allows you to access the Accumulo shell through the monitor page. |
| The left navigation bar will have a new link to Shell. |
| An Accumulo user name and password must be entered for access to the shell. |
| |
| === Metrics |
| Accumulo is capable of using the Hadoop Metrics2 library and is configured by default to use it. Metrics2 is a library |
| which allows for routing of metrics generated by registered MetricsSources to configured MetricsSinks. Examples of sinks |
| that are implemented by Hadoop include file-based logging, Graphite and Ganglia. All metric sources are exposed via JMX |
| when using Metrics2. |
| |
| Previous to Accumulo 1.7.0, JMX endpoints could be exposed in addition to file-based logging of those metrics configured via |
| the +accumulo-metrics.xml+ file. This mechanism can still be used by setting +general.legacy.metrics+ to +true+ in +accumulo-site.xml+. |
| |
| ==== Metrics2 Configuration |
| |
| Metrics2 is configured by examining the classpath for a file that matches +hadoop-metrics2*.properties+. The example configuration |
| files that Accumulo provides for use include +hadoop-metrics2-accumulo.properties+ as a template which can be used to enable |
| file, Graphite or Ganglia sinks (some minimal configuration required for Graphite and Ganglia). Because the Hadoop configuration is |
| also on the Accumulo classpath, be sure that you do not have multiple Metrics2 configuration files. It is recommended to consolidate |
| metrics in a single properties file in a central location to remove ambiguity. The contents of +hadoop-metrics2-accumulo.properties+ |
| can be added to a central +hadoop-metrics2.properties+ in +$HADOOP_CONF_DIR+. |
| |
| As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative |
| to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files |
| from growing without bound. |
| |
| Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check |
| the Accumulo processes log files when debugging missing metrics output. |
| |
| For additional information on configuring Metrics2, visit the |
| https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html[Javadoc page for Metrics2]. |
| |
| [[tracing]] |
| === Tracing |
| It can be difficult to determine why some operations are taking longer |
| than expected. For example, you may be looking up items with very low |
| latency, but sometimes the lookups take much longer. Determining the |
| cause of the delay is difficult because the system is distributed, and |
| the typical lookup is fast. |
| |
| Accumulo has been instrumented to record the time that various |
| operations take when tracing is turned on. The fact that tracing is |
| enabled follows all the requests made on behalf of the user throughout |
| the distributed infrastructure of accumulo, and across all threads of |
| execution. |
| |
| These time spans will be inserted into the +trace+ table in |
| Accumulo. You can browse recent traces from the Accumulo monitor |
| page. You can also read the +trace+ table directly like any |
| other table. |
| |
| The design of Accumulo's distributed tracing follows that of |
| http://research.google.com/pubs/pub36356.html[Google's Dapper]. |
| |
| ==== Tracers |
| To collect traces, Accumulo needs at least one server listed in |
| +$ACCUMULO_HOME/conf/tracers+. The server collects traces |
| from clients and writes them to the +trace+ table. The Accumulo |
| user that the tracer connects to Accumulo with can be configured with |
| the following properties |
| (see the <<configuration,Configuration>> section for setting Accumulo server properties) |
| |
| trace.user |
| trace.token.property.password |
| |
| Other tracer configuration properties include |
| |
| trace.port.client - port tracer listens on |
| trace.table - table tracer writes to |
| trace.zookeeper.path - zookeeper path where tracers register |
| |
| The zookeeper path is configured to /tracers by default. If |
| multiple Accumulo instances are sharing the same ZooKeeper |
| quorum, take care to configure Accumulo with unique values for |
| this property. |
| |
| ==== Configuring Tracing |
| Traces are collected via SpanReceivers. The default SpanReceiver |
| configured is org.apache.accumulo.core.trace.ZooTraceClient, which |
| sends spans to an Accumulo Tracer process, as discussed in the |
| previous section. This default can be changed to a different span |
| receiver, or additional span receivers can be added in a |
| comma-separated list, by modifying the property |
| |
| trace.span.receivers |
| |
| Individual span receivers may require their own configuration |
| parameters, which are grouped under the trace.span.receiver.* |
| prefix. ZooTraceClient uses the following properties. The first |
| three properties are populated from other Accumulo properties, |
| while the remaining ones should be prefixed with |
| trace.span.receiver. when set in the Accumulo configuration. |
| |
| tracer.zookeeper.host - populated from instance.zookeepers |
| tracer.zookeeper.timeout - populated from instance.zookeeper.timeout |
| tracer.zookeeper.path - populated from trace.zookeeper.path |
| tracer.send.timer.millis - timer for flushing send queue (in ms, default 1000) |
| tracer.queue.size - max queue size (default 5000) |
| tracer.span.min.ms - minimum span length to store (in ms, default 1) |
| |
| Note that to configure an Accumulo client for tracing, including |
| the Accumulo shell, the client configuration must be given the same |
| trace.span.receivers, trace.span.receiver.*, and trace.zookeeper.path |
| properties as the servers have. |
| |
| Hadoop can also be configured to send traces to Accumulo, as of |
| Hadoop 2.6.0, by setting properties in Hadoop's core-site.xml |
| file. Instead of using the trace.span.receiver.* prefix, Hadoop |
| uses hadoop.htrace.*. The Hadoop configuration does not have |
| access to Accumulo's properties, so the |
| hadoop.htrace.tracer.zookeeper.host property must be specified. |
| The zookeeper timeout defaults to 30000 (30 seconds), and the |
| zookeeper path defaults to /tracers. An example of configuring |
| Hadoop to send traces to ZooTraceClient is |
| |
| <property> |
| <name>hadoop.htrace.spanreceiver.classes</name> |
| <value>org.apache.accumulo.core.trace.ZooTraceClient</value> |
| </property> |
| <property> |
| <name>hadoop.htrace.tracer.zookeeper.host</name> |
| <value>zookeeperHost:2181</value> |
| </property> |
| <property> |
| <name>hadoop.htrace.tracer.zookeeper.path</name> |
| <value>/tracers</value> |
| </property> |
| <property> |
| <name>hadoop.htrace.tracer.span.min.ms</name> |
| <value>1</value> |
| </property> |
| |
| The accumulo-core, accumulo-tracer, accumulo-fate and libthrift |
| jars must also be placed on Hadoop's classpath. |
| |
| ===== Adding additional SpanReceivers |
| https://github.com/openzipkin/zipkin[Zipkin] |
| has a SpanReceiver supported by HTrace and popularized by Twitter |
| that users looking for a more graphical trace display may opt to use. |
| The following steps configure Accumulo to use +org.apache.htrace.impl.ZipkinSpanReceiver+ |
| in addition to the Accumulo's default ZooTraceClient, and they serve as a template |
| for adding any SpanReceiver to Accumulo: |
| |
| 1. Add the Jar containing the ZipkinSpanReceiver class file to the |
| +$ACCUMULO_HOME/lib/+. It is critical that the Jar is placed in |
| +lib/+ and NOT in +lib/ext/+ so that the new SpanReceiver class |
| is visible to the same class loader of htrace-core. |
| |
| 2. Add the following to +$ACCUMULO_HOME/conf/accumulo-site.xml+: |
| |
| <property> |
| <name>trace.span.receivers</name> |
| <value>org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver</value> |
| </property> |
| |
| 3. Restart your Accumulo tablet servers. |
| |
| In order to use ZipkinSpanReceiver from a client as well as the Accumulo server, |
| |
| 1. Ensure your client can see the ZipkinSpanReceiver class at runtime. For Maven projects, |
| this is easily done by adding to your client's pom.xml (taking care to specify a good version) |
| |
| <dependency> |
| <groupId>org.apache.htrace</groupId> |
| <artifactId>htrace-zipkin</artifactId> |
| <version>3.1.0-incubating</version> |
| <scope>runtime</scope> |
| </dependency> |
| |
| 2. Add the following to your ClientConfiguration |
| (see the <<ClientConfiguration>> section) |
| |
| trace.span.receivers=org.apache.accumulo.tracer.ZooTraceClient,org.apache.htrace.impl.ZipkinSpanReceiver |
| |
| 3. Instrument your client as in the next section. |
| |
| Your SpanReceiver may require additional properties, and if so these should likewise |
| be placed in the ClientConfiguration (if applicable) and Accumulo's +accumulo-site.xml+. |
| Two such properties for ZipkinSpanReceiver, listed with their default values, are |
| |
| <property> |
| <name>trace.span.receiver.zipkin.collector-hostname</name> |
| <value>localhost</value> |
| </property> |
| <property> |
| <name>trace.span.receiver.zipkin.collector-port</name> |
| <value>9410</value> |
| </property> |
| |
| ==== Instrumenting a Client |
| Tracing can be used to measure a client operation, such as a scan, as |
| the operation traverses the distributed system. To enable tracing for |
| your application call |
| |
| [source,java] |
| import org.apache.accumulo.core.trace.DistributedTrace; |
| ... |
| DistributedTrace.enable(hostname, "myApplication"); |
| // do some tracing |
| ... |
| DistributedTrace.disable(); |
| |
| Once tracing has been enabled, a client can wrap an operation in a trace. |
| |
| [source,java] |
| import org.apache.htrace.Sampler; |
| import org.apache.htrace.Trace; |
| import org.apache.htrace.TraceScope; |
| ... |
| TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS); |
| BatchScanner scanner = conn.createBatchScanner(...); |
| // Configure your scanner |
| for (Entry entry : scanner) { |
| } |
| scope.close(); |
| |
| The user can create additional Spans within a Trace. |
| |
| The sampler (such as +Sampler.ALWAYS+) for the trace should only be specified with a top-level span, |
| and subsequent spans will be collected depending on whether that first span was sampled. |
| Don't forget to specify a Sampler at the top-level span |
| because the default Sampler only samples when part of a pre-existing trace, |
| which will never occur in a client that never specifies a Sampler. |
| |
| [source,java] |
| TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS); |
| ... |
| TraceScope readScope = Trace.startSpan("Read"); |
| ... |
| readScope.close(); |
| ... |
| TraceScope writeScope = Trace.startSpan("Write"); |
| ... |
| writeScope.close(); |
| scope.close(); |
| |
| Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace. |
| Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS. |
| |
| [source,java] |
| ... |
| int numberOfEntriesRead = 0; |
| TraceScope readScope = Trace.startSpan("Read"); |
| // Do the read, update the counter |
| ... |
| if (Trace.isTracing) |
| readScope.getSpan().addKVAnnotation("Number of Entries Read".getBytes(StandardCharsets.UTF_8), |
| String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8)); |
| |
| It is also possible to add timeline annotations to your spans. |
| This associates a string with a given timestamp between the start and stop times for a span. |
| |
| [source,java] |
| ... |
| writeScope.getSpan().addTimelineAnnotation("Initiating Flush"); |
| |
| Some client operations may have a high volume within your |
| application. As such, you may wish to only sample a percentage of |
| operations for tracing. As seen below, the CountSampler can be used to |
| help enable tracing for 1-in-1000 operations |
| |
| [source,java] |
| import org.apache.htrace.impl.CountSampler; |
| ... |
| Sampler sampler = new CountSampler(HTraceConfiguration.fromMap( |
| Collections.singletonMap(CountSampler.SAMPLER_FREQUENCY_CONF_KEY, "1000"))); |
| ... |
| TraceScope readScope = Trace.startSpan("Read", sampler); |
| ... |
| readScope.close(); |
| |
| Remember to close all spans and disable tracing when finished. |
| |
| [source,java] |
| DistributedTrace.disable(); |
| |
| ==== Viewing Collected Traces |
| To view collected traces, use the "Recent Traces" link on the Monitor |
| UI. You can also programmatically access and print traces using the |
| +TraceDump+ class. |
| |
| ===== Trace Table Format |
| This section is for developers looking to use data recorded in the trace table |
| directly, above and beyond the default services of the Accumulo monitor. |
| Please note the trace table format and its supporting classes |
| are not in the public API and may be subject to change in future versions. |
| |
| Each span received by a tracer's ZooTraceClient is recorded in the trace table |
| in the form of three entries: span entries, index entries, and start time entries. |
| Span and start time entries record full span information, |
| whereas index entries provide indexing into span information |
| useful for quickly finding spans by type or start time. |
| |
| Each entry is illustrated by a description and sample of data. |
| In the description, a token in quotes is a String literal, |
| whereas other other tokens are span variables. |
| Parentheses group parts together, to distinguish colon characters inside the |
| column family or qualifier from the colon that separates column family and qualifier. |
| We use the format +row columnFamily:columnQualifier columnVisibility value+ |
| (omitting timestamp which records the time an entry is written to the trace table). |
| |
| Span entries take the following form: |
| |
| traceId "span":(parentSpanId:spanId) [] spanBinaryEncoding |
| 63b318de80de96d1 span:4b8f66077df89de1:3778c6739afe4e1 [] %18;%09;... |
| |
| The parentSpanId is "" for the root span of a trace. |
| The spanBinaryEncoding is a compact Apache Thrift encoding of the original Span object. |
| This allows clients (and the Accumulo monitor) to recover all the details of the original Span |
| at a later time, by scanning the trace table and decoding the value of span entries |
| via +TraceFormatter.getRemoteSpan(entry)+. |
| |
| The trace table has a formatter class by default (org.apache.accumulo.tracer.TraceFormatter) |
| that changes how span entries appear from the Accumulo shell. |
| Normal scans to the trace table do not use this formatter representation; |
| it exists only to make span entries easier to view inside the Accumulo shell. |
| |
| Index entries take the following form: |
| |
| "idx":service:startTime description:sender [] traceId:elapsedTime |
| idx:tserver:14f3828f58b startScan:localhost [] 63b318de80de96d1:1 |
| |
| The service and sender are set by the first call of each Accumulo process |
| (and instrumented client processes) to +DistributedTrace.enable(...)+ |
| (the sender is autodetected if not specified). |
| The description is specified in each span. |
| Start time and the elapsed time (start - stop, 1 millisecond in the example above) |
| are recorded in milliseconds as long values serialized to a string in hex. |
| |
| Start time entries take the following form: |
| |
| "start":startTime "id":traceId [] spanBinaryEncoding |
| start:14f3828a351 id:63b318de80de96d1 [] %18;%09;... |
| |
| The following classes may be run from $ACCUMULO_HOME while Accumulo is running |
| to provide insight into trace statistics. These require |
| accumulo-trace-VERSION.jar to be provided on the Accumulo classpath |
| (+$ACCUMULO_HOME/lib/ext+ is fine). |
| |
| $ bin/accumulo org.apache.accumulo.tracer.TraceTableStats -u username -p password -i instancename |
| $ bin/accumulo org.apache.accumulo.tracer.TraceDump -u username -p password -i instancename -r |
| |
| ==== Tracing from the Shell |
| You can enable tracing for operations run from the shell by using the |
| +trace on+ and +trace off+ commands. |
| |
| ---- |
| root@test test> trace on |
| |
| root@test test> scan |
| a b:c [] d |
| |
| root@test test> trace off |
| Waiting for trace information |
| Waiting for trace information |
| Trace started at 2013/08/26 13:24:08.332 |
| Time Start Service@Location Name |
| 3628+0 shell@localhost shell:root |
| 8+1690 shell@localhost scan |
| 7+1691 shell@localhost scan:location |
| 6+1692 tserver@localhost startScan |
| 5+1692 tserver@localhost tablet read ahead 6 |
| ---- |
| |
| === Logging |
| Accumulo processes each write to a set of log files. By default these are found under |
| +$ACCUMULO/logs/+. |
| |
| [[watcher]] |
| === Watcher |
| Accumulo includes scripts to automatically restart server processes in the case |
| of intermittent failures. To enable this watcher, edit +conf/accumulo-env.sh+ |
| to include the following: |
| |
| .... |
| # Should process be automatically restarted |
| export ACCUMULO_WATCHER="true" |
| |
| # What settings should we use for the watcher, if enabled |
| export UNEXPECTED_TIMESPAN="3600" |
| export UNEXPECTED_RETRIES="2" |
| |
| export OOM_TIMESPAN="3600" |
| export OOM_RETRIES="5" |
| |
| export ZKLOCK_TIMESPAN="600" |
| export ZKLOCK_RETRIES="5" |
| .... |
| |
| When an Accumulo process dies, the watcher will look at the logs and exit codes |
| to determine how the process failed and either restart or fail depending on the |
| recent history of failures. The restarting policy for various failure conditions |
| is configurable through the +*_TIMESPAN+ and +*_RETRIES+ variables shown above. |
| |
| === Recovery |
| |
| In the event of TabletServer failure or error on shutting Accumulo down, some |
| mutations may not have been minor compacted to HDFS properly. In this case, |
| Accumulo will automatically reapply such mutations from the write-ahead log |
| either when the tablets from the failed server are reassigned by the Master (in the |
| case of a single TabletServer failure) or the next time Accumulo starts (in the event of |
| failure during shutdown). |
| |
| Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing |
| updates. The sort status of each file is displayed on |
| Accumulo monitor status page. Once the recovery is complete any |
| tablets involved should return to an ``online'' state. Until then those tablets will be |
| unavailable to clients. |
| |
| The Accumulo client library is configured to retry failed mutations and in many |
| cases clients will be able to continue processing after the recovery process without |
| throwing an exception. |
| |
| === Migrating Accumulo from non-HA Namenode to HA Namenode |
| |
| The following steps will allow a non-HA instance to be migrated to an HA instance. Consider an HDFS URL |
| +hdfs://namenode.example.com:8020+ which is going to be moved to +hdfs://nameservice1+. |
| |
| Before moving HDFS over to the HA namenode, use +$ACCUMULO_HOME/bin/accumulo admin volumes+ to confirm |
| that the only volume displayed is the volume from the current namenode's HDFS URL. |
| |
| Listing volumes referenced in zookeeper |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| |
| Listing volumes referenced in accumulo.root tablets section |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time) |
| |
| Listing volumes referenced in accumulo.metadata tablets section |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| |
| Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time) |
| |
| After verifying the current volume is correct, shut down the cluster and transition HDFS to the HA nameservice. |
| |
| Edit +$ACCUMULO_HOME/conf/accumulo-site.xml+ to notify accumulo that a volume is being replaced. First, |
| add the new nameservice volume to the +instance.volumes+ property. Next, add the |
| +instance.volumes.replacements+ property in the form of +old new+. It's important to not include |
| the volume that's being replaced in +instance.volumes+, otherwise it's possible accumulo could continue |
| to write to the volume. |
| |
| [source,xml] |
| <!-- instance.dfs.uri and instance.dfs.dir should not be set--> |
| <property> |
| <name>instance.volumes</name> |
| <value>hdfs://nameservice1/accumulo</value> |
| </property> |
| <property> |
| <name>instance.volumes.replacements</name> |
| <value>hdfs://namenode.example.com:8020/accumulo hdfs://nameservice1/accumulo</value> |
| </property> |
| |
| Run +$ACCUMULO_HOME/bin/accumulo init --add-volumes+ and start up the accumulo cluster. Verify that the |
| new nameservice volume shows up with +$ACCUMULO_HOME/bin/accumulo admin volumes+. |
| |
| |
| Listing volumes referenced in zookeeper |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| Volume : hdfs://nameservice1/accumulo |
| |
| Listing volumes referenced in accumulo.root tablets section |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| Volume : hdfs://nameservice1/accumulo |
| Listing volumes referenced in accumulo.root deletes section (volume replacement occurrs at deletion time) |
| |
| Listing volumes referenced in accumulo.metadata tablets section |
| Volume : hdfs://namenode.example.com:8020/accumulo |
| Volume : hdfs://nameservice1/accumulo |
| Listing volumes referenced in accumulo.metadata deletes section (volume replacement occurrs at deletion time) |
| |
| Some erroneous GarbageCollector messages may still be seen for a small period while data is transitioning to |
| the new volumes. This is expected and can usually be ignored. |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| === Achieving Stability in a VM Environment |
| |
| For testing, demonstration, and even operation uses, Accumulo is often |
| installed and run in a virtual machine (VM) environment. The majority of |
| long-term operational uses of Accumulo are on bare-metal cluster. However, the |
| core design of Accumulo and its dependencies do not preclude running stably for |
| long periods within a VM. Many of Accumulo’s operational robustness features to |
| handle failures like periodic network partitioning in a large cluster carry |
| over well to VM environments. This guide covers general recommendations for |
| maximizing stability in a VM environment, including some of the common failure |
| modes that are more common when running in VMs. |
| |
| ==== Known failure modes: Setup and Troubleshooting |
| In addition to the general failure modes of running Accumulo, VMs can introduce a |
| couple of environmental challenges that can affect process stability. Clock |
| drift is something that is more common in VMs, especially when VMs are |
| suspended and resumed. Clock drift can cause Accumulo servers to assume that |
| they have lost connectivity to the other Accumulo processes and/or lose their |
| locks in Zookeeper. VM environments also frequently have constrained resources, |
| such as CPU, RAM, network, and disk throughput and capacity. Accumulo generally |
| deals well with constrained resources from a stability perspective (optimizing |
| performance will require additional tuning, which is not covered in this |
| section), however there are some limits. |
| |
| ===== Physical Memory |
| One of those limits has to do with the Linux out of memory killer. A common |
| failure mode in VM environments (and in some bare metal installations) is when |
| the Linux out of memory killer decides to kill processes in order to avoid a |
| kernel panic when provisioning a memory page. This often happens in VMs due to |
| the large number of processes that must run in a small memory footprint. In |
| addition to the Linux core processes, a single-node Accumulo setup requires a |
| Hadoop Namenode, a Hadoop Secondary Namenode a Hadoop Datanode, a Zookeeper |
| server, an Accumulo Master, an Accumulo GC and an Accumulo TabletServer. |
| Typical setups also include an Accumulo Monitor, an Accumulo Tracer, a Hadoop |
| ResourceManager, a Hadoop NodeManager, provisioning software, and client |
| applications. Between all of these processes, it is not uncommon to |
| over-subscribe the available RAM in a VM. We recommend setting up VMs without |
| swap enabled, so rather than performance grinding to a halt when physical |
| memory is exhausted the kernel will randomly* select processes to kill in order |
| to free up memory. |
| |
| Calculating the maximum possible memory usage is essential in creating a stable |
| Accumulo VM setup. Safely engineering memory allocation for stability is a |
| matter of then bringing the calculated maximum memory usage under the physical |
| memory by a healthy margin. The margin is to account for operating system-level |
| operations, such as managing process, maintaining virtual memory pages, and |
| file system caching. When the java out-of-memory killer finds your process, you |
| will probably only see evidence of that in /var/log/messages. Out-of-memory |
| process kills do not show up in Accumulo or Hadoop logs. |
| |
| To calculate the max memory usage of all java virtual machine (JVM) processes |
| add the maximum heap size (often limited by a -Xmx... argument, such as in |
| accumulo-site.xml) and the off-heap memory usage. Off-heap memory usage |
| includes the following: |
| |
| * "Permanent Space", where the JVM stores Classes, Methods, and other code elements. This can be limited by a JVM flag such as +-XX:MaxPermSize:100m+, and is typically tens of megabytes. |
| * Code generation space, where the JVM stores just-in-time compiled code. This is typically small enough to ignore |
| * Socket buffers, where the JVM stores send and receive buffers for each socket. |
| * Thread stacks, where the JVM allocates memory to manage each thread. |
| * Direct memory space and JNI code, where applications can allocate memory outside of the JVM-managed space. For Accumulo, this includes the native in-memory maps that are allocated with the memory.maps.max parameter in accumulo-site.xml. |
| * Garbage collection space, where the JVM stores information used for garbage collection. |
| |
| You can assume that each Hadoop and Accumulo process will use ~100-150MB for |
| Off-heap memory, plus the in-memory map of the Accumulo TServer process. A |
| simple calculation for physical memory requirements follows: |
| |
| .... |
| Physical memory needed |
| = (per-process off-heap memory) + (heap memory) + (other processes) + (margin) |
| = (number of java processes * 150M + native map) + (sum of -Xmx settings for java process) + (total applications memory, provisioning memory, etc.) + (1G) |
| = (11*150M +500M) + (1G +1G +1G +256M +1G +256M +512M +512M +512M +512M +512M) + (2G) + (1G) |
| = (2150M) + (7G) + (2G) + (1G) |
| = ~12GB |
| .... |
| |
| These calculations can add up quickly with the large number of processes, |
| especially in constrained VM environments. To reduce the physical memory |
| requirements, it is a good idea to reduce maximum heap limits and turn off |
| unnecessary processes. If you're not using YARN in your application, you can |
| turn off the ResourceManager and NodeManager. If you're not expecting to |
| re-provision the cluster frequently you can turn off or reduce provisioning |
| processes such as Salt Stack minions and masters. |
| |
| ===== Disk Space |
| Disk space is primarily used for two operations: storing data and storing logs. |
| While Accumulo generally stores all of its key/value data in HDFS, Accumulo, |
| Hadoop, and Zookeeper all store a significant amount of logs in a directory on |
| a local file system. Care should be taken to make sure that (a) limitations to |
| the amount of logs generated are in place, and (b) enough space is available to |
| host the generated logs on the partitions that they are assigned. When space is |
| not available to log, processes will hang. This can cause interruptions in |
| availability of Accumulo, as well as cascade into failures of various |
| processes. |
| |
| Hadoop, Accumulo, and Zookeeper use log4j as a logging mechanism, and each of |
| them has a way of limiting the logs and directing them to a particular |
| directory. Logs are generated independently for each process, so when |
| considering the total space you need to add up the maximum logs generated by |
| each process. Typically, a rolling log setup in which each process can generate |
| something like 10 100MB files is instituted, resulting in a maximum file system |
| usage of 1GB per process. Default setups for Hadoop and Zookeeper are often |
| unbounded, so it is important to set these limits in the logging configuration |
| files for each subsystem. Consult the user manual for each system for |
| instructions on how to limit generated logs. |
| |
| ===== Zookeeper Interaction |
| Accumulo is designed to scale up to thousands of nodes. At that scale, |
| intermittent interruptions in network service and other rare failures of |
| compute nodes become more common. To limit the impact of node failures on |
| overall service availability, Accumulo uses a heartbeat monitoring system that |
| leverages Zookeeper's ephemeral locks. There are several conditions that can |
| occur that cause Accumulo process to lose their Zookeeper locks, some of which |
| are true interruptions to availability and some of which are false positives. |
| Several of these conditions become more common in VM environments, where they |
| can be exacerbated by resource constraints and clock drift. |
| |
| Accumulo includes a mechanism to limit the impact of the false positives known |
| as the <<watcher>>. The watcher monitors Accumulo processes and will restart |
| them when they fail for certain reasons. The watcher can be configured within |
| the accumulo-env.sh file inside of Accumulo's configuration directory. We |
| recommend using the watcher to monitor Accumulo processes, as it will restore |
| the system to full capacity without administrator interaction after many of the |
| common failure modes. |
| |
| ==== Tested Versions |
| Each release of Accumulo is built with a specific version of Apache |
| Hadoop, Apache ZooKeeper and Apache Thrift. We expect Accumulo to |
| work with versions that are API compatable with those versions. |
| However this compatibility is not guaranteed because Hadoop, ZooKeeper |
| and Thift may not provide guarantees between their own versions. We |
| have also found that certain versions of Accumulo and Hadoop included |
| bugs that greatly affected overall stability. Thrift is particularly |
| prone to compatablity changes between versions and you must use the |
| same version your Accumulo is built with. |
| |
| Please check the release notes for your Accumulo version or use the |
| mailing lists at https://accumulo.apache.org for more info. |