| <?xml version="1.0"?> |
| <chapter |
| xml:id="configuration" |
| version="5.0" |
| xmlns="http://docbook.org/ns/docbook" |
| xmlns:xlink="http://www.w3.org/1999/xlink" |
| xmlns:xi="http://www.w3.org/2001/XInclude" |
| xmlns:svg="http://www.w3.org/2000/svg" |
| xmlns:m="http://www.w3.org/1998/Math/MathML" |
| xmlns:html="http://www.w3.org/1999/xhtml" |
| xmlns:db="http://docbook.org/ns/docbook"> |
| <!-- |
| /** |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| --> |
| <title>Apache HBase Configuration</title> |
| <para>This chapter expands upon the <xref linkend="getting_started" /> chapter to further explain |
| configuration of Apache HBase. Please read this chapter carefully, especially <xref |
| linkend="basic.prerequisites" /> to ensure that your HBase testing and deployment goes |
| smoothly, and prevent data loss.</para> |
| |
| <para> Apache HBase uses the same configuration system as Apache Hadoop. All configuration files |
| are located in the <filename>conf/</filename> directory, which needs to be kept in sync for each |
| node on your cluster.</para> |
| |
| <variablelist> |
| <title>HBase Configuration Files</title> |
| <varlistentry> |
| <term><filename>backup-masters</filename></term> |
| <listitem> |
| <para>Not present by default. A plain-text file which lists hosts on which the Master should |
| start a backup Master process, one host per line.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>hadoop-metrics2-hbase.properties</filename></term> |
| <listitem> |
| <para>Used to connect HBase Hadoop's Metrics2 framework. See the <link |
| xlink:href="http://wiki.apache.org/hadoop/HADOOP-6728-MetricsV2">Hadoop Wiki |
| entry</link> for more information on Metrics2. Contains only commented-out examples by |
| default.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>hbase-env.cmd</filename> and <filename>hbase-env.sh</filename></term> |
| <listitem> |
| <para>Script for Windows and Linux / Unix environments to set up the working environment for |
| HBase, including the location of Java, Java options, and other environment variables. The |
| file contains many commented-out examples to provide guidance.</para> |
| <note> |
| <para>In HBase 0.98.5 and newer, you must set <envar>JAVA_HOME</envar> on each node of |
| your cluster. <filename>hbase-env.sh</filename> provides a handy mechanism to do |
| this.</para> |
| </note> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>hbase-policy.xml</filename></term> |
| <listitem> |
| <para>The default policy configuration file used by RPC servers to make authorization |
| decisions on client requests. Only used if HBase security (<xref |
| linkend="security" />) is enabled.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>hbase-site.xml</filename></term> |
| <listitem> |
| <para>The main HBase configuration file. This file specifies configuration options which |
| override HBase's default configuration. You can view (but do not edit) the default |
| configuration file at <filename>docs/hbase-default.xml</filename>. You can also view the |
| entire effective configuration for your cluster (defaults and overrides) in the |
| <guilabel>HBase Configuration</guilabel> tab of the HBase Web UI.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>log4j.properties</filename></term> |
| <listitem> |
| <para>Configuration file for HBase logging via <code>log4j</code>.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry> |
| <term><filename>regionservers</filename></term> |
| <listitem> |
| <para>A plain-text file containing a list of hosts which should run a RegionServer in your |
| HBase cluster. By default this file contains the single entry |
| <literal>localhost</literal>. It should contain a list of hostnames or IP addresses, one |
| per line, and should only contain <literal>localhost</literal> if each node in your |
| cluster will run a RegionServer on its <literal>localhost</literal> interface.</para> |
| </listitem> |
| </varlistentry> |
| </variablelist> |
| |
| <tip> |
| <title>Checking XML Validity</title> |
| <para>When you edit XML, it is a good idea to use an XML-aware editor to be sure that your |
| syntax is correct and your XML is well-formed. You can also use the <command>xmllint</command> |
| utility to check that your XML is well-formed. By default, <command>xmllint</command> re-flows |
| and prints the XML to standard output. To check for well-formedness and only print output if |
| errors exist, use the command <command>xmllint -noout |
| <replaceable>filename.xml</replaceable></command>.</para> |
| </tip> |
| |
| <warning> |
| <title>Keep Configuration In Sync Across the Cluster</title> |
| <para>When running in distributed mode, after you make an edit to an HBase configuration, make |
| sure you copy the content of the <filename>conf/</filename> directory to all nodes of the |
| cluster. HBase will not do this for you. Use <command>rsync</command>, <command>scp</command>, |
| or another secure mechanism for copying the configuration files to your nodes. For most |
| configuration, a restart is needed for servers to pick up changes An exception is dynamic |
| configuration. to be described later below.</para> |
| </warning> |
| |
| <section |
| xml:id="basic.prerequisites"> |
| <title>Basic Prerequisites</title> |
| <para>This section lists required services and some required system configuration. </para> |
| |
| <table |
| xml:id="java"> |
| <title>Java</title> |
| <textobject> |
| <para>HBase requires at least Java 6 from <link |
| xlink:href="http://www.java.com/download/">Oracle</link>. The following table lists |
| which JDK version are compatible with each version of HBase.</para> |
| </textobject> |
| <tgroup |
| cols="4"> |
| <thead> |
| <row> |
| <entry>HBase Version</entry> |
| <entry>JDK 6</entry> |
| <entry>JDK 7</entry> |
| <entry>JDK 8</entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry>1.0</entry> |
| <entry><link |
| xlink:href="http://search-hadoop.com/m/DHED4Zlz0R1">Not Supported</link></entry> |
| <entry>yes</entry> |
| <entry><para>Running with JDK 8 will work but is not well tested.</para></entry> |
| </row> |
| <row> |
| <entry>0.98</entry> |
| <entry>yes</entry> |
| <entry>yes</entry> |
| <entry><para>Running with JDK 8 works but is not well tested. Building with JDK 8 would |
| require removal of the deprecated remove() method of the PoolMap class and is under |
| consideration. See ee <link |
| xlink:href="https://issues.apache.org/jira/browse/HBASE-7608">HBASE-7608</link> |
| for more information about JDK 8 support.</para></entry> |
| </row> |
| <row> |
| <entry>0.96</entry> |
| <entry>yes</entry> |
| <entry>yes</entry> |
| <entry /> |
| </row> |
| <row> |
| <entry>0.94</entry> |
| <entry>yes</entry> |
| <entry>yes</entry> |
| <entry /> |
| </row> |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <note> |
| <para>In HBase 0.98.5 and newer, you must set <envar>JAVA_HOME</envar> on each node of |
| your cluster. <filename>hbase-env.sh</filename> provides a handy mechanism to do |
| this.</para> |
| </note> |
| |
| <variablelist |
| xml:id="os"> |
| <title>Operating System Utilities</title> |
| <varlistentry |
| xml:id="ssh"> |
| <term>ssh</term> |
| <listitem> |
| <para>HBase uses the Secure Shell (ssh) command and utilities extensively to communicate |
| between cluster nodes. Each server in the cluster must be running <command>ssh</command> |
| so that the Hadoop and HBase daemons can be managed. You must be able to connect to all |
| nodes via SSH, including the local node, from the Master as well as any backup Master, |
| using a shared key rather than a password. You can see the basic methodology for such a |
| set-up in Linux or Unix systems at <xref |
| linkend="passwordless.ssh.quickstart" />. If your cluster nodes use OS X, see the |
| section, <link |
| xlink:href="http://wiki.apache.org/hadoop/Running_Hadoop_On_OS_X_10.5_64-bit_%28Single-Node_Cluster%29">SSH: |
| Setting up Remote Desktop and Enabling Self-Login</link> on the Hadoop wiki.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry |
| xml:id="dns"> |
| <term>DNS</term> |
| <listitem> |
| <para>HBase uses the local hostname to self-report its IP address. Both forward and |
| reverse DNS resolving must work in versions of HBase previous to 0.92.0. The <link |
| xlink:href="https://github.com/sujee/hadoop-dns-checker">hadoop-dns-checker</link> |
| tool can be used to verify DNS is working correctly on the cluster. The project |
| README file provides detailed instructions on usage. </para> |
| |
| <para>If your server has multiple network interfaces, HBase defaults to using the |
| interface that the primary hostname resolves to. To override this behavior, set the |
| <code>hbase.regionserver.dns.interface</code> property to a different interface. This |
| will only work if each server in your cluster uses the same network interface |
| configuration.</para> |
| |
| <para>To choose a different DNS nameserver than the system default, set the |
| <varname>hbase.regionserver.dns.nameserver</varname> property to the IP address of |
| that nameserver.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry |
| xml:id="loopback.ip"> |
| <term>Loopback IP</term> |
| <listitem> |
| <para>Prior to hbase-0.96.0, HBase only used the IP address |
| <systemitem>127.0.0.1</systemitem> to refer to <code>localhost</code>, and this could |
| not be configured. See <xref |
| linkend="loopback.ip" />.</para> |
| </listitem> |
| </varlistentry> |
| <varlistentry |
| xml:id="ntp"> |
| <term>NTP</term> |
| <listitem> |
| <para>The clocks on cluster nodes should be synchronized. A small amount of variation is |
| acceptable, but larger amounts of skew can cause erratic and unexpected behavior. Time |
| synchronization is one of the first things to check if you see unexplained problems in |
| your cluster. It is recommended that you run a Network Time Protocol (NTP) service, or |
| another time-synchronization mechanism, on your cluster, and that all nodes look to the |
| same service for time synchronization. See the <link |
| xlink:href="http://www.tldp.org/LDP/sag/html/basic-ntp-config.html">Basic NTP |
| Configuration</link> at <citetitle>The Linux Documentation Project (TLDP)</citetitle> |
| to set up NTP.</para> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry |
| xml:id="ulimit"> |
| <term>Limits on Number of Files and Processes (<command>ulimit</command>) |
| <indexterm> |
| <primary>ulimit</primary> |
| </indexterm><indexterm> |
| <primary>nproc</primary> |
| </indexterm> |
| </term> |
| |
| <listitem> |
| <para>Apache HBase is a database. It requires the ability to open a large number of files |
| at once. Many Linux distributions limit the number of files a single user is allowed to |
| open to <literal>1024</literal> (or <literal>256</literal> on older versions of OS X). |
| You can check this limit on your servers by running the command <command>ulimit |
| -n</command> when logged in as the user which runs HBase. See <xref |
| linkend="trouble.rs.runtime.filehandles" /> for some of the problems you may |
| experience if the limit is too low. You may also notice errors such as the |
| following:</para> |
| <screen> |
| 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException |
| 2010-04-06 03:04:37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901 |
| </screen> |
| <para>It is recommended to raise the ulimit to at least 10,000, but more likely 10,240, |
| because the value is usually expressed in multiples of 1024. Each ColumnFamily has at |
| least one StoreFile, and possibly more than 6 StoreFiles if the region is under load. |
| The number of open files required depends upon the number of ColumnFamilies and the |
| number of regions. The following is a rough formula for calculating the potential number |
| of open files on a RegionServer. </para> |
| <example> |
| <title>Calculate the Potential Number of Open Files</title> |
| <screen>(StoreFiles per ColumnFamily) x (regions per RegionServer)</screen> |
| </example> |
| <para>For example, assuming that a schema had 3 ColumnFamilies per region with an average |
| of 3 StoreFiles per ColumnFamily, and there are 100 regions per RegionServer, the JVM |
| will open 3 * 3 * 100 = 900 file descriptors, not counting open JAR files, configuration |
| files, and others. Opening a file does not take many resources, and the risk of allowing |
| a user to open too many files is minimal.</para> |
| <para>Another related setting is the number of processes a user is allowed to run at once. |
| In Linux and Unix, the number of processes is set using the <command>ulimit -u</command> |
| command. This should not be confused with the <command>nproc</command> command, which |
| controls the number of CPUs available to a given user. Under load, a |
| <varname>nproc</varname> that is too low can cause OutOfMemoryError exceptions. See |
| Jack Levin's <link |
| xlink:href="http://thread.gmane.org/gmane.comp.java.hadoop.hbase.user/16374">major |
| hdfs issues</link> thread on the hbase-users mailing list, from 2011.</para> |
| <para>Configuring the fmaximum number of ile descriptors and processes for the user who is |
| running the HBase process is an operating system configuration, rather than an HBase |
| configuration. It is also important to be sure that the settings are changed for the |
| user that actually runs HBase. To see which user started HBase, and that user's ulimit |
| configuration, look at the first line of the HBase log for that instance. A useful read |
| setting config on you hadoop cluster is Aaron Kimballs' <link |
| xlink:href="http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/" |
| >Configuration Parameters: What can you just ignore?</link></para> |
| <formalpara xml:id="ulimit_ubuntu"> |
| <title><command>ulimit</command> Settings on Ubuntu</title> |
| <para>To configure <command>ulimit</command> settings on Ubuntu, edit |
| <filename>/etc/security/limits.conf</filename>, which is a space-delimited file with |
| four columns. Refer to the <link |
| xlink:href="http://manpages.ubuntu.com/manpages/lucid/man5/limits.conf.5.html">man |
| page for limits.conf</link> for details about the format of this file. In the |
| following example, the first line sets both soft and hard limits for the number of |
| open files (<literal>nofile</literal>) to <literal>32768</literal> for the operating |
| system user with the username <literal>hadoop</literal>. The second line sets the |
| number of processes to 32000 for the same user.</para> |
| </formalpara> |
| <screen> |
| hadoop - nofile 32768 |
| hadoop - nproc 32000 |
| </screen> |
| <para>The settings are only applied if the Pluggable Authentication Module (PAM) |
| environment is directed to use them. To configure PAM to use these limits, be sure that |
| the <filename>/etc/pam.d/common-session</filename> file contains the following line:</para> |
| <screen>session required pam_limits.so</screen> |
| </listitem> |
| </varlistentry> |
| |
| <varlistentry |
| xml:id="windows"> |
| <term>Windows</term> |
| |
| <listitem> |
| <para>Prior to HBase 0.96, testing for running HBase on Microsoft Windows was limited. |
| Running a on Windows nodes is not recommended for production systems.</para> |
| |
| <para>To run versions of HBase prior to 0.96 on Microsoft Windows, you must install <link |
| xlink:href="http://cygwin.com/">Cygwin</link> and run HBase within the Cygwin |
| environment. This provides support for Linux/Unix commands and scripts. The full details are explained in the <link |
| xlink:href="http://hbase.apache.org/cygwin.html">Windows Installation</link> guide. Also <link |
| xlink:href="http://search-hadoop.com/?q=hbase+windows&fc_project=HBase&fc_type=mail+_hash_+dev">search |
| our user mailing list</link> to pick up latest fixes figured by Windows users.</para> |
| <para>Post-hbase-0.96.0, hbase runs natively on windows with supporting |
| <command>*.cmd</command> scripts bundled. </para></listitem> |
| </varlistentry> |
| |
| </variablelist> |
| <!-- OS --> |
| |
| <section |
| xml:id="hadoop"> |
| <title><link |
| xlink:href="http://hadoop.apache.org">Hadoop</link><indexterm> |
| <primary>Hadoop</primary> |
| </indexterm></title> |
| <para>The following table summarizes the versions of Hadoop supported with each version of |
| HBase. Based on the version of HBase, you should select the most |
| appropriate version of Hadoop. You can use Apache Hadoop, or a vendor's distribution of |
| Hadoop. No distinction is made here. See <link |
| xlink:href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support" /> |
| for information about vendors of Hadoop.</para> |
| <tip> |
| <title>Hadoop 2.x is recommended.</title> |
| <para>Hadoop 2.x is faster and includes features, such as short-circuit reads, which will |
| help improve your HBase random read profile. Hadoop 2.x also includes important bug fixes |
| that will improve your overall HBase experience. HBase 0.98 drops support for Hadoop 1.0, deprecates use of Hadoop 1.1+, |
| and HBase 1.0 will not support Hadoop 1.x.</para> |
| </tip> |
| <para>Use the following legend to interpret this table:</para> |
| <simplelist |
| type="vert" |
| columns="1"> |
| <member>S = supported and tested,</member> |
| <member>X = not supported,</member> |
| <member>NT = it should run, but not tested enough.</member> |
| </simplelist> |
| |
| <table> |
| <title>Hadoop version support matrix</title> |
| <tgroup |
| cols="6" |
| align="left" |
| colsep="1" |
| rowsep="1"> |
| <colspec |
| colname="c1" |
| align="left" /> |
| <colspec |
| colname="c2" |
| align="center" /> |
| <colspec |
| colname="c3" |
| align="center" /> |
| <colspec |
| colname="c4" |
| align="center" /> |
| <colspec |
| colname="c5" |
| align="center" /> |
| <colspec |
| colname="c6" |
| align="center" /> |
| <thead> |
| <row> |
| <entry> </entry> |
| <entry>HBase-0.92.x</entry> |
| <entry>HBase-0.94.x</entry> |
| <entry>HBase-0.96.x</entry> |
| <entry><para>HBase-0.98.x (Support for Hadoop 1.1+ is deprecated.)</para></entry> |
| <entry><para>HBase-1.0.x (Hadoop 1.x is NOT supported)</para></entry> |
| </row> |
| </thead> |
| <tbody> |
| <row> |
| <entry>Hadoop-0.20.205</entry> |
| <entry>S</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-0.22.x </entry> |
| <entry>S</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-1.0.x</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-1.1.x </entry> |
| <entry>NT</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| <entry>NT</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-0.23.x </entry> |
| <entry>X</entry> |
| <entry>S</entry> |
| <entry>NT</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.0.x-alpha </entry> |
| <entry>X</entry> |
| <entry>NT</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.1.0-beta </entry> |
| <entry>X</entry> |
| <entry>NT</entry> |
| <entry>S</entry> |
| <entry>X</entry> |
| <entry>X</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.2.0 </entry> |
| <entry>X</entry> |
| <entry><link linkend="hadoop2.hbase-0.94">NT</link></entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| <entry>NT</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.3.x</entry> |
| <entry>X</entry> |
| <entry>NT</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| <entry>NT</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.4.x</entry> |
| <entry>X</entry> |
| <entry>NT</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| </row> |
| <row> |
| <entry>Hadoop-2.5.x</entry> |
| <entry>X</entry> |
| <entry>NT</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| <entry>S</entry> |
| </row> |
| |
| </tbody> |
| </tgroup> |
| </table> |
| |
| <note |
| xml:id="replace.hadoop"> |
| <title>Replace the Hadoop Bundled With HBase!</title> |
| <para> Because HBase depends on Hadoop, it bundles an instance of the Hadoop jar under its |
| <filename>lib</filename> directory. The bundled jar is ONLY for use in standalone mode. |
| In distributed mode, it is <emphasis>critical</emphasis> that the version of Hadoop that |
| is out on your cluster match what is under HBase. Replace the hadoop jar found in the |
| HBase lib directory with the hadoop jar you are running on your cluster to avoid version |
| mismatch issues. Make sure you replace the jar in HBase everywhere on your cluster. Hadoop |
| version mismatch issues have various manifestations but often all looks like its hung up. |
| </para> |
| </note> |
| <section |
| xml:id="hadoop2.hbase-0.94"> |
| <title>Apache HBase 0.94 with Hadoop 2</title> |
| <para>To get 0.94.x to run on hadoop 2.2.0, you need to change the hadoop |
| 2 and protobuf versions in the <filename>pom.xml</filename>: Here is a diff with |
| pom.xml changes: </para> |
| <programlisting><![CDATA[$ svn diff pom.xml |
| Index: pom.xml |
| =================================================================== |
| --- pom.xml (revision 1545157) |
| +++ pom.xml (working copy) |
| @@ -1034,7 +1034,7 @@ |
| <slf4j.version>1.4.3</slf4j.version> |
| <log4j.version>1.2.16</log4j.version> |
| <mockito-all.version>1.8.5</mockito-all.version> |
| - <protobuf.version>2.4.0a</protobuf.version> |
| + <protobuf.version>2.5.0</protobuf.version> |
| <stax-api.version>1.0.1</stax-api.version> |
| <thrift.version>0.8.0</thrift.version> |
| <zookeeper.version>3.4.5</zookeeper.version> |
| @@ -2241,7 +2241,7 @@ |
| </property> |
| </activation> |
| <properties> |
| - <hadoop.version>2.0.0-alpha</hadoop.version> |
| + <hadoop.version>2.2.0</hadoop.version> |
| <slf4j.version>1.6.1</slf4j.version> |
| </properties> |
| <dependencies>]]> |
| </programlisting> |
| <para>The next step is to regenerate Protobuf files and assuming that the Protobuf |
| has been installed:</para> |
| <itemizedlist> |
| <listitem> |
| <para>Go to the hbase root folder, using the command line;</para> |
| </listitem> |
| <listitem> |
| <para>Type the following commands:</para> |
| <para> |
| <programlisting language="bourne"><![CDATA[$ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/hbase.proto]]></programlisting> |
| </para> |
| <para> |
| <programlisting language="bourne"><![CDATA[$ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/ErrorHandling.proto]]></programlisting> |
| </para> |
| </listitem> |
| </itemizedlist> |
| <para> Building against the hadoop 2 profile by running something like the |
| following command: </para> |
| <screen language="bourne">$ mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests</screen> |
| </section> |
| <section |
| xml:id="hadoop.hbase-0.94"> |
| <title>Apache HBase 0.92 and 0.94</title> |
| <para>HBase 0.92 and 0.94 versions can work with Hadoop versions, 0.20.205, 0.22.x, 1.0.x, |
| and 1.1.x. HBase-0.94 can additionally work with Hadoop-0.23.x and 2.x, but you may have |
| to recompile the code using the specific maven profile (see top level pom.xml)</para> |
| </section> |
| |
| <section |
| xml:id="hadoop.hbase-0.96"> |
| <title>Apache HBase 0.96</title> |
| <para> As of Apache HBase 0.96.x, Apache Hadoop 1.0.x at least is required. Hadoop 2 is |
| strongly encouraged (faster but also has fixes that help MTTR). We will no longer run |
| properly on older Hadoops such as 0.20.205 or branch-0.20-append. Do not move to Apache |
| HBase 0.96.x if you cannot upgrade your Hadoop.. See <link |
| xlink:href="http://search-hadoop.com/m/7vFVx4EsUb2">HBase, mail # dev - DISCUSS: |
| Have hbase require at least hadoop 1.0.0 in hbase 0.96.0?</link></para> |
| </section> |
| |
| <section |
| xml:id="hadoop.older.versions"> |
| <title>Hadoop versions 0.20.x - 1.x</title> |
| <para> HBase will lose data unless it is running on an HDFS that has a durable |
| <code>sync</code> implementation. DO NOT use Hadoop 0.20.2, Hadoop 0.20.203.0, and |
| Hadoop 0.20.204.0 which DO NOT have this attribute. Currently only Hadoop versions |
| 0.20.205.x or any release in excess of this version -- this includes hadoop-1.0.0 -- have |
| a working, durable sync. The Cloudera blog post <link |
| xlink:href="http://www.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/">An |
| update on Apache Hadoop 1.0</link> by Charles Zedlweski has a nice exposition on how all |
| the Hadoop versions relate. Its worth checking out if you are having trouble making sense |
| of the Hadoop version morass. </para> |
| <para>Sync has to be explicitly enabled by setting |
| <varname>dfs.support.append</varname> equal to true on both the client side -- in |
| <filename>hbase-site.xml</filename> -- and on the serverside in |
| <filename>hdfs-site.xml</filename> (The sync facility HBase needs is a subset of the |
| append code path).</para> |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>dfs.support.append</name> |
| <value>true</value> |
| </property>]]></programlisting> |
| <para> You will have to restart your cluster after making this edit. Ignore the |
| chicken-little comment you'll find in the <filename>hdfs-default.xml</filename> in the |
| description for the <varname>dfs.support.append</varname> configuration. </para> |
| </section> |
| <section |
| xml:id="hadoop.security"> |
| <title>Apache HBase on Secure Hadoop</title> |
| <para>Apache HBase will run on any Hadoop 0.20.x that incorporates Hadoop security features |
| as long as you do as suggested above and replace the Hadoop jar that ships with HBase with |
| the secure version. If you want to read more about how to setup Secure HBase, see <xref |
| linkend="hbase.secure.configuration" />.</para> |
| </section> |
| |
| <section |
| xml:id="dfs.datanode.max.transfer.threads"> |
| <title><varname>dfs.datanode.max.transfer.threads</varname><indexterm> |
| <primary>dfs.datanode.max.transfer.threads</primary> |
| </indexterm></title> |
| |
| <para>An HDFS datanode has an upper bound on the number of files that it will serve |
| at any one time. Before doing any loading, make sure you have configured |
| Hadoop's <filename>conf/hdfs-site.xml</filename>, setting the |
| <varname>dfs.datanode.max.transfer.threads</varname> value to at least the following: |
| </para> |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>dfs.datanode.max.transfer.threads</name> |
| <value>4096</value> |
| </property> |
| ]]></programlisting> |
| |
| <para>Be sure to restart your HDFS after making the above configuration.</para> |
| |
| <para>Not having this configuration in place makes for strange-looking failures. One |
| manifestation is a complaint about missing blocks. For example:</para> |
| <screen>10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block |
| blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes |
| contain current block. Will get new block locations from namenode and retry...</screen> |
| <para>See also <xref linkend="casestudies.max.transfer.threads" /> and note that this |
| property was previously known as <varname>dfs.datanode.max.xcievers</varname> (e.g. |
| <link |
| xlink:href="http://ccgtech.blogspot.com/2010/02/hadoop-hdfs-deceived-by-xciever.html"> |
| Hadoop HDFS: Deceived by Xciever</link>). |
| </para> |
| |
| |
| </section> |
| </section> |
| <!-- hadoop --> |
| <section xml:id="zookeeper.requirements"> |
| <title>ZooKeeper Requirements</title> |
| <para>ZooKeeper 3.4.x is required as of HBase 1.0.0. HBase makes use of the |
| <methodname>multi</methodname> functionality that is only available since 3.4.0 |
| (The <property>useMulti</property> is defaulted true in HBase 1.0.0). |
| See <link href="https://issues.apache.org/jira/browse/HBASE-12241">HBASE-12241 The crash of regionServer when taking deadserver's replication queue breaks replication</link> |
| and <link href="https://issues.apache.org/jira/browse/HBASE-6775">Use ZK.multi when available for HBASE-6710 0.92/0.94 compatibility fix</link> for background.</para> |
| </section> |
| </section> |
| |
| <section |
| xml:id="standalone_dist"> |
| <title>HBase run modes: Standalone and Distributed</title> |
| |
| <para>HBase has two run modes: <xref |
| linkend="standalone" /> and <xref |
| linkend="distributed" />. Out of the box, HBase runs in standalone mode. Whatever your mode, |
| you will need to configure HBase by editing files in the HBase <filename>conf</filename> |
| directory. At a minimum, you must edit <code>conf/hbase-env.sh</code> to tell HBase which |
| <command>java</command> to use. In this file you set HBase environment variables such as the |
| heapsize and other options for the <application>JVM</application>, the preferred location for |
| log files, etc. Set <varname>JAVA_HOME</varname> to point at the root of your |
| <command>java</command> install.</para> |
| |
| <section |
| xml:id="standalone"> |
| <title>Standalone HBase</title> |
| |
| <para>This is the default mode. Standalone mode is what is described in the <xref |
| linkend="quickstart" /> section. In standalone mode, HBase does not use HDFS -- it uses |
| the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up |
| in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.</para> |
| </section> |
| |
| <section |
| xml:id="distributed"> |
| <title>Distributed</title> |
| |
| <para>Distributed mode can be subdivided into distributed but all daemons run on a single node |
| -- a.k.a <emphasis>pseudo-distributed</emphasis>-- and |
| <emphasis>fully-distributed</emphasis> where the daemons are spread across all nodes in |
| the cluster. The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.</para> |
| |
| <para>Pseudo-distributed mode can run against the local filesystem or it can run against an |
| instance of the <emphasis>Hadoop Distributed File System</emphasis> (HDFS). |
| Fully-distributed mode can ONLY run on HDFS. See the Hadoop <link |
| xlink:href="http://hadoop.apache.org/common/docs/r1.1.1/api/overview-summary.html#overview_description"> |
| requirements and instructions</link> for how to set up HDFS for Hadoop 1.x. A good |
| walk-through for setting up HDFS on Hadoop 2 is at <link |
| xlink:href="http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide">http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide</link>.</para> |
| |
| <para>Below we describe the different distributed setups. Starting, verification and |
| exploration of your install, whether a <emphasis>pseudo-distributed</emphasis> or |
| <emphasis>fully-distributed</emphasis> configuration is described in a section that |
| follows, <xref |
| linkend="confirm" />. The same verification script applies to both deploy types.</para> |
| <section |
| xml:id="pseudo"> |
| <title>Pseudo-distributed</title> |
| <note> |
| <title>Pseudo-Distributed Quickstart</title> |
| <para>A quickstart has been added to the <xref |
| linkend="quickstart" /> chapter. See <xref |
| linkend="quickstart-pseudo" />. Some of the information that was originally in this |
| section has been moved there.</para> |
| </note> |
| |
| <para>A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use |
| this configuration testing and prototyping on HBase. Do not use this configuration for |
| production nor for evaluating HBase performance.</para> |
| |
| </section> |
| |
| </section> |
| |
| <section |
| xml:id="fully_dist"> |
| <title>Fully-distributed</title> |
| <para>By default, HBase runs in standalone mode. Both standalone mode and pseudo-distributed |
| mode are provided for the purposes of small-scale testing. For a production environment, |
| distributed mode is appropriate. In distributed mode, multiple instances of HBase daemons |
| run on multiple servers in the cluster.</para> |
| <para>Just as in pseudo-distributed mode, a fully distributed configuration requires that you |
| set the <code>hbase-cluster.distributed</code> property to <literal>true</literal>. |
| Typically, the <code>hbase.rootdir</code> is configured to point to a highly-available HDFS |
| filesystem. </para> |
| <para>In addition, the cluster is configured so that multiple cluster nodes enlist as |
| RegionServers, ZooKeeper QuorumPeers, and backup HMaster servers. These configuration basics |
| are all demonstrated in <xref |
| linkend="quickstart-fully-distributed" />.</para> |
| |
| <formalpara |
| xml:id="regionserver"> |
| <title>Distributed RegionServers</title> |
| <para>Typically, your cluster will contain multiple RegionServers all running on different |
| servers, as well as primary and backup Master and Zookeeper daemons. The |
| <filename>conf/regionservers</filename> file on the master server contains a list of |
| hosts whose RegionServers are associated with this cluster. Each host is on a separate |
| line. All hosts listed in this file will have their RegionServer processes started and |
| stopped when the master server starts or stops.</para> |
| </formalpara> |
| |
| <formalpara |
| xml:id="hbase.zookeeper"> |
| <title>ZooKeeper and HBase</title> |
| <para>See section <xref |
| linkend="zookeeper" /> for ZooKeeper setup for HBase.</para> |
| </formalpara> |
| |
| <example> |
| <title>Example Distributed HBase Cluster</title> |
| <para>This is a bare-bones <filename>conf/hbase-site.xml</filename> for a distributed HBase |
| cluster. A cluster that is used for real-world work would contain more custom |
| configuration parameters. Most HBase configuration directives have default values, which |
| are used unless the value is overridden in the <filename>hbase-site.xml</filename>. See <xref |
| linkend="config.files" /> for more information.</para> |
| <programlisting language="xml"><![CDATA[ |
| <configuration> |
| <property> |
| <name>hbase.rootdir</name> |
| <value>hdfs://namenode.example.org:8020/hbase</value> |
| </property> |
| <property> |
| <name>hbase.cluster.distributed</name> |
| <value>true</value> |
| </property> |
| <property> |
| <name>hbase.zookeeper.quorum</name> |
| <value>node-a.example.com,node-b.example.com,node-c.example.com</value> |
| </property> |
| </configuration> |
| ]]> |
| </programlisting> |
| <para>This is an example <filename>conf/regionservers</filename> file, which contains a list |
| of each node that should run a RegionServer in the cluster. These nodes need HBase |
| installed and they need to use the same contents of the <filename>conf/</filename> |
| directory as the Master server..</para> |
| <programlisting> |
| node-a.example.com |
| node-b.example.com |
| node-c.example.com |
| </programlisting> |
| <para>This is an example <filename>conf/backup-masters</filename> file, which contains a |
| list of each node that should run a backup Master instance. The backup Master instances |
| will sit idle unless the main Master becomes unavailable.</para> |
| <programlisting> |
| node-b.example.com |
| node-c.example.com |
| </programlisting> |
| </example> |
| <formalpara> |
| <title>Distributed HBase Quickstart</title> |
| <para>See <xref |
| linkend="quickstart-fully-distributed" /> for a walk-through of a simple three-node |
| cluster configuration with multiple ZooKeeper, backup HMaster, and RegionServer |
| instances.</para> |
| </formalpara> |
| |
| <procedure |
| xml:id="hdfs_client_conf"> |
| <title>HDFS Client Configuration</title> |
| <step> |
| <para>Of note, if you have made HDFS client configuration on your Hadoop cluster, such as |
| configuration directives for HDFS clients, as opposed to server-side configurations, you |
| must use one of the following methods to enable HBase to see and use these configuration |
| changes:</para> |
| <stepalternatives> |
| <step> |
| <para>Add a pointer to your <varname>HADOOP_CONF_DIR</varname> to the |
| <varname>HBASE_CLASSPATH</varname> environment variable in |
| <filename>hbase-env.sh</filename>.</para> |
| </step> |
| |
| <step> |
| <para>Add a copy of <filename>hdfs-site.xml</filename> (or |
| <filename>hadoop-site.xml</filename>) or, better, symlinks, under |
| <filename>${HBASE_HOME}/conf</filename>, or</para> |
| </step> |
| |
| <step> |
| <para>if only a small set of HDFS client configurations, add them to |
| <filename>hbase-site.xml</filename>.</para> |
| </step> |
| </stepalternatives> |
| </step> |
| </procedure> |
| <para>An example of such an HDFS client configuration is <varname>dfs.replication</varname>. |
| If for example, you want to run with a replication factor of 5, hbase will create files with |
| the default of 3 unless you do the above to make the configuration available to |
| HBase.</para> |
| </section> |
| </section> |
| |
| <section |
| xml:id="confirm"> |
| <title>Running and Confirming Your Installation</title> |
| |
| |
| |
| <para>Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running |
| <filename>bin/start-hdfs.sh</filename> over in the <varname>HADOOP_HOME</varname> |
| directory. You can ensure it started properly by testing the <command>put</command> and |
| <command>get</command> of files into the Hadoop filesystem. HBase does not normally use |
| the mapreduce daemons. These do not need to be started.</para> |
| <para><emphasis>If</emphasis> you are managing your own ZooKeeper, start it and confirm its |
| running else, HBase will start up ZooKeeper for you as part of its start process.</para> |
| <para>Start HBase with the following command:</para> |
| <screen>bin/start-hbase.sh</screen> |
| <para>Run the above from the <varname>HBASE_HOME</varname> directory.</para> |
| <para>You should now have a running HBase instance. HBase logs can be found in the |
| <filename>logs</filename> subdirectory. Check them out especially if HBase had trouble |
| starting.</para> |
| |
| <para>HBase also puts up a UI listing vital attributes. By default its deployed on the Master |
| host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an |
| informational http server at 16030). If the Master were running on a host named |
| <varname>master.example.org</varname> on the default port, to see the Master's homepage |
| you'd point your browser at <filename>http://master.example.org:16010</filename>.</para> |
| |
| <para>Prior to HBase 0.98, the default ports the master ui was deployed on port 16010, and the |
| HBase RegionServers would listen on port 16020 by default and put up an informational http |
| server at 16030. </para> |
| |
| <para>Once HBase has started, see the <xref |
| linkend="shell_exercises" /> for how to create tables, add data, scan your insertions, and |
| finally disable and drop your tables.</para> |
| |
| <para>To stop HBase after exiting the HBase shell enter</para> |
| <screen language="bourne">$ ./bin/stop-hbase.sh |
| stopping hbase...............</screen> |
| <para>Shutdown can take a moment to complete. It can take longer if your cluster is comprised |
| of many machines. If you are running a distributed operation, be sure to wait until HBase |
| has shut down completely before stopping the Hadoop daemons.</para> |
| </section> |
| |
| <!-- run modes --> |
| |
| |
| |
| <section |
| xml:id="config.files"> |
| <title>Configuration Files</title> |
| |
| <section |
| xml:id="hbase.site"> |
| <title><filename>hbase-site.xml</filename> and <filename>hbase-default.xml</filename></title> |
| <para>Just as in Hadoop where you add site-specific HDFS configuration to the |
| <filename>hdfs-site.xml</filename> file, for HBase, site specific customizations go into |
| the file <filename>conf/hbase-site.xml</filename>. For the list of configurable properties, |
| see <xref |
| linkend="hbase_default_configurations" /> below or view the raw |
| <filename>hbase-default.xml</filename> source file in the HBase source code at |
| <filename>src/main/resources</filename>. </para> |
| <para> Not all configuration options make it out to <filename>hbase-default.xml</filename>. |
| Configuration that it is thought rare anyone would change can exist only in code; the only |
| way to turn up such configurations is via a reading of the source code itself. </para> |
| <para> Currently, changes here will require a cluster restart for HBase to notice the change. </para> |
| <!--The file hbase-default.xml is generated as part of |
| the build of the hbase site. See the hbase pom.xml. |
| The generated file is a docbook section with a glossary |
| in it--> |
| <!--presumes the pre-site target has put the hbase-default.xml at this location--> |
| <xi:include |
| xmlns:xi="http://www.w3.org/2001/XInclude" |
| href="../../../target/docbkx/hbase-default.xml"> |
| <xi:fallback> |
| <section |
| xml:id="hbase_default_configurations"> |
| <title /> |
| <para> |
| <emphasis>This file is fallback content</emphasis>. If you are seeing this, something |
| is wrong with the build of the HBase documentation or you are doing pre-build |
| verification. </para> |
| <para> The file hbase-default.xml is generated as part of the build of the hbase site. |
| See the hbase <filename>pom.xml</filename>. The generated file is a docbook glossary. </para> |
| <section> |
| <title>IDs that are auto-generated and cause validation errors if not present</title> |
| <para> Each of these is a reference to a configuration file parameter which will cause |
| an error if you are using the fallback content here. This is a dirty dirty hack. </para> |
| <section |
| xml:id="fail.fast.expired.active.master"> |
| <title>fail.fast.expired.active.master</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hregion.memstore.flush.size"> |
| <title>"hbase.hregion.memstore.flush.size"</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hstore.bytes.per.checksum"> |
| <title>hbase.hstore.bytes.per.checksum</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.online.schema.update.enable"> |
| <title>hbase.online.schema.update.enable</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.regionserver.global.memstore.size"> |
| <title>hbase.regionserver.global.memstore.size</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hregion.max.filesize"> |
| <title>hbase.hregion.max.filesize</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hstore.blockingStoreFiles"> |
| <title>hbase.hstore.BlockingStoreFiles</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hfile.block.cache.size"> |
| <title>hfile.block.cache.size</title> |
| <para /> |
| </section> |
| <section |
| xml:id="copy.table"> |
| <title>copy.table</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hstore.checksum.algorithm"> |
| <title>hbase.hstore.checksum.algorithm</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.zookeeper.useMulti"> |
| <title>hbase.zookeeper.useMulti</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.hregion.memstore.block.multiplier"> |
| <title>hbase.hregion.memstore.block.multiplier</title> |
| <para /> |
| </section> |
| <section |
| xml:id="hbase.regionserver.global.memstore.size.lower.limit"> |
| <title>hbase.regionserver.global.memstore.size.lower.limit</title> |
| <para /> |
| </section> |
| </section> |
| </section> |
| </xi:fallback> |
| </xi:include> |
| </section> |
| |
| <section |
| xml:id="hbase.env.sh"> |
| <title><filename>hbase-env.sh</filename></title> |
| <para>Set HBase environment variables in this file. Examples include options to pass the JVM |
| on start of an HBase daemon such as heap size and garbage collector configs. You can also |
| set configurations for HBase configuration, log directories, niceness, ssh options, where to |
| locate process pid files, etc. Open the file at <filename>conf/hbase-env.sh</filename> and |
| peruse its content. Each option is fairly well documented. Add your own environment |
| variables here if you want them read by HBase daemons on startup.</para> |
| <para> Changes here will require a cluster restart for HBase to notice the change. </para> |
| </section> |
| |
| <section |
| xml:id="log4j"> |
| <title><filename>log4j.properties</filename></title> |
| <para>Edit this file to change rate at which HBase files are rolled and to change the level at |
| which HBase logs messages. </para> |
| <para> Changes here will require a cluster restart for HBase to notice the change though log |
| levels can be changed for particular daemons via the HBase UI. </para> |
| </section> |
| |
| <section |
| xml:id="client_dependencies"> |
| <title>Client configuration and dependencies connecting to an HBase cluster</title> |
| <para>If you are running HBase in standalone mode, you don't need to configure anything for |
| your client to work provided that they are all on the same machine.</para> |
| <para> Since the HBase Master may move around, clients bootstrap by looking to ZooKeeper for |
| current critical locations. ZooKeeper is where all these values are kept. Thus clients |
| require the location of the ZooKeeper ensemble information before they can do anything else. |
| Usually this the ensemble location is kept out in the <filename>hbase-site.xml</filename> |
| and is picked up by the client from the <varname>CLASSPATH</varname>.</para> |
| |
| <para>If you are configuring an IDE to run a HBase client, you should include the |
| <filename>conf/</filename> directory on your classpath so |
| <filename>hbase-site.xml</filename> settings can be found (or add |
| <filename>src/test/resources</filename> to pick up the hbase-site.xml used by tests). </para> |
| <para> Minimally, a client of HBase needs several libraries in its |
| <varname>CLASSPATH</varname> when connecting to a cluster, including: |
| <programlisting> |
| commons-configuration (commons-configuration-1.6.jar) |
| commons-lang (commons-lang-2.5.jar) |
| commons-logging (commons-logging-1.1.1.jar) |
| hadoop-core (hadoop-core-1.0.0.jar) |
| hbase (hbase-0.92.0.jar) |
| log4j (log4j-1.2.16.jar) |
| slf4j-api (slf4j-api-1.5.8.jar) |
| slf4j-log4j (slf4j-log4j12-1.5.8.jar) |
| zookeeper (zookeeper-3.4.2.jar)</programlisting> |
| </para> |
| <para> An example basic <filename>hbase-site.xml</filename> for client only might look as |
| follows: <programlisting language="xml"><![CDATA[ |
| <?xml version="1.0"?> |
| <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> |
| <configuration> |
| <property> |
| <name>hbase.zookeeper.quorum</name> |
| <value>example1,example2,example3</value> |
| <description>The directory shared by region servers. |
| </description> |
| </property> |
| </configuration> |
| ]]></programlisting> |
| </para> |
| |
| <section |
| xml:id="java.client.config"> |
| <title>Java client configuration</title> |
| <para>The configuration used by a Java client is kept in an <link |
| xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HBaseConfiguration">HBaseConfiguration</link> |
| instance. The factory method on HBaseConfiguration, |
| <code>HBaseConfiguration.create();</code>, on invocation, will read in the content of |
| the first <filename>hbase-site.xml</filename> found on the client's |
| <varname>CLASSPATH</varname>, if one is present (Invocation will also factor in any |
| <filename>hbase-default.xml</filename> found; an hbase-default.xml ships inside the |
| <filename>hbase.X.X.X.jar</filename>). It is also possible to specify configuration |
| directly without having to read from a <filename>hbase-site.xml</filename>. For example, |
| to set the ZooKeeper ensemble for the cluster programmatically do as follows: |
| <programlisting language="java">Configuration config = HBaseConfiguration.create(); |
| config.set("hbase.zookeeper.quorum", "localhost"); // Here we are running zookeeper locally</programlisting> |
| If multiple ZooKeeper instances make up your ZooKeeper ensemble, they may be specified in |
| a comma-separated list (just as in the <filename>hbase-site.xml</filename> file). This |
| populated <classname>Configuration</classname> instance can then be passed to an <link |
| xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html">HTable</link>, |
| and so on. </para> |
| </section> |
| </section> |
| |
| </section> |
| <!-- config files --> |
| |
| <section |
| xml:id="example_config"> |
| <title>Example Configurations</title> |
| |
| <section> |
| <title>Basic Distributed HBase Install</title> |
| |
| <para>Here is an example basic configuration for a distributed ten node cluster. The nodes are |
| named <varname>example0</varname>, <varname>example1</varname>, etc., through node |
| <varname>example9</varname> in this example. The HBase Master and the HDFS namenode are |
| running on the node <varname>example0</varname>. RegionServers run on nodes |
| <varname>example1</varname>-<varname>example9</varname>. A 3-node ZooKeeper ensemble runs |
| on <varname>example1</varname>, <varname>example2</varname>, and <varname>example3</varname> |
| on the default ports. ZooKeeper data is persisted to the directory |
| <filename>/export/zookeeper</filename>. Below we show what the main configuration files -- |
| <filename>hbase-site.xml</filename>, <filename>regionservers</filename>, and |
| <filename>hbase-env.sh</filename> -- found in the HBase <filename>conf</filename> |
| directory might look like.</para> |
| |
| <section |
| xml:id="hbase_site"> |
| <title><filename>hbase-site.xml</filename></title> |
| |
| <programlisting language="bourne"> |
| <![CDATA[ |
| <?xml version="1.0"?> |
| <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> |
| <configuration> |
| <property> |
| <name>hbase.zookeeper.quorum</name> |
| <value>example1,example2,example3</value> |
| <description>The directory shared by RegionServers. |
| </description> |
| </property> |
| <property> |
| <name>hbase.zookeeper.property.dataDir</name> |
| <value>/export/zookeeper</value> |
| <description>Property from ZooKeeper config zoo.cfg. |
| The directory where the snapshot is stored. |
| </description> |
| </property> |
| <property> |
| <name>hbase.rootdir</name> |
| <value>hdfs://example0:8020/hbase</value> |
| <description>The directory shared by RegionServers. |
| </description> |
| </property> |
| <property> |
| <name>hbase.cluster.distributed</name> |
| <value>true</value> |
| <description>The mode the cluster will be in. Possible values are |
| false: standalone and pseudo-distributed setups with managed Zookeeper |
| true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) |
| </description> |
| </property> |
| </configuration> |
| ]]> |
| </programlisting> |
| </section> |
| |
| <section |
| xml:id="regionservers"> |
| <title><filename>regionservers</filename></title> |
| |
| <para>In this file you list the nodes that will run RegionServers. In our case, these nodes |
| are <varname>example1</varname>-<varname>example9</varname>. </para> |
| |
| <programlisting> |
| example1 |
| example2 |
| example3 |
| example4 |
| example5 |
| example6 |
| example7 |
| example8 |
| example9 |
| </programlisting> |
| </section> |
| |
| <section |
| xml:id="hbase_env"> |
| <title><filename>hbase-env.sh</filename></title> |
| |
| <para>The following lines in the <filename>hbase-env.sh</filename> file show how to set the |
| <envar>JAVA_HOME</envar> environment variable (required for HBase 0.98.5 and newer) and |
| set the heap to 4 GB (rather than the default value of 1 GB). If you copy and paste this |
| example, be sure to adjust the <envar>JAVA_HOME</envar> to suit your environment.</para> |
| |
| <screen language="bourne"> |
| # The java implementation to use. |
| export JAVA_HOME=/usr/java/jdk1.7.0/ |
| |
| # The maximum amount of heap to use, in MB. Default is 1000. |
| export HBASE_HEAPSIZE=4096 |
| </screen> |
| |
| <para>Use <command>rsync</command> to copy the content of the <filename>conf</filename> |
| directory to all nodes of the cluster.</para> |
| </section> |
| </section> |
| </section> |
| <!-- example config --> |
| |
| |
| <section |
| xml:id="important_configurations"> |
| <title>The Important Configurations</title> |
| <para>Below we list what the <emphasis>important</emphasis> Configurations. We've divided this |
| section into required configuration and worth-a-look recommended configs. </para> |
| |
| |
| <section |
| xml:id="required_configuration"> |
| <title>Required Configurations</title> |
| <para>Review the <xref |
| linkend="os" /> and <xref |
| linkend="hadoop" /> sections. </para> |
| <section |
| xml:id="big.cluster.config"> |
| <title>Big Cluster Configurations</title> |
| <para>If a cluster with a lot of regions, it is possible if an eager beaver regionserver |
| checks in soon after master start while all the rest in the cluster are laggardly, this |
| first server to checkin will be assigned all regions. If lots of regions, this first |
| server could buckle under the load. To prevent the above scenario happening up the |
| <varname>hbase.master.wait.on.regionservers.mintostart</varname> from its default value |
| of 1. See <link |
| xlink:href="https://issues.apache.org/jira/browse/HBASE-6389">HBASE-6389 Modify the |
| conditions to ensure that Master waits for sufficient number of Region Servers before |
| starting region assignments</link> for more detail. </para> |
| </section> |
| <section |
| xml:id="backup.master.fail.fast"> |
| <title>If a backup Master, making primary Master fail fast</title> |
| <para>If the primary Master loses its connection with ZooKeeper, it will fall into a loop |
| where it keeps trying to reconnect. Disable this functionality if you are running more |
| than one Master: i.e. a backup Master. Failing to do so, the dying Master may continue to |
| receive RPCs though another Master has assumed the role of primary. See the configuration <xref |
| linkend="fail.fast.expired.active.master" />. </para> |
| </section> |
| </section> |
| |
| <section |
| xml:id="recommended_configurations"> |
| <title>Recommended Configurations</title> |
| <section |
| xml:id="recommended_configurations.zk"> |
| <title>ZooKeeper Configuration</title> |
| <section |
| xml:id="sect.zookeeper.session.timeout"> |
| <title><varname>zookeeper.session.timeout</varname></title> |
| <para>The default timeout is three minutes (specified in milliseconds). This means that if |
| a server crashes, it will be three minutes before the Master notices the crash and |
| starts recovery. You might like to tune the timeout down to a minute or even less so the |
| Master notices failures the sooner. Before changing this value, be sure you have your |
| JVM garbage collection configuration under control otherwise, a long garbage collection |
| that lasts beyond the ZooKeeper session timeout will take out your RegionServer (You |
| might be fine with this -- you probably want recovery to start on the server if a |
| RegionServer has been in GC for a long period of time).</para> |
| |
| <para>To change this configuration, edit <filename>hbase-site.xml</filename>, copy the |
| changed file around the cluster and restart.</para> |
| |
| <para>We set this value high to save our having to field noob questions up on the mailing |
| lists asking why a RegionServer went down during a massive import. The usual cause is |
| that their JVM is untuned and they are running into long GC pauses. Our thinking is that |
| while users are getting familiar with HBase, we'd save them having to know all of its |
| intricacies. Later when they've built some confidence, then they can play with |
| configuration such as this. </para> |
| </section> |
| <section |
| xml:id="zookeeper.instances"> |
| <title>Number of ZooKeeper Instances</title> |
| <para>See <xref |
| linkend="zookeeper" />. </para> |
| </section> |
| </section> |
| <section |
| xml:id="recommended.configurations.hdfs"> |
| <title>HDFS Configurations</title> |
| <section |
| xml:id="dfs.datanode.failed.volumes.tolerated"> |
| <title>dfs.datanode.failed.volumes.tolerated</title> |
| <para>This is the "...number of volumes that are allowed to fail before a datanode stops |
| offering service. By default any volume failure will cause a datanode to shutdown" from |
| the <filename>hdfs-default.xml</filename> description. If you have > three or four |
| disks, you might want to set this to 1 or if you have many disks, two or more. </para> |
| </section> |
| </section> |
| <section |
| xml:id="hbase.regionserver.handler.count-description"> |
| <title><varname>hbase.regionserver.handler.count</varname></title> |
| <para> This setting defines the number of threads that are kept open to answer incoming |
| requests to user tables. The rule of thumb is to keep this number low when the payload per |
| request approaches the MB (big puts, scans using a large cache) and high when the payload |
| is small (gets, small puts, ICVs, deletes). The total size of the queries in progress is |
| limited by the setting "hbase.ipc.server.max.callqueue.size". </para> |
| <para> It is safe to set that number to the maximum number of incoming clients if their |
| payload is small, the typical example being a cluster that serves a website since puts |
| aren't typically buffered and most of the operations are gets. </para> |
| <para> The reason why it is dangerous to keep this setting high is that the aggregate size |
| of all the puts that are currently happening in a region server may impose too much |
| pressure on its memory, or even trigger an OutOfMemoryError. A region server running on |
| low memory will trigger its JVM's garbage collector to run more frequently up to a point |
| where GC pauses become noticeable (the reason being that all the memory used to keep all |
| the requests' payloads cannot be trashed, no matter how hard the garbage collector tries). |
| After some time, the overall cluster throughput is affected since every request that hits |
| that region server will take longer, which exacerbates the problem even more. </para> |
| <para>You can get a sense of whether you have too little or too many handlers by <xref |
| linkend="rpc.logging" /> on an individual RegionServer then tailing its logs (Queued |
| requests consume memory). </para> |
| </section> |
| <section |
| xml:id="big_memory"> |
| <title>Configuration for large memory machines</title> |
| <para> HBase ships with a reasonable, conservative configuration that will work on nearly |
| all machine types that people might want to test with. If you have larger machines -- |
| HBase has 8G and larger heap -- you might the following configuration options helpful. |
| TODO. </para> |
| |
| </section> |
| |
| <section |
| xml:id="config.compression"> |
| <title>Compression</title> |
| <para>You should consider enabling ColumnFamily compression. There are several options that |
| are near-frictionless and in most all cases boost performance by reducing the size of |
| StoreFiles and thus reducing I/O. </para> |
| <para>See <xref |
| linkend="compression" /> for more information.</para> |
| </section> |
| <section |
| xml:id="config.wals"> |
| <title>Configuring the size and number of WAL files</title> |
| <para>HBase uses <xref |
| linkend="wal" /> to recover the memstore data that has not been flushed to disk in case |
| of an RS failure. These WAL files should be configured to be slightly smaller than HDFS |
| block (by default, HDFS block is 64Mb and WAL file is ~60Mb).</para> |
| <para>HBase also has a limit on number of WAL files, designed to ensure there's never too |
| much data that needs to be replayed during recovery. This limit needs to be set according |
| to memstore configuration, so that all the necessary data would fit. It is recommended to |
| allocated enough WAL files to store at least that much data (when all memstores are close |
| to full). For example, with 16Gb RS heap, default memstore settings (0.4), and default WAL |
| file size (~60Mb), 16Gb*0.4/60, the starting point for WAL file count is ~109. However, as |
| all memstores are not expected to be full all the time, less WAL files can be |
| allocated.</para> |
| </section> |
| <section |
| xml:id="disable.splitting"> |
| <title>Managed Splitting</title> |
| <para>HBase generally handles splitting your regions, based upon the settings in your |
| <filename>hbase-default.xml</filename> and <filename>hbase-site.xml</filename> |
| configuration files. Important settings include |
| <varname>hbase.regionserver.region.split.policy</varname>, |
| <varname>hbase.hregion.max.filesize</varname>, |
| <varname>hbase.regionserver.regionSplitLimit</varname>. A simplistic view of splitting |
| is that when a region grows to <varname>hbase.hregion.max.filesize</varname>, it is split. |
| For most use patterns, most of the time, you should use automatic splitting. See <xref |
| linkend="manual_region_splitting_decisions"/> for more information about manual region |
| splitting.</para> |
| <para>Instead of allowing HBase to split your regions automatically, you can choose to |
| manage the splitting yourself. This feature was added in HBase 0.90.0. Manually managing |
| splits works if you know your keyspace well, otherwise let HBase figure where to split for you. |
| Manual splitting can mitigate region creation and movement under load. It also makes it so |
| region boundaries are known and invariant (if you disable region splitting). If you use manual |
| splits, it is easier doing staggered, time-based major compactions spread out your network IO |
| load.</para> |
| |
| <formalpara> |
| <title>Disable Automatic Splitting</title> |
| <para>To disable automatic splitting, set <varname>hbase.hregion.max.filesize</varname> to |
| a very large value, such as <literal>100 GB</literal> It is not recommended to set it to |
| its absolute maximum value of <literal>Long.MAX_VALUE</literal>.</para> |
| </formalpara> |
| <note> |
| <title>Automatic Splitting Is Recommended</title> |
| <para>If you disable automatic splits to diagnose a problem or during a period of fast |
| data growth, it is recommended to re-enable them when your situation becomes more |
| stable. The potential benefits of managing region splits yourself are not |
| undisputed.</para> |
| </note> |
| <formalpara> |
| <title>Determine the Optimal Number of Pre-Split Regions</title> |
| <para>The optimal number of pre-split regions depends on your application and environment. |
| A good rule of thumb is to start with 10 pre-split regions per server and watch as data |
| grows over time. It is better to err on the side of too few regions and perform rolling |
| splits later. The optimal number of regions depends upon the largest StoreFile in your |
| region. The size of the largest StoreFile will increase with time if the amount of data |
| grows. The goal is for the largest region to be just large enough that the compaction |
| selection algorithm only compacts it during a timed major compaction. Otherwise, the |
| cluster can be prone to compaction storms where a large number of regions under |
| compaction at the same time. It is important to understand that the data growth causes |
| compaction storms, and not the manual split decision.</para> |
| </formalpara> |
| <para>If the regions are split into too many large regions, you can increase the major |
| compaction interval by configuring <varname>HConstants.MAJOR_COMPACTION_PERIOD</varname>. |
| HBase 0.90 introduced <classname>org.apache.hadoop.hbase.util.RegionSplitter</classname>, |
| which provides a network-IO-safe rolling split of all regions.</para> |
| </section> |
| <section |
| xml:id="managed.compactions"> |
| <title>Managed Compactions</title> |
| <para>By default, major compactions are scheduled to run once in a 7-day period. Prior to HBase 0.96.x, major |
| compactions were scheduled to happen once per day by default.</para> |
| <para>If you need to control exactly when and how often major compaction runs, you can |
| disable managed major compactions. See the entry for |
| <varname>hbase.hregion.majorcompaction</varname> in the <xref |
| linkend="compaction.parameters" /> table for details.</para> |
| <warning> |
| <title>Do Not Disable Major Compactions</title> |
| <para>Major compactions are absolutely necessary for StoreFile clean-up. Do not disable |
| them altogether. You can run major compactions manually via the HBase shell or via the <link |
| xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin |
| API</link>.</para> |
| </warning> |
| <para>For more information about compactions and the compaction file selection process, see <xref |
| linkend="compaction" /></para> |
| </section> |
| |
| <section |
| xml:id="spec.ex"> |
| <title>Speculative Execution</title> |
| <para>Speculative Execution of MapReduce tasks is on by default, and for HBase clusters it |
| is generally advised to turn off Speculative Execution at a system-level unless you need |
| it for a specific case, where it can be configured per-job. Set the properties |
| <varname>mapreduce.map.speculative</varname> and |
| <varname>mapreduce.reduce.speculative</varname> to false. </para> |
| </section> |
| </section> |
| <section xml:id="other_configuration"><title>Other Configurations</title> |
| <section xml:id="balancer_config"><title>Balancer</title> |
| <para>The balancer is a periodic operation which is run on the master to redistribute regions on the cluster. It is configured via |
| <varname>hbase.balancer.period</varname> and defaults to 300000 (5 minutes). </para> |
| <para>See <xref linkend="master.processes.loadbalancer" /> for more information on the LoadBalancer. |
| </para> |
| </section> |
| <section xml:id="disabling.blockcache"><title>Disabling Blockcache</title> |
| <para>Do not turn off block cache (You'd do it by setting <varname>hbase.block.cache.size</varname> to zero). |
| Currently we do not do well if you do this because the regionserver will spend all its time loading hfile |
| indices over and over again. If your working set it such that block cache does you no good, at least |
| size the block cache such that hfile indices will stay up in the cache (you can get a rough idea |
| on the size you need by surveying regionserver UIs; you'll see index block size accounted near the |
| top of the webpage).</para> |
| </section> |
| <section xml:id="nagles"> |
| <title><link xlink:href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagle's</link> or the small package problem</title> |
| <para>If a big 40ms or so occasional delay is seen in operations against HBase, |
| try the Nagles' setting. For example, see the user mailing list thread, |
| <link xlink:href="http://search-hadoop.com/m/pduLg2fydtE/Inconsistent+scan+performance+with+caching+set+&subj=Re+Inconsistent+scan+performance+with+caching+set+to+1">Inconsistent scan performance with caching set to 1</link> |
| and the issue cited therein where setting notcpdelay improved scan speeds. You might also |
| see the graphs on the tail of <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7008">HBASE-7008 Set scanner caching to a better default</link> |
| where our Lars Hofhansl tries various data sizes w/ Nagle's on and off measuring the effect.</para> |
| </section> |
| <section xml:id="mttr"> |
| <title>Better Mean Time to Recover (MTTR)</title> |
| <para>This section is about configurations that will make servers come back faster after a fail. |
| See the Deveraj Das an Nicolas Liochon blog post |
| <link xlink:href="http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/">Introduction to HBase Mean Time to Recover (MTTR)</link> |
| for a brief introduction.</para> |
| <para>The issue <link xlink:href="https://issues.apache.org/jira/browse/HBASE-8389">HBASE-8354 forces Namenode into loop with lease recovery requests</link> |
| is messy but has a bunch of good discussion toward the end on low timeouts and how to effect faster recovery including citation of fixes |
| added to HDFS. Read the Varun Sharma comments. The below suggested configurations are Varun's suggestions distilled and tested. Make sure you are |
| running on a late-version HDFS so you have the fixes he refers too and himself adds to HDFS that help HBase MTTR |
| (e.g. HDFS-3703, HDFS-3712, and HDFS-4791 -- hadoop 2 for sure has them and late hadoop 1 has some). |
| Set the following in the RegionServer.</para> |
| <programlisting language="xml"> |
| <![CDATA[<property> |
| <property> |
| <name>hbase.lease.recovery.dfs.timeout</name> |
| <value>23000</value> |
| <description>How much time we allow elapse between calls to recover lease. |
| Should be larger than the dfs timeout.</description> |
| </property> |
| <property> |
| <name>dfs.client.socket-timeout</name> |
| <value>10000</value> |
| <description>Down the DFS timeout from 60 to 10 seconds.</description> |
| </property> |
| ]]></programlisting> |
| |
| <para>And on the namenode/datanode side, set the following to enable 'staleness' introduced |
| in HDFS-3703, HDFS-3912. </para> |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>dfs.client.socket-timeout</name> |
| <value>10000</value> |
| <description>Down the DFS timeout from 60 to 10 seconds.</description> |
| </property> |
| <property> |
| <name>dfs.datanode.socket.write.timeout</name> |
| <value>10000</value> |
| <description>Down the DFS timeout from 8 * 60 to 10 seconds.</description> |
| </property> |
| <property> |
| <name>ipc.client.connect.timeout</name> |
| <value>3000</value> |
| <description>Down from 60 seconds to 3.</description> |
| </property> |
| <property> |
| <name>ipc.client.connect.max.retries.on.timeouts</name> |
| <value>2</value> |
| <description>Down from 45 seconds to 3 (2 == 3 retries).</description> |
| </property> |
| <property> |
| <name>dfs.namenode.avoid.read.stale.datanode</name> |
| <value>true</value> |
| <description>Enable stale state in hdfs</description> |
| </property> |
| <property> |
| <name>dfs.namenode.stale.datanode.interval</name> |
| <value>20000</value> |
| <description>Down from default 30 seconds</description> |
| </property> |
| <property> |
| <name>dfs.namenode.avoid.write.stale.datanode</name> |
| <value>true</value> |
| <description>Enable stale state in hdfs</description> |
| </property> |
| ]]></programlisting> |
| </section> |
| |
| <section |
| xml:id="JMX_config"> |
| <title>JMX</title> |
| <para>JMX(Java Management Extensions) provides built-in instrumentation that enables you |
| to monitor and manage the Java VM. To enable monitoring and management from remote |
| systems, you need to set system property com.sun.management.jmxremote.port(the port |
| number through which you want to enable JMX RMI connections) when you start the Java VM. |
| See <link |
| xlink:href="http://docs.oracle.com/javase/6/docs/technotes/guides/management/agent.html"> |
| official document</link> for more information. Historically, besides above port mentioned, |
| JMX opens 2 additional random TCP listening ports, which could lead to port conflict |
| problem.(See <link |
| xlink:href="https://issues.apache.org/jira/browse/HBASE-10289">HBASE-10289</link> |
| for details) |
| </para> |
| <para>As an alternative, You can use the coprocessor-based JMX implementation provided |
| by HBase. To enable it in 0.99 or above, add below property in |
| <filename>hbase-site.xml</filename>: |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>hbase.coprocessor.regionserver.classes</name> |
| <value>org.apache.hadoop.hbase.JMXListener</value> |
| </property> |
| ]]></programlisting> |
| NOTE: DO NOT set com.sun.management.jmxremote.port for Java VM at the same time. |
| </para> |
| <para>Currently it supports Master and RegionServer Java VM. The reason why you only |
| configure coprocessor for 'regionserver' is that, starting from HBase 0.99, |
| a Master IS also a RegionServer. (See <link |
| xlink:href="https://issues.apache.org/jira/browse/HBASE-10569">HBASE-10569</link> |
| for more information.) |
| By default, the JMX listens on TCP port 10102, you can further configure the port |
| using below properties: |
| |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>regionserver.rmi.registry.port</name> |
| <value>61130</value> |
| </property> |
| <property> |
| <name>regionserver.rmi.connector.port</name> |
| <value>61140</value> |
| </property> |
| ]]></programlisting> |
| The registry port can be shared with connector port in most cases, so you only |
| need to configure regionserver.rmi.registry.port. However if you want to use SSL |
| communication, the 2 ports must be configured to different values. |
| </para> |
| |
| <para>By default the password authentication and SSL communication is disabled. |
| To enable password authentication, you need to update <filename>hbase-env.sh</filename> |
| like below: |
| <screen language="bourne"> |
| export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.authenticate=true \ |
| -Dcom.sun.management.jmxremote.password.file=your_password_file \ |
| -Dcom.sun.management.jmxremote.access.file=your_access_file" |
| |
| export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE " |
| export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE " |
| </screen> |
| See example password/access file under $JRE_HOME/lib/management. |
| </para> |
| |
| <para>To enable SSL communication with password authentication, follow below steps: |
| <screen language="bourne"> |
| #1. generate a key pair, stored in myKeyStore |
| keytool -genkey -alias jconsole -keystore myKeyStore |
| |
| #2. export it to file jconsole.cert |
| keytool -export -alias jconsole -keystore myKeyStore -file jconsole.cert |
| |
| #3. copy jconsole.cert to jconsole client machine, import it to jconsoleKeyStore |
| keytool -import -alias jconsole -keystore jconsoleKeyStore -file jconsole.cert |
| </screen> |
| And then update <filename>hbase-env.sh</filename> like below: |
| <screen language="bourne"> |
| export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=true \ |
| -Djavax.net.ssl.keyStore=/home/tianq/myKeyStore \ |
| -Djavax.net.ssl.keyStorePassword=your_password_in_step_1 \ |
| -Dcom.sun.management.jmxremote.authenticate=true \ |
| -Dcom.sun.management.jmxremote.password.file=your_password file \ |
| -Dcom.sun.management.jmxremote.access.file=your_access_file" |
| |
| export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE " |
| export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE " |
| </screen> |
| |
| Finally start jconsole on client using the key store: |
| <screen language="bourne"> |
| jconsole -J-Djavax.net.ssl.trustStore=/home/tianq/jconsoleKeyStore |
| </screen> |
| </para> |
| <para>NOTE: for HBase 0.98, To enable the HBase JMX implementation on Master, you also |
| need to add below property in <filename>hbase-site.xml</filename>: |
| <programlisting language="xml"><![CDATA[ |
| <property> |
| <name>hbase.coprocessor.master.classes</name> |
| <value>org.apache.hadoop.hbase.JMXListener</value> |
| </property> |
| ]]></programlisting> |
| The corresponding properties for port configuration are master.rmi.registry.port |
| (by default 10101) and master.rmi.connector.port(by default the same as registry.port) |
| </para> |
| </section> |
| |
| </section> |
| |
| </section> |
| <!-- important config --> |
| <section xml:id="dyn_config"> |
| <title>Dynamic Configuration</title> |
| <subtitle>Changing Configuration Without Restarting Servers</subtitle> |
| <para>Since HBase 1.0.0, it is possible to change a subset of the configuration without |
| requiring a server restart. In the hbase shell, there are new operators, |
| <command>update_config</command> and <command>update_all_config</command> that |
| will prompt a server or all servers to reload configuration.</para> |
| <para>Only a subset of all configurations can currently be changed in the running server. |
| Here is an incomplete list: |
| <property>hbase.regionserver.thread.compaction.large</property>, |
| <property>hbase.regionserver.thread.compaction.small</property>, |
| <property>hbase.regionserver.thread.split</property>, |
| <property>hbase.regionserver.thread.merge</property>, as well as compaction |
| policy and configurations and adjustment to offpeak hours. |
| For the full list consult the patch attached to |
| <link xlink:href="https://issues.apache.org/jira/browse/HBASE-12147">HBASE-12147 Porting Online Config Change from 89-fb</link>. |
| </para> |
| |
| </section> |
| </chapter> |