Docbook updates from trunk git-svn-id: https://svn.apache.org/repos/asf/hbase/tags/0.98.0RC0@1561345 13f79535-47bb-0310-9956-ffa450edef68

commit: 9d71453b085696e8745fec6fbcf9f9c778c77bb3 [log] [tgz]
author: Andrew Kyle Purtell <apurtell@apache.org> Sat Jan 25 17:44:48 2014 +0000
committer: Andrew Kyle Purtell <apurtell@apache.org> Sat Jan 25 17:44:48 2014 +0000
tree: 8c5b75b821aa638c82612f51d890e514e2a60055
parent: f19b1b5babf00bbd172197b47340bc11c11f0950 [diff]
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 3232e2e..33be718 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml

@@ -1739,41 +1739,49 @@
  </programlisting>
      For a description of what HBase files look like when written to HDFS, see <xref linkend="trouble.namenode.hbase.objects"/>.
             </para>
-
     <section xml:id="arch.regions.size">
-      <title>Region Size</title>
-
-      <para>Determining the "right" region size can be tricky, and there are a few factors
-      to consider:</para>
-
-      <itemizedlist>
-        <listitem>
-          <para>HBase scales by having regions across many servers. Thus if
-          you have 2 regions for 16GB data, on a 20 node machine your data
-          will be concentrated on just a few machines - nearly the entire
-          cluster will be idle.  This really cant be stressed enough, since a
-          common problem is loading 200MB data into HBase then wondering why
-          your awesome 10 node cluster isn't doing anything.</para>
-        </listitem>
-
-        <listitem>
-          <para>On the other hand, high region count has been known to make things slow.
-          This is getting better with each release of HBase, but it is probably better to have
-          700 regions than 3000 for the same amount of data.</para>
-        </listitem>
-
-        <listitem>
-          <para>There is not much memory footprint difference between 1 region
-          and 10 in terms of indexes, etc, held by the RegionServer.</para>
-        </listitem>
-      </itemizedlist>
-
-      <para>When starting off, it's probably best to stick to the default region-size, perhaps going
-      smaller for hot tables (or manually split hot regions to spread the load over
-      the cluster), or go with larger region sizes if your cell sizes tend to be
-      largish (100k and up).</para>
-      <para>See <xref linkend="bigger.regions"/> for more information on configuration.
+<para> In general, HBase is designed to run with a small (20-200) number of relatively large (5-20Gb) regions per server. The considerations for this are as follows:</para>
+<section xml:id="too_many_regions">
+          <title>Why cannot I have too many regions?</title>
+          <para>
+              Typically you want to keep your region count low on HBase for numerous reasons.
+              Usually right around 100 regions per RegionServer has yielded the best results.
+              Here are some of the reasons below for keeping region count low:
+              <orderedlist>
+                  <listitem><para>
+                          MSLAB requires 2mb per memstore (that's 2mb per family per region).
+                          1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
+                  </para></listitem>
+                  <listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
+                          flushes when you have too many regions which in turn generates compactions.
+                          Rewriting the same data tens of times is the last thing you want.
+                          An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
+                          usage of 5GB (the region server would have a big heap).
+                          Once it reaches 5GB it will force flush the biggest region,
+                          at that point they should almost all have about 5MB of data so
+                          it would flush that amount. 5MB inserted later, it would flush another
+                          region that will now have a bit over 5MB of data, and so on.
+                          This is currently the main limiting factor for the number of regions; see <xref linkend="ops.capacity.regions.count" />
+                          for detailed formula.
+                  </para></listitem>
+                  <listitem><para>The master as is is allergic to tons of regions, and will
+                          take a lot of time assigning them and moving them around in batches.
+                          The reason is that it's heavy on ZK usage, and it's not very async
+                          at the moment (could really be improved -- and has been imporoved a bunch
+                          in 0.96 hbase).
+                  </para></listitem>
+                  <listitem><para>
+                          In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
+                          on a few RS can cause the store file index to rise, increasing heap usage and potentially
+                          creating memory pressure or OOME on the RSs
+                  </para></listitem>
+          </orderedlist>
       </para>
+      </section>
+      <para>Another issue is the effect of the number of regions on mapreduce jobs; it is typical to have one mapper per HBase region.
+          Thus, hosting only 5 regions per RS may not be enough to get sufficient number of tasks for a mapreduce job, while 1000 regions will generate far too many tasks.
+      </para>
+      <para>See <xref linkend="ops.capacity.regions" /> for configuration guidelines.</para>
     </section>
 
       <section xml:id="regions.arch.assignment">
@@ -1833,9 +1841,11 @@
            <orderedlist>
              <listitem>First replica is written to local node
              </listitem>
-             <listitem>Second replica is written to another node in same rack
+             <listitem>Second replica is written to a random node on another rack
              </listitem>
-             <listitem>Third replica is written to a node in another rack (if sufficient nodes)
+             <listitem>Third replica is written on the same rack as the second, but on a different node chosen randomly
+             </listitem>
+             <listitem>Subsequent replicas are written on random nodes on the cluster
              </listitem>
            </orderedlist>
           Thus, HBase eventually achieves locality for a region after a flush or a compaction.
@@ -1849,7 +1859,7 @@
         </para>
       </section>
 
-      <section>
+      <section xml:id="arch.region.splits">
         <title>Region Splits</title>
 
         <para>Splits run unaided on the RegionServer; i.e. the Master does not
@@ -2961,7 +2971,9 @@
   <appendix>
       <title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title>
       <para>TODO: Describe how YCSB is poor for putting up a decent cluster load.</para>
-      <para>TODO: Describe setup of YCSB for HBase</para>
+      <para>TODO: Describe setup of YCSB for HBase.  In particular, presplit your tables before you start
+          a run.  See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-4163">HBASE-4163 Create Split Strategy for YCSB Benchmark</link>
+          for why and a little shell command for how to do it.</para>
       <para>Ted Dunning redid YCSB so it's mavenized and added facility for verifying workloads.  See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para>
 
   </appendix>

diff --git a/src/main/docbkx/community.xml b/src/main/docbkx/community.xml
index 8eee9d1..0ec7331 100644
--- a/src/main/docbkx/community.xml
+++ b/src/main/docbkx/community.xml

@@ -137,4 +137,12 @@
 </para>
       </section>
     </section>
+      <section xml:id="hbase.commit.msg.format">
+          <title>Commit Message format</title>
+          <para>We <link xlink:href="http://search-hadoop.com/m/Gwxwl10cFHa1">agreed</link>
+          to the following SVN commit message format:
+<programlisting>HBASE-xxxxx &lt;title>. (&lt;contributor>)</programlisting>
+If the person making the commit is the contributor, leave off the '(&lt;contributor>)' element.
+          </para>
+      </section>
     </chapter>

diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml
index 9559a66..9e648d8 100644
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml

@@ -240,16 +240,16 @@
 		 <title>Hadoop version support matrix</title>
 		 <tgroup cols='4' align='left' colsep='1' rowsep='1'><colspec colname='c1' align='left'/><colspec colname='c2' align='center'/><colspec colname='c3' align='center'/><colspec colname='c4' align='center'/>
          <thead>
-	     <row><entry>               </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96.0</entry></row>
+         <row><entry>               </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96.0</entry><entry>HBase-0.98.0</entry></row>
 	     </thead><tbody>
-         <row><entry>Hadoop-0.20.205</entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-0.22.x  </entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-1.0.0-1.0.2<footnote><para>HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.</para></footnote>   </entry><entry>S</entry>          <entry>S</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-1.0.3+</entry><entry>S</entry>          <entry>S</entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-1.1.x   </entry><entry>NT</entry>         <entry>S</entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-0.23.x  </entry><entry>X</entry>          <entry>S</entry>           <entry>NT</entry></row>
-         <row><entry>Hadoop-2.0.x-alpha     </entry><entry>X</entry>          <entry>NT</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-2.1.0-beta     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry></row>
+          <row><entry>Hadoop-0.20.205</entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-0.22.x  </entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.0.0-1.0.2<footnote><para>HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.</para></footnote>   </entry><entry>S</entry>          <entry>S</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.0.3+</entry><entry>S</entry>          <entry>S</entry>           <entry>S</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.1.x   </entry><entry>NT</entry>         <entry>S</entry>           <entry>S</entry><entry>X</entry></row>
+          <row><entry>Hadoop-0.23.x  </entry><entry>X</entry>          <entry>S</entry>           <entry>NT</entry><entry>X</entry></row>
+          <row><entry>Hadoop-2.0.x-alpha     </entry><entry>X</entry>          <entry>NT</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-2.1.0-beta     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry><entry>X</entry></row>
          <row><entry>Hadoop-2.2.0     </entry><entry>X</entry>          <entry>NT<footnote><para>To get 0.94.x to run on hadoop 2.2.0,
                          you need to change the hadoop 2 and protobuf versions in the <filename>pom.xml</filename> and then
                          build against the hadoop 2 profile by running something like the following command:
@@ -278,8 +278,8 @@
          <slf4j.version>1.6.1</slf4j.version>
        </properties>
        <dependencies>]]></programlisting>
-         </para></footnote></entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-2.x     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry></row>
+          </para></footnote></entry>           <entry>S</entry><entry>S</entry></row>
+          <row><entry>Hadoop-2.x     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry><entry>S</entry></row>
 		 </tbody></tgroup></table>
 
         Where
@@ -515,13 +515,16 @@
                 </para>
             	<para>To start up an extra backup master(s) on the same server run...
                        <programlisting>% bin/local-master-backup.sh start 1</programlisting>
-                       ... the '1' means use ports 60001 &amp; 60011, and this backup master's logfile will be at <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>.
+                       ... the '1' means use ports 16001 &amp; 16011, and this backup master's
+		       logfile will be at 
+		       <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>.
                 </para>
                 <para>To startup multiple backup masters run... <programlisting>% bin/local-master-backup.sh start 2 3</programlisting> You can start up to 9 backup masters (10 total).
  				</para>
 				<para>To start up more regionservers...
      			  <programlisting>% bin/local-regionservers.sh start 1</programlisting>
-     			where '1' means use ports 60201 &amp; 60301 and its logfile will be at <filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>.
+			... where '1' means use ports 16201 &amp; 16301 and its logfile will be at 
+			`<filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>.
      			</para>
      			<para>To add 4 more regionservers in addition to the one you just started by running... <programlisting>% bin/local-regionservers.sh start 2 3 4 5</programlisting>
      			This supports up to 99 extra regionservers (100 total).
@@ -678,14 +681,18 @@
 
 
         <para>HBase also puts up a UI listing vital attributes. By default its
-        deployed on the Master host at port 60010 (HBase RegionServers listen
-        on port 60020 by default and put up an informational http server at
-        60030). If the Master were running on a host named
+        deployed on the Master host at port 16010 (HBase RegionServers listen
+        on port 16020 by default and put up an informational http server at
+        16030). If the Master were running on a host named
         <varname>master.example.org</varname> on the default port, to see the
         Master's homepage you'd point your browser at
-        <filename>http://master.example.org:60010</filename>.</para>
+        <filename>http://master.example.org:16010</filename>.</para>
 
-
+	<para>Prior to HBase 0.98, the default ports the master ui was deployed
+	on port 16010, and the HBase RegionServers would listen
+        on port 16020 by default and put up an informational http server at
+        16030.
+	</para>
 
     <para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
         create tables, add data, scan your insertions, and finally disable and
@@ -1081,74 +1088,11 @@
       </para>
       <para>See <xref linkend="compression" /> for more information.</para>
       </section>
-      <section xml:id="bigger.regions">
-      <title>Bigger Regions</title>
-      <para>
-      Consider going to larger regions to cut down on the total number of regions
-      on your cluster. Generally less Regions to manage makes for a smoother running
-      cluster (You can always later manually split the big Regions should one prove
-      hot and you want to spread the request load over the cluster).  A lower number of regions is
-       preferred, generally in the range of 20 to low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.
-       </para>
-       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
-       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
-       </para>
-       <para>You may need to experiment with this setting based on your hardware configuration and application needs.
-       </para>
-       <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
-       RegionSize can also be set on a per-table basis via
-       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
-      </para>
-      <section xml:id="too_many_regions">
-          <title>How many regions per RegionServer?</title>
-          <para>
-              Typically you want to keep your region count low on HBase for numerous reasons.
-              Usually right around 100 regions per RegionServer has yielded the best results.
-              Here are some of the reasons below for keeping region count low:
-              <orderedlist>
-                  <listitem><para>
-                          MSLAB requires 2mb per memstore (that's 2mb per family per region).
-                          1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
-                  </para></listitem>
-                  <listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
-                          flushes when you have too many regions which in turn generates compactions.
-                          Rewriting the same data tens of times is the last thing you want.
-                          An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
-                          usage of 5GB (the region server would have a big heap).
-                          Once it reaches 5GB it will force flush the biggest region,
-                          at that point they should almost all have about 5MB of data so
-                          it would flush that amount. 5MB inserted later, it would flush another
-                          region that will now have a bit over 5MB of data, and so on.
-                          A basic formula for the amount of regions to have per region server would
-                          look like this:
-                          Heap * upper global memstore limit = amount of heap devoted to memstore
-                          then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
-                          This will give you the rough memstore size if everything is being written to.
-                          A more accurate formula is
-                          Heap * upper global memstore limit = amount of heap devoted to memstore then the
-                          amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
-                          This can allot you a higher region count from the write perspective if you know how many
-                          regions you will be writing to at one time.
-                  </para></listitem>
-                  <listitem><para>The master as is is allergic to tons of regions, and will
-                          take a lot of time assigning them and moving them around in batches.
-                          The reason is that it's heavy on ZK usage, and it's not very async
-                          at the moment (could really be improved -- and has been imporoved a bunch
-                          in 0.96 hbase).
-                  </para></listitem>
-                  <listitem><para>
-                          In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
-                          on a few RS can cause the store file index to rise raising heap usage and can
-                          create memory pressure or OOME on the RSs
-                  </para></listitem>
-          </orderedlist>
-      </para>
-      <para>Another issue is the effect of the number of regions on mapreduce jobs.
-          Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
-      </para>
-      </section>
-
+      <section xml:id="config.wals"><title>Configuring the size and number of WAL files</title>
+      <para>HBase uses <xref linkend="wal" /> to recover the memstore data that has not been flushed to disk in case of an RS failure. These WAL files should be configured to be slightly smaller than HDFS block (by default, HDFS block is 64Mb and WAL file is ~60Mb).</para>
+      <para>HBase also has a limit on number of WAL files, designed to ensure there's never too much data that needs to be replayed during recovery. This limit needs to be set according to memstore configuration, so that all the necessary data would fit. It is recommended to allocated enough WAL files to store at least that much data (when all memstores are close to full).
+      For example, with 16Gb RS heap, default memstore settings (0.4), and default WAL file size (~60Mb), 16Gb*0.4/60, the starting point for WAL file count is ~109.
+      However, as all memstores are not expected to be full all the time, less WAL files can be allocated.</para>
       </section>
       <section xml:id="disable.splitting">
       <title>Managed Splitting</title>

diff --git a/src/main/docbkx/developer.xml b/src/main/docbkx/developer.xml
index 268a0b9..1e6b0d4 100644
--- a/src/main/docbkx/developer.xml
+++ b/src/main/docbkx/developer.xml

@@ -963,14 +963,25 @@
        <section xml:id="maven.build.hadoop">
           <title>Building against various hadoop versions.</title>
           <para>As of 0.96, Apache HBase supports building against Apache Hadoop versions: 1.0.3, 2.0.0-alpha and 3.0.0-SNAPSHOT.
-          By default, we will build with Hadoop-1.0.3. To change the version to run with Hadoop-2.0.0-alpha, you would run:</para>
-         <programlisting>mvn -Dhadoop.profile=2.0 ...</programlisting>
+	  By default, in 0.96 and earlier, we will build with Hadoop-1.0.x. 
+          As of 0.98, Hadoop 1.x is deprecated and Hadoop 2.x is the default.
+          To change the version to build against, add a hadoop.profile property when you invoke <command>mvn</command>:</para>
+         <programlisting>mvn -Dhadoop.profile=1.0 ...</programlisting>
          <para>
-         That is, designate build with hadoop.profile 2.0.  Pass 2.0 for hadoop.profile to build against hadoop 2.0.
-         Tests may not all pass as of this writing so you may need to pass <code>-DskipTests</code> unless you are inclined
-          to fix the failing tests.</para>
+         The above will build against whatever explicit hadoop 1.x version we have in our <filename>pom.xml</filename> as our '1.0' version.
+         Tests may not all pass so you may need to pass <code>-DskipTests</code> unless you are inclined to fix the failing tests.</para>
+<note id="maven.build.passing.default.profile">
+<title>'dependencyManagement.dependencies.dependency.artifactId' for org.apache.hbase:${compat.module}:test-jar with value '${compat.module}' does not match a valid id pattern</title>
+<para>You will see ERRORs like the above title if you pass the <emphasis>default</emphasis> profile; e.g. if
+you pass <property>hadoop.profile=1.1</property> when building 0.96 or
+<property>hadoop.profile=2.0</property> when building hadoop 0.98; just drop the
+hadoop.profile stipulation in this case to get your build to run again.  This seems to be a maven
+pecularity that is probably fixable but we've not spent the time trying to figure it.</para>
+</note>
+
           <para>
-         Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
+         Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a
+         deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
          </para>
          <para>
          In earilier verions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.

diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index d55fad6..c12f8d4 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml

@@ -34,13 +34,106 @@
   <section xml:id="tools">
     <title >HBase Tools and Utilities</title>
 
-    <para>Here we list HBase tools for administration, analysis, fixup, and
-    debugging.</para>
+    <para>Here we list HBase tools for administration, analysis, fixup, and debugging.</para>
+    <section xml:id="canary"><title>Canary</title>
+<para>There is a Canary class can help users to canary-test the HBase cluster status, with every column-family for every regions or regionservers granularity. To see the usage,
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help</programlisting>
+Will output
+<programlisting>Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..]
+ where [opts] are:
+   -help          Show this help and exit.
+   -regionserver  replace the table argument to regionserver,
+      which means to enable regionserver mode
+   -daemon        Continuous check at defined intervals.
+   -interval &lt;N>  Interval between checks (sec)
+   -e             Use region/regionserver as regular expression
+      which means the region/regionserver is regular expression pattern
+   -f &lt;B>         stop whole program if first error occurs, default is true
+   -t &lt;N>         timeout for a check, default is 600000 (milisecs)</programlisting>
+This tool will return non zero error codes to user for collaborating with other monitoring tools, such as Nagios.
+The error code definitions are...
+<programlisting>private static final int USAGE_EXIT_CODE = 1;
+private static final int INIT_ERROR_EXIT_CODE = 2;
+private static final int TIMEOUT_ERROR_EXIT_CODE = 3;
+private static final int ERROR_EXIT_CODE = 4;</programlisting>
+Here are some examples based on the following given case. There are two HTable called test-01 and test-02, they have two column family cf1 and cf2 respectively, and deployed on the 3 regionservers. see following table.
+	     <table>
+		 <tgroup cols='3' align='center' colsep='1' rowsep='1'><colspec colname='regionserver' align='center'/><colspec colname='test-01' align='center'/><colspec colname='test-02' align='center'/>
+         <thead>
+         <row><entry>RegionServer</entry><entry>test-01</entry><entry>test-02</entry></row>
+	     </thead><tbody>
+          <row><entry>rs1</entry><entry>r1</entry>          <entry>r2</entry></row>
+          <row><entry>rs2</entry><entry>r2</entry>          <entry></entry></row>
+          <row><entry>rs3</entry><entry>r2</entry>          <entry>r1</entry></row>
+		 </tbody></tgroup></table>
+Following are some examples based on the previous given case.
+</para>
+<section><title>Canary test for every column family (store) of every region of every table</title>
+<para>
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary</programlisting>
+The output log is...
+<programlisting>13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf1 in 2ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf2 in 2ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,0004883,1386230156732.87b55e03dfeade00f441125159f8ca87. column family cf1 in 4ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,0004883,1386230156732.87b55e03dfeade00f441125159f8ca87. column family cf2 in 1ms
+...
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,,1386559511167.aa2951a86289281beee480f107bb36ee. column family cf1 in 5ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,,1386559511167.aa2951a86289281beee480f107bb36ee. column family cf2 in 3ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,0004883,1386559511167.cbda32d5e2e276520712d84eaaa29d84. column family cf1 in 31ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,0004883,1386559511167.cbda32d5e2e276520712d84eaaa29d84. column family cf2 in 8ms
+</programlisting>
+So you can see, table test-01 has two regions and two column families, so the Canary tool will pick 4 small piece of data from 4 (2 region * 2 store) different stores. This is a default behavior of the this tool does.
+</para>
+    </section>
+
+<section><title>Canary test for every column family (store) of every region of specific table(s)</title>
+<para>
+You can also test one or more specific tables.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary test-01 test-02</programlisting>
+</para>
+    </section>
+
+<section><title>Canary test with regionserver granularity</title>
+<para>
+This will pick one small piece of data from each regionserver, and can also put your resionserver name as input options for canary-test specific regionservers.
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver</programlisting>
+The output log is...
+<programlisting>13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs2 in 72ms
+13/12/09 06:05:17 INFO tool.Canary: Read from table:test-02 on region server:rs3 in 34ms
+13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs1 in 56ms</programlisting>
+</para>
+    </section>
+<section><title>Canary test with regular expression pattern</title>
+<para>
+This will test both table test-01 and test-02.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -e test-0[1-2]</programlisting>
+</para>
+    </section>
+
+<section><title>Run canary test as daemon mode</title>
+<para>
+Run repeatedly with interval defined in option -interval whose default value is 6 seconds. This daemon will stop itself and return non-zero error code if any error occurs, due to the default value of option -f is true.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -daemon</programlisting>
+Run repeatedly with internal 5 seconds and will not stop itself even error occurs in the test.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -daemon -interval 50000 -f false</programlisting>
+</para>
+    </section>
+
+<section><title>Force timeout if canary test stuck</title>
+<para>In some cases, we suffered the request stucked on the regionserver and not response back to the client. The regionserver in problem, would also not indicated to be dead by Master, which would bring the clients hung. So we provide the timeout option to kill the canary test forcefully and return non-zero error code as well.
+This run sets the timeout value to 60 seconds, the default value is 600 seconds.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -t 600000</programlisting>
+</para>
+    </section>
+
+    </section>
+
     <section xml:id="health.check"><title>Health Checker</title>
         <para>You can configure HBase to run a script on a period and if it fails N times (configurable), have the server exit.
             See <link xlink:ref="">HBASE-7351 Periodic health check script</link> for configurations and detail.
         </para>
     </section>
+
     <section xml:id="driver"><title>Driver</title>
       <para>There is a <code>Driver</code> class that is executed by the HBase jar can be used to invoke frequently accessed utilities.  For example,
 <programlisting>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar
@@ -142,6 +235,10 @@
         <note><title>Scanner Caching</title>
         <para>Caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
         </para>
+	</note>
+	<note><title>Versions</title>
+        <para>By default, CopyTable utility only copies the latest version of row cells unless <code>--versions=n</code> is explicitly specified in the command.
+        </para>
         </note>
         <para>
         See Jonathan Hsieh's <link xlink:href="http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/">Online HBase Backups with CopyTable</link> blog post for more on <command>CopyTable</command>.
@@ -162,6 +259,10 @@
 <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import &lt;tablename&gt; &lt;inputdir&gt;
 </programlisting>
        </para>
+       <para>To import 0.94 exported files in a 0.96 cluster or onwards, you need to set system property "hbase.import.version" when running the import command as below:
+<programlisting>$ bin/hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import &lt;tablename&gt; &lt;inputdir&gt;
+</programlisting>
+       </para>
     </section>
     <section xml:id="importtsv">
        <title>ImportTsv</title>
@@ -920,37 +1021,73 @@
     </section>
   </section>  <!--  snapshots -->
 
-  <section xml:id="ops.capacity"><title>Capacity Planning</title>
-    <section xml:id="ops.capacity.storage"><title>Storage</title>
-      <para>A common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
-      There are several apsects to consider, the most important of which is what data load into the cluster.  Start
-      with a solid understanding of how HBase handles data internally (KeyValue).
-      </para>
-      <section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
-        <para>HBase storage will be dominated by KeyValues.  See <xref linkend="keyvalue" /> and <xref linkend="keysize" /> for
-        how HBase stores data internally.
-        </para>
-        <para>It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
-        rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
-        factor.
-        </para>
-      </section>
-      <section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
-        <para>KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
-        Blocks are aggregated into StoreFile's.  See <xref linkend="regions.arch" />.
-        </para>
-      </section>
-      <section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
-        <para>Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
-        </para>
-      </section>
-    </section>
-    <section xml:id="ops.capacity.regions"><title>Regions</title>
-      <para>Another common question for HBase administrators is determining the right number of regions per
-      RegionServer.  This affects both storage and hardware planning. See <xref linkend="perf.number.of.regions" />.
-      </para>
-    </section>
-  </section>
+  <section xml:id="ops.capacity"><title>Capacity Planning and Region Sizing</title>
+    <para>There are several considerations when planning the capacity for an HBase cluster and performing the initial configuration. Start with a solid understanding of how HBase handles data internally.</para>
+    <section xml:id="ops.capacity.nodes"><title>Node count and hardware/VM configuration</title>
+      <section xml:id="ops.capacity.nodes.datasize"><title>Physical data size</title>
+<para>Physical data size on disk is distinct from logical size of your data and is affected by the following:
+<itemizedlist>
+<listitem>Increased by HBase overhead
+<itemizedlist>
+<listitem>See <xref linkend="keyvalue" /> and <xref linkend="keysize" />. At least 24 bytes per key-value (cell), can be more. Small keys/values means more relative overhead.</listitem>
+<listitem>KeyValue instances are aggregated into blocks, which are indexed. Indexes also have to be stored. Blocksize is configurable on a per-ColumnFamily basis. See <xref linkend="regions.arch" />.</listitem>
+</itemizedlist></listitem>
+<listitem>Decreased by <xref linkend="compression" xrefstyle="template:compression" /> and data block encoding, depending on data. See also <ulink url="http://search-hadoop.com/m/lL12B1PFVhp1">this thread</ulink>. You might want to test what compression and encoding (if any) make sense for your data.</listitem>
+<listitem>Increased by size of region server <xref linkend="wal" xrefstyle="template:WAL" /> (usually fixed and negligible - less than half of RS memory size, per RS).</listitem>
+<listitem>Increased by HDFS replication - usually x3.</listitem>
+</itemizedlist></para>
+<para>Aside from the disk space necessary to store the data, one RS may not be able to serve arbitrarily large amounts of data due to some practical limits on region count and size (see <xref linkend="ops.capacity.regions" xrefstyle="template:below" />).</para>
+      </section> <!-- ops.capacity.nodes.datasize -->
+      <section xml:id="ops.capacity.nodes.throughput"><title>Read/Write throughput</title>
+<para>Number of nodes can also be driven by required thoughput for reads and/or writes. The  throughput one can get per node depends a lot on data (esp. key/value sizes) and request patterns, as well as node and system configuration. Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count. PerformanceEvaluation and <xref linkend="ycsb" xrefstyle="template:YCSB" /> tools can be used to test single node or a test cluster.</para>
+<para>For write, usually 5-15Mb/s per RS can be expected, since every region server has only one active WAL. There's no good estimate for reads, as it depends vastly on data, requests, and cache hit rate. <xref linkend="perf.casestudy" /> might be helpful.</para>
+      </section> <!-- ops.capacity.nodes.throughput -->
+      <section xml:id="ops.capacity.nodes.gc"><title>JVM GC limitations</title>
+<para>RS cannot currently utilize very large heap due to cost of GC. There's also no good way of running multiple RS-es per server (other than running several VMs per machine). Thus, ~20-24Gb or less memory dedicated to one RS is recommended. GC tuning is required for large heap sizes. See <xref linkend="gcpause" />, <xref linkend="trouble.log.gc" /> and elsewhere (TODO: where?)</para>
+      </section> <!-- ops.capacity.nodes.gc -->
+    </section> <!-- ops.capacity.nodes -->
+    <section xml:id="ops.capacity.regions"><title>Determining region count and size</title>
+<para>Generally less regions makes for a smoother running cluster (you can always manually split the big regions later (if necessary) to spread the data, or request load, over the cluster); 20-200 regions per RS is a reasonable range. The number of regions cannot be configured directly (unless you go for fully <xref linkend="disable.splitting" xrefstyle="template:manual splitting" />); adjust the region size to achieve the target region size given table size.</para>
+<para>When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>, as well as shell commands. These settings will override the ones in <varname>hbase-site.xml</varname>. That is useful if your tables have different workloads/use cases.</para>
+<para>Also note that in the discussion of region sizes here, <emphasis role="bold">HDFS replication factor is not (and should not be) taken into account, whereas other factors <xref linkend="ops.capacity.nodes.datasize" xrefstyle="template:above" /> should be.</emphasis> So, if your data is compressed and replicated 3 ways by HDFS, "9 Gb region" means 9 Gb of compressed data. HDFS replication factor only affects your disk usage and is invisible to most HBase code.</para>
+      <section xml:id="ops.capacity.regions.count"><title>Number of regions per RS - upper bound</title>
+<para>In production scenarios, where you have a lot of data, you are normally concerned with the maximum number of regions you can have per server. <xref linkend="too_many_regions" /> has technical discussion on the subject; in short, maximum number of regions is mostly determined by memstore memory usage. Each region has its own memstores; these grow up to a configurable size; usually in 128-256Mb range, see <xref linkend="hbase.hregion.memstore.flush.size" />. There's one memstore per column family (so there's only one per region if there's one CF in the table). RS dedicates some fraction of total memory (see <xref linkend="hbase.regionserver.global.memstore.upperLimit" />) to region memstores. If this memory is exceeded (too much memstore usage), undesirable consequences such as unresponsive server, or later compaction storms, can result. Thus, a good starting point for the number of regions per RS (assuming one table) is <programlisting>(RS memory)*(total memstore fraction)/((memstore size)*(# column families))</programlisting>
+E.g. if RS has 16Gb RAM, with default settings, it is 16384*0.4/128 ~ 51 regions per RS is a starting point. The formula can be extended to multiple tables; if they all have the same configuration, just use total number of families.</para>
+<para>This number can be adjusted; the formula above assumes all your regions are filled at approximately the same rate. If only a fraction of your regions are going to be actively written to, you can divide the result by that fraction to get a larger region count. Then, even if all regions are written to, all region memstores are not filled evenly, and eventually jitter appears even if they are (due to limited number of concurrent flushes). Thus, one can have as many as 2-3 times more regions than the starting point; however, increased numbers carry increased risk.</para>
+<para>For write-heavy workload, memstore fraction can be increased in configuration at the expense of block cache; this will also allow one to have more regions.</para>
+      </section> <!-- ops.capacity.regions.count -->
+      <section xml:id="ops.capacity.regions.mincount"><title>Number of regions per RS - lower bound</title>
+<para>HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly the entire        cluster will be idle. This really can't be stressed enough, since a common problem is loading 200MB data into HBase and then wondering why your awesome 10 node cluster isn't doing anything.</para>
+<para>On the other hand, if you have a very large amount of data, you may also want to go for a larger number of regions to avoid having regions that are too large.</para>
+      </section> <!-- ops.capacity.regions.mincount -->
+      <section xml:id="ops.capacity.regions.size"><title>Maximum region size</title>
+<para>For large tables in production scenarios, maximum region size is mostly limited by compactions - very large compactions, esp. major, can degrade cluster performance. Currently, the recommended maximum region size is 10-20Gb, and 5-10Gb is optimal. For older 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.</para>
+<para>The size at which the region is split into two is generally configured via <xref linkend="hbase.hregion.max.filesize" />; for details, see <xref linkend="arch.region.splits" />.</para>
+<para>If you cannot estimate the size of your tables well, when starting off, it's probably best to stick to the default region size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up).</para>
+<para>In HBase 0.98, experimental stripe compactions feature was added that would allow for larger regions, especially for log data. See <xref linkend="ops.stripe" />.</para>
+      </section> <!-- ops.capacity.regions.size -->
+      <section xml:id="ops.capacity.regions.total"><title>Total data size per region server</title>
+<para>According to above numbers for region size and number of regions per region server, in an optimistic estimate 10 GB x 100 regions per RS will give up to 1TB served per region server, which is in line with some of the reported multi-PB use cases. However, it is important to think about the data vs cache size ratio at the RS level. With 1TB of data per server and 10 GB block cache, only 1% of the data will be cached, which may barely cover all block indices.</para>
+      </section> <!-- ops.capacity.regions.total -->
+    </section> <!-- ops.capacity.regions -->
+    <section xml:id="ops.capacity.config"><title>Initial configuration and tuning</title>
+<para>First, see <xref linkend="important_configurations" />. Note that some configurations, more than others, depend on specific scenarios. Pay special attention to 
+<itemizedlist>
+<listitem><xref linkend="hbase.regionserver.handler.count" /> - request handler thread count, vital for high-throughput workloads.</listitem>
+<listitem><xref linkend="config.wals" /> - the blocking number of WAL files depends on your memstore configuration and should be set accordingly to prevent potential blocking when doing high volume of writes.</listitem>
+</itemizedlist></para>
+<para>Then, there are some considerations when setting up your cluster and tables.</para>
+      <section xml:id="ops.capacity.config.compactions"><title>Compactions</title>
+<para>Depending on read/write volume and latency requirements, optimal compaction settings may be different. See <xref linkend="compaction" /> for some details.</para>
+<para>When provisioning for large data sizes, however, it's good to keep in mind that compactions can affect write throughput. Thus, for write-intensive workloads, you may opt for less frequent compactions and more store files per regions. Minimum number of files for compactions (<varname>hbase.hstore.compaction.min</varname>) can be set to higher value; <xref linkend="hbase.hstore.blockingStoreFiles" /> should also be increased, as more files might accumulate in such case. You may also consider manually managing compactions: <xref linkend="managed.compactions" /></para>
+      </section> <!-- ops.capacity.config.compactions -->
+      <section xml:id="ops.capacity.config.presplit"><title>Pre-splitting the table</title>
+<para>Based on the target number of the regions per RS (see <xref linkend="ops.capacity.regions.count" xrefstyle="template:above" />) and number of RSes, one can pre-split the table at creation time. This would both avoid some costly splitting as the table starts to fill up, and ensure that the table starts out already distributed across many servers.</para>
+<para>If the table is expected to grow large enough to justify that, at least one region per RS should be created. It is not recommended to split immediately into the full target number of regions (e.g. 50 * number of RSes), but a low intermediate value can be chosen. For multiple tables, it is recommended to be conservative with presplitting (e.g. pre-split 1 region per RS at most), especially if you don't know how much each table will grow. If you split too much, you may end up with too many regions, with some tables having too many small regions.</para>
+<para>For pre-splitting howto, see <xref linkend="precreate.regions" />.</para>
+      </section> <!-- ops.capacity.config.presplit -->
+    </section> <!-- ops.capacity.config -->
+  </section> <!-- ops.capacity -->
   <section xml:id="table.rename"><title>Table Rename</title>
       <para>In versions 0.90.x of hbase and earlier, we had a simple script that would rename the hdfs
           table directory and then do an edit of the .META. table replacing all mentions of the old

diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index d23554f..58656a7 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml

@@ -156,15 +156,6 @@
 
     <para>See <xref linkend="recommended_configurations" />.</para>
 
-
-    <section xml:id="perf.number.of.regions">
-      <title>Number of Regions</title>
-
-      <para>The number of regions for an HBase table is driven by the <xref
-              linkend="bigger.regions" />. Also, see the architecture
-          section on <xref linkend="arch.regions.size" /></para>
-    </section>
-
     <section xml:id="perf.compactions.and.splits">
       <title>Managing Compactions</title>
 
@@ -248,7 +239,7 @@
     <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the
     event where certain tables require different regionsizes than the configured default regionsize.
     </para>
-    <para>See <xref linkend="perf.number.of.regions"/> for more information.
+    <para>See <xref linkend="ops.capacity.regions"/> for more information.
     </para>
     </section>
     <section xml:id="schema.bloom">

diff --git a/src/main/docbkx/security.xml b/src/main/docbkx/security.xml
index f54f227..7af0c09 100644
--- a/src/main/docbkx/security.xml
+++ b/src/main/docbkx/security.xml

@@ -378,7 +378,68 @@
 
 </section>  <!-- Simple User Access to Apache HBase -->
 
-
+<section xml:id="hbase.tags">
+<title>Tags</title>
+<para>
+	Every cell can have metadata associated with it.  Adding metadata in the data part of every cell would make things difficult.
+</para>
+<para>
+	The 0.98 version of HBase solves this problem by providing Tags along with the cell format. 
+	Some of the usecases that uses the tags are Visibility labels, Cell level ACLs, etc.
+</para>
+<para>
+	HFile V3 version from 0.98 onwards supports tags and this feature can be turned on using the following configuration
+</para>
+<programlisting><![CDATA[
+      <property>
+	    <name>hfile.format.version</name>
+        <value>3</value>
+      </property>
+    ]]></programlisting>
+<para>
+	Every cell can have zero or more tags. Every tag has a type and the actual tag byte array.
+	The types <command>0-31</command> are reserved for System tags.  For example ‘1’ is reserved for ACL and ‘2’ is reserved for Visibility tags.
+</para>
+<para>
+	The way rowkeys, column families, qualifiers and values are encoded using different Encoding Algos, similarly the tags can also be encoded.  
+	Tag encoding can be turned on per CF.  Default is always turn ON.
+	To turn on the tag encoding on the HFiles use
+</para>
+<programlisting><![CDATA[
+    HColumnDescriptor#setCompressTags(boolean compressTags)
+    ]]></programlisting>
+<para>
+	Note that encoding of tags takes place only if the DataBlockEncoder is enabled for the CF.
+</para>
+<para>
+	As we compress the WAL entries using Dictionary the tags present in the WAL can also be compressed using Dictionary.  
+	Every tag is compressed individually using WAL Dictionary.  To turn ON tag compression in WAL dictionary enable the property
+</para>
+<programlisting><![CDATA[
+    <property>
+    	<name>hbase.regionserver.wal.tags.enablecompression</name>
+    	<value>true</value>
+	</property>
+    ]]></programlisting>
+<para>
+	To add tags to every cell during Puts, the following apis are provided
+</para>
+<programlisting><![CDATA[
+	Put#add(byte[] family, byte [] qualifier, byte [] value, Tag[] tag)
+	Put#add(byte[] family, byte[] qualifier, long ts, byte[] value, Tag[] tag)
+    ]]></programlisting>
+<para>
+	Some of the feature developed using tags are Cell level ACLs and Visibility labels.  
+	These are some features that use tags framework and allows users to gain better security features on cell level.
+</para>
+<para>
+	For details checkout 
+</para>
+<para>
+    <link linkend='hbase.accesscontrol.configuration'>Access Control</link>
+    <link linkend='hbase.visibility.labels'>Visibility labels</link>
+</para>
+</section>
 <section xml:id="hbase.accesscontrol.configuration">
     <title>Access Control</title>
     <para>
@@ -423,7 +484,7 @@
     </para>
     <orderedlist>
       <listitem>
-        <para>Row-level or per value (cell): This would require broader changes for storing the ACLs inline with rows. It is a future goal.</para>
+        <para>Row-level or per value (cell): Using Tags in HFile V3</para>
       </listitem>
       <listitem>
         <para>Push down of file ownership to HDFS: HBase is not designed for the case where files may have different permissions than the HBase system principal. Pushing file ownership down into HDFS would necessitate changes to core code. Also, while HDFS file ownership would make applying quotas easy, and possibly make bulk imports more straightforward, it is not clear that it would offer a more secure setup.</para>
@@ -609,6 +670,47 @@
     ]]></programlisting>
     </section>
 
+    <section>
+    <title>Cell level Access Control using Tags</title>
+    <para>
+    	Prior to HBase 0.98 access control was restricted to table and column family level.  Thanks to tags feature in 0.98 that allows Access control on a cell level.
+		The existing Access Controller coprocessor helps in achieving cell level access control also.
+		For details on configuring it refer to <link linkend='hbase.accesscontrol.configuration'>Access Control</link> section.
+    </para>
+    <para>
+    	The ACLs can be specified for every mutation using the APIs
+    </para>
+    <programlisting><![CDATA[
+    	Mutation.setACL(String user, Permission perms)
+	  	Mutation.setACL(Map<String, Permission> perms)
+    ]]></programlisting>
+    <para>
+    	For example, to provide read permission to an user ‘user1’ then
+    </para>
+    <programlisting><![CDATA[
+    	put.setACL(“user1”, new Permission(Permission.Action.READ))
+    ]]></programlisting>
+    <para>
+    	Generally the ACL applied on the table and CF takes precedence over Cell level ACL.  In order to make the cell level ACL to take precedence use the following API,
+    </para>
+    <programlisting><![CDATA[
+    	Mutation.setACLStrategy(boolean cellFirstStrategy)
+    ]]></programlisting>
+    <para>
+    	Please note that inorder to use this feature, HFile V3 version should be turned on.
+    </para>
+    <programlisting><![CDATA[
+   		<property>
+			<name>hfile.format.version</name>
+			<value>3</value>
+		</property>
+     ]]></programlisting>
+    <para>
+    	Note that deletes with ACLs do not have any effect.
+		To keep things simple the ACLs applied on the current Put does not change the ACL of any previous Put in the sense
+		that the ACL on the current put does not affect older versions of Put for the same row.
+    </para>
+    </section>
     <section><title>Shell Enhancements for Access Control</title>
     <para>
 The HBase shell has been extended to provide simple commands for editing and updating user permissions. The following commands have been added for access control list management:
@@ -616,10 +718,13 @@
     Grant
     <para>
     <programlisting>
-    grant &lt;user&gt; &lt;permissions&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
+    grant &lt;user|@group&gt; &lt;permissions&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
     </programlisting>
     </para>
     <para>
+    <code class="code">&lt;user|@group&gt;</code> is user or group  (start with character '@'), Groups are created and manipulated via the Hadoop group mapping service.
+    </para>
+    <para>
     <code>&lt;permissions&gt;</code> is zero or more letters from the set "RWCA": READ('R'), WRITE('W'), CREATE('C'), ADMIN('A').
     </para>
     <para>
@@ -630,7 +735,7 @@
     </para>
     <para>
     <programlisting>
-    revoke &lt;user&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
+    revoke &lt;user|@group&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
     </programlisting>
     </para>
     <para>
@@ -639,7 +744,7 @@
     <para>
     The <code>alter</code> command has been extended to allow ownership assignment:
     <programlisting>
-      alter 'tablename', {OWNER => 'username'}
+      alter 'tablename', {OWNER => 'username|@group'}
     </programlisting>
     </para>
     <para>

diff --git a/src/main/docbkx/troubleshooting.xml b/src/main/docbkx/troubleshooting.xml
index e722df4..7a9caa4 100644
--- a/src/main/docbkx/troubleshooting.xml
+++ b/src/main/docbkx/troubleshooting.xml

@@ -257,7 +257,8 @@
            <title>Builtin Tools</title>
             <section xml:id="trouble.tools.builtin.webmaster">
               <title>Master Web Interface</title>
-              <para>The Master starts a web-interface on port 60010 by default.
+              <para>The Master starts a web-interface on port 16010 by default.
+	      (Up to and including 0.98 this was port 60010)
               </para>
               <para>The Master web UI lists created tables and their definition (e.g., ColumnFamilies, blocksize, etc.).  Additionally,
               the available RegionServers in the cluster are listed along with selected high-level metrics (requests, number of regions, usedHeap, maxHeap).
@@ -266,7 +267,8 @@
             </section>
             <section xml:id="trouble.tools.builtin.webregion">
               <title>RegionServer Web Interface</title>
-              <para>RegionServers starts a web-interface on port 60030 by default.
+              <para>RegionServers starts a web-interface on port 16030 by default.
+              (Up to an including 0.98 this was port 60030)
               </para>
               <para>The RegionServer web UI lists online regions and their start/end keys, as well as point-in-time RegionServer metrics (requests, regions, storeFileIndexSize, compactionQueueSize, etc.).
               </para>
@@ -1154,4 +1156,29 @@
       </para>
     </section>
 
+    <section xml:id="trouble.crypto">
+      <title>Cryptographic Features</title>
+        <section xml:id="trouble.crypto.HBASE-10132">
+           <title>sun.security.pkcs11.wrapper.PKCS11Exception: CKR_ARGUMENTS_BAD</title>
+<para>This problem manifests as exceptions ultimately caused by:</para>
+<programlisting>
+Caused by: sun.security.pkcs11.wrapper.PKCS11Exception: CKR_ARGUMENTS_BAD
+	at sun.security.pkcs11.wrapper.PKCS11.C_DecryptUpdate(Native Method)
+	at sun.security.pkcs11.P11Cipher.implDoFinal(P11Cipher.java:795)
+</programlisting>
+<para>
+This problem appears to affect some versions of OpenJDK 7 shipped by some Linux vendors. NSS is configured as the default provider. If the host has an x86_64 architecture, depending on if the vendor packages contain the defect, the NSS provider will not function correctly.
+</para>
+<para>
+To work around this problem, find the JRE home directory and edit the file <filename>lib/security/java.security</filename>. Edit the file to comment out the line:
+</para>
+<programlisting>
+security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/nss.cfg
+</programlisting>
+<para>
+Then renumber the remaining providers accordingly.
+</para>
+        </section>
+    </section>
+
   </chapter>

diff --git a/src/main/docbkx/upgrading.xml b/src/main/docbkx/upgrading.xml
index 869a44d..23e6e2e 100644
--- a/src/main/docbkx/upgrading.xml
+++ b/src/main/docbkx/upgrading.xml

@@ -333,13 +333,14 @@
 </para>
 </section>
 
-<section><title>Experimental off-heap cache
-</title>
+<section xml:id="slabcache"><title>Experimental off-heap cache: SlabCache</title>
 <para>
 A new cache was contributed to 0.92.0 to act as a solution between using the “on-heap” cache which is the current LRU cache the region servers have and the operating system cache which is out of our control.
-To enable, set “-XX:MaxDirectMemorySize” in hbase-env.sh to the value for maximum direct memory size and specify hbase.offheapcache.percentage in hbase-site.xml with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
-See this blog post for additional information on this new experimental feature: http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/
+To enable <emphasis>SlabCache</emphasis>, as this feature is being called, set “-XX:MaxDirectMemorySize” in hbase-env.sh to the value for maximum direct memory size and specify
+<property>hbase.offheapcache.percentage</property> in <filename>hbase-site.xml</filename> with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
+See this blog post, <link xlink:href="http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching in Apache HBase: SlabCache</link>, for additional information on this new experimental feature.
 </para>
+<para>This feature has mostly been eclipsed in later HBases.  See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7404 ">HBASE-7404 Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE</link>, etc.</para>
 </section>
 
 <section><title>Changes in HBase replication
commit	9d71453b085696e8745fec6fbcf9f9c778c77bb3	[log] [tgz]
author	Andrew Kyle Purtell <apurtell@apache.org>	Sat Jan 25 17:44:48 2014 +0000
committer	Andrew Kyle Purtell <apurtell@apache.org>	Sat Jan 25 17:44:48 2014 +0000
tree	8c5b75b821aa638c82612f51d890e514e2a60055
parent	f19b1b5babf00bbd172197b47340bc11c11f0950 [diff]