Latest docbook and site files from trunk

git-svn-id: https://svn.apache.org/repos/asf/hbase/tags/0.98.1RC0@1578154 13f79535-47bb-0310-9956-ffa450edef68
diff --git a/src/main/docbkx/book.xml b/src/main/docbkx/book.xml
index 3232e2e..43af6ec 100644
--- a/src/main/docbkx/book.xml
+++ b/src/main/docbkx/book.xml
@@ -39,14 +39,14 @@
            </inlinemediaobject>
        </link>
     </subtitle>
-    <copyright><year>2013</year><holder>Apache Software Foundation.
+    <copyright><year>2014</year><holder>Apache Software Foundation.
         All Rights Reserved.  Apache Hadoop, Hadoop, MapReduce, HDFS, Zookeeper, HBase, and the HBase project logo are trademarks of the Apache Software Foundation.
         </holder>
     </copyright>
       <abstract>
     <para>This is the official reference guide of
     <link xlink:href="http://www.hbase.org">Apache HBase&#153;</link>,
-    a distributed, versioned, column-oriented database built on top of
+    a distributed, versioned, big data store built on top of
     <link xlink:href="http://hadoop.apache.org/">Apache Hadoop&#153;</link> and
     <link xlink:href="http://zookeeper.apache.org/">Apache ZooKeeper&#153;</link>.
       </para>
@@ -216,8 +216,8 @@
 #create my_table in my_ns namespace
 create 'my_ns:my_table', 'fam'
 
-#delete namespace
-delete_namespace 'my_ns'
+#drop namespace
+drop_namespace 'my_ns'
 
 #alter namespace
 alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
@@ -1112,7 +1112,7 @@
       <section xml:id="arch.overview.hbasehdfs">
         <title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
           <para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files.
-          It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
+          Its documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
           HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
           This can sometimes be a point of conceptual confusion.  HBase internally puts your data in indexed "StoreFiles" that exist
           on HDFS for high-speed lookups.  See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals.
@@ -1251,7 +1251,7 @@
 	      </para>
 	      <para>Note: <code>htable.delete(Delete);</code> does not go in the writebuffer!  This only applies to Puts.
 	      </para>
-	      <para>For additional information on write durability, review the <link xlink:href="acid-semantics.html">ACID semantics</link> page.
+	      <para>For additional information on write durability, review the <link xlink:href="../acid-semantics.html">ACID semantics</link> page.
 	      </para>
        <para>For fine-grained control of batching of
            <classname>Put</classname>s or <classname>Delete</classname>s,
@@ -1739,41 +1739,49 @@
  </programlisting>
      For a description of what HBase files look like when written to HDFS, see <xref linkend="trouble.namenode.hbase.objects"/>.
             </para>
-
     <section xml:id="arch.regions.size">
-      <title>Region Size</title>
-
-      <para>Determining the "right" region size can be tricky, and there are a few factors
-      to consider:</para>
-
-      <itemizedlist>
-        <listitem>
-          <para>HBase scales by having regions across many servers. Thus if
-          you have 2 regions for 16GB data, on a 20 node machine your data
-          will be concentrated on just a few machines - nearly the entire
-          cluster will be idle.  This really cant be stressed enough, since a
-          common problem is loading 200MB data into HBase then wondering why
-          your awesome 10 node cluster isn't doing anything.</para>
-        </listitem>
-
-        <listitem>
-          <para>On the other hand, high region count has been known to make things slow.
-          This is getting better with each release of HBase, but it is probably better to have
-          700 regions than 3000 for the same amount of data.</para>
-        </listitem>
-
-        <listitem>
-          <para>There is not much memory footprint difference between 1 region
-          and 10 in terms of indexes, etc, held by the RegionServer.</para>
-        </listitem>
-      </itemizedlist>
-
-      <para>When starting off, it's probably best to stick to the default region-size, perhaps going
-      smaller for hot tables (or manually split hot regions to spread the load over
-      the cluster), or go with larger region sizes if your cell sizes tend to be
-      largish (100k and up).</para>
-      <para>See <xref linkend="bigger.regions"/> for more information on configuration.
+<para> In general, HBase is designed to run with a small (20-200) number of relatively large (5-20Gb) regions per server. The considerations for this are as follows:</para>
+<section xml:id="too_many_regions">
+          <title>Why cannot I have too many regions?</title>
+          <para>
+              Typically you want to keep your region count low on HBase for numerous reasons.
+              Usually right around 100 regions per RegionServer has yielded the best results.
+              Here are some of the reasons below for keeping region count low:
+              <orderedlist>
+                  <listitem><para>
+                          MSLAB requires 2mb per memstore (that's 2mb per family per region).
+                          1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
+                  </para></listitem>
+                  <listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
+                          flushes when you have too many regions which in turn generates compactions.
+                          Rewriting the same data tens of times is the last thing you want.
+                          An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
+                          usage of 5GB (the region server would have a big heap).
+                          Once it reaches 5GB it will force flush the biggest region,
+                          at that point they should almost all have about 5MB of data so
+                          it would flush that amount. 5MB inserted later, it would flush another
+                          region that will now have a bit over 5MB of data, and so on.
+                          This is currently the main limiting factor for the number of regions; see <xref linkend="ops.capacity.regions.count" />
+                          for detailed formula.
+                  </para></listitem>
+                  <listitem><para>The master as is is allergic to tons of regions, and will
+                          take a lot of time assigning them and moving them around in batches.
+                          The reason is that it's heavy on ZK usage, and it's not very async
+                          at the moment (could really be improved -- and has been imporoved a bunch
+                          in 0.96 hbase).
+                  </para></listitem>
+                  <listitem><para>
+                          In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
+                          on a few RS can cause the store file index to rise, increasing heap usage and potentially
+                          creating memory pressure or OOME on the RSs
+                  </para></listitem>
+          </orderedlist>
       </para>
+      </section>
+      <para>Another issue is the effect of the number of regions on mapreduce jobs; it is typical to have one mapper per HBase region.
+          Thus, hosting only 5 regions per RS may not be enough to get sufficient number of tasks for a mapreduce job, while 1000 regions will generate far too many tasks.
+      </para>
+      <para>See <xref linkend="ops.capacity.regions" /> for configuration guidelines.</para>
     </section>
 
       <section xml:id="regions.arch.assignment">
@@ -1833,9 +1841,12 @@
            <orderedlist>
              <listitem>First replica is written to local node
              </listitem>
-             <listitem>Second replica is written to another node in same rack
+             <listitem>Second replica is written to a random node on another rack
              </listitem>
-             <listitem>Third replica is written to a node in another rack (if sufficient nodes)
+             <listitem>Third replica is written on the same rack as the second, but on a different node chosen randomly
+             </listitem>
+             <listitem>Subsequent replicas are written on random nodes on the cluster
+<footnote><para>See <emphasis>Replica Placement: The First Baby Steps</emphasis> on this page: <link xlink:href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS Architecture</link></para></footnote>
              </listitem>
            </orderedlist>
           Thus, HBase eventually achieves locality for a region after a flush or a compaction.
@@ -1844,13 +1855,18 @@
           in the region, or the table is compacted and StoreFiles are re-written, they will become "local"
           to the RegionServer.
         </para>
-        <para>For more information, see <link xlink:href="http://hadoop.apache.org/common/docs/r0.20.205.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps">HDFS Design on Replica Placement</link>
+        <para>For more information, see <emphasis>Replica Placement: The First Baby Steps</emphasis> on this page: <link xlink:href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html">HDFS Architecture</link>
         and also Lars George's blog on <link xlink:href="http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html">HBase and HDFS locality</link>.
         </para>
       </section>
 
-      <section>
+      <section xml:id="arch.region.splits">
         <title>Region Splits</title>
+        <para>Regions split when they reach a configured threshold.
+        Below we treat the topic in short.  For a longer exposition,
+        see <link xlink:href="http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/">Apache HBase Region Splitting and Merging</link>
+        by our Enis Soztutar.
+        </para>
 
         <para>Splits run unaided on the RegionServer; i.e. the Master does not
         participate. The RegionServer splits a region, offlines the split
@@ -2891,7 +2907,13 @@
             <listitem>
                 <para>
                     Build and install <link xlink:href="http://code.google.com/p/snappy/">snappy</link> on all nodes
-                    of your cluster (see below)
+                    of your cluster (see below).  HBase nor Hadoop cannot include snappy because of licensing issues (The
+                    hadoop libhadoop.so under its native dir does not include snappy; of note, the shipped .so
+                    may be for 32-bit architectures -- this fact has tripped up folks in the past with them thinking
+                    it 64-bit).  The notes below are about installing snappy for HBase use.  You may want snappy
+                    available in your hadoop context also.  That is not covered here.
+                    HBase and Hadoop find the snappy .so in different locations currently: Hadoop picks those files in
+                    <filename>./lib</filename> while HBase finds the .so in <filename>./lib/[PLATFORM]</filename>.
                 </para>
             </listitem>
             <listitem>
@@ -2916,6 +2938,8 @@
     <title>
     Installation
     </title>
+    <para>Snappy is used by hbase to compress HFiles on flush and when compacting.
+    </para>
     <para>
         You will find the snappy library file under the .libs directory from your Snappy build (For example
         /home/hbase/snappy-1.0.5/.libs/). The file is called libsnappy.so.1.x.x where 1.x.x is the version of the snappy
@@ -2961,7 +2985,9 @@
   <appendix>
       <title xml:id="ycsb"><link xlink:href="https://github.com/brianfrankcooper/YCSB/">YCSB: The Yahoo! Cloud Serving Benchmark</link> and HBase</title>
       <para>TODO: Describe how YCSB is poor for putting up a decent cluster load.</para>
-      <para>TODO: Describe setup of YCSB for HBase</para>
+      <para>TODO: Describe setup of YCSB for HBase.  In particular, presplit your tables before you start
+          a run.  See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-4163">HBASE-4163 Create Split Strategy for YCSB Benchmark</link>
+          for why and a little shell command for how to do it.</para>
       <para>Ted Dunning redid YCSB so it's mavenized and added facility for verifying workloads.  See <link xlink:href="https://github.com/tdunning/YCSB">Ted Dunning's YCSB</link>.</para>
 
   </appendix>
@@ -3446,7 +3472,7 @@
        <section xml:id="other.info.videos"><title>HBase Videos</title>
          <para>Introduction to HBase
             <itemizedlist>
-			  <listitem><link xlink:href="http://www.cloudera.com/videos/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon">Introduction to HBase</link> by Todd Lipcon (Chicago Data Summit 2011).
+              <listitem><link xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/presentation/chicago_data_summit_apache_hbase_an_introduction_todd_lipcon.html">Introduction to HBase</link> by Todd Lipcon (Chicago Data Summit 2011).
 			  </listitem>
 			  <listitem><link xlink:href="http://www.cloudera.com/videos/intorduction-hbase-todd-lipcon">Introduction to HBase</link> by Todd Lipcon (2010).
 			  </listitem>
@@ -3458,7 +3484,7 @@
          </para>
        </section>
        <section xml:id="other.info.pres"><title>HBase Presentations (Slides)</title>
-         <para><link xlink:href="http://www.cloudera.com/resource/hadoop-world-2011-presentation-slides-advanced-hbase-schema-design">Advanced HBase Schema Design</link> by Lars George (Hadoop World 2011).
+         <para><link xlink:href="http://www.cloudera.com/content/cloudera/en/resources/library/hadoopworld/hadoop-world-2011-presentation-video-advanced-hbase-schema-design.html">Advanced HBase Schema Design</link> by Lars George (Hadoop World 2011).
          </para>
          <para><link xlink:href="http://www.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction">Introduction to HBase</link> by Todd Lipcon (Chicago Data Summit 2011).
          </para>
@@ -3526,67 +3552,7 @@
        </section>
   </appendix>
 
-  <appendix xml:id="tracing" ><title>Enabling Dapper-like Tracing in HBase</title>
-<para><link xlink:href="https://issues.apache.org/jira/browse/HBASE-6449">HBASE-6449</link> added support
-for tracing requests through HBase, using the open source tracing library,
-<link xlink:href="http://github.com/cloudera/htrace">HTrace</link>. Setting up tracing is quite simple,
-however it currently requires some very minor changes to your client code (it would not be very difficult to remove this requirement).
-</para>
-<section xml:id="tracing.spanreceivers"><title>SpanReceivers</title>
-<para>The tracing system works by collecting information in structs called ‘Spans’.
-It is up to you to choose how you want to receive this information by implementing the
-<classname>SpanReceiver</classname> interface, which defines one method:
-<programlisting>public void receiveSpan(Span span);</programlisting>
-This method serves as a callback whenever a span is completed. HTrace allows you to use
-as many SpanReceivers as you want so you can easily send trace information to multiple destinations.
-</para>
-
-<para>Configure what SpanReceivers you’d like to use by putting a comma separated list of the
-fully-qualified class name of classes implementing <classname>SpanReceiver</classname> in
-<filename>hbase-site.xml</filename> property: <varname>hbase.trace.spanreceiver.classes</varname>.
-</para>
-
-<para>HBase includes a <classname>HBaseLocalFileSpanReceiver</classname> that writes all span
-information to local files in a JSON-based format. The <classname>HBaseLocalFileSpanReceiver</classname>
-looks in <filename>hbase-site.xml</filename> for a <varname>hbase.trace.spanreceiver.localfilespanreceiver.filename</varname>
-property with a value describing the name of the file to which nodes should write their span information.
-</para>
-
-<para>If you do not want to use the included <classname>HBaseLocalFileSpanReceiver</classname>,
-you are encouraged to write your own receiver (take a look at <classname>HBaseLocalFileSpanReceiver</classname>
-for an example). If you think others would benefit from your receiver, file a JIRA or send a pull request to
-<link xlink:href="http://github.com/cloudera/htrace">HTrace</link>.
-</para>
-</section>
-<section xml:id="tracing.client.modifications">
-<title>Client Modifications</title>
-<para>Currently, you must turn on tracing in your client code. To do this, you simply turn on tracing for
-requests you think are interesting, and turn it off when the request is done.
-</para>
-
-<para>For example, if you wanted to trace all of your get operations, you change this:
-<programlisting>HTable table = new HTable(...);
-Get get = new Get(...);</programlisting>
-
-into:
-
-<programlisting>Span getSpan = Trace.startSpan(“doing get”, Sampler.ALWAYS);
-try {
-  HTable table = new HTable(...);
-  Get get = new Get(...);
-...
-} finally {
-  getSpan.stop();
-}</programlisting>
-
-If you wanted to trace half of your ‘get’ operations, you would pass in:
-<programlisting>new ProbabilitySampler(0.5)</programlisting> in lieu of <varname>Sampler.ALWAYS</varname> to <classname>Trace.startSpan()</classname>.
-See the HTrace <filename>README</filename> for more information on Samplers.
-</para>
-</section>
-
-  </appendix>
-
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tracing.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="rpc.xml" />
 
   <index xml:id="book_index">
diff --git a/src/main/docbkx/community.xml b/src/main/docbkx/community.xml
index 8eee9d1..ae5573f 100644
--- a/src/main/docbkx/community.xml
+++ b/src/main/docbkx/community.xml
@@ -99,6 +99,15 @@
               to the issue must happen in a new JIRA.
           </para>
       </section>
+      <section xml:id="no.permanent.state.in.zk">
+          <title>Only transient state in ZooKeeper!</title>
+          <para>
+You should be able to kill the data in zookeeper and hbase should ride over it recreating the zk content as it goes.
+This is an old adage around these parts.  We just made note of it now.  We also are currently in violation of this
+basic tenet -- replication at least keeps permanent state in zk -- but we are working to undo this breaking of a
+golden rule.
+          </para>
+      </section>
     </section>
     <section xml:id="community.roles">
       <title>Community Roles</title>
@@ -137,4 +146,12 @@
 </para>
       </section>
     </section>
+      <section xml:id="hbase.commit.msg.format">
+          <title>Commit Message format</title>
+          <para>We <link xlink:href="http://search-hadoop.com/m/Gwxwl10cFHa1">agreed</link>
+          to the following SVN commit message format:
+<programlisting>HBASE-xxxxx &lt;title>. (&lt;contributor>)</programlisting>
+If the person making the commit is the contributor, leave off the '(&lt;contributor>)' element.
+          </para>
+      </section>
     </chapter>
diff --git a/src/main/docbkx/configuration.xml b/src/main/docbkx/configuration.xml
index 9559a66..9e648d8 100644
--- a/src/main/docbkx/configuration.xml
+++ b/src/main/docbkx/configuration.xml
@@ -240,16 +240,16 @@
 		 <title>Hadoop version support matrix</title>
 		 <tgroup cols='4' align='left' colsep='1' rowsep='1'><colspec colname='c1' align='left'/><colspec colname='c2' align='center'/><colspec colname='c3' align='center'/><colspec colname='c4' align='center'/>
          <thead>
-	     <row><entry>               </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96.0</entry></row>
+         <row><entry>               </entry><entry>HBase-0.92.x</entry><entry>HBase-0.94.x</entry><entry>HBase-0.96.0</entry><entry>HBase-0.98.0</entry></row>
 	     </thead><tbody>
-         <row><entry>Hadoop-0.20.205</entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-0.22.x  </entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-1.0.0-1.0.2<footnote><para>HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.</para></footnote>   </entry><entry>S</entry>          <entry>S</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-1.0.3+</entry><entry>S</entry>          <entry>S</entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-1.1.x   </entry><entry>NT</entry>         <entry>S</entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-0.23.x  </entry><entry>X</entry>          <entry>S</entry>           <entry>NT</entry></row>
-         <row><entry>Hadoop-2.0.x-alpha     </entry><entry>X</entry>          <entry>NT</entry>           <entry>X</entry></row>
-         <row><entry>Hadoop-2.1.0-beta     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry></row>
+          <row><entry>Hadoop-0.20.205</entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-0.22.x  </entry><entry>S</entry>          <entry>X</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.0.0-1.0.2<footnote><para>HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.</para></footnote>   </entry><entry>S</entry>          <entry>S</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.0.3+</entry><entry>S</entry>          <entry>S</entry>           <entry>S</entry><entry>X</entry></row>
+          <row><entry>Hadoop-1.1.x   </entry><entry>NT</entry>         <entry>S</entry>           <entry>S</entry><entry>X</entry></row>
+          <row><entry>Hadoop-0.23.x  </entry><entry>X</entry>          <entry>S</entry>           <entry>NT</entry><entry>X</entry></row>
+          <row><entry>Hadoop-2.0.x-alpha     </entry><entry>X</entry>          <entry>NT</entry>           <entry>X</entry><entry>X</entry></row>
+          <row><entry>Hadoop-2.1.0-beta     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry><entry>X</entry></row>
          <row><entry>Hadoop-2.2.0     </entry><entry>X</entry>          <entry>NT<footnote><para>To get 0.94.x to run on hadoop 2.2.0,
                          you need to change the hadoop 2 and protobuf versions in the <filename>pom.xml</filename> and then
                          build against the hadoop 2 profile by running something like the following command:
@@ -278,8 +278,8 @@
          <slf4j.version>1.6.1</slf4j.version>
        </properties>
        <dependencies>]]></programlisting>
-         </para></footnote></entry>           <entry>S</entry></row>
-         <row><entry>Hadoop-2.x     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry></row>
+          </para></footnote></entry>           <entry>S</entry><entry>S</entry></row>
+          <row><entry>Hadoop-2.x     </entry><entry>X</entry>          <entry>NT</entry>           <entry>S</entry><entry>S</entry></row>
 		 </tbody></tgroup></table>
 
         Where
@@ -515,13 +515,16 @@
                 </para>
             	<para>To start up an extra backup master(s) on the same server run...
                        <programlisting>% bin/local-master-backup.sh start 1</programlisting>
-                       ... the '1' means use ports 60001 &amp; 60011, and this backup master's logfile will be at <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>.
+                       ... the '1' means use ports 16001 &amp; 16011, and this backup master's
+		       logfile will be at 
+		       <filename>logs/hbase-${USER}-1-master-${HOSTNAME}.log</filename>.
                 </para>
                 <para>To startup multiple backup masters run... <programlisting>% bin/local-master-backup.sh start 2 3</programlisting> You can start up to 9 backup masters (10 total).
  				</para>
 				<para>To start up more regionservers...
      			  <programlisting>% bin/local-regionservers.sh start 1</programlisting>
-     			where '1' means use ports 60201 &amp; 60301 and its logfile will be at <filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>.
+			... where '1' means use ports 16201 &amp; 16301 and its logfile will be at 
+			`<filename>logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log</filename>.
      			</para>
      			<para>To add 4 more regionservers in addition to the one you just started by running... <programlisting>% bin/local-regionservers.sh start 2 3 4 5</programlisting>
      			This supports up to 99 extra regionservers (100 total).
@@ -678,14 +681,18 @@
 
 
         <para>HBase also puts up a UI listing vital attributes. By default its
-        deployed on the Master host at port 60010 (HBase RegionServers listen
-        on port 60020 by default and put up an informational http server at
-        60030). If the Master were running on a host named
+        deployed on the Master host at port 16010 (HBase RegionServers listen
+        on port 16020 by default and put up an informational http server at
+        16030). If the Master were running on a host named
         <varname>master.example.org</varname> on the default port, to see the
         Master's homepage you'd point your browser at
-        <filename>http://master.example.org:60010</filename>.</para>
+        <filename>http://master.example.org:16010</filename>.</para>
 
-
+	<para>Prior to HBase 0.98, the default ports the master ui was deployed
+	on port 16010, and the HBase RegionServers would listen
+        on port 16020 by default and put up an informational http server at
+        16030.
+	</para>
 
     <para>Once HBase has started, see the <xref linkend="shell_exercises" /> for how to
         create tables, add data, scan your insertions, and finally disable and
@@ -1081,74 +1088,11 @@
       </para>
       <para>See <xref linkend="compression" /> for more information.</para>
       </section>
-      <section xml:id="bigger.regions">
-      <title>Bigger Regions</title>
-      <para>
-      Consider going to larger regions to cut down on the total number of regions
-      on your cluster. Generally less Regions to manage makes for a smoother running
-      cluster (You can always later manually split the big Regions should one prove
-      hot and you want to spread the request load over the cluster).  A lower number of regions is
-       preferred, generally in the range of 20 to low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number.
-       </para>
-       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.
-       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported (e.g., 20Gb).
-       </para>
-       <para>You may need to experiment with this setting based on your hardware configuration and application needs.
-       </para>
-       <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
-       RegionSize can also be set on a per-table basis via
-       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
-      </para>
-      <section xml:id="too_many_regions">
-          <title>How many regions per RegionServer?</title>
-          <para>
-              Typically you want to keep your region count low on HBase for numerous reasons.
-              Usually right around 100 regions per RegionServer has yielded the best results.
-              Here are some of the reasons below for keeping region count low:
-              <orderedlist>
-                  <listitem><para>
-                          MSLAB requires 2mb per memstore (that's 2mb per family per region).
-                          1000 regions that have 2 families each is 3.9GB of heap used, and it's not even storing data yet. NB: the 2MB value is configurable.
-                  </para></listitem>
-                  <listitem><para>If you fill all the regions at somewhat the same rate, the global memory usage makes it that it forces tiny
-                          flushes when you have too many regions which in turn generates compactions.
-                          Rewriting the same data tens of times is the last thing you want.
-                          An example is filling 1000 regions (with one family) equally and let's consider a lower bound for global memstore
-                          usage of 5GB (the region server would have a big heap).
-                          Once it reaches 5GB it will force flush the biggest region,
-                          at that point they should almost all have about 5MB of data so
-                          it would flush that amount. 5MB inserted later, it would flush another
-                          region that will now have a bit over 5MB of data, and so on.
-                          A basic formula for the amount of regions to have per region server would
-                          look like this:
-                          Heap * upper global memstore limit = amount of heap devoted to memstore
-                          then the amount of heap devoted to memstore / (Number of regions per RS * CFs).
-                          This will give you the rough memstore size if everything is being written to.
-                          A more accurate formula is
-                          Heap * upper global memstore limit = amount of heap devoted to memstore then the
-                          amount of heap devoted to memstore / (Number of actively written regions per RS * CFs).
-                          This can allot you a higher region count from the write perspective if you know how many
-                          regions you will be writing to at one time.
-                  </para></listitem>
-                  <listitem><para>The master as is is allergic to tons of regions, and will
-                          take a lot of time assigning them and moving them around in batches.
-                          The reason is that it's heavy on ZK usage, and it's not very async
-                          at the moment (could really be improved -- and has been imporoved a bunch
-                          in 0.96 hbase).
-                  </para></listitem>
-                  <listitem><para>
-                          In older versions of HBase (pre-v2 hfile, 0.90 and previous), tons of regions
-                          on a few RS can cause the store file index to rise raising heap usage and can
-                          create memory pressure or OOME on the RSs
-                  </para></listitem>
-          </orderedlist>
-      </para>
-      <para>Another issue is the effect of the number of regions on mapreduce jobs.
-          Keeping 5 regions per RS would be too low for a job, whereas 1000 will generate too many maps.
-      </para>
-      </section>
-
+      <section xml:id="config.wals"><title>Configuring the size and number of WAL files</title>
+      <para>HBase uses <xref linkend="wal" /> to recover the memstore data that has not been flushed to disk in case of an RS failure. These WAL files should be configured to be slightly smaller than HDFS block (by default, HDFS block is 64Mb and WAL file is ~60Mb).</para>
+      <para>HBase also has a limit on number of WAL files, designed to ensure there's never too much data that needs to be replayed during recovery. This limit needs to be set according to memstore configuration, so that all the necessary data would fit. It is recommended to allocated enough WAL files to store at least that much data (when all memstores are close to full).
+      For example, with 16Gb RS heap, default memstore settings (0.4), and default WAL file size (~60Mb), 16Gb*0.4/60, the starting point for WAL file count is ~109.
+      However, as all memstores are not expected to be full all the time, less WAL files can be allocated.</para>
       </section>
       <section xml:id="disable.splitting">
       <title>Managed Splitting</title>
diff --git a/src/main/docbkx/developer.xml b/src/main/docbkx/developer.xml
index 268a0b9..8d5167f 100644
--- a/src/main/docbkx/developer.xml
+++ b/src/main/docbkx/developer.xml
@@ -123,7 +123,6 @@
 Description	Resource	Path	Location	Type
 The project cannot be built until build path errors are resolved	hbase		Unknown	Java Problem
 Unbound classpath variable: 'M2_REPO/asm/asm/3.1/asm-3.1.jar' in project 'hbase'	hbase		Build path	Build Path Problem
-Unbound classpath variable: 'M2_REPO/com/github/stephenc/high-scale-lib/high-scale-lib/1.1.1/high-scale-lib-1.1.1.jar' in project 'hbase'	hbase		Build path	Build Path Problem
 Unbound classpath variable: 'M2_REPO/com/google/guava/guava/r09/guava-r09.jar' in project 'hbase'	hbase		Build path	Build Path Problem
 Unbound classpath variable: 'M2_REPO/com/google/protobuf/protobuf-java/2.3.0/protobuf-java-2.3.0.jar' in project 'hbase'	hbase		Build path	Build Path Problem Unbound classpath variable:
             </programlisting>
@@ -262,11 +261,26 @@
          <title>Making a Release Candidate</title>
          <para>I'll explain by running through the process.  See later in this section for more detail on particular steps.
          </para>
+	 <para>If you are making a point release (for example to quickly address a critical incompatability or security
+             problem) off of a release branch instead of a development branch the tagging instructions are slightly different.
+             I'll prefix those special steps with "Point Release Only".
+	 </para>
+
          <para>I would advise before you go about making a release candidate, do a practise run by deploying a SNAPSHOT.
              Also, make sure builds have been passing recently for the branch from where you are going to take your
              release.  You should also have tried recent branch tips out on a cluster under load running for instance
              our hbase-it integration test suite for a few hours to 'burn in' the near-candidate bits.
          </para>
+	 <note>
+	 <para>Point Release Only: At this point you should make svn copy of the previous release branch (ex: 0.96.1) with
+             the new point release tag (e.g. 0.96.1.1 tag).  Any commits with changes or mentioned below for the point release
+	     should be appled to the new tag. 
+	 </para>
+	 <para><programlisting>
+$ svn copy http://svn.apache.org/repos/asf/hbase/tags/0.96.1 http://svn.apache.org/repos/asf/hbase/tags/0.96.1.1
+$ svn checkout http://svn.apache.org/repos/asf/hbase/tags/0.96.1.1
+	 </programlisting></para>
+	 </note>
          <para>The script <filename>dev-support/make_rc.sh</filename> automates most of this.  It does all but the close of the
              staging repository up in apache maven, the checking of the produced artifacts to ensure they are 'good' -- e.g.
              undoing the produced tarballs, eyeballing them to make sure they look right then starting and checking all is
@@ -333,6 +347,7 @@
 Undo the generated tarball and check it out.  Look at doc. and see if it runs, etc.  Are the set of modules appropriate: e.g. do we have a hbase-hadoop2-compat in the hadoop1 tarball?
 If good, copy the tarball to the above mentioned <emphasis>version directory</emphasis>.
 </para>
+<note><para>Point Release Only: The following step that creates a new tag can be skipped since you've already created the point release tag</para></note>
 <para>I'll tag the release at this point since its looking good.  If we find an issue later, we can delete the tag and start over.  Release needs to be tagged when we do next step.</para>
 <para>Now deploy hadoop1 hbase to mvn. Do the mvn deploy and tgz for a particular version all together in the one go else if you flip between hadoop1 and hadoop2 builds,
 you might mal-publish poms and hbase-default.xml's (the version interpolations won't match).
@@ -963,14 +978,25 @@
        <section xml:id="maven.build.hadoop">
           <title>Building against various hadoop versions.</title>
           <para>As of 0.96, Apache HBase supports building against Apache Hadoop versions: 1.0.3, 2.0.0-alpha and 3.0.0-SNAPSHOT.
-          By default, we will build with Hadoop-1.0.3. To change the version to run with Hadoop-2.0.0-alpha, you would run:</para>
-         <programlisting>mvn -Dhadoop.profile=2.0 ...</programlisting>
+	  By default, in 0.96 and earlier, we will build with Hadoop-1.0.x. 
+          As of 0.98, Hadoop 1.x is deprecated and Hadoop 2.x is the default.
+          To change the version to build against, add a hadoop.profile property when you invoke <command>mvn</command>:</para>
+         <programlisting>mvn -Dhadoop.profile=1.0 ...</programlisting>
          <para>
-         That is, designate build with hadoop.profile 2.0.  Pass 2.0 for hadoop.profile to build against hadoop 2.0.
-         Tests may not all pass as of this writing so you may need to pass <code>-DskipTests</code> unless you are inclined
-          to fix the failing tests.</para>
+         The above will build against whatever explicit hadoop 1.x version we have in our <filename>pom.xml</filename> as our '1.0' version.
+         Tests may not all pass so you may need to pass <code>-DskipTests</code> unless you are inclined to fix the failing tests.</para>
+<note id="maven.build.passing.default.profile">
+<title>'dependencyManagement.dependencies.dependency.artifactId' for org.apache.hbase:${compat.module}:test-jar with value '${compat.module}' does not match a valid id pattern</title>
+<para>You will see ERRORs like the above title if you pass the <emphasis>default</emphasis> profile; e.g. if
+you pass <property>hadoop.profile=1.1</property> when building 0.96 or
+<property>hadoop.profile=2.0</property> when building hadoop 0.98; just drop the
+hadoop.profile stipulation in this case to get your build to run again.  This seems to be a maven
+pecularity that is probably fixable but we've not spent the time trying to figure it.</para>
+</note>
+
           <para>
-         Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
+         Similarly, for 3.0, you would just replace the profile value. Note that Hadoop-3.0.0-SNAPSHOT does not currently have a
+         deployed maven artificat - you will need to build and install your own in your local maven repository if you want to run against this profile.
          </para>
          <para>
          In earilier verions of Apache HBase, you can build against older versions of Apache Hadoop, notably, Hadoop 0.22.x and 0.23.x.
diff --git a/src/main/docbkx/getting_started.xml b/src/main/docbkx/getting_started.xml
index e3e471d..cd47284 100644
--- a/src/main/docbkx/getting_started.xml
+++ b/src/main/docbkx/getting_started.xml
@@ -41,22 +41,27 @@
 
     <para>This guide describes setup of a standalone HBase instance. It will
         run against the local filesystem.  In later sections we will take you through
-        how to run HBase on HDFS, a distributed filesystem.  This section
-        leads you through creating a table, inserting
-    rows via the HBase <command>shell</command>, and then cleaning
-    up and shutting down your standalone, local filesystem  HBase instance. The below exercise
+        how to run HBase on Apache Hadoop's HDFS, a distributed filesystem.  This section
+        shows you how to create a table in HBase, inserting
+    rows into your new HBase table via the HBase <command>shell</command>, and then cleaning
+    up and shutting down your standalone, local filesystem-based  HBase instance. The below exercise
     should take no more than ten minutes (not including download time).
     </para>
     <note xml:id="local.fs.durability"><title>Local Filesystem and Durability</title>
         <para>Using HBase with a LocalFileSystem does not currently guarantee durability.
+        The HDFS local filesystem implementation will lose edits if files are not properly
+        closed -- which is very likely to happen when experimenting with a new download.
             You need to run HBase on HDFS to ensure all writes are preserved.  Running
             against the local filesystem though will get you off the ground quickly and get you
             familiar with how the general system works so lets run with it for now. See
             <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3696"/> and its associated issues for more details.</para></note>
     <note xml:id="loopback.ip.getting.started">
         <title>Loopback IP</title>
-        <para>The below advice is for hbase-0.94.0 (and older) versions; we believe this fixed in hbase-0.96.0 and beyond (let us know if we have it wrong) -- there should be no need of modification to
-        <filename>/etc/hosts</filename>.</para>
+        <note>
+        <para><emphasis>The below advice is for hbase-0.94.x and older versions only. We believe this fixed in hbase-0.96.0 and beyond
+(let us know if we have it wrong).</emphasis>  There should be no need of the below modification to <filename>/etc/hosts</filename> in
+later versions of HBase.</para>
+       </note>
         <para>HBase expects the loopback IP address to be 127.0.0.1.  Ubuntu and some other distributions,
             for example, will default to 127.0.1.1 and this will cause problems for you
             <footnote><para>See <link xlink:href="http://blog.devving.com/why-does-hbase-care-about-etchosts/">Why does HBase care about /etc/hosts?</link> for detail.</para></footnote>.
diff --git a/src/main/docbkx/ops_mgt.xml b/src/main/docbkx/ops_mgt.xml
index c02c079..6b669f6 100644
--- a/src/main/docbkx/ops_mgt.xml
+++ b/src/main/docbkx/ops_mgt.xml
@@ -34,13 +34,106 @@
   <section xml:id="tools">
     <title >HBase Tools and Utilities</title>
 
-    <para>Here we list HBase tools for administration, analysis, fixup, and
-    debugging.</para>
+    <para>Here we list HBase tools for administration, analysis, fixup, and debugging.</para>
+    <section xml:id="canary"><title>Canary</title>
+<para>There is a Canary class can help users to canary-test the HBase cluster status, with every column-family for every regions or regionservers granularity. To see the usage,
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -help</programlisting>
+Will output
+<programlisting>Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..]
+ where [opts] are:
+   -help          Show this help and exit.
+   -regionserver  replace the table argument to regionserver,
+      which means to enable regionserver mode
+   -daemon        Continuous check at defined intervals.
+   -interval &lt;N>  Interval between checks (sec)
+   -e             Use region/regionserver as regular expression
+      which means the region/regionserver is regular expression pattern
+   -f &lt;B>         stop whole program if first error occurs, default is true
+   -t &lt;N>         timeout for a check, default is 600000 (milisecs)</programlisting>
+This tool will return non zero error codes to user for collaborating with other monitoring tools, such as Nagios.
+The error code definitions are...
+<programlisting>private static final int USAGE_EXIT_CODE = 1;
+private static final int INIT_ERROR_EXIT_CODE = 2;
+private static final int TIMEOUT_ERROR_EXIT_CODE = 3;
+private static final int ERROR_EXIT_CODE = 4;</programlisting>
+Here are some examples based on the following given case. There are two HTable called test-01 and test-02, they have two column family cf1 and cf2 respectively, and deployed on the 3 regionservers. see following table.
+	     <table>
+		 <tgroup cols='3' align='center' colsep='1' rowsep='1'><colspec colname='regionserver' align='center'/><colspec colname='test-01' align='center'/><colspec colname='test-02' align='center'/>
+         <thead>
+         <row><entry>RegionServer</entry><entry>test-01</entry><entry>test-02</entry></row>
+	     </thead><tbody>
+          <row><entry>rs1</entry><entry>r1</entry>          <entry>r2</entry></row>
+          <row><entry>rs2</entry><entry>r2</entry>          <entry></entry></row>
+          <row><entry>rs3</entry><entry>r2</entry>          <entry>r1</entry></row>
+		 </tbody></tgroup></table>
+Following are some examples based on the previous given case.
+</para>
+<section><title>Canary test for every column family (store) of every region of every table</title>
+<para>
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary</programlisting>
+The output log is...
+<programlisting>13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf1 in 2ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,,1386230156732.0e3c7d77ffb6361ea1b996ac1042ca9a. column family cf2 in 2ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,0004883,1386230156732.87b55e03dfeade00f441125159f8ca87. column family cf1 in 4ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-01,0004883,1386230156732.87b55e03dfeade00f441125159f8ca87. column family cf2 in 1ms
+...
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,,1386559511167.aa2951a86289281beee480f107bb36ee. column family cf1 in 5ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,,1386559511167.aa2951a86289281beee480f107bb36ee. column family cf2 in 3ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,0004883,1386559511167.cbda32d5e2e276520712d84eaaa29d84. column family cf1 in 31ms
+13/12/09 03:26:32 INFO tool.Canary: read from region test-02,0004883,1386559511167.cbda32d5e2e276520712d84eaaa29d84. column family cf2 in 8ms
+</programlisting>
+So you can see, table test-01 has two regions and two column families, so the Canary tool will pick 4 small piece of data from 4 (2 region * 2 store) different stores. This is a default behavior of the this tool does.
+</para>
+    </section>
+
+<section><title>Canary test for every column family (store) of every region of specific table(s)</title>
+<para>
+You can also test one or more specific tables.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary test-01 test-02</programlisting>
+</para>
+    </section>
+
+<section><title>Canary test with regionserver granularity</title>
+<para>
+This will pick one small piece of data from each regionserver, and can also put your resionserver name as input options for canary-test specific regionservers.
+<programlisting>$ ${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver</programlisting>
+The output log is...
+<programlisting>13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs2 in 72ms
+13/12/09 06:05:17 INFO tool.Canary: Read from table:test-02 on region server:rs3 in 34ms
+13/12/09 06:05:17 INFO tool.Canary: Read from table:test-01 on region server:rs1 in 56ms</programlisting>
+</para>
+    </section>
+<section><title>Canary test with regular expression pattern</title>
+<para>
+This will test both table test-01 and test-02.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -e test-0[1-2]</programlisting>
+</para>
+    </section>
+
+<section><title>Run canary test as daemon mode</title>
+<para>
+Run repeatedly with interval defined in option -interval whose default value is 6 seconds. This daemon will stop itself and return non-zero error code if any error occurs, due to the default value of option -f is true.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -daemon</programlisting>
+Run repeatedly with internal 5 seconds and will not stop itself even error occurs in the test.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -daemon -interval 50000 -f false</programlisting>
+</para>
+    </section>
+
+<section><title>Force timeout if canary test stuck</title>
+<para>In some cases, we suffered the request stucked on the regionserver and not response back to the client. The regionserver in problem, would also not indicated to be dead by Master, which would bring the clients hung. So we provide the timeout option to kill the canary test forcefully and return non-zero error code as well.
+This run sets the timeout value to 60 seconds, the default value is 600 seconds.
+<programlisting>$ ${HBASE_HOME}/bin/hbase orghapache.hadoop.hbase.tool.Canary -t 600000</programlisting>
+</para>
+    </section>
+
+    </section>
+
     <section xml:id="health.check"><title>Health Checker</title>
         <para>You can configure HBase to run a script on a period and if it fails N times (configurable), have the server exit.
             See <link xlink:ref="">HBASE-7351 Periodic health check script</link> for configurations and detail.
         </para>
     </section>
+
     <section xml:id="driver"><title>Driver</title>
       <para>There is a <code>Driver</code> class that is executed by the HBase jar can be used to invoke frequently accessed utilities.  For example,
 <programlisting>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar
@@ -142,6 +235,10 @@
         <note><title>Scanner Caching</title>
         <para>Caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
         </para>
+	</note>
+	<note><title>Versions</title>
+        <para>By default, CopyTable utility only copies the latest version of row cells unless <code>--versions=n</code> is explicitly specified in the command.
+        </para>
         </note>
         <para>
         See Jonathan Hsieh's <link xlink:href="http://www.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/">Online HBase Backups with CopyTable</link> blog post for more on <command>CopyTable</command>.
@@ -162,6 +259,10 @@
 <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.Import &lt;tablename&gt; &lt;inputdir&gt;
 </programlisting>
        </para>
+       <para>To import 0.94 exported files in a 0.96 cluster or onwards, you need to set system property "hbase.import.version" when running the import command as below:
+<programlisting>$ bin/hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import &lt;tablename&gt; &lt;inputdir&gt;
+</programlisting>
+       </para>
     </section>
     <section xml:id="importtsv">
        <title>ImportTsv</title>
@@ -920,37 +1021,73 @@
     </section>
   </section>  <!--  snapshots -->
 
-  <section xml:id="ops.capacity"><title>Capacity Planning</title>
-    <section xml:id="ops.capacity.storage"><title>Storage</title>
-      <para>A common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
-      There are several apsects to consider, the most important of which is what data load into the cluster.  Start
-      with a solid understanding of how HBase handles data internally (KeyValue).
-      </para>
-      <section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
-        <para>HBase storage will be dominated by KeyValues.  See <xref linkend="keyvalue" /> and <xref linkend="keysize" /> for
-        how HBase stores data internally.
-        </para>
-        <para>It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
-        rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
-        factor.
-        </para>
-      </section>
-      <section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
-        <para>KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
-        Blocks are aggregated into StoreFile's.  See <xref linkend="regions.arch" />.
-        </para>
-      </section>
-      <section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
-        <para>Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
-        </para>
-      </section>
-    </section>
-    <section xml:id="ops.capacity.regions"><title>Regions</title>
-      <para>Another common question for HBase administrators is determining the right number of regions per
-      RegionServer.  This affects both storage and hardware planning. See <xref linkend="perf.number.of.regions" />.
-      </para>
-    </section>
-  </section>
+  <section xml:id="ops.capacity"><title>Capacity Planning and Region Sizing</title>
+    <para>There are several considerations when planning the capacity for an HBase cluster and performing the initial configuration. Start with a solid understanding of how HBase handles data internally.</para>
+    <section xml:id="ops.capacity.nodes"><title>Node count and hardware/VM configuration</title>
+      <section xml:id="ops.capacity.nodes.datasize"><title>Physical data size</title>
+<para>Physical data size on disk is distinct from logical size of your data and is affected by the following:
+<itemizedlist>
+<listitem>Increased by HBase overhead
+<itemizedlist>
+<listitem>See <xref linkend="keyvalue" /> and <xref linkend="keysize" />. At least 24 bytes per key-value (cell), can be more. Small keys/values means more relative overhead.</listitem>
+<listitem>KeyValue instances are aggregated into blocks, which are indexed. Indexes also have to be stored. Blocksize is configurable on a per-ColumnFamily basis. See <xref linkend="regions.arch" />.</listitem>
+</itemizedlist></listitem>
+<listitem>Decreased by <xref linkend="compression" xrefstyle="template:compression" /> and data block encoding, depending on data. See also <ulink url="http://search-hadoop.com/m/lL12B1PFVhp1">this thread</ulink>. You might want to test what compression and encoding (if any) make sense for your data.</listitem>
+<listitem>Increased by size of region server <xref linkend="wal" xrefstyle="template:WAL" /> (usually fixed and negligible - less than half of RS memory size, per RS).</listitem>
+<listitem>Increased by HDFS replication - usually x3.</listitem>
+</itemizedlist></para>
+<para>Aside from the disk space necessary to store the data, one RS may not be able to serve arbitrarily large amounts of data due to some practical limits on region count and size (see <xref linkend="ops.capacity.regions" xrefstyle="template:below" />).</para>
+      </section> <!-- ops.capacity.nodes.datasize -->
+      <section xml:id="ops.capacity.nodes.throughput"><title>Read/Write throughput</title>
+<para>Number of nodes can also be driven by required thoughput for reads and/or writes. The  throughput one can get per node depends a lot on data (esp. key/value sizes) and request patterns, as well as node and system configuration. Planning should be done for peak load if it is likely that the load would be the main driver of the increase of the node count. PerformanceEvaluation and <xref linkend="ycsb" xrefstyle="template:YCSB" /> tools can be used to test single node or a test cluster.</para>
+<para>For write, usually 5-15Mb/s per RS can be expected, since every region server has only one active WAL. There's no good estimate for reads, as it depends vastly on data, requests, and cache hit rate. <xref linkend="perf.casestudy" /> might be helpful.</para>
+      </section> <!-- ops.capacity.nodes.throughput -->
+      <section xml:id="ops.capacity.nodes.gc"><title>JVM GC limitations</title>
+<para>RS cannot currently utilize very large heap due to cost of GC. There's also no good way of running multiple RS-es per server (other than running several VMs per machine). Thus, ~20-24Gb or less memory dedicated to one RS is recommended. GC tuning is required for large heap sizes. See <xref linkend="gcpause" />, <xref linkend="trouble.log.gc" /> and elsewhere (TODO: where?)</para>
+      </section> <!-- ops.capacity.nodes.gc -->
+    </section> <!-- ops.capacity.nodes -->
+    <section xml:id="ops.capacity.regions"><title>Determining region count and size</title>
+<para>Generally less regions makes for a smoother running cluster (you can always manually split the big regions later (if necessary) to spread the data, or request load, over the cluster); 20-200 regions per RS is a reasonable range. The number of regions cannot be configured directly (unless you go for fully <xref linkend="disable.splitting" xrefstyle="template:manual splitting" />); adjust the region size to achieve the target region size given table size.</para>
+<para>When configuring regions for multiple tables, note that most region settings can be set on a per-table basis via <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>, as well as shell commands. These settings will override the ones in <varname>hbase-site.xml</varname>. That is useful if your tables have different workloads/use cases.</para>
+<para>Also note that in the discussion of region sizes here, <emphasis role="bold">HDFS replication factor is not (and should not be) taken into account, whereas other factors <xref linkend="ops.capacity.nodes.datasize" xrefstyle="template:above" /> should be.</emphasis> So, if your data is compressed and replicated 3 ways by HDFS, "9 Gb region" means 9 Gb of compressed data. HDFS replication factor only affects your disk usage and is invisible to most HBase code.</para>
+      <section xml:id="ops.capacity.regions.count"><title>Number of regions per RS - upper bound</title>
+<para>In production scenarios, where you have a lot of data, you are normally concerned with the maximum number of regions you can have per server. <xref linkend="too_many_regions" /> has technical discussion on the subject; in short, maximum number of regions is mostly determined by memstore memory usage. Each region has its own memstores; these grow up to a configurable size; usually in 128-256Mb range, see <xref linkend="hbase.hregion.memstore.flush.size" />. There's one memstore per column family (so there's only one per region if there's one CF in the table). RS dedicates some fraction of total memory (see <xref linkend="hbase.regionserver.global.memstore.size" />) to region memstores. If this memory is exceeded (too much memstore usage), undesirable consequences such as unresponsive server, or later compaction storms, can result. Thus, a good starting point for the number of regions per RS (assuming one table) is <programlisting>(RS memory)*(total memstore fraction)/((memstore size)*(# column families))</programlisting>
+E.g. if RS has 16Gb RAM, with default settings, it is 16384*0.4/128 ~ 51 regions per RS is a starting point. The formula can be extended to multiple tables; if they all have the same configuration, just use total number of families.</para>
+<para>This number can be adjusted; the formula above assumes all your regions are filled at approximately the same rate. If only a fraction of your regions are going to be actively written to, you can divide the result by that fraction to get a larger region count. Then, even if all regions are written to, all region memstores are not filled evenly, and eventually jitter appears even if they are (due to limited number of concurrent flushes). Thus, one can have as many as 2-3 times more regions than the starting point; however, increased numbers carry increased risk.</para>
+<para>For write-heavy workload, memstore fraction can be increased in configuration at the expense of block cache; this will also allow one to have more regions.</para>
+      </section> <!-- ops.capacity.regions.count -->
+      <section xml:id="ops.capacity.regions.mincount"><title>Number of regions per RS - lower bound</title>
+<para>HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly the entire        cluster will be idle. This really can't be stressed enough, since a common problem is loading 200MB data into HBase and then wondering why your awesome 10 node cluster isn't doing anything.</para>
+<para>On the other hand, if you have a very large amount of data, you may also want to go for a larger number of regions to avoid having regions that are too large.</para>
+      </section> <!-- ops.capacity.regions.mincount -->
+      <section xml:id="ops.capacity.regions.size"><title>Maximum region size</title>
+<para>For large tables in production scenarios, maximum region size is mostly limited by compactions - very large compactions, esp. major, can degrade cluster performance. Currently, the recommended maximum region size is 10-20Gb, and 5-10Gb is optimal. For older 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with a default of 256Mb.</para>
+<para>The size at which the region is split into two is generally configured via <xref linkend="hbase.hregion.max.filesize" />; for details, see <xref linkend="arch.region.splits" />.</para>
+<para>If you cannot estimate the size of your tables well, when starting off, it's probably best to stick to the default region size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster), or go with larger region sizes if your cell sizes tend to be largish (100k and up).</para>
+<para>In HBase 0.98, experimental stripe compactions feature was added that would allow for larger regions, especially for log data. See <xref linkend="ops.stripe" />.</para>
+      </section> <!-- ops.capacity.regions.size -->
+      <section xml:id="ops.capacity.regions.total"><title>Total data size per region server</title>
+<para>According to above numbers for region size and number of regions per region server, in an optimistic estimate 10 GB x 100 regions per RS will give up to 1TB served per region server, which is in line with some of the reported multi-PB use cases. However, it is important to think about the data vs cache size ratio at the RS level. With 1TB of data per server and 10 GB block cache, only 1% of the data will be cached, which may barely cover all block indices.</para>
+      </section> <!-- ops.capacity.regions.total -->
+    </section> <!-- ops.capacity.regions -->
+    <section xml:id="ops.capacity.config"><title>Initial configuration and tuning</title>
+<para>First, see <xref linkend="important_configurations" />. Note that some configurations, more than others, depend on specific scenarios. Pay special attention to 
+<itemizedlist>
+<listitem><xref linkend="hbase.regionserver.handler.count" /> - request handler thread count, vital for high-throughput workloads.</listitem>
+<listitem><xref linkend="config.wals" /> - the blocking number of WAL files depends on your memstore configuration and should be set accordingly to prevent potential blocking when doing high volume of writes.</listitem>
+</itemizedlist></para>
+<para>Then, there are some considerations when setting up your cluster and tables.</para>
+      <section xml:id="ops.capacity.config.compactions"><title>Compactions</title>
+<para>Depending on read/write volume and latency requirements, optimal compaction settings may be different. See <xref linkend="compaction" /> for some details.</para>
+<para>When provisioning for large data sizes, however, it's good to keep in mind that compactions can affect write throughput. Thus, for write-intensive workloads, you may opt for less frequent compactions and more store files per regions. Minimum number of files for compactions (<varname>hbase.hstore.compaction.min</varname>) can be set to higher value; <xref linkend="hbase.hstore.blockingStoreFiles" /> should also be increased, as more files might accumulate in such case. You may also consider manually managing compactions: <xref linkend="managed.compactions" /></para>
+      </section> <!-- ops.capacity.config.compactions -->
+      <section xml:id="ops.capacity.config.presplit"><title>Pre-splitting the table</title>
+<para>Based on the target number of the regions per RS (see <xref linkend="ops.capacity.regions.count" xrefstyle="template:above" />) and number of RSes, one can pre-split the table at creation time. This would both avoid some costly splitting as the table starts to fill up, and ensure that the table starts out already distributed across many servers.</para>
+<para>If the table is expected to grow large enough to justify that, at least one region per RS should be created. It is not recommended to split immediately into the full target number of regions (e.g. 50 * number of RSes), but a low intermediate value can be chosen. For multiple tables, it is recommended to be conservative with presplitting (e.g. pre-split 1 region per RS at most), especially if you don't know how much each table will grow. If you split too much, you may end up with too many regions, with some tables having too many small regions.</para>
+<para>For pre-splitting howto, see <xref linkend="precreate.regions" />.</para>
+      </section> <!-- ops.capacity.config.presplit -->
+    </section> <!-- ops.capacity.config -->
+  </section> <!-- ops.capacity -->
   <section xml:id="table.rename"><title>Table Rename</title>
       <para>In versions 0.90.x of hbase and earlier, we had a simple script that would rename the hdfs
           table directory and then do an edit of the .META. table replacing all mentions of the old
diff --git a/src/main/docbkx/performance.xml b/src/main/docbkx/performance.xml
index 50de8aa1..a282aca 100644
--- a/src/main/docbkx/performance.xml
+++ b/src/main/docbkx/performance.xml
@@ -156,15 +156,6 @@
 
     <para>See <xref linkend="recommended_configurations" />.</para>
 
-
-    <section xml:id="perf.number.of.regions">
-      <title>Number of Regions</title>
-
-      <para>The number of regions for an HBase table is driven by the <xref
-              linkend="bigger.regions" />. Also, see the architecture
-          section on <xref linkend="arch.regions.size" /></para>
-    </section>
-
     <section xml:id="perf.compactions.and.splits">
       <title>Managing Compactions</title>
 
@@ -184,15 +175,15 @@
         A memory setting for the RegionServer process.
         </para>
     </section>
-    <section xml:id="perf.rs.memstore.upperlimit">
-        <title><varname>hbase.regionserver.global.memstore.upperLimit</varname></title>
-        <para>See <xref linkend="hbase.regionserver.global.memstore.upperLimit"/>.
+    <section xml:id="perf.rs.memstore.size">
+        <title><varname>hbase.regionserver.global.memstore.size</varname></title>
+        <para>See <xref linkend="hbase.regionserver.global.memstore.size"/>.
         This memory setting is often adjusted for the RegionServer process depending on needs.
         </para>
     </section>
-    <section xml:id="perf.rs.memstore.lowerlimit">
-        <title><varname>hbase.regionserver.global.memstore.lowerLimit</varname></title>
-        <para>See <xref linkend="hbase.regionserver.global.memstore.lowerLimit"/>.
+    <section xml:id="perf.rs.memstore.size.lower.limit">
+        <title><varname>hbase.regionserver.global.memstore.size.lower.limit</varname></title>
+        <para>See <xref linkend="hbase.regionserver.global.memstore.size.lower.limit"/>.
         This memory setting is often adjusted for the RegionServer process depending on needs.
         </para>
     </section>
@@ -248,7 +239,7 @@
     <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link> in the
     event where certain tables require different regionsizes than the configured default regionsize.
     </para>
-    <para>See <xref linkend="perf.number.of.regions"/> for more information.
+    <para>See <xref linkend="ops.capacity.regions"/> for more information.
     </para>
     </section>
     <section xml:id="schema.bloom">
@@ -259,7 +250,7 @@
         <varname>NONE</varname> for no bloom filters. If
         <varname>ROW</varname>, the hash of the row will be added to the bloom
         on each insert. If <varname>ROWCOL</varname>, the hash of the row +
-        column family + column family qualifier will be added to the bloom on
+        column family name + column family qualifier will be added to the bloom on
         each key insert.</para>
     <para>See <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html">HColumnDescriptor</link> and
     <xref linkend="blooms"/> for more information or this answer up in quora,
@@ -442,7 +433,9 @@
 
   <section xml:id="perf.reading">
     <title>Reading from HBase</title>
-
+    <para>The mailing list can help if you are having performance issues.
+    For example, here is a good general thread on what to look at addressing
+    read-time issues: <link xlink:href="http://search-hadoop.com/m/qOo2yyHtCC1">HBase Random Read latency > 100ms</link></para>
     <section xml:id="perf.hbase.client.caching">
       <title>Scan Caching</title>
 
@@ -552,7 +545,7 @@
      <section xml:id="blooms">
      <title>Bloom Filters</title>
          <para>Enabling Bloom Filters can save your having to go to disk and
-         can help improve read latencys.</para>
+         can help improve read latencies.</para>
          <para><link xlink:href="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</link> were developed over in <link
     xlink:href="https://issues.apache.org/jira/browse/HBASE-1200">HBase-1200
     Add bloomfilters</link>.<footnote>
diff --git a/src/main/docbkx/preface.xml b/src/main/docbkx/preface.xml
index 5308037..7d05abe 100644
--- a/src/main/docbkx/preface.xml
+++ b/src/main/docbkx/preface.xml
@@ -48,7 +48,7 @@
   xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>.</para>
 
   <note xml:id="headsup">
-      <title>Heads-up</title>
+      <title>Heads-up if this is your first foray into the world of distributed computing...</title>
       <para>
           If this is your first foray into the wonderful world of
           Distributed Computing, then you are in for
@@ -65,6 +65,7 @@
           computing has been bound to a single box.  Here is one good
           starting point:
           <link xlink:href="http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing">Fallacies of Distributed Computing</link>.
+          That said, you are welcome.  Its a fun place to be.  Yours, the HBase Community.
       </para>
   </note>
 </preface>
diff --git a/src/main/docbkx/schema_design.xml b/src/main/docbkx/schema_design.xml
index 765a8f7..3c41ab0 100644
--- a/src/main/docbkx/schema_design.xml
+++ b/src/main/docbkx/schema_design.xml
@@ -753,7 +753,7 @@
 	  </para>
 	  <section xml:id="schema.smackdown.rowsversions"><title>Rows vs. Versions</title>
 	    <para>A common question is whether one should prefer rows or HBase's built-in-versioning.  The context is typically where there are
-	    "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 3 max versions).  The
+	    "a lot" of versions of a row to be retained (e.g., where it is significantly above the HBase default of 1 max versions).  The
 	    rows-approach would require storing a timstamp in some portion of the rowkey so that they would not overwite with each successive update.
 	    </para>
 	    <para>Preference:  Rows (generally speaking).
diff --git a/src/main/docbkx/security.xml b/src/main/docbkx/security.xml
index f54f227..633bdcd 100644
--- a/src/main/docbkx/security.xml
+++ b/src/main/docbkx/security.xml
@@ -241,6 +241,66 @@
     </para>
    </section>
 
+    <section><title>REST Gateway Impersonation Configuration</title>
+    <para>
+        By default, the REST gateway doesn't support impersonation. It accesses
+        the HBase on behalf of clients as the user configured as in the previous
+        section. To the HBase server, all requests are from the REST gateway user.
+        The actual users are unknown. You can turn on the impersonation support.
+        With impersonation, the REST gateway user is a proxy user. The HBase server
+        knows the acutal/real user of each request. So it can apply proper
+        authorizations.
+    </para>
+    <para>
+        To turn on REST gateway impersonation, we need to configure HBase servers
+        (masters and region servers) to allow proxy users; configure REST gateway
+        to enable impersonation.
+    </para>
+    <para>
+        To allow proxy users, add the following to the <code>hbase-site.xml</code>
+        file for every HBase server:
+    <programlisting><![CDATA[
+   <property>
+      <name>hadoop.security.authorization</name>
+      <value>true</value>
+   </property>
+   <property>
+      <name>hadoop.proxyuser.$USER.groups</name>
+      <value>$GROUPS</value>
+   </property>
+   <property>
+      <name>hadoop.proxyuser.$USER.hosts</name>
+      <value>$GROUPS</value>
+   </property>
+    ]]></programlisting>
+    </para>
+    <para>
+        Substitute the REST gateway proxy user for $USER, and the allowed
+        group list for $GROUPS.
+    </para>
+    <para>
+        To enable REST gateway impersonation, add the following to the
+        <code>hbase-site.xml</code> file for every REST gateway.
+    <programlisting><![CDATA[
+   <property>
+      <name>hbase.rest.authentication.type</name>
+      <value>kerberos</value>
+   </property>
+   <property>
+      <name>hbase.rest.authentication.kerberos.principal</name>
+      <value>HTTP/_HOST@HADOOP.LOCALDOMAIN</value>
+   </property>
+   <property>
+      <name>hbase.rest.authentication.kerberos.keytab</name>
+      <value>$KEYTAB</value>
+   </property>
+    ]]></programlisting>
+    </para>
+    <para>
+        Substitute the keytab for HTTP for $KEYTAB.
+    </para>
+   </section>
+
 </section>  <!-- Secure Client Access to HBase -->
 
 <section xml:id="hbase.secure.simpleconfiguration">
@@ -378,7 +438,68 @@
 
 </section>  <!-- Simple User Access to Apache HBase -->
 
-
+<section xml:id="hbase.tags">
+<title>Tags</title>
+<para>
+	Every cell can have metadata associated with it.  Adding metadata in the data part of every cell would make things difficult.
+</para>
+<para>
+	The 0.98 version of HBase solves this problem by providing Tags along with the cell format. 
+	Some of the usecases that uses the tags are Visibility labels, Cell level ACLs, etc.
+</para>
+<para>
+	HFile V3 version from 0.98 onwards supports tags and this feature can be turned on using the following configuration
+</para>
+<programlisting><![CDATA[
+      <property>
+	    <name>hfile.format.version</name>
+        <value>3</value>
+      </property>
+    ]]></programlisting>
+<para>
+	Every cell can have zero or more tags. Every tag has a type and the actual tag byte array.
+	The types <command>0-31</command> are reserved for System tags.  For example ‘1’ is reserved for ACL and ‘2’ is reserved for Visibility tags.
+</para>
+<para>
+	The way rowkeys, column families, qualifiers and values are encoded using different Encoding Algos, similarly the tags can also be encoded.  
+	Tag encoding can be turned on per CF.  Default is always turn ON.
+	To turn on the tag encoding on the HFiles use
+</para>
+<programlisting><![CDATA[
+    HColumnDescriptor#setCompressTags(boolean compressTags)
+    ]]></programlisting>
+<para>
+	Note that encoding of tags takes place only if the DataBlockEncoder is enabled for the CF.
+</para>
+<para>
+	As we compress the WAL entries using Dictionary the tags present in the WAL can also be compressed using Dictionary.  
+	Every tag is compressed individually using WAL Dictionary.  To turn ON tag compression in WAL dictionary enable the property
+</para>
+<programlisting><![CDATA[
+    <property>
+    	<name>hbase.regionserver.wal.tags.enablecompression</name>
+    	<value>true</value>
+	</property>
+    ]]></programlisting>
+<para>
+	To add tags to every cell during Puts, the following apis are provided
+</para>
+<programlisting><![CDATA[
+	Put#add(byte[] family, byte [] qualifier, byte [] value, Tag[] tag)
+	Put#add(byte[] family, byte[] qualifier, long ts, byte[] value, Tag[] tag)
+    ]]></programlisting>
+<para>
+	Some of the feature developed using tags are Cell level ACLs and Visibility labels.  
+	These are some features that use tags framework and allows users to gain better security features on cell level.
+</para>
+<para>
+	For details checkout 
+</para>
+<para>
+    <link linkend='hbase.accesscontrol.configuration'>Access Control</link>
+    <link linkend='hbase.visibility.labels'>Visibility labels</link>
+</para>
+</section>
 <section xml:id="hbase.accesscontrol.configuration">
     <title>Access Control</title>
     <para>
@@ -423,7 +544,7 @@
     </para>
     <orderedlist>
       <listitem>
-        <para>Row-level or per value (cell): This would require broader changes for storing the ACLs inline with rows. It is a future goal.</para>
+        <para>Row-level or per value (cell): Using Tags in HFile V3</para>
       </listitem>
       <listitem>
         <para>Push down of file ownership to HDFS: HBase is not designed for the case where files may have different permissions than the HBase system principal. Pushing file ownership down into HDFS would necessitate changes to core code. Also, while HDFS file ownership would make applying quotas easy, and possibly make bulk imports more straightforward, it is not clear that it would offer a more secure setup.</para>
@@ -609,6 +730,47 @@
     ]]></programlisting>
     </section>
 
+    <section>
+    <title>Cell level Access Control using Tags</title>
+    <para>
+    	Prior to HBase 0.98 access control was restricted to table and column family level.  Thanks to tags feature in 0.98 that allows Access control on a cell level.
+		The existing Access Controller coprocessor helps in achieving cell level access control also.
+		For details on configuring it refer to <link linkend='hbase.accesscontrol.configuration'>Access Control</link> section.
+    </para>
+    <para>
+    	The ACLs can be specified for every mutation using the APIs
+    </para>
+    <programlisting><![CDATA[
+    	Mutation.setACL(String user, Permission perms)
+	  	Mutation.setACL(Map<String, Permission> perms)
+    ]]></programlisting>
+    <para>
+    	For example, to provide read permission to an user ‘user1’ then
+    </para>
+    <programlisting><![CDATA[
+    	put.setACL(“user1”, new Permission(Permission.Action.READ))
+    ]]></programlisting>
+    <para>
+    	Generally the ACL applied on the table and CF takes precedence over Cell level ACL.  In order to make the cell level ACL to take precedence use the following API,
+    </para>
+    <programlisting><![CDATA[
+    	Mutation.setACLStrategy(boolean cellFirstStrategy)
+    ]]></programlisting>
+    <para>
+    	Please note that inorder to use this feature, HFile V3 version should be turned on.
+    </para>
+    <programlisting><![CDATA[
+   		<property>
+			<name>hfile.format.version</name>
+			<value>3</value>
+		</property>
+     ]]></programlisting>
+    <para>
+    	Note that deletes with ACLs do not have any effect.
+		To keep things simple the ACLs applied on the current Put does not change the ACL of any previous Put in the sense
+		that the ACL on the current put does not affect older versions of Put for the same row.
+    </para>
+    </section>
     <section><title>Shell Enhancements for Access Control</title>
     <para>
 The HBase shell has been extended to provide simple commands for editing and updating user permissions. The following commands have been added for access control list management:
@@ -616,10 +778,13 @@
     Grant
     <para>
     <programlisting>
-    grant &lt;user&gt; &lt;permissions&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
+    grant &lt;user|@group&gt; &lt;permissions&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
     </programlisting>
     </para>
     <para>
+    <code class="code">&lt;user|@group&gt;</code> is user or group  (start with character '@'), Groups are created and manipulated via the Hadoop group mapping service.
+    </para>
+    <para>
     <code>&lt;permissions&gt;</code> is zero or more letters from the set "RWCA": READ('R'), WRITE('W'), CREATE('C'), ADMIN('A').
     </para>
     <para>
@@ -630,7 +795,7 @@
     </para>
     <para>
     <programlisting>
-    revoke &lt;user&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
+    revoke &lt;user|@group&gt; [ &lt;table&gt; [ &lt;column family&gt; [ &lt;column qualifier&gt; ] ] ]
     </programlisting>
     </para>
     <para>
@@ -639,7 +804,7 @@
     <para>
     The <code>alter</code> command has been extended to allow ownership assignment:
     <programlisting>
-      alter 'tablename', {OWNER => 'username'}
+      alter 'tablename', {OWNER => 'username|@group'}
     </programlisting>
     </para>
     <para>
diff --git a/src/main/docbkx/tracing.xml b/src/main/docbkx/tracing.xml
new file mode 100644
index 0000000..6614096
--- /dev/null
+++ b/src/main/docbkx/tracing.xml
@@ -0,0 +1,205 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<appendix xml:id="tracing"
+      version="5.0" xmlns="http://docbook.org/ns/docbook"
+      xmlns:xlink="http://www.w3.org/1999/xlink"
+      xmlns:xi="http://www.w3.org/2001/XInclude"
+      xmlns:svg="http://www.w3.org/2000/svg"
+      xmlns:m="http://www.w3.org/1998/Math/MathML"
+      xmlns:html="http://www.w3.org/1999/xhtml"
+      xmlns:db="http://docbook.org/ns/docbook">
+  <!--/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+-->
+  <title>Enabling Dapper-like Tracing in HBase</title>
+
+  <para>
+    <link xlink:href="https://issues.apache.org/jira/browse/HBASE-6449">HBASE-6449</link>
+    added support for tracing requests through HBase, using the open source tracing library,
+    <link xlink:href="http://github.com/cloudera/htrace">HTrace</link>.
+    Setting up tracing is quite simple,
+    however it currently requires some very minor changes to your client code
+    (it would not be very difficult to remove this requirement).
+  </para>
+
+  <section xml:id="tracing.spanreceivers">
+    <title>SpanReceivers</title>
+    <para>
+      The tracing system works by collecting information in structs called 'Spans'.
+      It is up to you to choose how you want to receive this information
+      by implementing the <classname>SpanReceiver</classname> interface,
+      which defines one method:
+<programlisting><![CDATA[
+  public void receiveSpan(Span span);
+]]></programlisting>
+      This method serves as a callback whenever a span is completed.
+      HTrace allows you to use as many SpanReceivers as you want
+      so you can easily send trace information to multiple destinations.
+    </para>
+
+    <para>
+      Configure what SpanReceivers you'd like to us
+      by putting a comma separated list of the
+      fully-qualified class name of classes implementing
+      <classname>SpanReceiver</classname> in <filename>hbase-site.xml</filename>
+      property: <varname>hbase.trace.spanreceiver.classes</varname>.
+    </para>
+
+    <para>
+      HTrace includes a <classname>LocalFileSpanReceiver</classname>
+      that writes all span information to local files in a JSON-based format.
+      The <classname>LocalFileSpanReceiver</classname>
+      looks in <filename>hbase-site.xml</filename>
+      for a <varname>hbase.local-file-span-receiver.path</varname>
+      property with a value describing the name of the file
+      to which nodes should write their span information.
+<programlisting><![CDATA[
+  <property>
+    <name>hbase.trace.spanreceiver.classes</name>
+    <value>org.cloudera.htrace.impl.LocalFileSpanReceiver</value>
+  </property>
+  <property>
+    <name>hbase.local-file-span-receiver.path</name>
+    <value>/var/log/hbase/htrace.out</value>
+  </property>
+]]></programlisting>
+    </para>
+
+    <para>
+      HTrace also includes a <classname>ZipkinSpanReceiver</classname>
+      that converts all span information to
+      <link xlink:href="http://github.com/twitter/zipkin">Zipkin</link>
+      span format and send them to Zipkin server.
+      You need to install htrace-zipkin jar and add it to your HBase classpath
+      in order to use this receiver.
+      The <classname>ZipkinSpanReceiver</classname>
+      looks in <filename>hbase-site.xml</filename>
+      for a <varname>hbase.zipkin.collector-hostname</varname>
+      and <varname>hbase.zipkin.collector-port</varname>
+      property with a value describing the Zipkin server
+      to which span information are sent.
+<programlisting><![CDATA[
+  <property>
+    <name>hbase.trace.spanreceiver.classes</name>
+    <value>org.cloudera.htrace.impl.ZipkinSpanReceiver</value>
+  </property> 
+  <property>
+    <name>hbase.zipkin.collector-hostname</name>
+    <value>localhost</value>
+  </property> 
+  <property>
+    <name>hbase.zipkin.collector-port</name>
+    <value>9410</value>
+  </property> 
+]]></programlisting>
+    </para>
+
+    <para>
+      If you do not want to use the included span receivers,
+      you are encouraged to write your own receiver
+      (take a look at <classname>LocalFileSpanReceiver</classname> for an example).
+      If you think others would benefit from your receiver,
+      file a JIRA or send a pull request to
+      <link xlink:href="http://github.com/cloudera/htrace">HTrace</link>.
+    </para>
+  </section>
+
+  <section xml:id="tracing.client.modifications">
+    <title>Client Modifications</title>
+    <para>
+      In order to turn on tracing in your client code,
+      you must initialize the module sending spans to receiver
+      once per client process.
+      (Because <classname>SpanReceiverHost</classname> is included in hbase-server jar,
+      you need it on the client classpath in order to run this example.)
+<programlisting><![CDATA[
+  private SpanReceiverHost spanReceiverHost;
+  
+  ...
+  
+    Configuration conf = HBaseConfiguration.create();
+    SpanReceiverHost spanReceiverHost = SpanReceiverHost.getInstance(conf);
+]]></programlisting>
+      Then you simply start tracing span before requests you think are interesting,
+      and close it when the request is done.
+      For example, if you wanted to trace all of your get operations,
+      you change this:
+<programlisting><![CDATA[
+  HTable table = new HTable(conf, "t1");
+  Get get = new Get(Bytes.toBytes("r1"));
+  Result res = table.get(get);
+]]></programlisting>
+      into:
+<programlisting><![CDATA[
+  TraceScope ts = Trace.startSpan("Gets", Sampler.ALWAYS);
+  try {
+    HTable table = new HTable(conf, "t1");
+    Get get = new Get(Bytes.toBytes("r1"));
+    Result res = table.get(get);
+  } finally {
+    ts.close();
+  }
+]]></programlisting>
+      If you wanted to trace half of your 'get' operations, you would pass in:
+<programlisting><![CDATA[
+  new ProbabilitySampler(0.5)
+]]></programlisting>
+      in lieu of <varname>Sampler.ALWAYS</varname>
+      to <classname>Trace.startSpan()</classname>.
+      See the HTrace <filename>README</filename> for more information on Samplers.
+    </para>
+  </section>
+
+  <section xml:id="tracing.client.shell">
+    <title>Tracing from HBase Shell</title>
+    <para>
+      You can use <command>trace</command> command
+      for tracing requests from HBase Shell.
+      <command>trace 'start'</command> command turns on tracing and 
+      <command>trace 'stop'</command> command turns off tracing.
+<programlisting><![CDATA[
+  hbase(main):001:0> trace 'start'
+  hbase(main):002:0> put 'test', 'row1', 'f:', 'val1'   # traced commands
+  hbase(main):003:0> trace 'stop'
+]]></programlisting>
+    </para>
+    <para>
+      <command>trace 'start'</command> and 
+      <command>trace 'stop'</command> always
+      returns boolean value representing 
+      if or not there is ongoing tracing.
+      As a result, <command>trace 'stop'</command>
+      returns false on suceess.
+      <command>trace 'status'</command>
+      just returns if or not tracing is turned on.
+<programlisting><![CDATA[
+  hbase(main):001:0> trace 'start'
+  => true
+  
+  hbase(main):002:0> trace 'status'
+  => true
+  
+  hbase(main):003:0> trace 'stop'
+  => false
+  
+  hbase(main):004:0> trace 'status'
+  => false
+]]></programlisting>
+    </para>
+  </section>
+
+</appendix>
diff --git a/src/main/docbkx/troubleshooting.xml b/src/main/docbkx/troubleshooting.xml
index e722df4..7a9caa4 100644
--- a/src/main/docbkx/troubleshooting.xml
+++ b/src/main/docbkx/troubleshooting.xml
@@ -257,7 +257,8 @@
            <title>Builtin Tools</title>
             <section xml:id="trouble.tools.builtin.webmaster">
               <title>Master Web Interface</title>
-              <para>The Master starts a web-interface on port 60010 by default.
+              <para>The Master starts a web-interface on port 16010 by default.
+	      (Up to and including 0.98 this was port 60010)
               </para>
               <para>The Master web UI lists created tables and their definition (e.g., ColumnFamilies, blocksize, etc.).  Additionally,
               the available RegionServers in the cluster are listed along with selected high-level metrics (requests, number of regions, usedHeap, maxHeap).
@@ -266,7 +267,8 @@
             </section>
             <section xml:id="trouble.tools.builtin.webregion">
               <title>RegionServer Web Interface</title>
-              <para>RegionServers starts a web-interface on port 60030 by default.
+              <para>RegionServers starts a web-interface on port 16030 by default.
+              (Up to an including 0.98 this was port 60030)
               </para>
               <para>The RegionServer web UI lists online regions and their start/end keys, as well as point-in-time RegionServer metrics (requests, regions, storeFileIndexSize, compactionQueueSize, etc.).
               </para>
@@ -1154,4 +1156,29 @@
       </para>
     </section>
 
+    <section xml:id="trouble.crypto">
+      <title>Cryptographic Features</title>
+        <section xml:id="trouble.crypto.HBASE-10132">
+           <title>sun.security.pkcs11.wrapper.PKCS11Exception: CKR_ARGUMENTS_BAD</title>
+<para>This problem manifests as exceptions ultimately caused by:</para>
+<programlisting>
+Caused by: sun.security.pkcs11.wrapper.PKCS11Exception: CKR_ARGUMENTS_BAD
+	at sun.security.pkcs11.wrapper.PKCS11.C_DecryptUpdate(Native Method)
+	at sun.security.pkcs11.P11Cipher.implDoFinal(P11Cipher.java:795)
+</programlisting>
+<para>
+This problem appears to affect some versions of OpenJDK 7 shipped by some Linux vendors. NSS is configured as the default provider. If the host has an x86_64 architecture, depending on if the vendor packages contain the defect, the NSS provider will not function correctly.
+</para>
+<para>
+To work around this problem, find the JRE home directory and edit the file <filename>lib/security/java.security</filename>. Edit the file to comment out the line:
+</para>
+<programlisting>
+security.provider.1=sun.security.pkcs11.SunPKCS11 ${java.home}/lib/security/nss.cfg
+</programlisting>
+<para>
+Then renumber the remaining providers accordingly.
+</para>
+        </section>
+    </section>
+
   </chapter>
diff --git a/src/main/docbkx/upgrading.xml b/src/main/docbkx/upgrading.xml
index 869a44d..ff603f9 100644
--- a/src/main/docbkx/upgrading.xml
+++ b/src/main/docbkx/upgrading.xml
@@ -30,6 +30,10 @@
     <para>You cannot skip major verisons upgrading.  If you are upgrading from
     version 0.90.x to 0.94.x, you must first go from 0.90.x to 0.92.x and then go
     from 0.92.x to 0.94.x.</para>
+    <note><para>It may be possible to skip across versions -- for example go from
+    0.92.2 straight to 0.98.0 just following the 0.96.x upgrade instructions --
+    but we have not tried it so cannot say whether it works or not.</para>
+    </note>
     <para>
         Review <xref linkend="configuration" />, in particular the section on Hadoop version.
     </para>
@@ -81,6 +85,11 @@
         </section>
 </section>
 
+    <section xml:id="upgrade0.98">
+      <title>Upgrading from 0.96.x to 0.98.x</title>
+      <para>A rolling upgrade from 0.96.x to 0.98.x works.  The two versions are not binary compatible.
+      TODO: List of changes.</para> 
+    </section>
     <section xml:id="upgrade0.96">
       <title>Upgrading from 0.94.x to 0.96.x</title>
       <subtitle>The Singularity</subtitle>
@@ -200,7 +209,9 @@
          </programlisting>
          </para>
          <para>
-             If the output from the execute step looks good, start hbase-0.96.0.
+             If the output from the execute step looks good, stop the zookeeper instance you started
+             to do the upgrade: <programlisting>$ ./hbase/bin/hbase-daemon.sh stop zookeeper</programlisting>
+             Now start up hbase-0.96.0.
          </para>
      </section>
      <section xml:id="096.migration.troubleshooting"><title>Troubleshooting</title>
@@ -333,13 +344,14 @@
 </para>
 </section>
 
-<section><title>Experimental off-heap cache
-</title>
+<section xml:id="slabcache"><title>Experimental off-heap cache: SlabCache</title>
 <para>
 A new cache was contributed to 0.92.0 to act as a solution between using the “on-heap” cache which is the current LRU cache the region servers have and the operating system cache which is out of our control.
-To enable, set “-XX:MaxDirectMemorySize” in hbase-env.sh to the value for maximum direct memory size and specify hbase.offheapcache.percentage in hbase-site.xml with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
-See this blog post for additional information on this new experimental feature: http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/
+To enable <emphasis>SlabCache</emphasis>, as this feature is being called, set “-XX:MaxDirectMemorySize” in hbase-env.sh to the value for maximum direct memory size and specify
+<property>hbase.offheapcache.percentage</property> in <filename>hbase-site.xml</filename> with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
+See this blog post, <link xlink:href="http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching in Apache HBase: SlabCache</link>, for additional information on this new experimental feature.
 </para>
+<para>This feature has mostly been eclipsed in later HBases.  See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7404 ">HBASE-7404 Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE</link>, etc.</para>
 </section>
 
 <section><title>Changes in HBase replication
diff --git a/src/main/site/xdoc/export_control.xml b/src/main/site/xdoc/export_control.xml
new file mode 100644
index 0000000..0bd00e1
--- /dev/null
+++ b/src/main/site/xdoc/export_control.xml
@@ -0,0 +1,55 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document xmlns="http://maven.apache.org/XDOC/2.0"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
+  <properties>
+    <title>
+      Export Control
+    </title>
+  </properties>
+  <body>
+  <section name="Export Control">
+<p>
+This distribution uses or includes cryptographic software. The country in
+which you currently reside may have restrictions on the import, possession,
+use, and/or re-export to another country, of encryption software. BEFORE
+using any encryption software, please check your country's laws, regulations
+and policies concerning the import, possession, or use, and re-export of
+encryption software, to see if this is permitted. See the
+<a href="http://www.wassenaar.org/">Wassenaar Arrangement</a> for more
+information.</p>
+<p>
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
+(BIS), has classified this software as Export Commodity Control Number (ECCN) 
+5D002.C.1, which includes information security software using or performing 
+cryptographic functions with asymmetric algorithms. The form and manner of this
+Apache Software Foundation distribution makes it eligible for export under the 
+License Exception ENC Technology Software Unrestricted (TSU) exception (see the
+BIS Export Administration Regulations, Section 740.13) for both object code and
+source code.</p>
+<p>
+Apache HBase uses the built-in java cryptography libraries. See Oracle's
+information regarding
+<a href="http://www.oracle.com/us/products/export/export-regulations-345813.html">Java cryptographic export regulations</a>
+for more details.</p>
+  </section>
+  </body>
+</document>
diff --git a/src/main/site/xdoc/index.xml b/src/main/site/xdoc/index.xml
index aa971f2..b110a95 100644
--- a/src/main/site/xdoc/index.xml
+++ b/src/main/site/xdoc/index.xml
@@ -29,7 +29,7 @@
     <p>
     Use Apache HBase when you need random, realtime read/write access to your Big Data.
     This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
-Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's <a href="http://research.google.com/archive/bigtable.html">Bigtable: A Distributed Storage System for Structured Data</a> by Chang et al.
+Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's <a href="http://research.google.com/archive/bigtable.html">Bigtable: A Distributed Storage System for Structured Data</a> by Chang et al.
  Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
     </p>
     <h4>Features</h4>
@@ -63,17 +63,16 @@
    <p>See the <a href="http://hbase.apache.org/book/architecture.html#arch.overview">Architecture Overview</a>, the <a href="http://hbase.apache.org/book/faq.html">Apache HBase Reference Guide FAQ</a>,
     and the other documentation links on the left!
    </p>
+     <h4>Export Control</h4>
+   <p>The HBase distribution includes cryptographic software. See the export control notice <a href="export_control.html">here</a>
+   </p>
  </section>
      <section name="News">
-         <p>October 24th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/140759692/">HBase User and <a href="http://www.meetup.com/hackathon/events/144366512/">Developer</a> Meetup at HortonWorks</a>.in Palo Alto</p>
-         <p>September 26, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/135862292/">HBase Meetup at Arista Networks</a>.in San Francisco</p>
-         <p>August 20th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/120534362/">HBase Meetup at Flurry</a>.in San Francisco</p>
-         <p>July 16th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/119929152/">HBase Meetup at Twitter</a>.in San Francisco</p>
-         <p>June 25th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/119154442/">Hadoop Summit Meetup</a>.at San Jose Convention Center</p>
-         <p>June 14th, 2013 <a href="http://kijicon.eventbrite.com/">KijiCon: Building Big Data Apps</a> in San Francisco.</p>
-         <p>June 13th, 2013 <a href="http://www.hbasecon.com/">HBaseCon2013</a> in San Francisco.  Submit an Abstract!</p>
-         <p>June 12th, 2013 <a href="http://www.meetup.com/hackathon/events/123403802/">HBaseConHackAthon</a> at the Cloudera office in San Francisco.</p>
-
+         <p>May 5th, 2014 <a href="http://www.hbasecon.com">HBaseCon2014</a> at the Hilton San Francisco on Union Square</p>
+         <p>March 12th, 2014 <a href="http://www.meetup.com/hbaseusergroup/events/160757912/">HBase Meetup @ Ancestry.com</a> in San Francisco</p>
+         <p>February 10th, 2014 <a href="http://www.meetup.com/hbaseusergroup/events/163139322/">HBase Meetup @ Continuuity</a> in Palo Alto</p>
+         <p>January 30th, 2014 <a href="http://www.meetup.com/hbaseusergroup/events/158491762/">HBase Meetup @ Apple</a> in Cupertino</p>
+         <p>January 30th, 2014 <a href="http://www.meetup.com/Los-Angeles-HBase-User-group/events/160560282/">Los Angeles HBase User Group</a> in El Segundo</p>
       <p><small><a href="old_news.html">Old News</a></small></p>
     </section>
   </body>
diff --git a/src/main/site/xdoc/old_news.xml b/src/main/site/xdoc/old_news.xml
index 343d370..803d17d 100644
--- a/src/main/site/xdoc/old_news.xml
+++ b/src/main/site/xdoc/old_news.xml
@@ -27,6 +27,14 @@
   </properties>
   <body>
   <section name="Old News">
+         <p>October 24th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/140759692/">HBase User and <a href="http://www.meetup.com/hackathon/events/144366512/">Developer</a> Meetup at HortonWorks</a>.in Palo Alto</p>
+         <p>September 26, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/135862292/">HBase Meetup at Arista Networks</a>.in San Francisco</p>
+         <p>August 20th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/120534362/">HBase Meetup at Flurry</a>.in San Francisco</p>
+         <p>July 16th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/119929152/">HBase Meetup at Twitter</a>.in San Francisco</p>
+         <p>June 25th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/119154442/">Hadoop Summit Meetup</a>.at San Jose Convention Center</p>
+         <p>June 14th, 2013 <a href="http://kijicon.eventbrite.com/">KijiCon: Building Big Data Apps</a> in San Francisco.</p>
+         <p>June 13th, 2013 <a href="http://www.hbasecon.com/">HBaseCon2013</a> in San Francisco.  Submit an Abstract!</p>
+         <p>June 12th, 2013 <a href="http://www.meetup.com/hackathon/events/123403802/">HBaseConHackAthon</a> at the Cloudera office in San Francisco.</p>
          <p>April 11th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/103587852/">HBase Meetup at AdRoll</a> in San Francisco</p>
          <p>February 28th, 2013 <a href="http://www.meetup.com/hbaseusergroup/events/96584102/">HBase Meetup at Intel Mission Campus</a></p>
          <p>February 19th, 2013 <a href="http://www.meetup.com/hackathon/events/103633042/">Developers PowWow</a> at HortonWorks' new digs</p>