src/main/docbkx/upgrading.xml - hbase - Git at Google

 <?xml version="1.0"?>
     <chapter xml:id="upgrading"
       version="5.0" xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:xi="http://www.w3.org/2001/XInclude"
       xmlns:svg="http://www.w3.org/2000/svg"
       xmlns:m="http://www.w3.org/1998/Math/MathML"
       xmlns:html="http://www.w3.org/1999/xhtml"
       xmlns:db="http://docbook.org/ns/docbook">
 <!--
 /**
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
  * regarding copyright ownership.  The ASF licenses this file
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
  *
  *     http://www.apache.org/licenses/LICENSE-2.0
  *
  * Unless required by applicable law or agreed to in writing, software
  * distributed under the License is distributed on an "AS IS" BASIS,
  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
 -->
     <title>Upgrading</title>
     <para>You cannot skip major verisons upgrading.  If you are upgrading from
     version 0.90.x to 0.94.x, you must first go from 0.90.x to 0.92.x and then go
     from 0.92.x to 0.94.x.</para>
     <note><para>It may be possible to skip across versions -- for example go from
     0.92.2 straight to 0.98.0 just following the 0.96.x upgrade instructions --
     but we have not tried it so cannot say whether it works or not.</para>
     </note>
     <para>
         Review <xref linkend="configuration" />, in particular the section on Hadoop version.
     </para>
     <section xml:id="hbase.versioning">
         <title>HBase version numbers</title>
         <para>HBase has not walked a straight line where version numbers are concerned.
             Since we came up out of hadoop itself, we originally tracked hadoop versioning.
             Later we left hadoop versioning behind because we were moving at a different rate
             to that of our parent.  If you are into the arcane, checkout our old wiki page
             on <link xlink:href="http://wiki.apache.org/hadoop/Hbase/HBaseVersions">HBase Versioning</link>
             which tries to connect the HBase version dots.</para>
         <section xml:id="hbase.development.series"><title>Odd/Even Versioning or "Development"" Series Releases</title>
             <para>Ahead of big releases, we have been putting up preview versions to start the
                 feedback cycle turning-over  earlier.  These "Development" Series releases,
                 always odd-numbered, come with no guarantees, not even regards being able
                 to upgrade between two sequential releases (we reserve the right to break compatibility across
                 "Development" Series releases).  Needless to say, these releases are not for
                 production deploys.  They are a preview of what is coming in the hope that
                 interested parties will take the release for a test drive and flag us early if we
                 there are issues we've missed ahead of our rolling a production-worthy release.
             </para>
             <para>Our first "Development" Series was the 0.89 set that came out ahead of
                 HBase 0.90.0.  HBase 0.95 is another "Development" Series that portends
                 HBase 0.96.0.
             </para>
         </section>
         <section xml:id="hbase.binary.compatibility">
             <title>Binary Compatibility</title>
             <para>When we say two HBase versions are compatible, we mean that the versions
                 are wire and binary compatible.  Compatible HBase versions means that
                 clients can talk to compatible but differently versioned servers.
                 It means too that you can just swap out the jars of one version and replace
                 them with the jars of another, compatible version and all will just work.
                 Unless otherwise specified, HBase point versions are binary compatible.
                 You can safely do rolling upgrades between binary compatible versions; i.e.
                 across point versions: e.g. from 0.94.5 to 0.94.6<footnote><para>See
                         <link xlink:href="http://search-hadoop.com/m/bOOvwHGW981/Does+compatibility+between+versions+also+mean+binary+compatibility%253F&amp;subj=Re+Does+compatibility+between+versions+also+mean+binary+compatibility+">Does compatibility between versions also mean binary compatibility?</link>
                         discussion on the hbaes dev mailing list.
                 </para></footnote>.
             </para>
         </section>
         <section xml:id="hbase.rolling.restart">
             <title>Rolling Upgrade between versions/Binary compatibility</title>
             <para>Unless otherwise specified, HBase point versions are binary compatible.
                 you can do a rolling upgrade between hbase point versions;
                 for example, you can go to 0.94.6 from 0.94.5 by doing a rolling upgrade across the cluster
                 replacing the 0.94.5 binary with a 0.94.6 binary.
             </para>
         </section>
 </section>

     <section xml:id="upgrade0.98">
       <title>Upgrading from 0.96.x to 0.98.x</title>
       <para>A rolling upgrade from 0.96.x to 0.98.x works.  The two versions are not binary compatible.
       TODO: List of changes.</para>
     </section>
     <section xml:id="upgrade0.96">
       <title>Upgrading from 0.94.x to 0.96.x</title>
       <subtitle>The Singularity</subtitle>
       <para>You will have to stop your old 0.94.x cluster completely to upgrade.  If you are replicating
      between clusters, both clusters will have to go down to upgrade.  Make sure it is a clean shutdown.
      The less WAL files around, the faster the upgrade will run (the upgrade will split any log files it
      finds in the filesystem as part of the upgrade process).  All clients must be upgraded to 0.96 too.
  </para>
  <para>The API has changed.  You will need to recompile your code against 0.96 and you may need to
      adjust applications to go against new APIs (TODO: List of changes).
  </para>
  <section>
      <title>Executing the 0.96 Upgrade</title>
      <note>
      <para>HDFS and ZooKeeper should be up and running during the upgrade process.</para>
  </note>
  <para>hbase-0.96.0 comes with an upgrade script.  Run
      <programlisting>$ bin/hbase upgrade</programlisting> to see its usage.
      The script has two main modes: -check, and -execute.
  </para>
      <section><title>check</title>
          <para>The <emphasis>check</emphasis> step is run against a running 0.94 cluster.
              Run it from a downloaded 0.96.x binary.  The <emphasis>check</emphasis> step
              is looking for the presence of <filename>HFileV1</filename> files.  These are
              unsupported in hbase-0.96.0.  To purge them -- have them rewritten as HFileV2 --
              you must run a compaction.
          </para>
          <para>The <emphasis>check</emphasis> step prints stats at the end of its run
              (grep for “Result:” in the log) printing absolute path of the tables it scanned,
              any HFileV1 files found, the regions containing said files (the regions we
              need to  major compact to purge the HFileV1s), and any corrupted files if
              any found. A corrupt file is unreadable, and so is undefined (neither HFileV1 nor HFileV2).
          </para>
          <para>To run the check step, run <programlisting>$ bin/hbase upgrade -check</programlisting>.
           Here is sample output:
 <programlisting>
              Tables Processed:
              hdfs://localhost:41020/myHBase/.META.
              hdfs://localhost:41020/myHBase/usertable
              hdfs://localhost:41020/myHBase/TestTable
              hdfs://localhost:41020/myHBase/t

              Count of HFileV1: 2
              HFileV1:
              hdfs://localhost:41020/myHBase/usertable    /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
              hdfs://localhost:41020/myHBase/usertable    /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512

              Count of corrupted files: 1
              Corrupted Files:
              hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
              Count of Regions with HFileV1: 2
              Regions to Major Compact:
              hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
              hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af

              There are some HFileV1, or corrupt files (files with incorrect major version)
 </programlisting>
              In the above sample output, there are two HFileV1 in two regions, and one corrupt file.
              Corrupt files should probably be removed.  The regions that have HFileV1s need to be major
              compacted.  To major compact, start up the hbase shell and review how to compact an individual
              region.  After the major compaction is done, rerun the check step and the HFileV1s shoudl be
              gone, replaced by HFileV2 instances.
          </para>
          <para>By default, the check step scans the hbase root directory (defined as hbase.rootdir in the configuration).
              To scan a specific directory only, pass the <emphasis>-dir</emphasis> option.
              <programlisting>$ bin/hbase upgrade -check -dir /myHBase/testTable</programlisting>
              The above command would detect HFileV1s in the /myHBase/testTable directory.
          </para>
          <para>
              Once the check step reports all the HFileV1 files have been rewritten, it is safe to proceed with the
              upgrade.
           </para>
      </section>
      <section><title>execute</title>
          <para>After the check step shows the cluster is free of HFileV1, it is safe to proceed with the upgrade.
              Next is the <emphasis>execute</emphasis> step.  You must <emphasis>SHUTDOWN YOUR 0.94.x CLUSTER</emphasis>
              before you can run the <emphasis>execute</emphasis> step.  The execute step will not run if it
              detects running HBase masters or regionservers.
          <note>
          <para>HDFS and ZooKeeper should be up and running during the upgrade process.
              If zookeeper is managed by HBase, then you can start zookeeper so it is available to the upgrade
              by running <programlisting>$ ./hbase/bin/hbase-daemon.sh  start zookeeper</programlisting>
          </para></note>
          </para>
          <para>
              The <emphasis>execute</emphasis> upgrade step is made of three substeps.

              <itemizedlist>
              <listitem> <para>Namespaces: HBase 0.96.0 has support for namespaces.  The upgrade needs to reorder directories in the filesystem for namespaces to work.</para> </listitem>
              <listitem> <para>ZNodes: All znodes are purged so that new ones can be written in their place using a new protobuf'ed format and a few are migrated in place: e.g. replication and table state znodes</para> </listitem>
              <listitem> <para>WAL Log Splitting: If the 0.94.x cluster shutdown was not clean, we'll split WAL logs as part of migration before
                      we startup on 0.96.0.  This WAL splitting runs slower than the native distributed WAL splitting because it is all inside the
                      single upgrade process (so try and get a clean shutdown of the 0.94.0 cluster  if you can).
              </para> </listitem>
          </itemizedlist>
          </para>
          <para>
              To run the <emphasis>execute</emphasis> step, make sure that first you have copied hbase-0.96.0
              binaries everywhere under servers and under clients.  Make sure the 0.94.0 cluster is down.
              Then do as follows:
          <programlisting>$ bin/hbase upgrade -execute</programlisting>
          Here is some sample output
          <programlisting>
              Starting Namespace upgrade
              Created version file at hdfs://localhost:41020/myHBase with version=7
              Migrating table testTable to hdfs://localhost:41020/myHBase/.data/default/testTable
              …..
              Created version file at hdfs://localhost:41020/myHBase with version=8
              Successfully completed NameSpace upgrade.
              Starting Znode upgrade
              ….
              Successfully completed Znode upgrade

              Starting Log splitting
              …
              Successfully completed Log splitting
          </programlisting>
          </para>
          <para>
              If the output from the execute step looks good, stop the zookeeper instance you started
              to do the upgrade: <programlisting>$ ./hbase/bin/hbase-daemon.sh stop zookeeper</programlisting>
              Now start up hbase-0.96.0.
          </para>
      </section>
      <section xml:id="096.migration.troubleshooting"><title>Troubleshooting</title>
      <section xml:id="096.migration.troubleshooting.old.client"><title>Old Client connecting to 0.96 cluster</title>
          <para>It will fail with an exception like the below.  Upgrade.
              <programlisting>17:22:15  Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF
 17:22:15  *
 17:22:15   api-compat-8.ent.cloudera.com ��  ���(
 17:22:15    at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60)
 17:22:15    at org.apache.hadoop.hbase.ServerName.&amp;init>(ServerName.java:101)
 17:22:15    at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283)
 17:22:15    at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77)
 17:22:15    at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61)
 17:22:15    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703)
 17:22:15    at org.apache.hadoop.hbase.client.HBaseAdmin.&amp;init>(HBaseAdmin.java:126)
 17:22:15    at Client_4_3_0.setup(Client_4_3_0.java:716)
 17:22:15    at Client_4_3_0.main(Client_4_3_0.java:63)</programlisting>
          </para>
      </section>
      </section>
  </section>


     </section>

     <section xml:id="upgrade0.94">
       <title>Upgrading from 0.92.x to 0.94.x</title>
       <para>We used to think that 0.92 and 0.94 were interface compatible and that you can do a
           rolling upgrade between these versions but then we figured that
           <link xlink:href="https://issues.apache.org/jira/browse/HBASE-5357">HBASE-5357 Use builder pattern in HColumnDescriptor</link>
           changed method signatures so rather than return void they instead return HColumnDescriptor.  This
           will throw <programlisting>java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HColumnDescriptor.setMaxVersions(I)V</programlisting>
           .... so 0.92 and 0.94 are NOT compatible.  You cannot do a rolling upgrade between them.
     </para>
     </section>
     <section xml:id="upgrade0.92">
       <title>Upgrading from 0.90.x to 0.92.x</title>
       <subtitle>Upgrade Guide</subtitle>
 <para>You will find that 0.92.0 runs a little differently to 0.90.x releases.  Here are a few things to watch out for upgrading from 0.90.x to 0.92.0.
 <note><title>tl;dr</title>
 <para>
 If you've not patience, here are the important things to know upgrading.
 <orderedlist>
 <listitem>Once you upgrade, you can’t go back.
 </listitem>
 <listitem>
 MSLAB is on by default. Watch that heap usage if you have a lot of regions.
 </listitem>
 <listitem>
 Distributed splitting is on by defaul.  It should make region server failover faster.
 </listitem>
 <listitem>
 There’s a separate tarball for security.
 </listitem>
 <listitem>
 If -XX:MaxDirectMemorySize is set in your hbase-env.sh, it’s going to enable the experimental off-heap cache (You may not want this).
 </listitem>
 </orderedlist>
 </para>
 </note>
 </para>

 <section>
 <title>You can’t go back!
 </title>
 <para>To move to 0.92.0, all you need to do is shutdown your cluster, replace your hbase 0.90.x with hbase 0.92.0 binaries (be sure you clear out all 0.90.x instances) and restart (You cannot do a rolling restart from 0.90.x to 0.92.x -- you must restart).
 On startup, the <varname>.META.</varname> table content is rewritten removing the table schema from the <varname>info:regioninfo</varname> column.
 Also, any flushes done post first startup will write out data in the new 0.92.0 file format, <link xlink:href="http://hbase.apache.org/book.html#hfilev2">HFile V2</link>.
 This means you cannot go back to 0.90.x once you’ve started HBase 0.92.0 over your HBase data directory.
 </para>
 </section>

 <section>
 <title>MSLAB is ON by default
 </title>
 <para>In 0.92.0, the <link xlink:href="http://hbase.apache.org/book.html#hbase.hregion.memstore.mslab.enabled">hbase.hregion.memstore.mslab.enabled</link> flag is set to true
 (See <xref linkend="mslab" />).  In 0.90.x it was <constant>false</constant>.  When it is enabled, memstores will step allocate memory in MSLAB 2MB chunks even if the
 memstore has zero or just a few small elements.  This is fine usually but if you had lots of regions per regionserver in a 0.90.x cluster (and MSLAB was off),
 you may find yourself OOME'ing on upgrade because the <mathphrase>thousands of regions * number of column families * 2MB MSLAB (at a minimum)</mathphrase>
 puts your heap over the top.  Set <varname>hbase.hregion.memstore.mslab.enabled</varname> to
 <constant>false</constant> or set the MSLAB size down from 2MB by setting <varname>hbase.hregion.memstore.mslab.chunksize</varname> to something less.
 </para>
 </section>

 <section><title>Distributed splitting is on by default
 </title>
 <para>Previous, WAL logs on crash were split by the Master alone.  In 0.92.0, log splitting is done by the cluster (See See “HBASE-1364 [performance] Distributed splitting of regionserver commit logs”).  This should cut down significantly on the amount of time it takes splitting logs and getting regions back online again.
 </para>
 </section>

 <section><title>Memory accounting is different now
 </title>
 <para>In 0.92.0, <xref linkend="hfilev2" /> indices and bloom filters take up residence in the same LRU used caching blocks that come from the filesystem.
 In 0.90.x, the HFile v1 indices lived outside of the LRU so they took up space even if the index was on a ‘cold’ file, one that wasn’t being actively used.  With the indices now in the LRU, you may find you
 have less space for block caching.  Adjust your block cache accordingly.  See the <xref linkend="block.cache" /> for more detail.
 The block size default size has been changed in 0.92.0 from 0.2 (20 percent of heap) to 0.25.
 </para>
 </section>


 <section><title>On the Hadoop version to use
 </title>
 <para>Run 0.92.0 on Hadoop 1.0.x (or CDH3u3 when it ships).  The performance benefits are worth making the move.  Otherwise, our Hadoop prescription is as it has been; you need an Hadoop that supports a working sync.  See <xref linkend="hadoop" />.
 </para>

 <para>If running on Hadoop 1.0.x (or CDH3u3), enable local read.  See <link xlink:href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf">Practical Caching</link> presentation for ruminations on the performance benefits ‘going local’ (and for how to enable local reads).
 </para>
 </section>
 <section><title>HBase 0.92.0 ships with ZooKeeper 3.4.2
 </title>
 <para>If you can, upgrade your zookeeper.  If you can’t, 3.4.2 clients should work against 3.3.X ensembles (HBase makes use of 3.4.2 API).
 </para>
 </section>
 <section>
 <title>Online alter is off by default
 </title>
 <para>In 0.92.0, we’ve added an experimental online schema alter facility  (See <xref linkend="hbase.online.schema.update.enable" />).  Its off by default.  Enable it at your own risk.  Online alter and splitting tables do not play well together so be sure your cluster quiescent using this feature (for now).
 </para>
 </section>
 <section>
 <title>WebUI
 </title>
 <para>The webui has had a few additions made in 0.92.0.  It now shows a list of the regions currently transitioning, recent compactions/flushes, and a process list of running processes (usually empty if all is well and requests are being handled promptly).  Other additions including requests by region, a debugging servlet dump, etc.
 </para>
 </section>
 <section>
 <title>Security tarball
 </title>
 <para>We now ship with two tarballs; secure and insecure HBase.  Documentation on how to setup a secure HBase is on the way.
 </para>
 </section>

 <section xml:id="slabcache"><title>Experimental off-heap cache: SlabCache</title>
 <para>
 A new cache was contributed to 0.92.0 to act as a solution between using the “on-heap” cache which is the current LRU cache the region servers have and the operating system cache which is out of our control.
 To enable <emphasis>SlabCache</emphasis>, as this feature is being called, set “-XX:MaxDirectMemorySize” in hbase-env.sh to the value for maximum direct memory size and specify
 <property>hbase.offheapcache.percentage</property> in <filename>hbase-site.xml</filename> with the percentage that you want to dedicate to off-heap cache. This should only be set for servers and not for clients. Use at your own risk.
 See this blog post, <link xlink:href="http://www.cloudera.com/blog/2012/01/caching-in-hbase-slabcache/">Caching in Apache HBase: SlabCache</link>, for additional information on this new experimental feature.
 </para>
 <para>This feature has mostly been eclipsed in later HBases.  See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-7404 ">HBASE-7404 Bucket Cache:A solution about CMS,Heap Fragment and Big Cache on HBASE</link>, etc.</para>
 </section>

 <section><title>Changes in HBase replication
 </title>
 <para>0.92.0 adds two new features: multi-slave and multi-master replication. The way to enable this is the same as adding a new peer, so in order to have multi-master you would just run add_peer for each cluster that acts as a master to the other slave clusters. Collisions are handled at the timestamp level which may or may not be what you want, this needs to be evaluated on a per use case basis. Replication is still experimental in 0.92 and is disabled by default, run it at your own risk.
 </para>
 </section>


 <section><title>RegionServer now aborts if OOME
 </title>
 <para>If an OOME, we now have the JVM kill -9 the regionserver process so it goes down fast.  Previous, a RegionServer might stick around after incurring an OOME limping along in some wounded state.  To disable this facility, and recommend you leave it in place, you’d need to edit the bin/hbase file.  Look for the addition of the -XX:OnOutOfMemoryError="kill -9 %p" arguments (See [HBASE-4769] - ‘Abort RegionServer Immediately on OOME’)
 </para>
 </section>


 <section><title>HFile V2 and the “Bigger, Fewer” Tendency
 </title>
 <para>0.92.0 stores data in a new format, <xref linkend="hfilev2" />.   As HBase runs, it will move all your data from HFile v1 to HFile v2 format.  This auto-migration will run in the background as flushes and compactions run.
 HFile V2 allows HBase run with larger regions/files.  In fact, we encourage that all HBasers going forward tend toward Facebook axiom #1, run with larger, fewer regions.
 If you have lots of regions now -- more than 100s per host -- you should look into setting your region size up after you move to 0.92.0 (In 0.92.0, default size is now 1G, up from 256M), and then running online merge tool (See “HBASE-1621 merge tool should work on online cluster, but disabled table”).
 </para>
 </section>
     </section>
     <section xml:id="upgrade0.90">
     <title>Upgrading to HBase 0.90.x from 0.20.x or 0.89.x</title>
           <para>This version of 0.90.x HBase can be started on data written by
               HBase 0.20.x or HBase 0.89.x.  There is no need of a migration step.
               HBase 0.89.x and 0.90.x does write out the name of region directories
               differently -- it names them with a md5 hash of the region name rather
               than a jenkins hash -- so this means that once started, there is no
               going back to HBase 0.20.x.
           </para>
           <para>
              Be sure to remove the <filename>hbase-default.xml</filename> from
              your <filename>conf</filename>
              directory on upgrade.  A 0.20.x version of this file will have
              sub-optimal configurations for 0.90.x HBase.  The
              <filename>hbase-default.xml</filename> file is now bundled into the
              HBase jar and read from there.  If you would like to review
              the content of this file, see it in the src tree at
              <filename>src/main/resources/hbase-default.xml</filename> or
              see <xref linkend="hbase_default_configurations" />.
           </para>
           <para>
             Finally, if upgrading from 0.20.x, check your
             <varname>.META.</varname> schema in the shell.  In the past we would
             recommend that users run with a 16kb
             <varname>MEMSTORE_FLUSHSIZE</varname>.
             Run <code>hbase> scan '-ROOT-'</code> in the shell. This will output
             the current <varname>.META.</varname> schema.  Check
             <varname>MEMSTORE_FLUSHSIZE</varname> size.  Is it 16kb (16384)?  If so, you will
             need to change this (The 'normal'/default value is 64MB (67108864)).
             Run the script <filename>bin/set_meta_memstore_size.rb</filename>.
             This will make the necessary edit to your <varname>.META.</varname> schema.
             Failure to run this change will make for a slow cluster <footnote>
             <para>
             See <link xlink:href="https://issues.apache.org/jira/browse/HBASE-3499">HBASE-3499 Users upgrading to 0.90.0 need to have their .META. table updated with the right MEMSTORE_SIZE</link>
             </para>
             </footnote>
             .

           </para>
           </section>
     </chapter>