<?xml version="1.0" encoding="UTF-8"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

<document>
  <header>
    <title>Source Installation</title>
  </header>
  <body>

  <section>
    <title>Server Installation from Source</title>

    <p><strong>Prerequisites</strong></p>
    <ul>
        <li>Machine to build the installation tar on</li>
        <li>Machine on which the server can be installed - this should have
        access to the Hadoop cluster in question, and be accessible from
        the machines you launch jobs from</li>
        <li>an RDBMS - we recommend MySQL and provide instructions for it</li>
        <li>Hadoop cluster</li>
        <li>Unix user that the server will run as, and, if you are running your
        cluster in secure mode, an associated Kerberos service principal and keytabs.</li>
    </ul>

    <p>Throughout these instructions when you see a word in <em>italics</em> it
    indicates a place where you should replace the word with a locally 
    appropriate value such as a hostname or password.</p>

    <p><strong>Building a tarball </strong></p>

    <p>If you downloaded HCatalog from Apache or another site as a source release,
    you will need to first build a tarball to install.  You can tell if you have
    a source release by looking at the name of the object you downloaded.  If
    it is named hcatalog-src-0.4.0-incubating.tar.gz (notice the
    <strong>src</strong> in the name) then you have a source release.</p>
    
    <p>If you do not already have Apache Ant installed on your machine, you 
    will need to obtain it.  You can get it from the <a href="http://ant.apache.org/">
    Apache Ant website</a>.  Once you download it, you will need to unpack it
    somewhere on your machine.  The directory where you unpack it will be referred
    to as <em>ant_home</em> in this document.</p>

    <p>If you do not already have Apache Forrest installed on your machine, you 
    will need to obtain it.  You can get it from the <a href="http://forrest.apache.org/">
    Apache Forrest website</a>.  Once you download it, you will need to unpack 
    it somewhere on your machine.  The directory where you unpack it will be referred
    to as <em>forrest_home</em> in this document.</p>
    
    <p>To produce a tarball from this do the following:</p>

    <p>Create a directory to expand the source release in.  Copy the source
    release to that directory and unpack it.</p>
    <p><code>mkdir /tmp/hcat_source_release</code></p>
    <p><code>cp hcatalog-src-0.4.0-incubating.tar.gz /tmp/hcat_source_release</code></p>
    <p><code>cd /tmp/hcat_source_release</code></p>
    <p><code>tar xzf hcatalog-src-0.4.0-incubating.tar.gz</code></p>

    <p>Change directories into the unpacked source release and build the
    installation tarball.</p>
    <p><code>cd hcatalog-src-0.4.0-incubating</code></p>
    <p><em>ant_home</em><code>/bin/ant -Dhcatalog.version=0.4.0
    -Dforrest.home=</code><em>forrest_home</em><code> tar </code></p>

    <p>The tarball for installation should now be at
    <code>build/hcatalog-0.4.0.tar.gz</code></p>

    <p><strong>Database Setup</strong></p>

    <p>If you do not already have Hive installed with MySQL, the following will
    walk you through how to do so.  If you have already set this up, you can skip
    this step.</p>

    <p>Select a machine to install the database on.  This need not be the same
    machine as the Thrift server, which we will set up later.  For large
    clusters we recommend that they not be the same machine.  For the 
    purposes of these instructions we will refer to this machine as
    <em>hivedb.acme.com</em></p>

    <p>Install MySQL server on <em>hivedb.acme.com</em>.  You can obtain
    packages for MySQL from <a href="http://www.mysql.com/downloads/">MySQL's
    download site</a>.  We have developed and tested with versions 5.1.46
    and 5.1.48.  We suggest you use these versions or later.
    Once you have MySQL up and running, use the <code>mysql</code> command line
    tool to add the <code>hive</code> user and <code>hivemetastoredb</code>
    database.  You will need to pick a password for your <code>hive</code>
    user, and replace <em>dbpassword</em> in the following commands with it.</p>

    <p><code>mysql -u root</code></p>
    <p><code>mysql> CREATE USER 'hive'@'</code><em>hivedb.acme.com</em><code>' IDENTIFIED BY '</code><em>dbpassword</em><code>';</code></p>
    <p><code>mysql> CREATE DATABASE hivemetastoredb DEFAULT CHARACTER SET latin1 DEFAULT COLLATE latin1_swedish_ci;</code></p>
    <p><code>mysql> GRANT ALL PRIVILEGES ON hivemetastoredb.* TO 'hive'@'</code><em>hivedb.acme.com</em><code>' WITH GRANT OPTION;</code></p>
    <p><code>mysql> flush privileges;</code></p>
    <p><code>mysql> quit;</code></p>

    <p>Use the database installation script found in the Hive package to create the
    database.  <code>hive_home</code> in the line below refers to the directory
    where you have installed Hive.  If you are using Hive rpms, then this will
    be <code>/usr/lib/hive</code>.</p>

    <p><code>mysql -u hive -D hivemetastoredb -h</code><em>hivedb.acme.com</em><code> -p &lt; </code><em>hive_home</em><code>scripts/metastore/upgrade/mysql/hive-schema-0.9.0.mysql.sql</code></p>

    <p><strong>Thrift Server Setup</strong></p>

    <p>If you do not already have Hive running a metastore server using Thrift,
    you can use the following instructions to setup and run one.  You may skip
    this step if you already are using a Hive metastore server.</p>

    <p>Select a machine to install your Thrift server on.  For smaller and test
    installations this can be the same machine as the database.  For the
    purposes of these instructions we will refer to this machine as
    <em>hcatsvr.acme.com</em>.</p>

    <p>If you have not already done so, install Hive 0.9 on this machine.  You
    can use the
    <a href="http://hive.apache.org/releases.html">binary distributions</a> 
    provided by Hive or rpms available from
    <a href="http://incubator.apache.org/bigtop/">Apache Bigtop</a>.  If you use
    the Apache Hive binary distribution, select a directory, henceforth 
    referred to as <code>hive_home</code>, and untar the distribution there.
    If you use the rpms, <code>hive_home</code> will be
    <code>/usr/lib/hive</code>.</p>

    <p>Install the MySQL Java connector libraries on <em>hcatsvr.acme.com</em>.
    You can obtain these from
    <a href="http://www.mysql.com/downloads/connector/j/5.1.html">MySQL's
    download site</a>.</p>

    <p>Select a user to run the Thrift server as.  This user should not be a
    human user, and must be able to act as a proxy for other users.  We suggest
    the name "hive" for the user.  Throughout the rest of this documentation 
    we will refer to this user as <em>hive</em>.  If necessary, add the user to 
    <em>hcatsvr.acme.com</em>.</p>

    <p>Select a <em>root</em> directory for your installation of HCatalog.  This 
    directory must be owned by the <em>hive</em> user.  We recommend
    <code>/usr/local/hive</code>.  If necessary, create the directory.  You will
    need to be the <em>hive</em> user for the operations described in the remainder
    of this Thrift Server Setup section.</p>

    <p>Copy the HCatalog installation tarball into a temporary directory, and untar
    it.  Then change directories into the new distribution and run the HCatalog
    server installation script.  You will need to know the directory you chose
    as <em>root</em> and the
    directory you installed the MySQL Java connector libraries into (referred
    to in the command below as <em>dbroot</em>).  You will also need your
    <em>hadoop_home</em>, the directory where you have Hadoop installed, and 
    the port number you wish HCatalog to operate on which you will use to set
    <em>portnum</em>.</p>

    <p><code>tar zxf hcatalog-0.4.0.tar.gz</code></p>
    <p><code>cd hcatalog-0.4.0</code></p>
    <p><code>share/hcatalog/scripts/hcat_server_install.sh -r </code><em>root</em><code> -d </code><em>dbroot</em><code> -h </code><em>hadoop_home</em><code> -p </code><em>portnum</em></p>

    <p>Now you need to edit your <em>hive_home</em><code>/conf/hive-site.xml</code> file.
    Open this file in your favorite text editor.  The following table shows the
    values you need to configure.</p>

    <table>
        <tr>
            <th>Parameter</th>
            <th>Value to Set it to</th>
        </tr>
        <tr>
            <td>hive.metastore.local</td>
            <td>false</td>
        </tr>
        <tr>
            <td>javax.jdo.option.ConnectionURL</td>
            <td>jdbc:mysql://<em>hostname</em>/hivemetastoredb?createDatabaseIfNotExist=true where <em>hostname</em> is the name of the machine you installed MySQL on.</td>
        </tr>
        <tr>
            <td>javax.jdo.option.ConnectionDriverName</td>
            <td>com.mysql.jdbc.Driver</td>
        </tr>

        <tr>
            <td>javax.jdo.option.ConnectionUserName</td>
            <td>hive</td>
        </tr>
        <tr>
            <td>javax.jdo.option.ConnectionPassword</td>
            <td><em>dbpassword</em> value you used in setting up the MySQL server
            above.</td>
        </tr>
        <tr>
            <td>hive.semantic.analyzer.factory.impl</td>
            <td>org.apache.hcatalog.cli.HCatSemanticAnalyzerFactory</td>
        </tr>
        <tr>
            <td>hadoop.clientside.fs.operations</td>
            <td>true</td>
        </tr>
        <tr>
            <td>hive.metastore.warehouse.dir</td>
            <td>The directory can be a URI or an absolute file path. If it is an absolute file path, it will be resolved to a URI by the metastore:
            <p>-- If default hdfs was specified in core-site.xml, path resolves to HDFS location. </p>
            <p>-- Otherwise, path is resolved as local file: URI.</p>
            <p>This setting becomes effective when creating new tables (it takes precedence over default DBS.DB_LOCATION_URI at the time of table creation).</p>
            <p>You only need to set this if you have not yet configured Hive to run on your system.</p>
            </td>
        </tr>
        <tr>
            <td>hive.metastore.uris</td>
            <td>thrift://<em>hostname</em>:<em>portnum</em> where <em>hostname</em> is the name of the machine hosting the Thrift server, and <em>portnum</em> is the port number
            used above in the installation script.</td>
        </tr>
        <tr>
            <td>hive.metastore.execute.setugi</td>
            <td>true</td>
        </tr>
        <tr>
            <td>hive.metastore.sasl.enabled</td>
            <td>Set to true if you are using kerberos security with your Hadoop
            cluster, false otherwise.</td>
        </tr>
        <tr>
            <td>hive.metastore.kerberos.keytab.file</td>
            <td>The path to the Kerberos keytab file containing the metastore
            Thrift server's service principal.  Only required if you set
            hive.metastore.sasl.enabled above to true.</td>
        </tr>
        <tr>
            <td>hive.metastore.kerberos.principal</td>
            <td>The service principal for the metastore Thrift server.  You can
            reference your host as _HOST and it will be replaced with your
            actual hostname.  Only required if you set
            hive.metastore.sasl.enabled above to true.</td>
        </tr>
    </table>

    <p>You can now procede to starting the server.</p>
  </section>

  <section>
    <title>Starting the Server</title>
            
    <p>To start your server, HCatalog needs to know where Hive is installed.
    This is communicated by setting the environment variable <code>HIVE_HOME</code>
    to the location you installed Hive.  Start the HCatalog server by switching directories to
    <em>root</em> and invoking <code>HIVE_HOME=</code><em>hive_home</em><code> sbin/hcat_server.sh start</code></p>

  </section>

  <section>
    <title>Logging</title>

    <p>Server activity logs are located in
    <em>root</em><code>/var/log/hcat_server</code>.  Logging configuration is located at
    <em>root</em><code>/conf/log4j.properties</code>.  Server logging uses
    <code>DailyRollingFileAppender</code> by default. It will generate a new
    file per day and does not expire old log files automatically.</p>

  </section>

  <section>
    <title>Stopping the Server</title>

    <p>To stop the HCatalog server, change directories to the <em>root</em>
    directory and invoking <code>HIVE_HOME=</code><em>hive_home</em><code> sbin/hcat_server.sh stop</code></p>

  </section>

  <section>
    <title>Client Installation</title>

    <p>Select a <em>root</em> directory for your installation of HCatalog client.
    We recommend <code>/usr/local/hcat</code>.  If necessary, create the directory.</p>

    <p>Copy the HCatalog installation tarball into a temporary directory, and untar
    it.</p>

    <p><code>tar zxf hcatalog-0.4.0.tar.gz</code></p>

    <p>Now you need to edit your <em>hive_home</em><code>/conf/hive-site.xml</code> file.
    You can use the same file as on the server <strong>except the value of 
    </strong><code>javax.jdo.option.ConnectionPasswordh</code><strong> should be
    removed</strong>.  This avoids having the password available in plain text on
    all of your clients.</p>

    <p>The HCatalog command line interface (CLI) can now be invoked as
    <code>HIVE_HOME=</code><em>hive_home root</em><code>/bin/hcat</code>.</p>

  </section>

  </body>
</document>
