src/docs/src/documentation/content/xdocs/rest_server_install.xml - hcatalog - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

 <document>
   <header>
     <title>Installation </title>
   </header>
   <body>

   <section>
   <title>Procedure</title>
   <ol>
   <li>Ensure that the
        <a href="rest_server_install.html#Requirements">required related installations</a>
       are in place, and place required files into the
       <a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop distributed cache.</a></li>
   <li>Download and unpack the HCatalog distribution.</li>
   <li>Set the <code>TEMPLETON_HOME</code> environment variable to the base of the
       HCatalog REST server installation.  This will usually be same as
        <code>HCATALOG_HOME</code>.
       This is used to find the Templeton
       configuration.</li>
   <li>Set JAVA_HOME,HADOOP_PREFIX, and HIVE_HOME environment variables</li>
   <li>Review the <a href="configuration.html">configuration</a>
       and update or create <code>webhcat-site.xml</code> as required. Ensure that
       site specific component installation locations are accurate, especially
       the Hadoop configuration path.  Configuration variables that use a filesystem
       path try to have reasonable defaults, but it's always safe to specify a full
       and complete path.</li>
   <li>Verify that HCatalog is installed and that the <code>hcat</code>
       executable is in the PATH.</li>
   <li>Build HCatalog using the command <code>ant jar</code> from the
       top level HCatalog directory.</li>
   <li>Start the REST server with the command
       <code>sbin/webhcat_server.sh start</code>.</li>
   <li>Check that your local install works.  Assuming that the server is running
       on port 50111, the following command would give output similar to that shown.
 <source>
 % curl -i http://localhost:50111/templeton/v1/status
 HTTP/1.1 200 OK
 Content-Type: application/json
 Transfer-Encoding: chunked
 Server: Jetty(7.6.0.v20120127)

 {"status":"ok","version":"v1"}
 %
 </source></li>
   </ol>
   </section>

   <section>
   <title>Server Commands</title>
   <ul>
    <li><strong>Start the server:</strong> <code>sbin/webhcat_server.sh start</code></li>
    <li><strong>Stop the server:</strong> <code>sbin/webhcat_server.sh stop</code></li>
    <li><strong>End-to-end build, run, test:</strong> <code>ant e2e</code></li>
   </ul>
   </section>

   <section>
   <title>Requirements</title>
   <ul>
    <li><a href="http://ant.apache.org/">Ant</a>, version 1.8 or higher</li>
    <li><a href="http://hadoop.apache.org/">Hadoop</a>, version 1.0.3 or higher</li>
    <li><a href="http://zookeeper.apache.org/">ZooKeeper</a> is required if you are using
     the ZooKeeper storage class. (Be sure to review
     and update the ZooKeeper related <a href="configuration.html">Templeton
     configuration</a>.)</li>
    <li>
      <a href="http://incubator.apache.org/hcatalog/">HCatalog</a>.
      Version 0.5.0 or higher.  The hcat executable must be both in the
      PATH and properly configured in the <a
      href="configuration.html">Templeton configuration</a>.
    </li>
    <li>
      Permissions must be given to the user running the server.
      (see below)
    </li>
    <li>
      If running a secure cluster, Kerberos keys and principals must be
      created.  (see below)
    </li>
    <li><a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop Distributed Cache</a>.
     To use the <a href="http://hive.apache.org/">Hive</a>,
     <a href="http://pig.apache.org/">Pig</a>, or
     <a href="http://hadoop.apache.org/common/docs/r1.0.0/streaming.html">hadoop/streaming</a>
     resources, see instructions below for placing the required files in the
     Hadoop Distributed Cache.</li>
   </ul>
   </section>

   <section>
   <title>Hadoop Distributed Cache</title>

   <p>The server requires some files be accessible on the
   <a href="http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#DistributedCache">
    Hadoop distributed cache</a>.  For example, to avoid the installation of Pig and Hive
    everywhere on the cluster, the server gathers a version of Pig or Hive from the
    Hadoop distributed cache whenever those resources are invoked. After placing the
    following components into HDFS please update the site configuration as required for
    each.</p>

   <ul>
   <li><strong>Hive</strong>:
    <a href="http://www.apache.org/dyn/closer.cgi/hive/">Download</a>
    the hive tar.gz file and place it in HDFS.
 <source>
 hadoop fs -put /tmp/hive-0.10.0.tar.gz /apps/templeton/hive-0.10.0.tar.gz
 </source>
   </li>

   <li><strong>Pig</strong>:
    <a href="http://www.apache.org/dyn/closer.cgi/pig">Download</a>
    the Pig tar.gz file and place it into HDFS.  For example:
 <source>
 hadoop fs -put /tmp/pig-0.10.1.tar.gz /apps/templeton/pig-0.10.1.tar.gz
 </source>
   </li>

   <li><strong>Hadoop Streaming</strong>:
    Place <code>hadoop-streaming.jar</code> into HDFS. For example, use the
    following command, substituting your path to the jar for the one below.
 <source>
 hadoop fs -put $HADOOP_PREFIX/hadoop-0.20.205.0/contrib/streaming/hadoop-streaming-0.20.205.0.jar \
  /apps/templeton/hadoop-streaming.jar
 </source>
   </li>

   <li><strong>Override Jars</strong>:
    Place override jars required (if any) into HDFS. <em>Note</em>: Hadoop versions
    prior to 1.0.3 require a patch (HADOOP-7987) to properly run Templeton.  This patch is
    distributed with Templeton (located at
    <code>templeton/src/hadoop_temp_fix/ugi.jar</code>)
    and should be placed into HDFS, as reflected in the current default configuration.
 <source>
 hadoop fs -put ugi.jar /apps/templeton/ugi.jar
 </source>
   </li>
   </ul>

   <p>The location of these files in the cache, and the location
      of the installations inside the archives, can be specified using the following
      Templeton configuration variables.  (See the
      <a href="configuration.html">Configuration</a> documentation for more information
      on changing Templeton configuration parameters.)</p>

   <table>
   <tr><th>Name</th><th>Default</th><th>Description</th></tr>

   <tr>
     <td><strong>templeton.pig.archive</strong></td>
     <td><code>hdfs:///apps/templeton/pig-0.10.1.tar.gz</code></td>
     <td>The path to the Pig archive.</td>
   </tr>

   <tr>
     <td><strong>templeton.pig.path</strong></td>
     <td><code>pig-0.10.1.tar.gz/pig-0.10.1/bin/pig</code></td>
     <td>The path to the Pig executable.</td>
   </tr>

   <tr>
     <td><strong>templeton.hive.archive</strong></td>
     <td><code>hdfs:///apps/templeton/hive-0.10.0.tar.gz</code></td>
     <td>The path to the Hive archive.</td>
   </tr>

   <tr>
     <td><strong>templeton.hive.path</strong></td>
     <td><code>hive-0.10.0.tar.gz/hive-0.10.0/bin/hive</code></td>
     <td>The path to the Hive executable.</td>
   </tr>

   <tr>
     <td><strong>templeton.streaming.jar</strong></td>
     <td><code>hdfs:///apps/templeton/hadoop-streaming.jar</code></td>
     <td>The path to the Hadoop streaming jar file.</td>
   </tr>

   <tr>
     <td><strong>templeton.override.jars</strong></td>
     <td><code>hdfs:///apps/templeton/ugi.jar</code></td>
     <td>Jars to add to the HADOOP_CLASSPATH for all Map Reduce jobs.
         These jars must exist on HDFS.</td>
   </tr>
   </table>
   </section>

   <section>
   <title>Permissions</title>
   <p>
     Permission must given for the user running the templeton
     executable to run jobs for other users.  That is, the templeton
     server will impersonate users on the Hadoop cluster.
   </p>

   <p>
     Create (or assign) a Unix user who will run the Templeton server.
     Call this USER.  See the Secure Cluster section below for choosing
     a user on a Kerberos cluster.
   </p>

   <p>
     Modify the Hadoop core-site.xml file and set these properties:
   </p>

   <table>
     <tr><th>Variable</th><th>Value</th></tr>
     <tr>
       <td>hadoop.proxyuser.USER.groups</td>
       <td>
         A comma separated list of the Unix groups whose users will be
         impersonated.
       </td>
     </tr>
     <tr>
       <td>hadoop.proxyuser.USER.hosts</td>
       <td>
         A comma separated list of the hosts that will run the hcat and
         JobTracker servers.
       </td>
     </tr>
   </table>
   </section>

   <section>
   <title>Secure Cluster</title>
   <p>
     To run Templeton on a secure cluster follow the Permissions
     instructions above but create a Kerberos principal for the
     Templeton server with the name <code>USER/host@realm</code>
   </p>

   <p>
     Also, set the templeton configuration variables
     <code>templeton.kerberos.principal</code> and
     <code>templeton.kerberos.keytab</code>
   </p>
   </section>

   </body>
 </document>
	<?xml version="1.0" encoding="UTF-8"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">

	<document>
	<header>
	<title>Installation </title>
	</header>
	<body>

	<section>
	<title>Procedure</title>
	<ol>
	<li>Ensure that the
	<a href="rest_server_install.html#Requirements">required related installations</a>
	are in place, and place required files into the
	<a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop distributed cache.</a></li>
	<li>Download and unpack the HCatalog distribution.</li>
	<li>Set the <code>TEMPLETON_HOME</code> environment variable to the base of the
	HCatalog REST server installation. This will usually be same as
	<code>HCATALOG_HOME</code>.
	This is used to find the Templeton
	configuration.</li>
	<li>Set JAVA_HOME,HADOOP_PREFIX, and HIVE_HOME environment variables</li>
	<li>Review the <a href="configuration.html">configuration</a>
	and update or create <code>webhcat-site.xml</code> as required. Ensure that
	site specific component installation locations are accurate, especially
	the Hadoop configuration path. Configuration variables that use a filesystem
	path try to have reasonable defaults, but it's always safe to specify a full
	and complete path.</li>
	<li>Verify that HCatalog is installed and that the <code>hcat</code>
	executable is in the PATH.</li>
	<li>Build HCatalog using the command <code>ant jar</code> from the
	top level HCatalog directory.</li>
	<li>Start the REST server with the command
	<code>sbin/webhcat_server.sh start</code>.</li>
	<li>Check that your local install works. Assuming that the server is running
	on port 50111, the following command would give output similar to that shown.
	<source>
	% curl -i http://localhost:50111/templeton/v1/status
	HTTP/1.1 200 OK
	Content-Type: application/json
	Transfer-Encoding: chunked
	Server: Jetty(7.6.0.v20120127)

	{"status":"ok","version":"v1"}
	%
	</source></li>
	</ol>
	</section>

	<section>
	<title>Server Commands</title>
	<ul>
	<li><strong>Start the server:</strong> <code>sbin/webhcat_server.sh start</code></li>
	<li><strong>Stop the server:</strong> <code>sbin/webhcat_server.sh stop</code></li>
	<li><strong>End-to-end build, run, test:</strong> <code>ant e2e</code></li>
	</ul>
	</section>

	<section>
	<title>Requirements</title>
	<ul>
	<li><a href="http://ant.apache.org/">Ant</a>, version 1.8 or higher</li>
	<li><a href="http://hadoop.apache.org/">Hadoop</a>, version 1.0.3 or higher</li>
	<li><a href="http://zookeeper.apache.org/">ZooKeeper</a> is required if you are using
	the ZooKeeper storage class. (Be sure to review
	and update the ZooKeeper related <a href="configuration.html">Templeton
	configuration</a>.)</li>
	<li>
	<a href="http://incubator.apache.org/hcatalog/">HCatalog</a>.
	Version 0.5.0 or higher. The hcat executable must be both in the
	PATH and properly configured in the <a
	href="configuration.html">Templeton configuration</a>.
	</li>
	<li>
	Permissions must be given to the user running the server.
	(see below)
	</li>
	<li>
	If running a secure cluster, Kerberos keys and principals must be
	created. (see below)
	</li>
	<li><a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop Distributed Cache</a>.
	To use the <a href="http://hive.apache.org/">Hive</a>,
	<a href="http://pig.apache.org/">Pig</a>, or
	<a href="http://hadoop.apache.org/common/docs/r1.0.0/streaming.html">hadoop/streaming</a>
	resources, see instructions below for placing the required files in the
	Hadoop Distributed Cache.</li>
	</ul>
	</section>

	<section>
	<title>Hadoop Distributed Cache</title>

	<p>The server requires some files be accessible on the
	<a href="http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#DistributedCache">
	Hadoop distributed cache</a>. For example, to avoid the installation of Pig and Hive
	everywhere on the cluster, the server gathers a version of Pig or Hive from the
	Hadoop distributed cache whenever those resources are invoked. After placing the
	following components into HDFS please update the site configuration as required for
	each.</p>

	<ul>
	<li><strong>Hive</strong>:
	<a href="http://www.apache.org/dyn/closer.cgi/hive/">Download</a>
	the hive tar.gz file and place it in HDFS.
	<source>
	hadoop fs -put /tmp/hive-0.10.0.tar.gz /apps/templeton/hive-0.10.0.tar.gz
	</source>
	</li>

	<li><strong>Pig</strong>:
	<a href="http://www.apache.org/dyn/closer.cgi/pig">Download</a>
	the Pig tar.gz file and place it into HDFS. For example:
	<source>
	hadoop fs -put /tmp/pig-0.10.1.tar.gz /apps/templeton/pig-0.10.1.tar.gz
	</source>
	</li>

	<li><strong>Hadoop Streaming</strong>:
	Place <code>hadoop-streaming.jar</code> into HDFS. For example, use the
	following command, substituting your path to the jar for the one below.
	<source>
	hadoop fs -put $HADOOP_PREFIX/hadoop-0.20.205.0/contrib/streaming/hadoop-streaming-0.20.205.0.jar \
	/apps/templeton/hadoop-streaming.jar
	</source>
	</li>

	<li><strong>Override Jars</strong>:
	Place override jars required (if any) into HDFS. <em>Note</em>: Hadoop versions
	prior to 1.0.3 require a patch (HADOOP-7987) to properly run Templeton. This patch is
	distributed with Templeton (located at
	<code>templeton/src/hadoop_temp_fix/ugi.jar</code>)
	and should be placed into HDFS, as reflected in the current default configuration.
	<source>
	hadoop fs -put ugi.jar /apps/templeton/ugi.jar
	</source>
	</li>
	</ul>

	<p>The location of these files in the cache, and the location
	of the installations inside the archives, can be specified using the following
	Templeton configuration variables. (See the
	<a href="configuration.html">Configuration</a> documentation for more information
	on changing Templeton configuration parameters.)</p>

	<table>
	<tr><th>Name</th><th>Default</th><th>Description</th></tr>

	<tr>
	<td><strong>templeton.pig.archive</strong></td>
	<td><code>hdfs:///apps/templeton/pig-0.10.1.tar.gz</code></td>
	<td>The path to the Pig archive.</td>
	</tr>

	<tr>
	<td><strong>templeton.pig.path</strong></td>
	<td><code>pig-0.10.1.tar.gz/pig-0.10.1/bin/pig</code></td>
	<td>The path to the Pig executable.</td>
	</tr>

	<tr>
	<td><strong>templeton.hive.archive</strong></td>
	<td><code>hdfs:///apps/templeton/hive-0.10.0.tar.gz</code></td>
	<td>The path to the Hive archive.</td>
	</tr>

	<tr>
	<td><strong>templeton.hive.path</strong></td>
	<td><code>hive-0.10.0.tar.gz/hive-0.10.0/bin/hive</code></td>
	<td>The path to the Hive executable.</td>
	</tr>

	<tr>
	<td><strong>templeton.streaming.jar</strong></td>
	<td><code>hdfs:///apps/templeton/hadoop-streaming.jar</code></td>
	<td>The path to the Hadoop streaming jar file.</td>
	</tr>

	<tr>
	<td><strong>templeton.override.jars</strong></td>
	<td><code>hdfs:///apps/templeton/ugi.jar</code></td>
	<td>Jars to add to the HADOOP_CLASSPATH for all Map Reduce jobs.
	These jars must exist on HDFS.</td>
	</tr>
	</table>
	</section>

	<section>
	<title>Permissions</title>
	<p>
	Permission must given for the user running the templeton
	executable to run jobs for other users. That is, the templeton
	server will impersonate users on the Hadoop cluster.
	</p>

	<p>
	Create (or assign) a Unix user who will run the Templeton server.
	Call this USER. See the Secure Cluster section below for choosing
	a user on a Kerberos cluster.
	</p>

	<p>
	Modify the Hadoop core-site.xml file and set these properties:
	</p>

	<table>
	<tr><th>Variable</th><th>Value</th></tr>
	<tr>
	<td>hadoop.proxyuser.USER.groups</td>
	<td>
	A comma separated list of the Unix groups whose users will be
	impersonated.
	</td>
	</tr>
	<tr>
	<td>hadoop.proxyuser.USER.hosts</td>
	<td>
	A comma separated list of the hosts that will run the hcat and
	JobTracker servers.
	</td>
	</tr>
	</table>
	</section>

	<section>
	<title>Secure Cluster</title>
	<p>
	To run Templeton on a secure cluster follow the Permissions
	instructions above but create a Kerberos principal for the
	Templeton server with the name <code>USER/host@realm</code>
	</p>

	<p>
	Also, set the templeton configuration variables
	<code>templeton.kerberos.principal</code> and
	<code>templeton.kerberos.keytab</code>
	</p>
	</section>

	</body>
	</document>