| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> |
| |
| <document> |
| <header> |
| <title>Installation </title> |
| </header> |
| <body> |
| |
| <section> |
| <title>Procedure</title> |
| <ol> |
| <li>Ensure that the |
| <a href="rest_server_install.html#Requirements">required related installations</a> |
| are in place, and place required files into the |
| <a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop distributed cache.</a></li> |
| <li>Download and unpack the HCatalog distribution.</li> |
| <li>Set the <code>TEMPLETON_HOME</code> environment variable to the base of the |
| HCatalog REST server installation. This will usually be same as |
| <code>HCATALOG_HOME</code>. |
| This is used to find the Templeton |
| configuration.</li> |
| <li>Set JAVA_HOME,HADOOP_PREFIX, and HIVE_HOME environment variables</li> |
| <li>Review the <a href="configuration.html">configuration</a> |
| and update or create <code>webhcat-site.xml</code> as required. Ensure that |
| site specific component installation locations are accurate, especially |
| the Hadoop configuration path. Configuration variables that use a filesystem |
| path try to have reasonable defaults, but it's always safe to specify a full |
| and complete path.</li> |
| <li>Verify that HCatalog is installed and that the <code>hcat</code> |
| executable is in the PATH.</li> |
| <li>Build HCatalog using the command <code>ant jar</code> from the |
| top level HCatalog directory.</li> |
| <li>Start the REST server with the command |
| <code>sbin/webhcat_server.sh start</code>.</li> |
| <li>Check that your local install works. Assuming that the server is running |
| on port 50111, the following command would give output similar to that shown. |
| <source> |
| % curl -i http://localhost:50111/templeton/v1/status |
| HTTP/1.1 200 OK |
| Content-Type: application/json |
| Transfer-Encoding: chunked |
| Server: Jetty(7.6.0.v20120127) |
| |
| {"status":"ok","version":"v1"} |
| % |
| </source></li> |
| </ol> |
| </section> |
| |
| <section> |
| <title>Server Commands</title> |
| <ul> |
| <li><strong>Start the server:</strong> <code>sbin/webhcat_server.sh start</code></li> |
| <li><strong>Stop the server:</strong> <code>sbin/webhcat_server.sh stop</code></li> |
| <li><strong>End-to-end build, run, test:</strong> <code>ant e2e</code></li> |
| </ul> |
| </section> |
| |
| <section> |
| <title>Requirements</title> |
| <ul> |
| <li><a href="http://ant.apache.org/">Ant</a>, version 1.8 or higher</li> |
| <li><a href="http://hadoop.apache.org/">Hadoop</a>, version 1.0.3 or higher</li> |
| <li><a href="http://zookeeper.apache.org/">ZooKeeper</a> is required if you are using |
| the ZooKeeper storage class. (Be sure to review |
| and update the ZooKeeper related <a href="configuration.html">Templeton |
| configuration</a>.)</li> |
| <li> |
| <a href="http://incubator.apache.org/hcatalog/">HCatalog</a>. |
| Version 0.5.0 or higher. The hcat executable must be both in the |
| PATH and properly configured in the <a |
| href="configuration.html">Templeton configuration</a>. |
| </li> |
| <li> |
| Permissions must be given to the user running the server. |
| (see below) |
| </li> |
| <li> |
| If running a secure cluster, Kerberos keys and principals must be |
| created. (see below) |
| </li> |
| <li><a href="rest_server_install.html#Hadoop+Distributed+Cache">Hadoop Distributed Cache</a>. |
| To use the <a href="http://hive.apache.org/">Hive</a>, |
| <a href="http://pig.apache.org/">Pig</a>, or |
| <a href="http://hadoop.apache.org/common/docs/r1.0.0/streaming.html">hadoop/streaming</a> |
| resources, see instructions below for placing the required files in the |
| Hadoop Distributed Cache.</li> |
| </ul> |
| </section> |
| |
| <section> |
| <title>Hadoop Distributed Cache</title> |
| |
| <p>The server requires some files be accessible on the |
| <a href="http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#DistributedCache"> |
| Hadoop distributed cache</a>. For example, to avoid the installation of Pig and Hive |
| everywhere on the cluster, the server gathers a version of Pig or Hive from the |
| Hadoop distributed cache whenever those resources are invoked. After placing the |
| following components into HDFS please update the site configuration as required for |
| each.</p> |
| |
| <ul> |
| <li><strong>Hive</strong>: |
| <a href="http://www.apache.org/dyn/closer.cgi/hive/">Download</a> |
| the hive tar.gz file and place it in HDFS. |
| <source> |
| hadoop fs -put /tmp/hive-0.10.0.tar.gz /apps/templeton/hive-0.10.0.tar.gz |
| </source> |
| </li> |
| |
| <li><strong>Pig</strong>: |
| <a href="http://www.apache.org/dyn/closer.cgi/pig">Download</a> |
| the Pig tar.gz file and place it into HDFS. For example: |
| <source> |
| hadoop fs -put /tmp/pig-0.10.1.tar.gz /apps/templeton/pig-0.10.1.tar.gz |
| </source> |
| </li> |
| |
| <li><strong>Hadoop Streaming</strong>: |
| Place <code>hadoop-streaming.jar</code> into HDFS. For example, use the |
| following command, substituting your path to the jar for the one below. |
| <source> |
| hadoop fs -put $HADOOP_PREFIX/hadoop-0.20.205.0/contrib/streaming/hadoop-streaming-0.20.205.0.jar \ |
| /apps/templeton/hadoop-streaming.jar |
| </source> |
| </li> |
| |
| <li><strong>Override Jars</strong>: |
| Place override jars required (if any) into HDFS. <em>Note</em>: Hadoop versions |
| prior to 1.0.3 require a patch (HADOOP-7987) to properly run Templeton. This patch is |
| distributed with Templeton (located at |
| <code>templeton/src/hadoop_temp_fix/ugi.jar</code>) |
| and should be placed into HDFS, as reflected in the current default configuration. |
| <source> |
| hadoop fs -put ugi.jar /apps/templeton/ugi.jar |
| </source> |
| </li> |
| </ul> |
| |
| <p>The location of these files in the cache, and the location |
| of the installations inside the archives, can be specified using the following |
| Templeton configuration variables. (See the |
| <a href="configuration.html">Configuration</a> documentation for more information |
| on changing Templeton configuration parameters.)</p> |
| |
| <table> |
| <tr><th>Name</th><th>Default</th><th>Description</th></tr> |
| |
| <tr> |
| <td><strong>templeton.pig.archive</strong></td> |
| <td><code>hdfs:///apps/templeton/pig-0.10.1.tar.gz</code></td> |
| <td>The path to the Pig archive.</td> |
| </tr> |
| |
| <tr> |
| <td><strong>templeton.pig.path</strong></td> |
| <td><code>pig-0.10.1.tar.gz/pig-0.10.1/bin/pig</code></td> |
| <td>The path to the Pig executable.</td> |
| </tr> |
| |
| <tr> |
| <td><strong>templeton.hive.archive</strong></td> |
| <td><code>hdfs:///apps/templeton/hive-0.10.0.tar.gz</code></td> |
| <td>The path to the Hive archive.</td> |
| </tr> |
| |
| <tr> |
| <td><strong>templeton.hive.path</strong></td> |
| <td><code>hive-0.10.0.tar.gz/hive-0.10.0/bin/hive</code></td> |
| <td>The path to the Hive executable.</td> |
| </tr> |
| |
| <tr> |
| <td><strong>templeton.streaming.jar</strong></td> |
| <td><code>hdfs:///apps/templeton/hadoop-streaming.jar</code></td> |
| <td>The path to the Hadoop streaming jar file.</td> |
| </tr> |
| |
| <tr> |
| <td><strong>templeton.override.jars</strong></td> |
| <td><code>hdfs:///apps/templeton/ugi.jar</code></td> |
| <td>Jars to add to the HADOOP_CLASSPATH for all Map Reduce jobs. |
| These jars must exist on HDFS.</td> |
| </tr> |
| </table> |
| </section> |
| |
| <section> |
| <title>Permissions</title> |
| <p> |
| Permission must given for the user running the templeton |
| executable to run jobs for other users. That is, the templeton |
| server will impersonate users on the Hadoop cluster. |
| </p> |
| |
| <p> |
| Create (or assign) a Unix user who will run the Templeton server. |
| Call this USER. See the Secure Cluster section below for choosing |
| a user on a Kerberos cluster. |
| </p> |
| |
| <p> |
| Modify the Hadoop core-site.xml file and set these properties: |
| </p> |
| |
| <table> |
| <tr><th>Variable</th><th>Value</th></tr> |
| <tr> |
| <td>hadoop.proxyuser.USER.groups</td> |
| <td> |
| A comma separated list of the Unix groups whose users will be |
| impersonated. |
| </td> |
| </tr> |
| <tr> |
| <td>hadoop.proxyuser.USER.hosts</td> |
| <td> |
| A comma separated list of the hosts that will run the hcat and |
| JobTracker servers. |
| </td> |
| </tr> |
| </table> |
| </section> |
| |
| <section> |
| <title>Secure Cluster</title> |
| <p> |
| To run Templeton on a secure cluster follow the Permissions |
| instructions above but create a Kerberos principal for the |
| Templeton server with the name <code>USER/host@realm</code> |
| </p> |
| |
| <p> |
| Also, set the templeton configuration variables |
| <code>templeton.kerberos.principal</code> and |
| <code>templeton.kerberos.keytab</code> |
| </p> |
| </section> |
| |
| </body> |
| </document> |