| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="install"> |
| |
| <title><ph audience="standalone">Installing Impala</ph><ph audience="integrated">Impala Installation</ph></title> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Installing"/> |
| <data name="Category" value="Administrators"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p> |
| <indexterm audience="hidden">installation</indexterm> |
| <indexterm audience="hidden">pseudo-distributed cluster</indexterm> |
| <indexterm audience="hidden">cluster</indexterm> |
| <indexterm audience="hidden">DataNodes</indexterm> |
| <indexterm audience="hidden">NameNode</indexterm> |
| <indexterm audience="hidden">impalad</indexterm> |
| <indexterm audience="hidden">impala-shell</indexterm> |
| <indexterm audience="hidden">statestored</indexterm> |
| Impala is an open-source analytic database for Apache Hadoop |
| that returns rapid responses to queries. |
| </p> |
| |
| <p> |
| Follow these steps to set up Impala on a cluster by building from source: |
| </p> |
| |
| <!-- Steps adapted from http://impala.apache.org/downloads.html --> |
| |
| <ul> |
| <li> |
| <p> |
| Download the latest release. See |
| <xref href="http://impala.apache.org/downloads.html" scope="external" format="html">the Impala downloads page</xref> |
| for the link to the latest release. |
| </p> |
| </li> |
| <li> |
| <p> |
| Check the <filepath>README.md</filepath> file for a pointer |
| to the build instructions. |
| </p> |
| </li> |
| <li> |
| <p> |
| Please check the MD5 and SHA1 and GPG signature, the latter by using the code signing keys of the release managers. |
| </p> |
| </li> |
| <li> |
| <p> |
| Developers interested in working on Impala can clone the Impala source repository: |
| <codeblock> |
| git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git |
| </codeblock> |
| </p> |
| </li> |
| </ul> |
| |
| </conbody> |
| |
| <concept id="install_details"> |
| |
| <title>What is Included in an Impala Installation</title> |
| |
| <conbody> |
| |
| <p> |
| Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster. |
| The key installation step for performance is to install the <cmdname>impalad</cmdname> daemon (which does |
| most of the query processing work) on <i>all</i> DataNodes in the cluster. |
| </p> |
| |
| <p> |
| The Impala package installs these binaries: |
| </p> |
| |
| <ul> |
| <li> |
| <p> |
| <cmdname>impalad</cmdname> - The Impala daemon. Plans and executes queries against HDFS, HBase, <ph rev="2.2.0">and Amazon S3 data</ph>. |
| <xref href="impala_processes.xml#processes">Run one impalad process</xref> on each node in the cluster |
| that has a DataNode. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| <cmdname>statestored</cmdname> - Name service that tracks location and status of all |
| <codeph>impalad</codeph> instances in the cluster. <xref href="impala_processes.xml#processes">Run one |
| instance of this daemon</xref> on a node in your cluster. Most production deployments run this daemon |
| on the namenode. |
| </p> |
| </li> |
| |
| <li rev="1.2"> |
| <p> |
| <cmdname>catalogd</cmdname> - Metadata coordination service that broadcasts changes from Impala DDL and |
| DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are |
| immediately visible to queries submitted through any Impala node. |
| <!-- Consider removing this when 1.2 gets far in the past. --> |
| (Prior to Impala 1.2, you had to run the <codeph>REFRESH</codeph> or <codeph>INVALIDATE |
| METADATA</codeph> statement on each node to synchronize changed metadata. Now those statements are only |
| required if you perform the DDL or DML through an external mechanism such as Hive <ph rev="2.2.0">or by uploading |
| data to the Amazon S3 filesystem</ph>.) |
| <xref href="impala_processes.xml#processes">Run one instance of this daemon</xref> on a node in your cluster, |
| preferably on the same host as the <codeph>statestored</codeph> daemon. |
| </p> |
| </li> |
| |
| <li> |
| <p> |
| <cmdname>impala-shell</cmdname> - <xref href="impala_impala_shell.xml#impala_shell">Command-line |
| interface</xref> for issuing queries to the Impala daemon. You install this on one or more hosts |
| anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can |
| connect remotely to any instance of the Impala daemon. |
| </p> |
| </li> |
| </ul> |
| |
| <p> |
| Before doing the installation, ensure that you have all necessary prerequisites. See |
| <xref href="impala_prereqs.xml#prereqs"/> for details. |
| </p> |
| </conbody> |
| </concept> |
| |
| </concept> |