blob: f6a3c17380968c550b3639733654ab0d4288b37b [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="connecting">
<title>Connecting to Impala Daemon from impala-shell</title>
<titlealts audience="PDF"><navtitle>Connecting to impalad</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="impala-shell"/>
<data name="Category" value="Network"/>
<data name="Category" value="DataNode"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p> Within an <cmdname>impala-shell</cmdname> session, you can only issue
queries while connected to an instance of the <cmdname>impalad</cmdname>
daemon. You can specify the connection information: <ul>
<li> Through command-line options when you run the
<cmdname>impala-shell</cmdname> command. </li>
<li> Through a configuration file that is read when you run the
<cmdname>impala-shell</cmdname> command. </li>
<li> During an <cmdname>impala-shell</cmdname> session, by issuing a
<codeph>CONNECT</codeph> command. </li>
</ul><note>You cannot connect to the 3.2 or earlier versions of Impala
using the <codeph>'hs2'</codeph> or <codeph>'hs2-http'</codeph> protocol
(<codeph>--protocol</codeph> option).</note> See <xref
href="impala_shell_options.xml"/> for the command-line and configuration
file options you can use. </p>
<p> You can connect to any Impala daemon (<cmdname>impalad</cmdname>), and
that daemon coordinates the execution of all queries sent to it. </p>
<p>
For simplicity during development, you might always connect to the same host, perhaps running <cmdname>impala-shell</cmdname> on
the same host as <cmdname>impalad</cmdname> and specifying the hostname as <codeph>localhost</codeph>.
</p>
<p> In a production environment, you might enable load balancing, in which
you connect to specific host/port combination but queries are forwarded to
arbitrary hosts. This technique spreads the overhead of acting as the
coordinator node among all the Impala daemons in the cluster. See <xref
href="impala_proxy.xml"/> for details. </p>
<p>
<b>To connect to an Impala during shell startup:</b>
</p>
<ol>
<li> Locate the hostname that is running an instance of the
<cmdname>impalad</cmdname> daemon. If that <cmdname>impalad</cmdname>
uses a non-default port (something other than port 21000) for
<cmdname>impala-shell</cmdname> connections, find out the port number
also. </li>
<li> Use the <codeph>-i</codeph> option to the
<cmdname>impala-shell</cmdname> interpreter to specify the connection
information for that instance of <cmdname>impalad</cmdname>:
<codeblock># When you are connecting to an impalad running on the same machine.
# The prompt will reflect the current hostname.
$ impala-shell
# When you are connecting to an impalad running on a remote machine, and impalad is listening
# on a non-default port over the HTTP HiveServer2 protocol.
$ impala-shell -i <varname>some.other.hostname</varname>:<varname>port_number</varname> --protocol='hs2-http'
# When you are connecting to an impalad running on a remote machine, and impalad is listening
# on a non-default port.
$ impala-shell -i <varname>some.other.hostname</varname>:<varname>port_number</varname>
</codeblock>
</li>
</ol>
<p>
<b>To connect to an Impala in the<cmdname>impala-shell</cmdname>
session:</b>
</p>
<ol>
<li> Start the Impala shell with no connection:
<codeblock>$ impala-shell</codeblock></li>
<li> Locate the hostname that is running the <cmdname>impalad</cmdname>
daemon. If that <cmdname>impalad</cmdname> uses a non-default port
(something other than port 21000) for <cmdname>impala-shell</cmdname>
connections, find out the port number also. </li>
<li> Use the <codeph>connect</codeph> command to connect to an Impala
instance. Enter a command of the form:
<codeblock>[Not connected] &gt; connect <varname>impalad-host</varname></codeblock><note>
Replace <varname>impalad-host</varname> with the hostname you have
configured to run Impala in your environment. The changed prompt
indicates a successful connection. </note>
</li>
</ol>
<p>
<b>To start <cmdname>impala-shell</cmdname> in a specific database:</b>
</p>
<p> You can use all the same connection options as in previous examples. For
simplicity, these examples assume that you are logged into one of the
Impala daemons. </p>
<ol>
<li>
Find the name of the database containing the relevant tables, views, and so
on that you want to operate on.
</li>
<li>
Use the <codeph>-d</codeph> option to the
<cmdname>impala-shell</cmdname> interpreter to connect and immediately
switch to the specified database, without the need for a <codeph>USE</codeph>
statement or fully qualified names:
<codeblock># Subsequent queries with unqualified names operate on
# tables, views, and so on inside the database named 'staging'.
$ impala-shell -i localhost -d staging
# It is common during development, ETL, benchmarking, and so on
# to have different databases containing the same table names
# but with different contents or layouts.
$ impala-shell -i localhost -d parquet_snappy_compression
$ impala-shell -i localhost -d parquet_gzip_compression
</codeblock>
</li>
</ol>
<p>
<b>To run one or several statements in non-interactive mode:</b>
</p>
<p> You can use all the same connection options as in previous examples. For
simplicity, these examples assume that you are logged into one of the
Impala daemons. </p>
<ol>
<li>
Construct a statement, or a file containing a sequence of statements,
that you want to run in an automated way, without typing or copying
and pasting each time.
</li>
<li>
Invoke <cmdname>impala-shell</cmdname> with the <codeph>-q</codeph> option to run a single statement, or
the <codeph>-f</codeph> option to run a sequence of statements from a file.
The <cmdname>impala-shell</cmdname> command returns immediately, without going into
the interactive interpreter.
<codeblock># A utility command that you might run while developing shell scripts
# to manipulate HDFS files.
$ impala-shell -i localhost -d database_of_interest -q 'show tables'
# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
# can go into a file to make the setup process repeatable.
$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
</codeblock>
</li>
</ol>
</conbody>
</concept>