blob: 7ef6612046df3c9cac28fde5f7fe828c7f80599a [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="config_options">
<title>Modifying Impala Startup Options</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
</metadata>
</prolog>
<conbody>
<p>
The configuration options for the Impala daemons let you choose which hosts and ports to
use for the services that run on a single host, specify directories for logging, control
resource usage and security, and specify other aspects of the Impala software.
</p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="config_options_noncm">
<title>Configuring Impala Startup Options through the Command Line</title>
<conbody>
<p>
The Impala server, <codeph>statestore</codeph>, and catalog services start up using
values provided in a defaults file, <filepath>/etc/default/impala</filepath>.
</p>
<p>
This file includes information about many resources used by Impala. Most of the defaults
included in this file should be effective in most cases. For example, typically you
would not change the definition of the <codeph>CLASSPATH</codeph> variable, but you
would always set the address used by the <codeph>statestore</codeph> server. Some of the
content you might modify includes:
</p>
<!-- Note: Update the following example for each release with the associated lines from /etc/default/impala. -->
<codeblock rev="ver">IMPALA_STATE_STORE_HOST=127.0.0.1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_SERVICE_HOST=...
IMPALA_STATE_STORE_HOST=...
export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}"
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
<p>
To use alternate values, edit the defaults file, then restart all the Impala-related
services so that the changes take effect. Restart the Impala server using the following
commands:
</p>
<codeblock>$ sudo service impala-server restart
Stopping Impala Server: [ OK ]
Starting Impala Server: [ OK ]</codeblock>
<p>
Restart the Impala StateStore using the following commands:
</p>
<codeblock>$ sudo service impala-state-store restart
Stopping Impala State Store Server: [ OK ]
Starting Impala State Store Server: [ OK ]</codeblock>
<p>
Restart the Impala Catalog Service using the following commands:
</p>
<codeblock>$ sudo service impala-catalog restart
Stopping Impala Catalog Server: [ OK ]
Starting Impala Catalog Server: [ OK ]</codeblock>
<p>
Some common settings to change include:
</p>
<ul>
<li>
<p>
StateStore address. Where practical, put the <codeph>statestored</codeph> on a
separate host not running the <cmdname>impalad</cmdname> daemon. In that recommended
configuration, the <cmdname>impalad</cmdname> daemon cannot refer to the
<codeph>statestored</codeph> server using the loopback address. If the
<codeph>statestored</codeph> is hosted on a machine with an IP address of
192.168.0.27, change:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=127.0.0.1</codeblock>
<p>
to:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=192.168.0.27</codeblock>
</li>
<li rev="1.2">
<p>
Catalog server address (including both the hostname and the port number). Update the
value of the <codeph>IMPALA_CATALOG_SERVICE_HOST</codeph> variable. Where practical,
run the catalog server on the same host as the <codeph>statestore</codeph>. In that
recommended configuration, the <cmdname>impalad</cmdname> daemon cannot refer to the
catalog server using the loopback address. If the catalog service is hosted on a
machine with an IP address of 192.168.0.27, add the following line:
</p>
<codeblock>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</codeblock>
<p>
The <filepath>/etc/default/impala</filepath> defaults file currently does not define
an <codeph>IMPALA_CATALOG_ARGS</codeph> environment variable, but if you add one it
will be recognized by the service startup/shutdown script. Add a definition for this
variable to <filepath>/etc/default/impala</filepath> and add the option
<codeph>&#8209;&#8209;catalog_service_host=<varname>hostname</varname></codeph>. If
the port is different than the default 26000, also add the option
<codeph>&#8209;&#8209;catalog_service_port=<varname>port</varname></codeph>.
</p>
</li>
<li id="mem_limit">
<p>
Memory limits. You can limit the amount of memory available to Impala. For example,
to allow Impala to use no more than 70% of system memory, change:
</p>
<!-- Note: also needs to be updated for each release to reflect latest /etc/default/impala. -->
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}}</codeblock>
<p>
to:
</p>
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</codeblock>
<p>
You can specify the memory limit using absolute notation such as
<codeph>500m</codeph> or <codeph>2G</codeph>, or as a percentage of physical memory
such as <codeph>60%</codeph>.
</p>
<note>
Queries that exceed the specified memory limit are aborted. Percentage limits are
based on the physical memory of the machine and do not consider cgroups.
</note>
</li>
<li>
<p>
Core dump enablement. To enable core dumps, change:
</p>
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
<p>
to:
</p>
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</codeblock>
<note
conref="../shared/impala_common.xml#common/core_dump_considerations"/>
</li>
<li>
<p>
Authorization. Specify the
<codeph>&#8209;&#8209;server_name</codeph> option as part of the
<codeph>IMPALA_SERVER_ARGS</codeph> and
<codeph>IMPALA_CATALOG_ARGS</codeph> settings to enable the core
Impala support for authorization. See <xref
href="impala_authorization.xml#secure_startup"/> for details.
</p>
</li>
<li>
<p>
Auditing for successful or blocked Impala queries, another aspect of security.
Specify the
<codeph>&#8209;&#8209;audit_event_log_dir=<varname>directory_path</varname></codeph>
option and optionally the
<codeph>&#8209;&#8209;max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
and <codeph>&#8209;&#8209;abort_on_failed_audit_event</codeph> options as part of
the <codeph>IMPALA_SERVER_ARGS</codeph> settings, for each Impala node, to enable
and customize auditing. See
<xref
href="impala_auditing.xml#auditing"/> for details.
</p>
</li>
<li>
<p>
Password protection for the Impala web UI, which listens on port 25000 by default.
This feature involves adding some or all of the
<codeph>&#8209;&#8209;webserver_password_file</codeph>,
<codeph>&#8209;&#8209;webserver_authentication_domain</codeph>, and
<codeph>&#8209;&#8209;webserver_certificate_file</codeph> options to the
<codeph>IMPALA_SERVER_ARGS</codeph> and <codeph>IMPALA_STATE_STORE_ARGS</codeph>
settings. See
<xref
href="impala_security_guidelines.xml#security_guidelines"/> for
details.
</p>
</li>
<li id="default_query_options">
<p rev="DOCS-677">
Another setting you might add to <codeph>IMPALA_SERVER_ARGS</codeph> is a
comma-separated list of query options and values:
<codeblock>&#8209;&#8209;default_query_options='<varname>option</varname>=<varname>value</varname>,<varname>option</varname>=<varname>value</varname>,...'
</codeblock>
These options control the behavior of queries performed by this
<cmdname>impalad</cmdname> instance. The option values you specify here override the
default values for
<xref
href="impala_query_options.xml#query_options">Impala query
options</xref>, as shown by the <codeph>SET</codeph> statement in
<cmdname>impala-shell</cmdname>.
</p>
</li>
<li>
<p>
During troubleshooting, <keyword keyref="support_org"/> might direct you to change
other values, particularly for <codeph>IMPALA_SERVER_ARGS</codeph>, to work around
issues or gather debugging information.
</p>
</li>
</ul>
<note>
<p>
These startup options for the <cmdname>impalad</cmdname> daemon are different from the
command-line options for the <cmdname>impala-shell</cmdname> command. For the
<cmdname>impala-shell</cmdname> options, see
<xref href="impala_shell_options.xml#shell_options"/>.
</p>
</note>
<p outputclass="toc inpage"/>
</conbody>
</concept>
<concept id="config_options_checking">
<title>Checking the Values of Impala Configuration Options</title>
<conbody>
<p>
You can check the current runtime value of all these settings through the Impala web
interface, available by default at
<codeph>http://<varname>impala_hostname</varname>:25000/varz</codeph> for the
<cmdname>impalad</cmdname> daemon,
<codeph>http://<varname>impala_hostname</varname>:25010/varz</codeph> for the
<cmdname>statestored</cmdname> daemon, or
<codeph>http://<varname>impala_hostname</varname>:25020/varz</codeph> for the
<cmdname>catalogd</cmdname> daemon.
</p>
</conbody>
</concept>
<concept rev="1.2" id="config_options_catalogd">
<title>Startup Options for catalogd Daemon</title>
<conbody>
<p>
The <cmdname>catalogd</cmdname> daemon implements the Impala Catalog service, which
broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts
data, or performs other kinds of DDL and DML operations.
</p>
<p conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
</conbody>
</concept>
</concept>