blob: 72f63ae8fa08edb2bf0098854eff9553c0611751 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="config_options">
<title>Modifying Impala Startup Options</title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Configuring"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
</metadata>
</prolog>
<conbody>
<p>
<indexterm audience="hidden">defaults file</indexterm>
<indexterm audience="hidden">configuration file</indexterm>
<indexterm audience="hidden">options</indexterm>
<indexterm audience="hidden">IMPALA_STATE_STORE_PORT</indexterm>
<indexterm audience="hidden">IMPALA_BACKEND_PORT</indexterm>
<indexterm audience="hidden">IMPALA_LOG_DIR</indexterm>
<indexterm audience="hidden">IMPALA_STATE_STORE_ARGS</indexterm>
<indexterm audience="hidden">IMPALA_SERVER_ARGS</indexterm>
<indexterm audience="hidden">ENABLE_CORE_DUMPS</indexterm>
<indexterm audience="hidden">core dumps</indexterm>
<indexterm audience="hidden">restarting services</indexterm>
<indexterm audience="hidden">services</indexterm>
The configuration options for the Impala-related daemons let you choose which hosts and
ports to use for the services that run on a single host, specify directories for logging,
control resource usage and security, and specify other aspects of the Impala software.
</p>
<p outputclass="toc inpage"/>
</conbody>
<concept id="config_options_noncm">
<title>Configuring Impala Startup Options through the Command Line</title>
<conbody>
<p> The Impala server, statestore, and catalog services start up using values provided in a
defaults file, <filepath>/etc/default/impala</filepath>. </p>
<p>
This file includes information about many resources used by Impala. Most of the defaults
included in this file should be effective in most cases. For example, typically you
would not change the definition of the <codeph>CLASSPATH</codeph> variable, but you
would always set the address used by the statestore server. Some of the content you
might modify includes:
</p>
<!-- Note: Update the following example for each release with the associated lines from /etc/default/impala. -->
<codeblock rev="ver">IMPALA_STATE_STORE_HOST=127.0.0.1
IMPALA_STATE_STORE_PORT=24000
IMPALA_BACKEND_PORT=22000
IMPALA_LOG_DIR=/var/log/impala
IMPALA_CATALOG_SERVICE_HOST=...
IMPALA_STATE_STORE_HOST=...
export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
IMPALA_SERVER_ARGS=" \
-log_dir=${IMPALA_LOG_DIR} \
-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}"
export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
<p>
To use alternate values, edit the defaults file, then restart all the Impala-related
services so that the changes take effect. Restart the Impala server using the following
commands:
</p>
<codeblock>$ sudo service impala-server restart
Stopping Impala Server: [ OK ]
Starting Impala Server: [ OK ]</codeblock>
<p>
Restart the Impala statestore using the following commands:
</p>
<codeblock>$ sudo service impala-state-store restart
Stopping Impala State Store Server: [ OK ]
Starting Impala State Store Server: [ OK ]</codeblock>
<p>
Restart the Impala catalog service using the following commands:
</p>
<codeblock>$ sudo service impala-catalog restart
Stopping Impala Catalog Server: [ OK ]
Starting Impala Catalog Server: [ OK ]</codeblock>
<p>
Some common settings to change include:
</p>
<ul>
<li>
<p>
Statestore address. Where practical, put the statestore on a separate host not
running the <cmdname>impalad</cmdname> daemon. In that recommended configuration,
the <cmdname>impalad</cmdname> daemon cannot refer to the statestore server using
the loopback address. If the statestore is hosted on a machine with an IP address of
192.168.0.27, change:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=127.0.0.1</codeblock>
<p>
to:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=192.168.0.27</codeblock>
</li>
<li rev="1.2">
<p>
Catalog server address (including both the hostname and the port number). Update the
value of the <codeph>IMPALA_CATALOG_SERVICE_HOST</codeph> variable. Where
practical, run the catalog server on the same host as the statestore. In that
recommended configuration, the <cmdname>impalad</cmdname> daemon cannot refer to the
catalog server using the loopback address. If the catalog service is hosted on a
machine with an IP address of 192.168.0.27, add the following line:
</p>
<codeblock>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</codeblock>
<p>
The <filepath>/etc/default/impala</filepath> defaults file currently does not define
an <codeph>IMPALA_CATALOG_ARGS</codeph> environment variable, but if you add one it
will be recognized by the service startup/shutdown script. Add a definition for this
variable to <filepath>/etc/default/impala</filepath> and add the option
<codeph>-catalog_service_host=<varname>hostname</varname></codeph>. If the port is
different than the default 26000, also add the option
<codeph>-catalog_service_port=<varname>port</varname></codeph>.
</p>
</li>
<li id="mem_limit">
<p>
Memory limits. You can limit the amount of memory available to Impala. For example,
to allow Impala to use no more than 70% of system memory, change:
</p>
<!-- Note: also needs to be updated for each release to reflect latest /etc/default/impala. -->
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}}</codeblock>
<p>
to:
</p>
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</codeblock>
<p>
You can specify the memory limit using absolute notation such as
<codeph>500m</codeph> or <codeph>2G</codeph>, or as a percentage of physical memory
such as <codeph>60%</codeph>.
</p>
<note>
Queries that exceed the specified memory limit are aborted. Percentage limits are
based on the physical memory of the machine and do not consider cgroups.
</note>
</li>
<li>
<p> Core dump enablement. To enable core dumps, change: </p>
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
<p>
to:
</p>
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</codeblock>
<note conref="../shared/impala_common.xml#common/core_dump_considerations"/>
</li>
<li>
<p>
Authorization using the open source Sentry plugin. Specify the
<codeph>-server_name</codeph> and <codeph>-authorization_policy_file</codeph>
options as part of the <codeph>IMPALA_SERVER_ARGS</codeph> and
<codeph>IMPALA_STATE_STORE_ARGS</codeph> settings to enable the core Impala support
for authentication. See <xref href="impala_authorization.xml#secure_startup"/> for
details.
</p>
</li>
<li>
<p>
Auditing for successful or blocked Impala queries, another aspect of security.
Specify the <codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
option and optionally the
<codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
and <codeph>-abort_on_failed_audit_event</codeph> options as part of the
<codeph>IMPALA_SERVER_ARGS</codeph> settings, for each Impala node, to enable and
customize auditing. See <xref href="impala_auditing.xml#auditing"/> for details.
</p>
</li>
<li>
<p>
Password protection for the Impala web UI, which listens on port 25000 by default.
This feature involves adding some or all of the
<codeph>--webserver_password_file</codeph>,
<codeph>--webserver_authentication_domain</codeph>, and
<codeph>--webserver_certificate_file</codeph> options to the
<codeph>IMPALA_SERVER_ARGS</codeph> and <codeph>IMPALA_STATE_STORE_ARGS</codeph>
settings. See <xref href="impala_security_guidelines.xml#security_guidelines"/> for
details.
</p>
</li>
<li id="default_query_options">
<p rev="DOCS-677">
Another setting you might add to <codeph>IMPALA_SERVER_ARGS</codeph> is a
comma-separated list of query options and values:
<codeblock>-default_query_options='<varname>option</varname>=<varname>value</varname>,<varname>option</varname>=<varname>value</varname>,...'
</codeblock>
These options control the behavior of queries performed by this
<cmdname>impalad</cmdname> instance. The option values you specify here override the
default values for <xref href="impala_query_options.xml#query_options">Impala query
options</xref>, as shown by the <codeph>SET</codeph> statement in
<cmdname>impala-shell</cmdname>.
</p>
</li>
<!-- Removing this reference now that the options are de-emphasized / desupported in Impala 2.3 and up.
<li rev="1.2">
<p>
Options for resource management, in conjunction with the YARN component. These options include
<codeph>-enable_rm</codeph> and <codeph>-cgroup_hierarchy_path</codeph>.
<ph rev="1.4.0">Additional options to help fine-tune the resource estimates are
<codeph>-—rm_always_use_defaults</codeph>,
<codeph>-—rm_default_memory=<varname>size</varname></codeph>, and
<codeph>-—rm_default_cpu_cores</codeph>.</ph> For details about these options, see
<xref href="impala_resource_management.xml#rm_options"/>. See
<xref href="impala_resource_management.xml#resource_management"/> for information about resource
management in general.
</p>
</li>
-->
<li>
<p>
During troubleshooting, <keyword keyref="support_org"/> might direct you to change other values,
particularly for <codeph>IMPALA_SERVER_ARGS</codeph>, to work around issues or
gather debugging information.
</p>
</li>
</ul>
<!-- Removing this reference now that the options are de-emphasized / desupported in Impala 2.3 and up.
<p conref="impala_resource_management.xml#rm_options/resource_management_impalad_options"/>
-->
<note>
<p>
These startup options for the <cmdname>impalad</cmdname> daemon are different from the
command-line options for the <cmdname>impala-shell</cmdname> command. For the
<cmdname>impala-shell</cmdname> options, see
<xref href="impala_shell_options.xml#shell_options"/>.
</p>
</note>
<p audience="hidden" outputclass="toc inpage"/>
</conbody>
<concept audience="hidden" id="config_options_impalad_details">
<title>Configuration Options for impalad Daemon</title>
<conbody>
<p>
Some common settings to change include:
</p>
<ul>
<li>
<p>
Statestore address. Where practical, put the statestore on a separate host not
running the <cmdname>impalad</cmdname> daemon. In that recommended configuration,
the <cmdname>impalad</cmdname> daemon cannot refer to the statestore server using
the loopback address. If the statestore is hosted on a machine with an IP address
of 192.168.0.27, change:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=127.0.0.1</codeblock>
<p>
to:
</p>
<codeblock>IMPALA_STATE_STORE_HOST=192.168.0.27</codeblock>
</li>
<li rev="1.2">
<p>
Catalog server address. Update the <codeph>IMPALA_CATALOG_SERVICE_HOST</codeph>
variable, including both the hostname and the port number in the value. Where
practical, run the catalog server on the same host as the statestore. In that
recommended configuration, the <cmdname>impalad</cmdname> daemon cannot refer to
the catalog server using the loopback address. If the catalog service is hosted on
a machine with an IP address of 192.168.0.27, add the following line:
</p>
<codeblock>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</codeblock>
<p>
The <filepath>/etc/default/impala</filepath> defaults file currently does not
define an <codeph>IMPALA_CATALOG_ARGS</codeph> environment variable, but if you
add one it will be recognized by the service startup/shutdown script. Add a
definition for this variable to <filepath>/etc/default/impala</filepath> and add
the option <codeph>-catalog_service_host=<varname>hostname</varname></codeph>. If
the port is different than the default 26000, also add the option
<codeph>-catalog_service_port=<varname>port</varname></codeph>.
</p>
</li>
<li id="mem_limit">
Memory limits. You can limit the amount of memory available to Impala. For example,
to allow Impala to use no more than 70% of system memory, change:
<!-- Note: also needs to be updated for each release to reflect latest /etc/default/impala. -->
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} \
-state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT}}</codeblock>
<p>
to:
</p>
<codeblock>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
-log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
-state_store_host=${IMPALA_STATE_STORE_HOST} \
-be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</codeblock>
<p>
You can specify the memory limit using absolute notation such as
<codeph>500m</codeph> or <codeph>2G</codeph>, or as a percentage of physical
memory such as <codeph>60%</codeph>.
</p>
<note>
Queries that exceed the specified memory limit are aborted. Percentage limits are
based on the physical memory of the machine and do not consider cgroups.
</note>
</li>
<li>
Core dump enablement. To enable core dumps, change:
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</codeblock>
<p>
to:
</p>
<codeblock>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</codeblock>
<note>
The location of core dump files may vary according to your operating system
configuration. Other security settings may prevent Impala from writing core dumps
even when this option is enabled.
</note>
</li>
<li>
Authorization using the open source Sentry plugin. Specify the
<codeph>-server_name</codeph> and <codeph>-authorization_policy_file</codeph>
options as part of the <codeph>IMPALA_SERVER_ARGS</codeph> and
<codeph>IMPALA_STATE_STORE_ARGS</codeph> settings to enable the core Impala support
for authentication. See <xref href="impala_authorization.xml#secure_startup"/> for
details.
</li>
<li>
Auditing for successful or blocked Impala queries, another aspect of security.
Specify the <codeph>-audit_event_log_dir=<varname>directory_path</varname></codeph>
option and optionally the
<codeph>-max_audit_event_log_file_size=<varname>number_of_queries</varname></codeph>
and <codeph>-abort_on_failed_audit_event</codeph> options as part of the
<codeph>IMPALA_SERVER_ARGS</codeph> settings, for each Impala node, to enable and
customize auditing. See <xref href="impala_auditing.xml#auditing"/> for details.
</li>
<li>
Password protection for the Impala web UI, which listens on port 25000 by default.
This feature involves adding some or all of the
<codeph>--webserver_password_file</codeph>,
<codeph>--webserver_authentication_domain</codeph>, and
<codeph>--webserver_certificate_file</codeph> options to the
<codeph>IMPALA_SERVER_ARGS</codeph> and <codeph>IMPALA_STATE_STORE_ARGS</codeph>
settings. See <xref href="impala_security_webui.xml"/> for details.
</li>
<li id="default_query_options">
Another setting you might add to <codeph>IMPALA_SERVER_ARGS</codeph> is:
<codeblock>-default_query_options='<varname>option</varname>=<varname>value</varname>,<varname>option</varname>=<varname>value</varname>,...'
</codeblock>
These options control the behavior of queries performed by this
<cmdname>impalad</cmdname> instance. The option values you specify here override the
default values for <xref href="impala_query_options.xml#query_options">Impala query
options</xref>, as shown by the <codeph>SET</codeph> statement in
<cmdname>impala-shell</cmdname>.
</li>
<!-- Removing this reference now that the options are de-emphasized / desupported in Impala 2.3 and up.
<li rev="1.2">
Options for resource management, in conjunction with the YARN component. These options
include <codeph>-enable_rm</codeph> and <codeph>-cgroup_hierarchy_path</codeph>.
<ph rev="1.4.0">Additional options to help fine-tune the resource estimates are
<codeph>-—rm_always_use_defaults</codeph>,
<codeph>-—rm_default_memory=<varname>size</varname></codeph>, and
<codeph>-—rm_default_cpu_cores</codeph>.</ph> For details about these options, see
<xref href="impala_resource_management.xml#rm_options"/>. See
<xref href="impala_resource_management.xml#resource_management"/> for information about resource
management in general.
</li>
-->
<li>
During troubleshooting, <keyword keyref="support_org"/> might direct you to change other values,
particularly for <codeph>IMPALA_SERVER_ARGS</codeph>, to work around issues or
gather debugging information.
</li>
</ul>
<!-- Removing this reference now that the options are de-emphasized / desupported in Impala 2.3 and up.
<p conref="impala_resource_management.xml#rm_options/resource_management_impalad_options"/>
-->
<note>
<p>
These startup options for the <cmdname>impalad</cmdname> daemon are different from
the command-line options for the <cmdname>impala-shell</cmdname> command. For the
<cmdname>impala-shell</cmdname> options, see
<xref href="impala_shell_options.xml#shell_options"/>.
</p>
</note>
</conbody>
</concept>
<concept audience="hidden" id="config_options_statestored_details">
<title>Configuration Options for statestored Daemon</title>
<conbody>
<p></p>
</conbody>
</concept>
<concept audience="hidden" id="config_options_catalogd_details">
<title>Configuration Options for catalogd Daemon</title>
<conbody>
<p></p>
</conbody>
</concept>
</concept>
<concept id="config_options_checking">
<title>Checking the Values of Impala Configuration Options</title>
<conbody>
<p>
You can check the current runtime value of all these settings through the Impala web
interface, available by default at
<codeph>http://<varname>impala_hostname</varname>:25000/varz</codeph> for the
<cmdname>impalad</cmdname> daemon,
<codeph>http://<varname>impala_hostname</varname>:25010/varz</codeph> for the
<cmdname>statestored</cmdname> daemon, or
<codeph>http://<varname>impala_hostname</varname>:25020/varz</codeph> for the
<cmdname>catalogd</cmdname> daemon.
</p>
</conbody>
</concept>
<concept id="config_options_impalad">
<title>Startup Options for impalad Daemon</title>
<conbody>
<p>
The <codeph>impalad</codeph> daemon implements the main Impala service, which performs
query processing and reads and writes the data files.
</p>
</conbody>
</concept>
<concept id="config_options_statestored">
<title>Startup Options for statestored Daemon</title>
<conbody>
<p>
The <cmdname>statestored</cmdname> daemon implements the Impala statestore service,
which monitors the availability of Impala services across the cluster, and handles
situations such as nodes becoming unavailable or becoming available again.
</p>
</conbody>
</concept>
<concept rev="1.2" id="config_options_catalogd">
<title>Startup Options for catalogd Daemon</title>
<conbody>
<p>
The <cmdname>catalogd</cmdname> daemon implements the Impala catalog service, which
broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts
data, or performs other kinds of DDL and DML operations.
</p>
<p conref="../shared/impala_common.xml#common/load_catalog_in_background"/>
</conbody>
</concept>
</concept>