blob: 51920c9fa37200340d08f09991869350b84532b5 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="ver" id="known_issues">
<title><ph audience="standalone">Known Issues and Workarounds in Impala</ph><ph audience="integrated">Apache Impala (incubating) Known Issues</ph></title>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Release Notes"/>
<data name="Category" value="Known Issues"/>
<data name="Category" value="Troubleshooting"/>
<data name="Category" value="Upgrading"/>
<data name="Category" value="Administrators"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p>
The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
whether a fix is in the pipeline.
</p>
<note>
The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
you are experiencing has already been reported, or which release an issue is fixed in, search on the
<xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org JIRA tracker</xref>.
</note>
<p outputclass="toc inpage"/>
<p>
For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
</p>
<!-- Use as a template for new issues.
<concept id="">
<title></title>
<conbody>
<p>
</p>
<p><b>Bug:</b> <xref keyref=""></xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> </p>
<p><b>Workaround:</b> </p>
</conbody>
</concept>
-->
</conbody>
<!-- New known issues for Impala 2.3.
Title: Server-to-server SSL and Kerberos do not work together
Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2598
Severity: Medium. Server-to-server SSL is practically unusable but this is a new feature.
Workaround: No known workaround.
Title: Queries may hang on server-to-server exchange errors
Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2592
Severity: Low. This does not occur frequently.
Workaround: No known workaround.
Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
Description: Incremental stats use up about 400 bytes per partition X column. So for a table with 20K partitions and 100 columns this is about 800 MB. When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
Severity: Low. This does not occur frequently.
Workaround: Reduce the number of partitions.
More from the JIRA report of blocker/critical issues:
IMPALA-2093
Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
IMPALA-1652
Incorrect results with basic predicate on CHAR typed column.
IMPALA-1459
Incorrect assignment of predicates through an outer join in an inline view.
IMPALA-2665
Incorrect assignment of On-clause predicate inside inline view with an outer join.
IMPALA-2603
Crash: impala::Coordinator::ValidateCollectionSlots
IMPALA-2375
Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
IMPALA-1862
Invalid bool value not reported as a scanner error
IMPALA-1792
ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
IMPALA-1578
Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
IMPALA-2643
Duplicated column in inline view causes dropping null slots during scan
IMPALA-2005
A failed CTAS does not drop the table if the insert fails.
IMPALA-1821
Casting scenarios with invalid/inconsistent results
Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
https://issues.apache.org/jira/browse/IMPALA-2665 - Already have
https://issues.apache.org/jira/browse/IMPALA-2643 - Already have
https://issues.apache.org/jira/browse/IMPALA-1459 - Already have
https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
-->
<concept id="known_issues_crash">
<title>Impala Known Issues: Crashes and Hangs</title>
<conbody>
<p>
These issues can cause Impala to quit or become unresponsive.
</p>
</conbody>
<concept id="IMPALA-4828">
<title>Altering Kudu table schema outside of Impala may result in crash on read</title>
<conbody>
<p>
Creating a table in Impala, changing the column schema outside of Impala,
and then reading again in Impala may result in a crash. Neither Impala nor
the Kudu client validates the schema immediately before reading, so Impala may attempt to
dereference pointers that aren't there. This happens if a string column is dropped
and then a new, non-string column is added with the old string column's name.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Workaround:</b> Run the statement <codeph>REFRESH <varname>table_name</varname></codeph>
after any occasion when the table structure, such as the number, names, and data types
of columns, are modified outside of Impala using the Kudu API.
</p>
</conbody>
</concept>
<concept id="IMPALA-1972" rev="IMPALA-1972">
<title>Queries that take a long time to plan can cause webserver to block other queries</title>
<conbody>
<p>
Trying to get the details of a query through the debug web page
while the query is planning will block new queries that had not
started when the web page was requested. The web UI becomes
unresponsive until the planning phase is finished.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1972">IMPALA-1972</xref>
</p>
<p>
<b>Severity:</b> High
</p>
</conbody>
<concept id="IMPALA-4595">
<title>Linking IR UDF module to main module crashes Impala</title>
<conbody>
<p>
A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash
when executed.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
<p><b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead of a
<codeph>.ll</codeph> IR module.</p>
</conbody>
</concept>
<concept id="IMPALA-3069" rev="IMPALA-3069">
<title>Setting BATCH_SIZE query option too large can cause a crash</title>
<conbody>
<p>
Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
columns, could cause a memory allocation of more than 2 GB resulting in a crash.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3069">IMPALA-3069</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-3441" rev="IMPALA-3441">
<title>Impala should not crash for invalid avro serialized data</title>
<conbody>
<p>
Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3441">IMPALA-3441</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala262"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-2592" rev="IMPALA-2592">
<title>Queries may hang on server-to-server exchange errors</title>
<conbody>
<p>
The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
the other side of the channel to wait indefinitely, causing a hang.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2592">IMPALA-2592</xref>
</p>
<p>
<b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
</p>
</conbody>
</concept>
<concept id="IMPALA-2365" rev="IMPALA-2365">
<title>Impalad is crashing if udf jar is not available in hdfs location for first time</title>
<conbody>
<p>
If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
issued, the <cmdname>impalad</cmdname> daemon crashes.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2365">IMPALA-2365</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_performance">
<title id="ki_performance">Impala Known Issues: Performance</title>
<conbody>
<p>
These issues involve the performance of operations such as queries or DDL statements.
</p>
</conbody>
<concept id="IMPALA-1480" rev="IMPALA-1480">
<!-- Not part of Alex's spreadsheet. Spreadsheet has IMPALA-1423 which mentions it's similar to this one but not a duplicate. -->
<title>Slow DDL statements for tables with large number of partitions</title>
<conbody>
<p>
DDL statements for tables with a large number of partitions might be slow.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1480">IMPALA-1480</xref>
</p>
<p>
<b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_usability">
<title id="ki_usability">Impala Known Issues: Usability</title>
<conbody>
<p>
These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
</p>
</conbody>
<concept id="IMPALA-4570">
<title>Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string</title>
<conbody>
<p>
For example, this issue could occur on a system using setuptools version 20.7.0.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
<p><b>Workaround:</b> Change to a setuptools version that does not have <codeph>0.7</codeph> as
a substring.
</p>
</conbody>
</concept>
<concept id="IMPALA-3133" rev="IMPALA-3133">
<title>Unexpected privileges in show output</title>
<conbody>
<p>
Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
not represent a security issue for other statements.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3133">IMPALA-3133</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Resolution:</b> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and <keyword keyref="impala251"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-1776" rev="IMPALA-1776">
<title>Less than 100% progress on completed simple SELECT queries</title>
<conbody>
<p>
Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1776">IMPALA-1776</xref>
</p>
</conbody>
</concept>
<concept id="concept_lmx_dk5_lx">
<title>Unexpected column overflow behavior with INT datatypes</title>
<conbody>
<p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
<p>
<b>Bug:</b>
<xref keyref="IMPALA-3123">IMPALA-3123</xref>
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_drivers">
<title id="ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</title>
<conbody>
<p>
These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
in languages such as Java or C++.
</p>
</conbody>
<concept id="IMPALA-1792" rev="IMPALA-1792">
<!-- Not part of Alex's spreadsheet -->
<title>ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</title>
<conbody>
<p>
If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
<codeph>NULL</codeph>.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1792">IMPALA-1792</xref>
</p>
<p>
<b>Workaround:</b> Fetch columns in the same order they are defined in the table.
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_security">
<title id="ki_security">Impala Known Issues: Security</title>
<conbody>
<p>
These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
redaction.
</p>
</conbody>
<concept id="renewable_kerberos_tickets">
<!-- Not part of Alex's spreadsheet. Not associated with a JIRA number AFAIK. -->
<title>Kerberos tickets must be renewable</title>
<conbody>
<p>
In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
</p>
<p>
<b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
renewable tickets.
</p>
</conbody>
</concept>
<!-- To do: Fixed in 2.5.0, 2.3.2. Commenting out until I see how it can fix into "known issues now fixed" convention.
That set of fix releases looks incomplete so probably have to do some detective work with the JIRA.
https://issues.apache.org/jira/browse/IMPALA-2598
<concept id="IMPALA-2598" rev="IMPALA-2598">
<title>Server-to-server SSL and Kerberos do not work together</title>
<conbody>
<p>
If SSL is enabled between internal Impala components (with <codeph>ssl_client_ca_certificate</codeph>), and Kerberos
authentication is used between servers, the cluster fails to start.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2598">IMPALA-2598</xref>
</p>
<p>
<b>Workaround:</b> Do not use the new <codeph>ssl_client_ca_certificate</codeph> setting on Kerberos-enabled clusters until this
issue is resolved.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala232"/>.</p>
</conbody>
</concept>
-->
</concept>
<!--
<concept id="known_issues_supportability">
<title id="ki_supportability">Impala Known Issues: Supportability</title>
<conbody>
<p>
These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
shown in monitoring applications.
</p>
</conbody>
</concept>
-->
<concept id="known_issues_resources">
<title id="ki_resources">Impala Known Issues: Resources</title>
<conbody>
<p>
These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
features.
</p>
</conbody>
<concept id="IMPALA-5605">
<title>Configuration to prevent crashes caused by thread resource limits</title>
<conbody>
<p>
Impala could encounter a serious error due to resource usage under very high concurrency.
The error message is similar to:
</p>
<codeblock><![CDATA[
F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
]]>
</codeblock>
<p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Workaround:</b>
To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
daemon with the following settings:
</p>
<codeblock>
echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
</codeblock>
<p>
Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
</p>
<codeblock>
impala soft nproc 262144
impala hard nproc 262144
</codeblock>
</conbody>
</concept>
<concept id="flatbuffers_mem_usage">
<title>Memory usage when compact_catalog_topic flag enabled</title>
<conbody>
<p>
The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref>
can cause an increase in size of the updates to Impala catalog metadata
that are broadcast to the <cmdname>impalad</cmdname> daemons
by the <cmdname>statestored</cmdname> daemon.
The increase in catalog update topic size results in higher CPU and network
utilization. By default, the increase in topic size is about 5-7%. If the
<codeph>compact_catalog_topic</codeph> flag is used, the
size increase is more substantial, with a topic size approximately twice as
large as in previous versions.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref></p>
<p><b>Severity:</b> Medium</p>
<p>
<b>Workaround:</b> Consider leaving the <codeph>compact_catalog_topic</codeph>
configuration setting at its default value of <codeph>false</codeph> until
this issue is resolved.
</p>
<p><b>Resolution:</b> A fix is in the pipeline. Check the status of
<xref keyref="IMPALA-5500">IMPALA-5500</xref> for the release where the fix is available.</p>
</conbody>
</concept>
<concept id="IMPALA-2294">
<title>Kerberos initialization errors due to high memory usage</title>
<conbody>
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
<p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Workaround:</b></p>
<p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="vm_overcommit_memory_end"/>
</conbody>
</concept>
<concept id="drop_table_purge_s3a">
<title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
<conbody>
<p>
A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data files
behind, if the table directory and the data files were created with a combination of
<cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> The underlying issue with the S3A connector depends on the resolution of <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.</p>
</conbody>
</concept>
<concept id="catalogd_heap">
<title>Impala catalogd heap issues when upgrading to <keyword keyref="impala25"/></title>
<conbody>
<p>
The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher:
</p>
<ul>
<li>
<p>
Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
physical memory or 32 GB.
</p>
</li>
<li>
<p>
Starting with <keyword keyref="impala250"/>, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
</p>
</li>
</ul>
<p>
For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
in out-of-memory errors in catalogd and leading to query failures.
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
<!-- Including full details here via conref, for benefit of PDF readers or anyone else
who might have trouble seeing or following the link. -->
</p>
<p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
</conbody>
</concept>
<concept id="IMPALA-3509" rev="IMPALA-3509">
<title>Breakpad minidumps can be very large when the thread count is high</title>
<conbody>
<p>
The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3509">IMPALA-3509</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
than 20 MB.
</p>
</conbody>
</concept>
<concept id="IMPALA-3662" rev="IMPALA-3662">
<title>Parquet scanner memory increase after IMPALA-2736</title>
<conbody>
<p>
The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading
Parquet files.
</p>
<p>
<keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
<ul>
<li>
<p>
Very wide rows due to projecting many columns in a scan.
</p>
</li>
<li>
<p>
Very large rows due to big column values, for example, long strings or nested collections with many items.
</p>
</li>
<li>
<p>
Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
plan nodes.
</p>
</li>
</ul>
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3662">IMPALA-3662</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
<ul>
<li>
Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
</li>
<li>
Reduce the batch size, for example: <codeph>set batch_size=512</codeph>
</li>
<li>
Increase the memory limit, for example: <codeph>set mem_limit=64g</codeph>
</li>
</ul>
</p>
</conbody>
</concept>
<concept id="IMPALA-691" rev="IMPALA-691">
<title>Process mem limit does not account for the JVM's memory usage</title>
<!-- Supposed to be resolved for Impala 2.3.0. -->
<conbody>
<p>
Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
<cmdname>impalad</cmdname> daemon.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-691">IMPALA-691</xref>
</p>
<p>
<b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
</p>
</conbody>
</concept>
<concept id="IMPALA-2375" rev="IMPALA-2375">
<!-- Not part of Alex's spreadsheet -->
<title>Fix issues with the legacy join and agg nodes using --enable_partitioned_hash_join=false and --enable_partitioned_aggregation=false</title>
<conbody>
<p></p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2375">IMPALA-2375</xref>
</p>
<p>
<b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_correctness">
<title id="ki_correctness">Impala Known Issues: Correctness</title>
<conbody>
<p>
These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
</p>
</conbody>
<concept id="IMPALA-4513">
<title>ABS(n) where n is the lowest bound for the int types returns negative values</title>
<conbody>
<p>
If the <codeph>abs()</codeph> function evaluates a number that is right at the lower bound for
an integer data type, the positive result cannot be represented in the same type, and the
result is returned as a negative number. For example, <codeph>abs(-128)</codeph> returns -128
because the argument is interpreted as a <codeph>TINYINT</codeph> and the return value is also
a <codeph>TINYINT</codeph>.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
<codeph>abs(<varname>tinyint_col</varname>)</codeph> as <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.</p>
</conbody>
</concept>
<concept id="IMPALA-4266">
<title>Java udf expression returning string in group by can give incorrect results.</title>
<conbody>
<p>
If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a string value,
the UDF could return an incorrect result.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
<p><b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF with an
empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
<codeph>concat(my_hive_udf(), '')</codeph>.
</p>
</conbody>
</concept>
<concept id="IMPALA-3084" rev="IMPALA-3084">
<title>Incorrect assignment of NULL checking predicate through an outer join of a nested collection.</title>
<conbody>
<p>
A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
<codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> clause.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3084">IMPALA-3084</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-3094" rev="IMPALA-3094">
<title>Incorrect result due to constant evaluation in query with outer join</title>
<conbody>
<p>
An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
another join clause. For example:
</p>
<codeblock><![CDATA[
explain SELECT 1 FROM alltypestiny a1
INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
+---------------------------------------------------------+
| Explain String |
+---------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
| |
| 00:EMPTYSET |
+---------------------------------------------------------+
]]>
</codeblock>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3094">IMPALA-3094</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Resolution:</b>
</p>
<p>
<b>Workaround:</b>
</p>
</conbody>
</concept>
<concept id="IMPALA-3126" rev="IMPALA-3126">
<title>Incorrect assignment of an inner join On-clause predicate through an outer join.</title>
<conbody>
<p>
Impala may return incorrect results for queries that have the following properties:
</p>
<ul>
<li>
<p>
There is an INNER JOIN following a series of OUTER JOINs.
</p>
</li>
<li>
<p>
The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
preceding OUTER JOINs.
</p>
</li>
</ul>
<p>
The following query demonstrates the issue:
</p>
<codeblock>
select 1 from functional.alltypes a left outer join
functional.alltypes b on a.id = b.id left outer join
functional.alltypes c on b.id = c.id right outer join
functional.alltypes d on c.id = d.id inner join functional.alltypes e
on b.int_col = c.int_col;
</codeblock>
<p>
The following listing shows the incorrect <codeph>EXPLAIN</codeph> plan:
</p>
<codeblock><![CDATA[
+-----------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
| |
| 14:EXCHANGE [UNPARTITIONED] |
| | |
| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
| | |
| |--13:EXCHANGE [BROADCAST] |
| | | |
| | 04:SCAN HDFS [functional.alltypes e] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
| | hash predicates: c.id = d.id |
| | runtime filters: RF000 <- d.id |
| | |
| |--12:EXCHANGE [HASH(d.id)] |
| | | |
| | 03:SCAN HDFS [functional.alltypes d] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
| | hash predicates: b.id = c.id |
| | other predicates: b.int_col = c.int_col <--- incorrect placement; should be at node 07 or 08
| | runtime filters: RF001 <- c.int_col |
| | |
| |--11:EXCHANGE [HASH(c.id)] |
| | | |
| | 02:SCAN HDFS [functional.alltypes c] |
| | partitions=24/24 files=24 size=478.45KB |
| | runtime filters: RF000 -> c.id |
| | |
| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
| | hash predicates: b.id = a.id |
| | runtime filters: RF002 <- a.id |
| | |
| |--10:EXCHANGE [HASH(a.id)] |
| | | |
| | 00:SCAN HDFS [functional.alltypes a] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 09:EXCHANGE [HASH(b.id)] |
| | |
| 01:SCAN HDFS [functional.alltypes b] |
| partitions=24/24 files=24 size=478.45KB |
| runtime filters: RF001 -> b.int_col, RF002 -> b.id |
+-----------------------------------------------------------+
]]>
</codeblock>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3126">IMPALA-3126</xref>
</p>
<p>
<b>Severity:</b> High
</p>
<p>
<b>Workaround:</b> High
</p>
<p>
For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
<codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
</p>
<codeblock><![CDATA[
select 1 from functional.alltypes a
left outer join functional.alltypes b
on a.id = b.id
left outer join functional.alltypes c
on b.id = c.id
right outer join functional.alltypes d
on c.id = d.id
inner join functional.alltypes e
where b.int_col = c.int_col
+-----------------------------------------------------------+
| Explain String |
+-----------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=480.04MB VCores=4 |
| |
| 14:EXCHANGE [UNPARTITIONED] |
| | |
| 08:NESTED LOOP JOIN [CROSS JOIN, BROADCAST] |
| | |
| |--13:EXCHANGE [BROADCAST] |
| | | |
| | 04:SCAN HDFS [functional.alltypes e] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 07:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
| | hash predicates: c.id = d.id |
| | other predicates: b.int_col = c.int_col <-- correct assignment
| | runtime filters: RF000 <- d.id |
| | |
| |--12:EXCHANGE [HASH(d.id)] |
| | | |
| | 03:SCAN HDFS [functional.alltypes d] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 06:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] |
| | hash predicates: b.id = c.id |
| | |
| |--11:EXCHANGE [HASH(c.id)] |
| | | |
| | 02:SCAN HDFS [functional.alltypes c] |
| | partitions=24/24 files=24 size=478.45KB |
| | runtime filters: RF000 -> c.id |
| | |
| 05:HASH JOIN [RIGHT OUTER JOIN, PARTITIONED] |
| | hash predicates: b.id = a.id |
| | runtime filters: RF001 <- a.id |
| | |
| |--10:EXCHANGE [HASH(a.id)] |
| | | |
| | 00:SCAN HDFS [functional.alltypes a] |
| | partitions=24/24 files=24 size=478.45KB |
| | |
| 09:EXCHANGE [HASH(b.id)] |
| | |
| 01:SCAN HDFS [functional.alltypes b] |
| partitions=24/24 files=24 size=478.45KB |
| runtime filters: RF001 -> b.id |
+-----------------------------------------------------------+
]]>
</codeblock>
</conbody>
</concept>
<concept id="IMPALA-3006" rev="IMPALA-3006">
<title>Impala may use incorrect bit order with BIT_PACKED encoding</title>
<conbody>
<p>
Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3006">IMPALA-3006</xref>
</p>
<p>
<b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
in Parquet 2.0.
</p>
</conbody>
</concept>
<concept id="IMPALA-3082" rev="IMPALA-3082">
<title>BST between 1972 and 1995</title>
<conbody>
<p>
The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
as:
</p>
<codeblock>
select
extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
</codeblock>
<p>
<b>Bug:</b> <xref keyref="IMPALA-3082">IMPALA-3082</xref>
</p>
<p>
<b>Severity:</b> High
</p>
</conbody>
</concept>
<concept id="IMPALA-1170" rev="IMPALA-1170">
<title>parse_url() returns incorrect result if @ character in URL</title>
<conbody>
<p>
If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
the hostname field.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1170"></xref>IMPALA-1170
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-2422" rev="IMPALA-2422">
<title>% escaping does not work correctly when occurs at the end in a LIKE clause</title>
<conbody>
<p>
If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
does not match a <codeph>%</codeph> final character of the LHS argument.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2422">IMPALA-2422</xref>
</p>
</conbody>
</concept>
<concept id="IMPALA-397" rev="IMPALA-397">
<title>ORDER BY rand() does not work.</title>
<conbody>
<p>
Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
involving a call to <codeph>rand()</codeph> does not actually randomize the results.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-397">IMPALA-397</xref>
</p>
</conbody>
</concept>
<concept id="IMPALA-2643" rev="IMPALA-2643">
<title>Duplicated column in inline view causes dropping null slots during scan</title>
<conbody>
<p>
If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2643">IMPALA-2643</xref>
</p>
<p>
<b>Workaround:</b> Avoid selecting the same column twice within an inline view.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-1459" rev="IMPALA-1459">
<!-- Not part of Alex's spreadsheet -->
<title>Incorrect assignment of predicates through an outer join in an inline view.</title>
<conbody>
<p>
A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
from the <codeph>ON</codeph> clause incorrectly.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1459">IMPALA-1459</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-2603" rev="IMPALA-2603">
<title>Crash: impala::Coordinator::ValidateCollectionSlots</title>
<conbody>
<p>
A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
subqueries.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2603">IMPALA-2603</xref>
</p>
</conbody>
</concept>
<concept id="IMPALA-2665" rev="IMPALA-2665">
<title>Incorrect assignment of On-clause predicate inside inline view with an outer join.</title>
<conbody>
<p>
A query might return incorrect results due to wrong predicate assignment in the following scenario:
</p>
<ol>
<li>
There is an inline view that contains an outer join
</li>
<li>
That inline view is joined with another table in the enclosing query block
</li>
<li>
That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
the inline view
</li>
</ol>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2665">IMPALA-2665</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-2144" rev="IMPALA-2144">
<title>Wrong assignment of having clause predicate across outer join</title>
<conbody>
<p>
In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
clause might be applied at the wrong stage of query processing, leading to incorrect results.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2144">IMPALA-2144</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-2093" rev="IMPALA-2093">
<title>Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</title>
<conbody>
<p>
A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
SUM(...))</codeph>, could return incorrect results.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2093">IMPALA-2093</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_metadata">
<title id="ki_metadata">Impala Known Issues: Metadata</title>
<conbody>
<p>
These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
</p>
</conbody>
<concept id="IMPALA-2648" rev="IMPALA-2648">
<title>Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats</title>
<conbody>
<p>
Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
</p>
<p>
<b>Bugs:</b> <xref keyref="IMPALA-2647">IMPALA-2647</xref>,
<xref keyref="IMPALA-2648">IMPALA-2648</xref>,
<xref keyref="IMPALA-2649">IMPALA-2649</xref>
</p>
<p>
<b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
scalability of incremental stats computation is a continuing work item.
</p>
</conbody>
</concept>
<concept id="IMPALA-1420" rev="IMPALA-1420 2.0.0">
<!-- Not part of Alex's spreadsheet -->
<title>Can't update stats manually via alter table after upgrading to <keyword keyref="impala20"/></title>
<conbody>
<p></p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1420">IMPALA-1420</xref>
</p>
<p>
<b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
set both properties with a single <codeph>ALTER TABLE</codeph> statement:
</p>
<codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
<p>
<b>Resolution:</b> The underlying cause is the issue
<xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <keyword keyref="distro"/>.
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_interop">
<title id="ki_interop">Impala Known Issues: Interoperability</title>
<conbody>
<p>
These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
and file formats.
</p>
</conbody>
<!-- Opened based on internal JIRA. Not part of Alex's spreadsheet AFAIK. -->
<concept id="describe_formatted_avro">
<title>DESCRIBE FORMATTED gives error on Avro table</title>
<conbody>
<p>
This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
</p>
<p>
As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
the Hive metastore database with the correct column definitions.
</p>
<note type="warning">
Only use this for external tables, or Impala will remove the data files. In case of an internal table, set it to external first:
<codeblock>
ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
</codeblock>
(The part in parentheses is case sensitive.) Make sure to pick the right choice between internal and external when recreating the
table. See <xref href="impala_tables.xml#tables"/> for the differences between internal and external tables.
</note>
<p>
<b>Severity:</b> High
</p>
</conbody>
</concept>
<concept id="IMP-469">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
<title>Deviation from Hive behavior: Impala does not do implicit casts between string and numeric and boolean types.</title>
<conbody>
<p>
<b>Anticipated Resolution</b>: None
</p>
<p>
<b>Workaround:</b> Use explicit casts.
</p>
</conbody>
</concept>
<concept id="IMP-175">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
<title>Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</title>
<conbody>
<p>
Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
allowed value of type (Hive returns NULL).
</p>
<p>
<b>Workaround:</b> None
</p>
</conbody>
</concept>
<concept id="flume_writeformat_text">
<!-- Not part of Alex's spreadsheet. From a non-public JIRA. -->
<title>Configuration needed for Flume to be compatible with Impala</title>
<conbody>
<p>
For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
<codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
Impala or Hive.
</p>
<p>
<b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
</p>
</conbody>
</concept>
<concept id="IMPALA-635" rev="IMPALA-635">
<!-- Not part of Alex's spreadsheet -->
<title>Avro Scanner fails to parse some schemas</title>
<conbody>
<p>
Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-635">IMPALA-635</xref>
</p>
<p>
<b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
instead of <codeph>["string", "null"]</codeph>.
</p>
<p>
<b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
crashing issue is resolved.
</p>
</conbody>
</concept>
<concept id="IMPALA-1024" rev="IMPALA-1024">
<!-- Not part of Alex's spreadsheet -->
<title>Impala BE cannot parse Avro schema that contains a trailing semi-colon</title>
<conbody>
<p>
If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1024">IMPALA-1024</xref>
</p>
<p>
<b>Severity:</b> Remove trailing semicolon from the Avro schema.
</p>
</conbody>
</concept>
<concept id="IMPALA-2154" rev="IMPALA-2154">
<!-- Not part of Alex's spreadsheet -->
<title>Fix decompressor to allow parsing gzips with multiple streams</title>
<conbody>
<p>
Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
streams, the Impala query only processes the data from the first stream.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2154">IMPALA-2154</xref>
</p>
<p>
<b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-1578" rev="IMPALA-1578">
<!-- Not part of Alex's spreadsheet -->
<title>Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block</title>
<conbody>
<p>
If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
the row following the <codeph>\n\r</codeph> pair twice.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1578">IMPALA-1578</xref>
</p>
<p>
<b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-1862" rev="IMPALA-1862">
<!-- Not part of Alex's spreadsheet -->
<title>Invalid bool value not reported as a scanner error</title>
<conbody>
<p>
In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value.
The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to
overlooking the presence of invalid data.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1862">IMPALA-1862</xref>
</p>
</conbody>
</concept>
<concept id="IMPALA-1652" rev="IMPALA-1652">
<!-- To do: Isn't this more a correctness issue? -->
<title>Incorrect results with basic predicate on CHAR typed column.</title>
<conbody>
<p>
When comparing a <codeph>CHAR</codeph> column value to a string literal, the literal value is not blank-padded and so the
comparison might fail when it should match.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1652">IMPALA-1652</xref>
</p>
<p>
<b>Workaround:</b> Use the <codeph>RPAD()</codeph> function to blank-pad literals compared with <codeph>CHAR</codeph> columns to
the expected length.
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_limitations">
<title>Impala Known Issues: Limitations</title>
<conbody>
<p>
These issues are current limitations of Impala that require evaluation as you plan how to integrate Impala into your data management
workflow.
</p>
</conbody>
<concept id="IMPALA-4551">
<title>Set limits on size of expression trees</title>
<conbody>
<p>
Very deeply nested expressions within queries can exceed internal Impala limits,
leading to excessive memory usage.
</p>
<p><b>Bug:</b> <xref keyref="IMPALA-4551">IMPALA-4551</xref></p>
<p><b>Severity:</b> High</p>
<p><b>Resolution:</b> </p>
<p><b>Workaround:</b> Avoid queries with extremely large expression trees. Setting the query option
<codeph>disable_codegen=true</codeph> may reduce the impact, at a cost of longer query runtime.</p>
</conbody>
</concept>
<concept id="IMPALA-77" rev="IMPALA-77">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
<title>Impala does not support running on clusters with federated namespaces</title>
<conbody>
<p>
Impala does not support running on clusters with federated namespaces. The <codeph>impalad</codeph> process will not start on a
node running such a filesystem based on the <codeph>org.apache.hadoop.fs.viewfs.ViewFs</codeph> class.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-77">IMPALA-77</xref>
</p>
<p>
<b>Anticipated Resolution:</b> Limitation
</p>
<p>
<b>Workaround:</b> Use standard HDFS on all Impala nodes.
</p>
</conbody>
</concept>
</concept>
<concept id="known_issues_misc">
<title>Impala Known Issues: Miscellaneous / Older Issues</title>
<conbody>
<p>
These issues do not fall into one of the above categories or have not been categorized yet.
</p>
</conbody>
<concept id="IMPALA-2005" rev="IMPALA-2005">
<!-- Not part of Alex's spreadsheet -->
<title>A failed CTAS does not drop the table if the insert fails.</title>
<conbody>
<p>
If a <codeph>CREATE TABLE AS SELECT</codeph> operation successfully creates the target table but an error occurs while querying
the source table or copying the data, the new table is left behind rather than being dropped.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-2005">IMPALA-2005</xref>
</p>
<p>
<b>Workaround:</b> Drop the new table manually after a failed <codeph>CREATE TABLE AS SELECT</codeph>.
</p>
</conbody>
</concept>
<concept id="IMPALA-1821" rev="IMPALA-1821">
<!-- Not part of Alex's spreadsheet -->
<title>Casting scenarios with invalid/inconsistent results</title>
<conbody>
<p>
Using a <codeph>CAST()</codeph> function to convert large literal values to smaller types, or to convert special values such as
<codeph>NaN</codeph> or <codeph>Inf</codeph>, produces values not consistent with other database systems. This could lead to
unexpected results from queries.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1821">IMPALA-1821</xref>
</p>
<!-- <p><b>Workaround:</b> Doublecheck that <codeph>CAST()</codeph> operations work as expect. The issue applies to expressions involving literals, not values read from table columns.</p> -->
</conbody>
</concept>
<concept id="IMPALA-1619" rev="IMPALA-1619">
<!-- Not part of Alex's spreadsheet -->
<title>Support individual memory allocations larger than 1 GB</title>
<conbody>
<p>
The largest single block of memory that Impala can allocate during a query is 1 GiB. Therefore, a query could fail or Impala could
crash if a compressed text file resulted in more than 1 GiB of data in uncompressed form, or if a string function such as
<codeph>group_concat()</codeph> returned a value greater than 1 GiB.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-1619">IMPALA-1619</xref>
</p>
<p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala263"/>.</p>
</conbody>
</concept>
<concept id="IMPALA-941" rev="IMPALA-941">
<!-- Not part of Alex's spreadsheet. Maybe this is interop? -->
<title>Impala Parser issue when using fully qualified table names that start with a number.</title>
<conbody>
<p>
A fully qualified table name starting with a number could cause a parsing error. In a name such as <codeph>db.571_market</codeph>,
the decimal point followed by digits is interpreted as a floating-point number.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-941">IMPALA-941</xref>
</p>
<p>
<b>Workaround:</b> Surround each part of the fully qualified name with backticks (<codeph>``</codeph>).
</p>
</conbody>
</concept>
<concept id="IMPALA-532" rev="IMPALA-532">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
<title>Impala should tolerate bad locale settings</title>
<conbody>
<p>
If the <codeph>LC_*</codeph> environment variables specify an unsupported locale, Impala does not start.
</p>
<p>
<b>Bug:</b> <xref keyref="IMPALA-532">IMPALA-532</xref>
</p>
<p>
<b>Workaround:</b> Add <codeph>LC_ALL="C"</codeph> to the environment settings for both the Impala daemon and the Statestore
daemon. See <xref href="impala_config_options.xml#config_options"/> for details about modifying these environment settings.
</p>
<p>
<b>Resolution:</b> Fixing this issue would require an upgrade to Boost 1.47 in the Impala distribution.
</p>
</conbody>
</concept>
<concept id="IMP-1203">
<!-- Not part of Alex's spreadsheet. Perhaps it really is a permanent limitation and nobody is tracking it? -->
<title>Log Level 3 Not Recommended for Impala</title>
<conbody>
<p>
The extensive logging produced by log level 3 can cause serious performance overhead and capacity issues.
</p>
<p>
<b>Workaround:</b> Reduce the log level to its default value of 1, that is, <codeph>GLOG_v=1</codeph>. See
<xref href="impala_logging.xml#log_levels"/> for details about the effects of setting different logging levels.
</p>
</conbody>
</concept>
</concept>
</concept>
</concept>