blob: 5286a3ca8825726931336b82d7229ecf0ed41fdb [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="char" rev="2.0.0">
<title>CHAR Data Type (<keyword keyref="impala20"/> or higher only)</title>
<titlealts audience="PDF">
<navtitle>CHAR</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Data Types"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Schemas"/>
</metadata>
</prolog>
<conbody>
<p rev="2.0.0">
A fixed-length character type, padded with trailing spaces if necessary to achieve the
specified length. If values are longer than the specified length, Impala truncates any
trailing characters.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<p>
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
</p>
<codeblock><varname>column_name</varname> CHAR(<varname>length</varname>)</codeblock>
<p>
The maximum <varname>length</varname> you can specify is 255.
</p>
<p>
<b>Semantics of trailing spaces:</b>
</p>
<ul>
<li>
When you store a <codeph>CHAR</codeph> value shorter than the specified length in a
table, queries return the value padded with trailing spaces if necessary; the resulting
value has the same length as specified in the column definition.
</li>
<li>
Leading spaces in <codeph>CHAR</codeph> are preserved within the data file.
</li>
<li>
If you store a <codeph>CHAR</codeph> value containing trailing spaces in a table, those
trailing spaces are not stored in the data file. When the value is retrieved by a query,
the result could have a different number of trailing spaces. That is, the value includes
however many spaces are needed to pad it to the specified length of the column.
</li>
<li>
If you compare two <codeph>CHAR</codeph> values that differ only in the number of
trailing spaces, those values are considered identical.
</li>
<li>
When comparing or processing <codeph>CHAR</codeph> values:
<ul>
<li>
<codeph>CAST()</codeph> truncates any longer string to fit within
the defined length. For example:
<codeblock>SELECT CAST('x' AS CHAR(4)) = CAST('x ' AS CHAR(4)); -- Returns TRUE.
</codeblock>
</li>
<li>
If a <codeph>CHAR</codeph> value is shorter than the specified
length, it is padded on the right with spaces until it matches the
specified length.
</li>
<li>
<codeph>CHAR_LENGTH()</codeph> returns the length including any
trailing spaces.
</li>
<li>
<codeph>LENGTH()</codeph> returns the length excluding trailing
spaces.
</li>
<li>
<codeph>CONCAT()</codeph> returns the length including trailing
spaces.
</li>
</ul>
</li>
</ul>
<p conref="../shared/impala_common.xml#common/partitioning_bad"/>
<p conref="../shared/impala_common.xml#common/hbase_no"/>
<p conref="../shared/impala_common.xml#common/parquet_blurb"/>
<ul>
<li>
This type can be read from and written to Parquet files.
</li>
<li>
There is no requirement for a particular level of Parquet.
</li>
<li>
Parquet files generated by Impala and containing this type can be freely interchanged
with other components such as Hive and MapReduce.
</li>
<li>
Any trailing spaces, whether implicitly or explicitly specified, are not written to the
Parquet data files.
</li>
<li>
Parquet data files might contain values that are longer than allowed by the
<codeph>CHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra
trailing characters when it processes those values during a query.
</li>
</ul>
<p conref="../shared/impala_common.xml#common/text_blurb"/>
<p>
Text data files might contain values that are longer than allowed for a particular
<codeph>CHAR(<varname>n</varname>)</codeph> column. Any extra trailing characters are
ignored when Impala processes those values during a query. Text data files can also
contain values that are shorter than the defined length limit, and Impala pads them with
trailing spaces up to the specified length. Any text data files produced by Impala
<codeph>INSERT</codeph> statements do not include any trailing blanks for
<codeph>CHAR</codeph> columns.
</p>
<p>
<b>Avro considerations:</b>
</p>
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
<p>
This type is available using <keyword keyref="impala20_full"/> or higher.
</p>
<p>
Some other database systems make the length specification optional. For Impala, the length
is required.
</p>
<!--
<p>
The Impala maximum length is larger than for the <codeph>CHAR</codeph> data type in Hive.
If a Hive query encounters a <codeph>CHAR</codeph> value longer than 255 during processing,
it silently treats the value as length 255.
</p>
-->
<p conref="../shared/impala_common.xml#common/internals_max_bytes"/>
<p conref="../shared/impala_common.xml#common/added_in_20"/>
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
<p conref="../shared/impala_common.xml#common/udf_blurb_no"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p>
<b>Performance consideration:</b>
</p>
<p>
The <codeph>CHAR</codeph> type currently does not have the Impala Codegen support, and we
recommend using <codeph>VARCHAR</codeph> or <codeph>STRING</codeph> over
<codeph>CHAR</codeph> as the performance gain of Codegen outweighs the benefits of fixed
width <codeph>CHAR</codeph>.
</p>
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
<p>
Because the blank-padding behavior requires allocating the maximum length for each value
in memory, for scalability reasons, you should avoid declaring <codeph>CHAR</codeph>
columns that are much longer than typical values in that column.
</p>
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>
<p>
When an expression compares a <codeph>CHAR</codeph> with a <codeph>STRING</codeph> or
<codeph>VARCHAR</codeph>, the <codeph>CHAR</codeph> value is implicitly converted to
<codeph>STRING</codeph> first, with trailing spaces preserved.
</p>
<p>
This behavior differs from other popular database systems. To get the expected result of
<codeph>TRUE</codeph>, cast the expressions on both sides to <codeph>CHAR</codeph> values
of the appropriate length. For example:
</p>
<codeblock>SELECT CAST("foo " AS CHAR(5)) = CAST('foo' AS CHAR(3)); -- Returns TRUE.</codeblock>
<p>
This behavior is subject to change in future releases.
</p>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_string.xml#string"/>, <xref href="impala_varchar.xml#varchar"/>,
<xref href="impala_literals.xml#string_literals"/>,
<xref href="impala_string_functions.xml#string_functions"/>
</p>
</conbody>
</concept>