| <?xml version="1.0" encoding="UTF-8"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| <!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> |
| <concept id="char" rev="2.0.0"> |
| |
| <title>CHAR Data Type (<keyword keyref="impala20"/> or higher only)</title> |
| <titlealts audience="PDF"><navtitle>CHAR</navtitle></titlealts> |
| <prolog> |
| <metadata> |
| <data name="Category" value="Impala"/> |
| <data name="Category" value="Impala Data Types"/> |
| <data name="Category" value="SQL"/> |
| <data name="Category" value="Data Analysts"/> |
| <data name="Category" value="Developers"/> |
| <data name="Category" value="Schemas"/> |
| </metadata> |
| </prolog> |
| |
| <conbody> |
| |
| <p rev="2.0.0"> |
| <indexterm audience="hidden">CHAR data type</indexterm> |
| A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If |
| values are longer than the specified length, Impala truncates any trailing characters. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/syntax_blurb"/> |
| |
| <p> |
| In the column definition of a <codeph>CREATE TABLE</codeph> statement: |
| </p> |
| |
| <codeblock><varname>column_name</varname> CHAR(<varname>length</varname>)</codeblock> |
| |
| <p> |
| The maximum length you can specify is 255. |
| </p> |
| |
| <p> |
| <b>Semantics of trailing spaces:</b> |
| </p> |
| |
| <ul> |
| <li> |
| When you store a <codeph>CHAR</codeph> value shorter than the specified length in a table, queries return |
| the value padded with trailing spaces if necessary; the resulting value has the same length as specified in |
| the column definition. |
| </li> |
| |
| <li> |
| If you store a <codeph>CHAR</codeph> value containing trailing spaces in a table, those trailing spaces are |
| not stored in the data file. When the value is retrieved by a query, the result could have a different |
| number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the |
| specified length of the column. |
| </li> |
| |
| <li> |
| If you compare two <codeph>CHAR</codeph> values that differ only in the number of trailing spaces, those |
| values are considered identical. |
| </li> |
| </ul> |
| |
| <p conref="../shared/impala_common.xml#common/partitioning_bad"/> |
| |
| <p conref="../shared/impala_common.xml#common/hbase_no"/> |
| |
| <p conref="../shared/impala_common.xml#common/parquet_blurb"/> |
| |
| <ul> |
| <li> |
| This type can be read from and written to Parquet files. |
| </li> |
| |
| <li> |
| There is no requirement for a particular level of Parquet. |
| </li> |
| |
| <li> |
| Parquet files generated by Impala and containing this type can be freely interchanged with other components |
| such as Hive and MapReduce. |
| </li> |
| |
| <li> |
| Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files. |
| </li> |
| |
| <li> |
| Parquet data files might contain values that are longer than allowed by the |
| <codeph>CHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra trailing characters when |
| it processes those values during a query. |
| </li> |
| </ul> |
| |
| <p conref="../shared/impala_common.xml#common/text_blurb"/> |
| |
| <p> |
| Text data files might contain values that are longer than allowed for a particular |
| <codeph>CHAR(<varname>n</varname>)</codeph> column. Any extra trailing characters are ignored when Impala |
| processes those values during a query. Text data files can also contain values that are shorter than the |
| defined length limit, and Impala pads them with trailing spaces up to the specified length. Any text data |
| files produced by Impala <codeph>INSERT</codeph> statements do not include any trailing blanks for |
| <codeph>CHAR</codeph> columns. |
| </p> |
| |
| <p><b>Avro considerations:</b></p> |
| <p conref="../shared/impala_common.xml#common/avro_2gb_strings"/> |
| |
| <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> |
| |
| <p> |
| This type is available using <keyword keyref="impala20_full"/> or higher. |
| </p> |
| |
| <p> |
| Some other database systems make the length specification optional. For Impala, the length is required. |
| </p> |
| |
| <!-- |
| <p> |
| The Impala maximum length is larger than for the <codeph>CHAR</codeph> data type in Hive. |
| If a Hive query encounters a <codeph>CHAR</codeph> value longer than 255 during processing, |
| it silently treats the value as length 255. |
| </p> |
| --> |
| |
| <p conref="../shared/impala_common.xml#common/internals_max_bytes"/> |
| |
| <p conref="../shared/impala_common.xml#common/added_in_20"/> |
| |
| <p conref="../shared/impala_common.xml#common/column_stats_constant"/> |
| |
| <!-- Seems like a logical design decision but don't think it's currently implemented like this. |
| <p> |
| Because both the maximum and average length are always known and always the same for |
| any given <codeph>CHAR(<varname>n</varname>)</codeph> column, those fields are always filled |
| in for <codeph>SHOW COLUMN STATS</codeph> output, even before you run |
| <codeph>COMPUTE STATS</codeph> on the table. |
| </p> |
| --> |
| |
| <p conref="../shared/impala_common.xml#common/udf_blurb_no"/> |
| |
| <p conref="../shared/impala_common.xml#common/example_blurb"/> |
| |
| <p> |
| These examples show how trailing spaces are not considered significant when comparing or processing |
| <codeph>CHAR</codeph> values. <codeph>CAST()</codeph> truncates any longer string to fit within the defined |
| length. If a <codeph>CHAR</codeph> value is shorter than the specified length, it is padded on the right with |
| spaces until it matches the specified length. Therefore, <codeph>LENGTH()</codeph> represents the length |
| including any trailing spaces, and <codeph>CONCAT()</codeph> also treats the column value as if it has |
| trailing spaces. |
| </p> |
| |
| <codeblock>select cast('x' as char(4)) = cast('x ' as char(4)) as "unpadded equal to padded"; |
| +--------------------------+ |
| | unpadded equal to padded | |
| +--------------------------+ |
| | true | |
| +--------------------------+ |
| |
| create table char_length(c char(3)); |
| insert into char_length values (cast('1' as char(3))), (cast('12' as char(3))), (cast('123' as char(3))), (cast('123456' as char(3))); |
| select concat("[",c,"]") as c, length(c) from char_length; |
| +-------+-----------+ |
| | c | length(c) | |
| +-------+-----------+ |
| | [1 ] | 3 | |
| | [12 ] | 3 | |
| | [123] | 3 | |
| | [123] | 3 | |
| +-------+-----------+ |
| </codeblock> |
| |
| <p> |
| This example shows a case where data values are known to have a specific length, where <codeph>CHAR</codeph> |
| is a logical data type to use. |
| <!-- |
| Because all the <codeph>CHAR</codeph> values have a constant predictable length, |
| Impala can efficiently analyze how best to use these values in join queries, |
| aggregation queries, and other contexts where column length is significant. |
| --> |
| </p> |
| |
| <codeblock>create table addresses |
| (id bigint, |
| street_name string, |
| state_abbreviation char(2), |
| country_abbreviation char(2)); |
| </codeblock> |
| |
| <p> |
| The following example shows how values written by Impala do not physically include the trailing spaces. It |
| creates a table using text format, with <codeph>CHAR</codeph> values much shorter than the declared length, |
| and then prints the resulting data file to show that the delimited values are not separated by spaces. The |
| same behavior applies to binary-format Parquet data files. |
| </p> |
| |
| <codeblock>create table char_in_text (a char(20), b char(30), c char(40)) |
| row format delimited fields terminated by ','; |
| |
| insert into char_in_text values (cast('foo' as char(20)), cast('bar' as char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' as char(30)), cast('aloha' as char(40))); |
| |
| -- Running this Linux command inside impala-shell using the ! shortcut. |
| !hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*'; |
| foo,bar,baz |
| hello,goodbye,aloha |
| </codeblock> |
| |
| <p> |
| The following example further illustrates the treatment of spaces. It replaces the contents of the previous |
| table with some values including leading spaces, trailing spaces, or both. Any leading spaces are preserved |
| within the data file, but trailing spaces are discarded. Then when the values are retrieved by a query, the |
| leading spaces are retrieved verbatim while any necessary trailing spaces are supplied by Impala. |
| </p> |
| |
| <codeblock>insert overwrite char_in_text values (cast('trailing ' as char(20)), cast(' leading and trailing ' as char(30)), cast(' leading' as char(40))); |
| !hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*'; |
| trailing, leading and trailing, leading |
| |
| select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c from char_in_text; |
| +------------------------+----------------------------------+--------------------------------------------+ |
| | a | b | c | |
| +------------------------+----------------------------------+--------------------------------------------+ |
| | [trailing ] | [ leading and trailing ] | [ leading ] | |
| +------------------------+----------------------------------+--------------------------------------------+ |
| </codeblock> |
| |
| <p conref="../shared/impala_common.xml#common/kudu_blurb"/> |
| <p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/> |
| |
| <p conref="../shared/impala_common.xml#common/restrictions_blurb"/> |
| |
| <p> |
| Because the blank-padding behavior requires allocating the maximum length for each value in memory, for |
| scalability reasons avoid declaring <codeph>CHAR</codeph> columns that are much longer than typical values in |
| that column. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/blobs_are_strings"/> |
| |
| <p> |
| When an expression compares a <codeph>CHAR</codeph> with a <codeph>STRING</codeph> or |
| <codeph>VARCHAR</codeph>, the <codeph>CHAR</codeph> value is implicitly converted to <codeph>STRING</codeph> |
| first, with trailing spaces preserved. |
| </p> |
| |
| <codeblock>select cast("foo " as char(5)) = 'foo' as "char equal to string"; |
| +----------------------+ |
| | char equal to string | |
| +----------------------+ |
| | false | |
| +----------------------+ |
| </codeblock> |
| |
| <p> |
| This behavior differs from other popular database systems. To get the expected result of |
| <codeph>TRUE</codeph>, cast the expressions on both sides to <codeph>CHAR</codeph> values of the appropriate |
| length: |
| </p> |
| |
| <codeblock>select cast("foo " as char(5)) = cast('foo' as char(3)) as "char equal to string"; |
| +----------------------+ |
| | char equal to string | |
| +----------------------+ |
| | true | |
| +----------------------+ |
| </codeblock> |
| |
| <p> |
| This behavior is subject to change in future releases. |
| </p> |
| |
| <p conref="../shared/impala_common.xml#common/related_info"/> |
| |
| <p> |
| <xref href="impala_string.xml#string"/>, <xref href="impala_varchar.xml#varchar"/>, |
| <xref href="impala_literals.xml#string_literals"/>, |
| <xref href="impala_string_functions.xml#string_functions"/> |
| </p> |
| </conbody> |
| </concept> |