blob: a0cb16cbd46f9f9e6786f4c30efd3e8a7bba2a63 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="varchar" rev="2.0.0">
<title>VARCHAR Data Type (<keyword keyref="impala20"/> or higher only)</title>
<titlealts audience="PDF"><navtitle>VARCHAR</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Data Types"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Schemas"/>
</metadata>
</prolog>
<conbody>
<p rev="2.0.0">
<indexterm audience="hidden">VARCHAR data type</indexterm>
A variable-length character type, truncated during processing if necessary to fit within the specified
length.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<p>
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
</p>
<codeblock><varname>column_name</varname> VARCHAR(<varname>max_length</varname>)</codeblock>
<p>
The maximum length you can specify is 65,535.
</p>
<p conref="../shared/impala_common.xml#common/partitioning_bad"/>
<!--
<p>
This type can be used for partition key columns.
Because of the efficiency advantage of numeric values over character-based values,
if the partition key is a string representation of a number,
prefer to use an integer data type with sufficient range (<codeph>INT</codeph>,
<codeph>BIGINT</codeph>, and so on) rather than this type.
</p>
-->
<p conref="../shared/impala_common.xml#common/hbase_no"/>
<p conref="../shared/impala_common.xml#common/parquet_blurb"/>
<ul>
<li>
This type can be read from and written to Parquet files.
</li>
<li>
There is no requirement for a particular level of Parquet.
</li>
<li>
Parquet files generated by Impala and containing this type can be freely interchanged with other components
such as Hive and MapReduce.
</li>
<li>
Parquet data files can contain values that are longer than allowed by the
<codeph>VARCHAR(<varname>n</varname>)</codeph> length limit. Impala ignores any extra trailing characters
when it processes those values during a query.
</li>
</ul>
<p conref="../shared/impala_common.xml#common/text_blurb"/>
<p>
Text data files can contain values that are longer than allowed by the
<codeph>VARCHAR(<varname>n</varname>)</codeph> length limit. Any extra trailing characters are ignored when
Impala processes those values during a query.
</p>
<p><b>Avro considerations:</b></p>
<p conref="../shared/impala_common.xml#common/avro_2gb_strings"/>
<p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>
<p>
You can use <codeph>ALTER TABLE ... CHANGE</codeph> to switch column data types to and from
<codeph>VARCHAR</codeph>. You can convert from <codeph>STRING</codeph> to
<codeph>VARCHAR(<varname>n</varname>)</codeph>, or from <codeph>VARCHAR(<varname>n</varname>)</codeph> to
<codeph>STRING</codeph>, or from <codeph>CHAR(<varname>n</varname>)</codeph> to
<codeph>VARCHAR(<varname>n</varname>)</codeph>, or from <codeph>VARCHAR(<varname>n</varname>)</codeph> to
<codeph>CHAR(<varname>n</varname>)</codeph>. When switching back and forth between <codeph>VARCHAR</codeph>
and <codeph>CHAR</codeph>, you can also change the length value. This schema evolution works the same for
tables using any file format. If a table contains values longer than the maximum length defined for a
<codeph>VARCHAR</codeph> column, Impala does not return an error. Any extra trailing characters are ignored
when Impala processes those values during a query.
</p>
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
<p>
This type is available in <keyword keyref="impala20_full"/> or higher.
</p>
<p conref="../shared/impala_common.xml#common/internals_min_bytes"/>
<p conref="../shared/impala_common.xml#common/added_in_20"/>
<p conref="../shared/impala_common.xml#common/column_stats_variable"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
<p conref="../shared/impala_common.xml#common/blobs_are_strings"/>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
The following examples show how long and short <codeph>VARCHAR</codeph> values are treated. Values longer
than the maximum specified length are truncated by <codeph>CAST()</codeph>, or when queried from existing
data files. Values shorter than the maximum specified length are represented as the actual length of the
value, with no extra padding as seen with <codeph>CHAR</codeph> values.
</p>
<codeblock>create table varchar_1 (s varchar(1));
create table varchar_4 (s varchar(4));
create table varchar_20 (s varchar(20));
insert into varchar_1 values (cast('a' as varchar(1))), (cast('b' as varchar(1))), (cast('hello' as varchar(1))), (cast('world' as varchar(1)));
insert into varchar_4 values (cast('a' as varchar(4))), (cast('b' as varchar(4))), (cast('hello' as varchar(4))), (cast('world' as varchar(4)));
insert into varchar_20 values (cast('a' as varchar(20))), (cast('b' as varchar(20))), (cast('hello' as varchar(20))), (cast('world' as varchar(20)));
select * from varchar_1;
+---+
| s |
+---+
| a |
| b |
| h |
| w |
+---+
select * from varchar_4;
+------+
| s |
+------+
| a |
| b |
| hell |
| worl |
+------+
[localhost:21000] &gt; select * from varchar_20;
+-------+
| s |
+-------+
| a |
| b |
| hello |
| world |
+-------+
select concat('[',s,']') as s from varchar_20;
+---------+
| s |
+---------+
| [a] |
| [b] |
| [hello] |
| [world] |
+---------+
</codeblock>
<p>
The following example shows how identical <codeph>VARCHAR</codeph> values compare as equal, even if the
columns are defined with different maximum lengths. Both tables contain <codeph>'a'</codeph> and
<codeph>'b'</codeph> values. The longer <codeph>'hello'</codeph> and <codeph>'world'</codeph> values from the
<codeph>VARCHAR_20</codeph> table were truncated when inserted into the <codeph>VARCHAR_1</codeph> table.
</p>
<codeblock>select s from varchar_1 join varchar_20 using (s);
+-------+
| s |
+-------+
| a |
| b |
+-------+
</codeblock>
<p>
The following examples show how <codeph>VARCHAR</codeph> values are freely interchangeable with
<codeph>STRING</codeph> values in contexts such as comparison operators and built-in functions:
</p>
<codeblock>select length(cast('foo' as varchar(100))) as length;
+--------+
| length |
+--------+
| 3 |
+--------+
select cast('xyz' as varchar(5)) &gt; cast('abc' as varchar(10)) as greater;
+---------+
| greater |
+---------+
| true |
+---------+
</codeblock>
<p conref="../shared/impala_common.xml#common/udf_blurb_no"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_string.xml#string"/>, <xref href="impala_char.xml#char"/>,
<xref href="impala_literals.xml#string_literals"/>,
<xref href="impala_string_functions.xml#string_functions"/>
</p>
</conbody>
</concept>