blob: 6aa8b1e9cb69056ec1e206e9329fa3d50127cfc7 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="1.4.0" id="decimal">
<title>DECIMAL Data Type (<keyword keyref="impala14"/> or higher only)</title>
<titlealts audience="PDF"><navtitle>DECIMAL</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Data Types"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Schemas"/>
</metadata>
</prolog>
<conbody>
<p>
A numeric data type with fixed scale and precision, used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER
TABLE</codeph> statements. Suitable for financial and other arithmetic calculations where the imprecise
representation and rounding behavior of <codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> make those types
impractical.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<p>
In the column definition of a <codeph>CREATE TABLE</codeph> statement:
</p>
<codeblock><varname>column_name</varname> DECIMAL[(<varname>precision</varname>[,<varname>scale</varname>])]</codeblock>
<p>
<codeph>DECIMAL</codeph> with no precision or scale values is equivalent to <codeph>DECIMAL(9,0)</codeph>.
</p>
<p>
<b>Precision and Scale:</b>
</p>
<p>
<varname>precision</varname> represents the total number of digits that can be represented by the column,
regardless of the location of the decimal point. This value must be between 1 and 38. For example,
representing integer values up to 9999, and floating-point values up to 99.99, both require a precision of 4.
You can also represent corresponding negative values, without any change in the precision. For example, the
range -9999 to 9999 still only requires a precision of 4.
</p>
<p>
<varname>scale</varname> represents the number of fractional digits. This value must be less than or equal to
<varname>precision</varname>. A scale of 0 produces integral values, with no fractional part. If precision
and scale are equal, all the digits come after the decimal point, making all the values between 0 and
0.999... or 0 and -0.999...
</p>
<p>
When <varname>precision</varname> and <varname>scale</varname> are omitted, a <codeph>DECIMAL</codeph> value
is treated as <codeph>DECIMAL(9,0)</codeph>, that is, an integer value ranging from
<codeph>-999,999,999</codeph> to <codeph>999,999,999</codeph>. This is the largest <codeph>DECIMAL</codeph>
value that can still be represented in 4 bytes. If precision is specified but scale is omitted, Impala uses a
value of zero for the scale.
</p>
<p>
Both <varname>precision</varname> and <varname>scale</varname> must be specified as integer literals, not any
other kind of constant expressions.
</p>
<p>
To check the precision or scale for arbitrary values, you can call the
<xref href="impala_math_functions.xml#math_functions"><codeph>precision()</codeph> and
<codeph>scale()</codeph> built-in functions</xref>. For example, you might use these values to figure out how
many characters are required for various fields in a report, or to understand the rounding characteristics of
a formula as applied to a particular <codeph>DECIMAL</codeph> column.
</p>
<p>
<b>Range:</b>
</p>
<p>
The maximum precision value is 38. Thus, the largest integral value is represented by
<codeph>DECIMAL(38,0)</codeph> (999... with 9 repeated 38 times). The most precise fractional value (between
0 and 1, or 0 and -1) is represented by <codeph>DECIMAL(38,38)</codeph>, with 38 digits to the right of the
decimal point. The value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
would be .999... (9 repeated 38 times).
</p>
<p>
For a given precision and scale, the range of <codeph>DECIMAL</codeph> values is the same in the positive and
negative directions. For example, <codeph>DECIMAL(4,2)</codeph> can represent from -99.99 to 99.99. This is
different from other integral numeric types where the positive and negative bounds differ slightly.
</p>
<p>
When you use <codeph>DECIMAL</codeph> values in arithmetic expressions, the precision and scale of the result
value are determined as follows:
</p>
<ul>
<li>
<p>
For addition and subtraction, the precision and scale are based on the maximum possible result, that is,
if all the digits of the input values were 9s and the absolute values were added together.
</p>
<!-- Seems like buggy output from this first query, so hiding the example for the time being. -->
<codeblock audience="hidden"><![CDATA[[localhost:21000] > select 50000.5 + 12.444, precision(50000.5 + 12.444), scale(50000.5 + 12.444);
+------------------+-----------------------------+-------------------------+
| 50000.5 + 12.444 | precision(50000.5 + 12.444) | scale(50000.5 + 12.444) |
+------------------+-----------------------------+-------------------------+
| 50012.944 | 9 | 3 |
+------------------+-----------------------------+-------------------------+
[localhost:21000] > select 99999.9 + 99.999, precision(99999.9 + 99.999), scale(99999.9 + 99.999);
+------------------+-----------------------------+-------------------------+
| 99999.9 + 99.999 | precision(99999.9 + 99.999) | scale(99999.9 + 99.999) |
+------------------+-----------------------------+-------------------------+
| 100099.899 | 9 | 3 |
+------------------+-----------------------------+-------------------------+
]]>
</codeblock>
</li>
<li>
<p>
For multiplication, the precision is the sum of the precisions of the input values. The scale is the sum
of the scales of the input values.
</p>
</li>
<!-- Need to add some specifics to discussion of division. Details here: http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx -->
<li>
<p>
For division, Impala sets the precision and scale to values large enough to represent the whole and
fractional parts of the result.
</p>
</li>
<li>
<p>
For <codeph>UNION</codeph>, the scale is the larger of the scales of the input values, and the precision
is increased if necessary to accommodate any additional fractional digits. If the same input value has
the largest precision and the largest scale, the result value has the same precision and scale. If one
value has a larger precision but smaller scale, the scale of the result value is increased. For example,
<codeph>DECIMAL(20,2) UNION DECIMAL(8,6)</codeph> produces a result of type
<codeph>DECIMAL(24,6)</codeph>. The extra 4 fractional digits of scale (6-2) are accommodated by
extending the precision by the same amount (20+4).
</p>
</li>
<li>
<p>
To doublecheck, you can always call the <codeph>PRECISION()</codeph> and <codeph>SCALE()</codeph>
functions on the results of an arithmetic expression to see the relevant values, or use a <codeph>CREATE
TABLE AS SELECT</codeph> statement to define a column based on the return type of the expression.
</p>
</li>
</ul>
<p conref="../shared/impala_common.xml#common/compatibility_blurb"/>
<ul>
<li>
Using the <codeph>DECIMAL</codeph> type is only supported under <keyword keyref="impala14_full"/> and higher.
</li>
<li>
Use the <codeph>DECIMAL</codeph> data type in Impala for applications where you used the
<codeph>NUMBER</codeph> data type in Oracle. The Impala <codeph>DECIMAL</codeph> type does not support the
Oracle idioms of <codeph>*</codeph> for scale or negative values for precision.
</li>
</ul>
<p>
<b>Conversions and casting:</b>
</p>
<p>
<ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
</p>
<p>
Impala automatically converts between <codeph>DECIMAL</codeph> and other numeric types where possible. A
<codeph>DECIMAL</codeph> with zero scale is converted to or from the smallest appropriate integral type. A
<codeph>DECIMAL</codeph> with a fractional part is automatically converted to or from the smallest
appropriate floating-point type. If the destination type does not have sufficient precision or scale to hold
all possible values of the source type, Impala raises an error and does not convert the value.
</p>
<p>
For example, these statements show how expressions of <codeph>DECIMAL</codeph> and other types are reconciled
to the same type in the context of <codeph>UNION</codeph> queries and <codeph>INSERT</codeph> statements:
</p>
<codeblock><![CDATA[[localhost:21000] > select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+----------------+
| x |
+----------------+
| 1.5000 |
| 1.0000 |
+----------------+
[localhost:21000] > create table int_vs_decimal as select cast(1 as int) as x union select cast(1.5 as decimal(9,4)) as x;
+-------------------+
| summary |
+-------------------+
| Inserted 2 row(s) |
+-------------------+
[localhost:21000] > desc int_vs_decimal;
+------+---------------+---------+
| name | type | comment |
+------+---------------+---------+
| x | decimal(14,4) | |
+------+---------------+---------+
]]>
</codeblock>
<p>
To avoid potential conversion errors, you can use <codeph>CAST()</codeph> to convert <codeph>DECIMAL</codeph>
values to <codeph>FLOAT</codeph>, <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, <codeph>INT</codeph>,
<codeph>BIGINT</codeph>, <codeph>STRING</codeph>, <codeph>TIMESTAMP</codeph>, or <codeph>BOOLEAN</codeph>.
You can use exponential notation in <codeph>DECIMAL</codeph> literals or when casting from
<codeph>STRING</codeph>, for example <codeph>1.0e6</codeph> to represent one million.
</p>
<p>
If you cast a value with more fractional digits than the scale of the destination type, any extra fractional
digits are truncated (not rounded). Casting a value to a target type with not enough precision produces a
result of <codeph>NULL</codeph> and displays a runtime warning.
</p>
<codeblock><![CDATA[[localhost:21000] > select cast(1.239 as decimal(3,2));
+-----------------------------+
| cast(1.239 as decimal(3,2)) |
+-----------------------------+
| 1.23 |
+-----------------------------+
[localhost:21000] > select cast(1234 as decimal(3));
+----------------------------+
| cast(1234 as decimal(3,0)) |
+----------------------------+
| NULL |
+----------------------------+
WARNINGS: Expression overflowed, returning NULL
]]>
</codeblock>
<p>
When you specify integer literals, for example in <codeph>INSERT ... VALUES</codeph> statements or arithmetic
expressions, those numbers are interpreted as the smallest applicable integer type. You must use
<codeph>CAST()</codeph> calls for some combinations of integer literals and <codeph>DECIMAL</codeph>
precision. For example, <codeph>INT</codeph> has a maximum value that is 10 digits long,
<codeph>TINYINT</codeph> has a maximum value that is 3 digits long, and so on. If you specify a value such as
123456 to go into a <codeph>DECIMAL</codeph> column, Impala checks if the column has enough precision to
represent the largest value of that integer type, and raises an error if not. Therefore, use an expression
like <codeph>CAST(123456 TO DECIMAL(9,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 9 or
less, <codeph>CAST(50 TO DECIMAL(2,0))</codeph> for <codeph>DECIMAL</codeph> columns with precision 2 or
less, and so on. For <codeph>DECIMAL</codeph> columns with precision 10 or greater, Impala automatically
interprets the value as the correct <codeph>DECIMAL</codeph> type; however, because
<codeph>DECIMAL(10)</codeph> requires 8 bytes of storage while <codeph>DECIMAL(9)</codeph> requires only 4
bytes, only use precision of 10 or higher when actually needed.
</p>
<codeblock><![CDATA[[localhost:21000] > create table decimals_9_0 (x decimal);
[localhost:21000] > insert into decimals_9_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
ERROR: AnalysisException: Possible loss of precision for target table 'decimal_testing.decimals_9_0'.
Expression '1' (type: INT) would need to be cast to DECIMAL(9,0) for column 'x'
[localhost:21000] > insert into decimals_9_0 values (cast(1 as decimal)), (cast(2 as decimal)), (cast(4 as decimal)), (cast(8 as decimal)), (cast(16 as decimal)), (cast(1024 as decimal)), (cast(32768 as decimal)), (cast(65536 as decimal)), (cast(1000000 as decimal));
[localhost:21000] > create table decimals_10_0 (x decimal(10,0));
[localhost:21000] > insert into decimals_10_0 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000);
]]>
</codeblock>
<p>
Be aware that in memory and for binary file formats such as Parquet or Avro, <codeph>DECIMAL(10)</codeph> or
higher consumes 8 bytes while <codeph>DECIMAL(9)</codeph> (the default for <codeph>DECIMAL</codeph>) or lower
consumes 4 bytes. Therefore, to conserve space in large tables, use the smallest-precision
<codeph>DECIMAL</codeph> type that is appropriate and <codeph>CAST()</codeph> literal values where necessary,
rather than declaring <codeph>DECIMAL</codeph> columns with high precision for convenience.
</p>
<p>
To represent a very large or precise <codeph>DECIMAL</codeph> value as a literal, for example one that
contains more digits than can be represented by a <codeph>BIGINT</codeph> literal, use a quoted string or a
floating-point value for the number, and <codeph>CAST()</codeph> to the desired <codeph>DECIMAL</codeph>
type:
</p>
<codeblock>insert into decimals_38_5 values (1), (2), (4), (8), (16), (1024), (32768), (65536), (1000000),
(cast("999999999999999999999999999999" as decimal(38,5))),
(cast(999999999999999999999999999999. as decimal(38,5)));
</codeblock>
<ul>
<li>
<p> The result of the <codeph>SUM()</codeph> aggregate function on
<codeph>DECIMAL</codeph> values is promoted to a precision of 38,
with the same precision as the underlying column. Thus, the result can
represent the largest possible value at that particular precision. </p>
</li>
<li>
<p>
<codeph>STRING</codeph> columns, literals, or expressions can be converted to <codeph>DECIMAL</codeph> as
long as the overall number of digits and digits to the right of the decimal point fit within the
specified precision and scale for the declared <codeph>DECIMAL</codeph> type. By default, a
<codeph>DECIMAL</codeph> value with no specified scale or precision can hold a maximum of 9 digits of an
integer value. If there are more digits in the string value than are allowed by the
<codeph>DECIMAL</codeph> scale and precision, the result is <codeph>NULL</codeph>.
</p>
<p>
The following examples demonstrate how <codeph>STRING</codeph> values with integer and fractional parts
are represented when converted to <codeph>DECIMAL</codeph>. If the scale is 0, the number is treated
as an integer value with a maximum of <varname>precision</varname> digits. If the precision is greater than
0, the scale must be increased to account for the digits both to the left and right of the decimal point.
As the precision increases, output values are printed with additional trailing zeros after the decimal
point if needed. Any trailing zeros after the decimal point in the <codeph>STRING</codeph> value must fit
within the number of digits specified by the precision.
</p>
<codeblock><![CDATA[[localhost:21000] > select cast('100' as decimal); -- Small integer value fits within 9 digits of scale.
+-----------------------------+
| cast('100' as decimal(9,0)) |
+-----------------------------+
| 100 |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(3,0)); -- Small integer value fits within 3 digits of scale.
+-----------------------------+
| cast('100' as decimal(3,0)) |
+-----------------------------+
| 100 |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(2,0)); -- 2 digits of scale is not enough!
+-----------------------------+
| cast('100' as decimal(2,0)) |
+-----------------------------+
| NULL |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(3,1)); -- (3,1) = 2 digits left of the decimal point, 1 to the right. Not enough.
+-----------------------------+
| cast('100' as decimal(3,1)) |
+-----------------------------+
| NULL |
+-----------------------------+
[localhost:21000] > select cast('100' as decimal(4,1)); -- 4 digits total, 1 to the right of the decimal point.
+-----------------------------+
| cast('100' as decimal(4,1)) |
+-----------------------------+
| 100.0 |
+-----------------------------+
[localhost:21000] > select cast('98.6' as decimal(3,1)); -- (3,1) can hold a 3 digit number with 1 fractional digit.
+------------------------------+
| cast('98.6' as decimal(3,1)) |
+------------------------------+
| 98.6 |
+------------------------------+
[localhost:21000] > select cast('98.6' as decimal(15,1)); -- Larger scale allows bigger numbers but still only 1 fractional digit.
+-------------------------------+
| cast('98.6' as decimal(15,1)) |
+-------------------------------+
| 98.6 |
+-------------------------------+
[localhost:21000] > select cast('98.6' as decimal(15,5)); -- Larger precision allows more fractional digits, outputs trailing zeros.
+-------------------------------+
| cast('98.6' as decimal(15,5)) |
+-------------------------------+
| 98.60000 |
+-------------------------------+
[localhost:21000] > select cast('98.60000' as decimal(15,1)); -- Trailing zeros in the string must fit within 'scale' digits (1 in this case).
+-----------------------------------+
| cast('98.60000' as decimal(15,1)) |
+-----------------------------------+
| NULL |
+-----------------------------------+
]]>
</codeblock>
</li>
<li>
Most built-in arithmetic functions such as <codeph>SIN()</codeph> and <codeph>COS()</codeph> continue to
accept only <codeph>DOUBLE</codeph> values because they are so commonly used in scientific context for
calculations of IEEE 954-compliant values. The built-in functions that accept and return
<codeph>DECIMAL</codeph> are:
<!-- List from Skye: positive, negative, least, greatest, fnv_hash, if, nullif, zeroifnull, isnull, coalesce -->
<!-- Nong had already told me about abs, ceil, floor, round, truncate -->
<ul>
<li>
<codeph>ABS()</codeph>
</li>
<li>
<codeph>CEIL()</codeph>
</li>
<li>
<codeph>COALESCE()</codeph>
</li>
<li>
<codeph>FLOOR()</codeph>
</li>
<li>
<codeph>FNV_HASH()</codeph>
</li>
<li>
<codeph>GREATEST()</codeph>
</li>
<li>
<codeph>IF()</codeph>
</li>
<li>
<codeph>ISNULL()</codeph>
</li>
<li>
<codeph>LEAST()</codeph>
</li>
<li>
<codeph>NEGATIVE()</codeph>
</li>
<li>
<codeph>NULLIF()</codeph>
</li>
<li>
<codeph>POSITIVE()</codeph>
</li>
<li>
<codeph>PRECISION()</codeph>
</li>
<li>
<codeph>ROUND()</codeph>
</li>
<li>
<codeph>SCALE()</codeph>
</li>
<li>
<codeph>TRUNCATE()</codeph>
</li>
<li>
<codeph>ZEROIFNULL()</codeph>
</li>
</ul>
See <xref href="impala_functions.xml#builtins"/> for details.
</li>
<li>
<p>
<codeph>BIGINT</codeph>, <codeph>INT</codeph>, <codeph>SMALLINT</codeph>, and <codeph>TINYINT</codeph>
values can all be cast to <codeph>DECIMAL</codeph>. The number of digits to the left of the decimal point
in the <codeph>DECIMAL</codeph> type must be sufficient to hold the largest value of the corresponding
integer type. Note that integer literals are treated as the smallest appropriate integer type, meaning
there is sometimes a range of values that require one more digit of <codeph>DECIMAL</codeph> scale than
you might expect. For integer values, the precision of the <codeph>DECIMAL</codeph> type can be zero; if
the precision is greater than zero, remember to increase the scale value by an equivalent amount to hold
the required number of digits to the left of the decimal point.
</p>
<p>
The following examples show how different integer types are converted to <codeph>DECIMAL</codeph>.
</p>
<!-- According to Nong, it's a bug that so many integer digits can be converted to a DECIMAL
value with small (s,p) spec. So expect to re-do this example. -->
<codeblock><![CDATA[[localhost:21000] > select cast(1 as decimal(1,0));
+-------------------------+
| cast(1 as decimal(1,0)) |
+-------------------------+
| 1 |
+-------------------------+
[localhost:21000] > select cast(9 as decimal(1,0));
+-------------------------+
| cast(9 as decimal(1,0)) |
+-------------------------+
| 9 |
+-------------------------+
[localhost:21000] > select cast(10 as decimal(1,0));
+--------------------------+
| cast(10 as decimal(1,0)) |
+--------------------------+
| 10 |
+--------------------------+
[localhost:21000] > select cast(10 as decimal(1,1));
+--------------------------+
| cast(10 as decimal(1,1)) |
+--------------------------+
| 10.0 |
+--------------------------+
[localhost:21000] > select cast(100 as decimal(1,1));
+---------------------------+
| cast(100 as decimal(1,1)) |
+---------------------------+
| 100.0 |
+---------------------------+
[localhost:21000] > select cast(1000 as decimal(1,1));
+----------------------------+
| cast(1000 as decimal(1,1)) |
+----------------------------+
| 1000.0 |
+----------------------------+
]]>
</codeblock>
</li>
<li>
<p>
When a <codeph>DECIMAL</codeph> value is converted to any of the integer types, any fractional part is
truncated (that is, rounded towards zero):
</p>
<codeblock><![CDATA[[localhost:21000] > create table num_dec_days (x decimal(4,1));
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
[localhost:21000] > insert into num_dec_days values (cast(0.1 as decimal(4,1))), (cast(.9 as decimal(4,1))), (cast(9.1 as decimal(4,1))), (cast(9.9 as decimal(4,1)));
[localhost:21000] > select cast(x as int) from num_dec_days;
+----------------+
| cast(x as int) |
+----------------+
| 1 |
| 2 |
| 4 |
| 0 |
| 0 |
| 9 |
| 9 |
+----------------+
]]>
</codeblock>
</li>
<li>
<p>
You cannot directly cast <codeph>TIMESTAMP</codeph> or <codeph>BOOLEAN</codeph> values to or from
<codeph>DECIMAL</codeph> values. You can turn a <codeph>DECIMAL</codeph> value into a time-related
representation using a two-step process, by converting it to an integer value and then using that result
in a call to a date and time function such as <codeph>from_unixtime()</codeph>.
</p>
<codeblock><![CDATA[[localhost:21000] > select from_unixtime(cast(cast(1000.0 as decimal) as bigint));
+-------------------------------------------------------------+
| from_unixtime(cast(cast(1000.0 as decimal(9,0)) as bigint)) |
+-------------------------------------------------------------+
| 1970-01-01 00:16:40 |
+-------------------------------------------------------------+
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- x is a DECIMAL column.
[localhost:21000] > create table num_dec_days (x decimal(4,1));
[localhost:21000] > insert into num_dec_days values (1), (2), (cast(4.5 as decimal(4,1)));
[localhost:21000] > select now() + interval cast(x as int) days from num_dec_days; -- The 4.5 value is truncated to 4 and becomes '4 days'.
+--------------------------------------+
| now() + interval cast(x as int) days |
+--------------------------------------+
| 2014-05-13 23:11:55.163284000 |
| 2014-05-14 23:11:55.163284000 |
| 2014-05-16 23:11:55.163284000 |
+--------------------------------------+
]]>
</codeblock>
</li>
<li>
<p>
Because values in <codeph>INSERT</codeph> statements are checked rigorously for type compatibility, be
prepared to use <codeph>CAST()</codeph> function calls around literals, column references, or other
expressions that you are inserting into a <codeph>DECIMAL</codeph> column.
</p>
</li>
</ul>
<p conref="../shared/impala_common.xml#common/null_bad_numeric_cast"/>
<p>
<b>DECIMAL differences from integer and floating-point types:</b>
</p>
<p>
With the <codeph>DECIMAL</codeph> type, you are concerned with the number of overall digits of a number
rather than powers of 2 (as in <codeph>TINYINT</codeph>, <codeph>SMALLINT</codeph>, and so on). Therefore,
the limits with integral values of <codeph>DECIMAL</codeph> types fall around 99, 999, 9999, and so on rather
than 32767, 65535, 2
<sup>32</sup>
-1, and so on. For fractional values, you do not need to account for imprecise representation of the
fractional part according to the IEEE-954 standard (as in <codeph>FLOAT</codeph> and
<codeph>DOUBLE</codeph>). Therefore, when you insert a fractional value into a <codeph>DECIMAL</codeph>
column, you can compare, sum, query, <codeph>GROUP BY</codeph>, and so on that column and get back the
original values rather than some <q>close but not identical</q> value.
</p>
<p>
<codeph>FLOAT</codeph> and <codeph>DOUBLE</codeph> can cause problems or unexpected behavior due to inability
to precisely represent certain fractional values, for example dollar and cents values for currency. You might
find output values slightly different than you inserted, equality tests that do not match precisely, or
unexpected values for <codeph>GROUP BY</codeph> columns. <codeph>DECIMAL</codeph> can help reduce unexpected
behavior and rounding errors, at the expense of some performance overhead for assignments and comparisons.
</p>
<p>
<b>Literals and expressions:</b>
<ul>
<li>
<p>
When you use an integer literal such as <codeph>1</codeph> or <codeph>999</codeph> in a SQL statement,
depending on the context, Impala will treat it as either the smallest appropriate
<codeph>DECIMAL</codeph> type, or the smallest integer type (<codeph>TINYINT</codeph>,
<codeph>SMALLINT</codeph>, <codeph>INT</codeph>, or <codeph>BIGINT</codeph>). To minimize memory usage,
Impala prefers to treat the literal as the smallest appropriate integer type.
</p>
</li>
<li>
<p>
When you use a floating-point literal such as <codeph>1.1</codeph> or <codeph>999.44</codeph> in a SQL
statement, depending on the context, Impala will treat it as either the smallest appropriate
<codeph>DECIMAL</codeph> type, or the smallest floating-point type (<codeph>FLOAT</codeph> or
<codeph>DOUBLE</codeph>). To avoid loss of accuracy, Impala prefers to treat the literal as a
<codeph>DECIMAL</codeph>.
</p>
</li>
</ul>
</p>
<p>
<b>Storage considerations:</b>
</p>
<ul>
<li>
Only the precision determines the storage size for <codeph>DECIMAL</codeph> values; the scale setting has
no effect on the storage size.
</li>
<li>
Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these text-based file formats,
leading zeros are not stored, but trailing zeros are stored. In these tables, each <codeph>DECIMAL</codeph>
value takes up as many bytes as there are digits in the value, plus an extra byte if the decimal point is
present and an extra byte for negative values. Once the values are loaded into memory, they are represented
in 4, 8, or 16 bytes as described in the following list items. The on-disk representation varies depending
on the file format of the table.
</li>
<!-- Next couple of points can be conref'ed with identical list bullets farther down under File Format Considerations. -->
<li>
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in as few bytes as
possible
<!-- 4, 8, or 16 bytes -->
depending on the precision specified for the <codeph>DECIMAL</codeph> column.
<ul>
<li>
In memory, <codeph>DECIMAL</codeph> values with precision of 9 or less are stored in 4 bytes.
</li>
<li>
In memory, <codeph>DECIMAL</codeph> values with precision of 10 through 18 are stored in 8 bytes.
</li>
<li>
In memory, <codeph>DECIMAL</codeph> values with precision greater than 18 are stored in 16 bytes.
</li>
</ul>
</li>
</ul>
<p conref="../shared/impala_common.xml#common/file_format_blurb"/>
<ul>
<li>
The <codeph>DECIMAL</codeph> data type can be stored in any of the file formats supported by Impala, as
described in <xref href="impala_file_formats.xml#file_formats"/>. Impala only writes to tables that use the
Parquet and text formats, so those formats are the focus for file format compatibility.
</li>
<li>
Impala can query Avro, RCFile, or SequenceFile tables containing <codeph>DECIMAL</codeph> columns, created
by other Hadoop components.
</li>
<li>
You can use <codeph>DECIMAL</codeph> columns in Impala tables that are mapped to HBase tables. Impala can
query and insert into such tables.
</li>
<li>
Text, RCFile, and SequenceFile tables all use ASCII-based formats. In these tables, each
<codeph>DECIMAL</codeph> value takes up as many bytes as there are digits in the value, plus an extra byte
if the decimal point is present. The binary format of Parquet or Avro files offers more compact storage for
<codeph>DECIMAL</codeph> columns.
</li>
<li>
Parquet and Avro tables use binary formats, In these tables, Impala stores each value in 4, 8, or 16 bytes
depending on the precision specified for the <codeph>DECIMAL</codeph> column.
</li>
</ul>
<p>
<b>UDF considerations:</b> When writing a C++ UDF, use the <codeph>DecimalVal</codeph> data type defined in
<filepath>/usr/include/impala_udf/udf.h</filepath>.
</p>
<p conref="../shared/impala_common.xml#common/partitioning_blurb"/>
<p>
You can use a <codeph>DECIMAL</codeph> column as a partition key. Doing so provides a better match between
the partition key values and the HDFS directory names than using a <codeph>DOUBLE</codeph> or
<codeph>FLOAT</codeph> partitioning column:
</p>
<p conref="../shared/impala_common.xml#common/schema_evolution_blurb"/>
<ul>
<li>
For text-based formats (text, RCFile, and SequenceFile tables), you can issue an <codeph>ALTER TABLE ...
REPLACE COLUMNS</codeph> statement to change the precision and scale of an existing
<codeph>DECIMAL</codeph> column. As long as the values in the column fit within the new precision and
scale, they are returned correctly by a query. Any values that do not fit within the new precision and
scale are returned as <codeph>NULL</codeph>, and Impala reports the conversion error. Leading zeros do not
count against the precision value, but trailing zeros after the decimal point do.
<codeblock><![CDATA[[localhost:21000] > create table text_decimals (x string);
[localhost:21000] > insert into text_decimals values ("1"), ("2"), ("99.99"), ("1.234"), ("000001"), ("1.000000000");
[localhost:21000] > select * from text_decimals;
+-------------+
| x |
+-------------+
| 1 |
| 2 |
| 99.99 |
| 1.234 |
| 000001 |
| 1.000000000 |
+-------------+
[localhost:21000] > alter table text_decimals replace columns (x decimal(4,2));
[localhost:21000] > select * from text_decimals;
+-------+
| x |
+-------+
| 1.00 |
| 2.00 |
| 99.99 |
| NULL |
| 1.00 |
| NULL |
+-------+
ERRORS:
Backend 0:Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.234)
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/634d4bd3aa0
e8420-b4b13bab7f1be787_56794587_data.0
record: 1.234
Error converting column: 0 TO DECIMAL(4, 2) (Data is: 1.000000000)
file: hdfs://127.0.0.1:8020/user/hive/warehouse/decimal_testing.db/text_decimals/cd40dc68e20
c565a-cc4bd86c724c96ba_311873428_data.0
record: 1.000000000
]]>
</codeblock>
</li>
<li>
For binary formats (Parquet and Avro tables), although an <codeph>ALTER TABLE ... REPLACE COLUMNS</codeph>
statement that changes the precision or scale of a <codeph>DECIMAL</codeph> column succeeds, any subsequent
attempt to query the changed column results in a fatal error. (The other columns can still be queried
successfully.) This is because the metadata about the columns is stored in the data files themselves, and
<codeph>ALTER TABLE</codeph> does not actually make any updates to the data files. If the metadata in the
data files disagrees with the metadata in the metastore database, Impala cancels the query.
</li>
</ul>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<codeblock>CREATE TABLE t1 (x DECIMAL, y DECIMAL(5,2), z DECIMAL(25,0));
INSERT INTO t1 VALUES (5, 99.44, 123456), (300, 6.7, 999999999);
SELECT x+y, ROUND(y,1), z/98.6 FROM t1;
SELECT CAST(1000.5 AS DECIMAL);
</codeblock>
<!-- <p conref="../shared/impala_common.xml#common/partitioning_good"/> -->
<p conref="../shared/impala_common.xml#common/hbase_ok"/>
<p conref="../shared/impala_common.xml#common/parquet_ok"/>
<p conref="../shared/impala_common.xml#common/text_bulky"/>
<!-- <p conref="../shared/impala_common.xml#common/compatibility_blurb"/> -->
<!-- <p conref="../shared/impala_common.xml#common/internals_blurb"/> -->
<!-- <p conref="../shared/impala_common.xml#common/added_in_20"/> -->
<p conref="../shared/impala_common.xml#common/column_stats_constant"/>
<p conref="../shared/impala_common.xml#common/kudu_blurb"/>
<p conref="../shared/impala_common.xml#common/kudu_unsupported_data_type"/>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_literals.xml#numeric_literals"/>, <xref href="impala_tinyint.xml#tinyint"/>,
<xref href="impala_smallint.xml#smallint"/>, <xref href="impala_int.xml#int"/>,
<xref href="impala_bigint.xml#bigint"/>, <xref href="impala_decimal.xml#decimal"/>,
<xref href="impala_math_functions.xml#math_functions"/> (especially <codeph>PRECISION()</codeph> and
<codeph>SCALE()</codeph>)
</p>
</conbody>
</concept>