blob: f64b68bcd3f0541e730af0e8e40a698e0865527d [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept rev="2.1.0" id="drop_stats">
<title>DROP STATS Statement</title>
<titlealts audience="PDF"><navtitle>DROP STATS</navtitle></titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="SQL"/>
<data name="Category" value="DDL"/>
<data name="Category" value="ETL"/>
<data name="Category" value="Ingest"/>
<data name="Category" value="Tables"/>
<data name="Category" value="Performance"/>
<data name="Category" value="Scalability"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Data Analysts"/>
</metadata>
</prolog>
<conbody>
<p rev="2.1.0">
<indexterm audience="hidden">DROP STATS statement</indexterm>
Removes the specified statistics from a table or partition. The statistics were originally created by the
<codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph> statement.
</p>
<p conref="../shared/impala_common.xml#common/syntax_blurb"/>
<codeblock rev="2.1.0">DROP STATS [<varname>database_name</varname>.]<varname>table_name</varname>
DROP INCREMENTAL STATS [<varname>database_name</varname>.]<varname>table_name</varname> PARTITION (<varname>partition_spec</varname>)
<varname>partition_spec</varname> ::= <varname>partition_col</varname>=<varname>constant_value</varname>
</codeblock>
<p conref="../shared/impala_common.xml#common/incremental_partition_spec"/>
<p>
<codeph>DROP STATS</codeph> removes all statistics from the table, whether created by <codeph>COMPUTE
STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph>.
</p>
<p rev="2.1.0">
<codeph>DROP INCREMENTAL STATS</codeph> only affects incremental statistics for a single partition, specified
through the <codeph>PARTITION</codeph> clause. The incremental stats are marked as outdated, so that they are
recomputed by the next <codeph>COMPUTE INCREMENTAL STATS</codeph> statement.
</p>
<!-- To do: what release was this added in? -->
<p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
<p>
You typically use this statement when the statistics for a table or a partition have become stale due to data
files being added to or removed from the associated HDFS data directories, whether by manual HDFS operations
or <codeph>INSERT</codeph>, <codeph>INSERT OVERWRITE</codeph>, or <codeph>LOAD DATA</codeph> statements, or
adding or dropping partitions.
</p>
<p>
When a table or partition has no associated statistics, Impala treats it as essentially zero-sized when
constructing the execution plan for a query. In particular, the statistics influence the order in which
tables are joined in a join query. To ensure proper query planning and good query performance and
scalability, make sure to run <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph> on
the table or partition after removing any stale statistics.
</p>
<p>
Dropping the statistics is not required for an unpartitioned table or a partitioned table covered by the
original type of statistics. A subsequent <codeph>COMPUTE STATS</codeph> statement replaces any existing
statistics with new ones, for all partitions, regardless of whether the old ones were outdated. Therefore,
this statement was rarely used before the introduction of incremental statistics.
</p>
<p>
Dropping the statistics is required for a partitioned table containing incremental statistics, to make a
subsequent <codeph>COMPUTE INCREMENTAL STATS</codeph> statement rescan an existing partition. See
<xref href="impala_perf_stats.xml#perf_stats"/> for information about incremental statistics, a new feature
available in Impala 2.1.0 and higher.
</p>
<p conref="../shared/impala_common.xml#common/ddl_blurb"/>
<p conref="../shared/impala_common.xml#common/cancel_blurb_no"/>
<p conref="../shared/impala_common.xml#common/permissions_blurb"/>
<p rev="">
The user ID that the <cmdname>impalad</cmdname> daemon runs under,
typically the <codeph>impala</codeph> user, does not need any
particular HDFS permissions to perform this statement.
All read and write operations are on the metastore database,
not HDFS files and directories.
</p>
<p conref="../shared/impala_common.xml#common/example_blurb"/>
<p>
The following example shows a partitioned table that has associated statistics produced by the
<codeph>COMPUTE INCREMENTAL STATS</codeph> statement, and how the situation evolves as statistics are dropped
from specific partitions, then the entire table.
</p>
<p>
Initially, all table and column statistics are filled in.
</p>
<!-- Note: chopped off any excess characters at position 87 and after,
to avoid weird wrapping in PDF.
Applies to any subsequent examples with output from SHOW ... STATS too. -->
<codeblock>show table stats item_partitioned;
+-------------+-------+--------+----------+--------------+---------+-----------------
| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
+-------------+-------+--------+----------+--------------+---------+-----------------
| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
| Total | 17957 | 10 | 2.25MB | 0B | |
+-------------+-------+--------+----------+--------------+---------+-----------------
show column stats item_partitioned;
+------------------+-----------+------------------+--------+----------+--------------
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size
+------------------+-----------+------------------+--------+----------+--------------
| i_item_sk | INT | 19443 | -1 | 4 | 4
| i_item_id | STRING | 9025 | -1 | 16 | 16
| i_rec_start_date | TIMESTAMP | 4 | -1 | 16 | 16
| i_rec_end_date | TIMESTAMP | 3 | -1 | 16 | 16
| i_item_desc | STRING | 13330 | -1 | 200 | 100.302803039
| i_current_price | FLOAT | 2807 | -1 | 4 | 4
| i_wholesale_cost | FLOAT | 2105 | -1 | 4 | 4
| i_brand_id | INT | 965 | -1 | 4 | 4
| i_brand | STRING | 725 | -1 | 22 | 16.1776008605
| i_class_id | INT | 16 | -1 | 4 | 4
| i_class | STRING | 101 | -1 | 15 | 7.76749992370
| i_category_id | INT | 10 | -1 | 4 | 4
| i_manufact_id | INT | 1857 | -1 | 4 | 4
| i_manufact | STRING | 1028 | -1 | 15 | 11.3295001983
| i_size | STRING | 8 | -1 | 11 | 4.33459997177
| i_formulation | STRING | 12884 | -1 | 20 | 19.9799995422
| i_color | STRING | 92 | -1 | 10 | 5.38089990615
| i_units | STRING | 22 | -1 | 7 | 4.18690013885
| i_container | STRING | 2 | -1 | 7 | 6.99259996414
| i_manager_id | INT | 105 | -1 | 4 | 4
| i_product_name | STRING | 19094 | -1 | 25 | 18.0233001708
| i_category | STRING | 10 | 0 | -1 | -1
+------------------+-----------+------------------+--------+----------+--------------
</codeblock>
<p>
To remove statistics for particular partitions, use the <codeph>DROP INCREMENTAL STATS</codeph> statement.
After removing statistics for two partitions, the table-level statistics reflect that change in the
<codeph>#Rows</codeph> and <codeph>Incremental stats</codeph> fields. The counts, maximums, and averages of
the column-level statistics are unaffected.
</p>
<note>
(It is possible that the row count might be preserved in future after a <codeph>DROP INCREMENTAL
STATS</codeph> statement. Check the resolution of the issue
<xref keyref="IMPALA-1615">IMPALA-1615</xref>.)
</note>
<codeblock>drop incremental stats item_partitioned partition (i_category='Sports');
drop incremental stats item_partitioned partition (i_category='Electronics');
show table stats item_partitioned
+-------------+-------+--------+----------+--------------+---------+------------------
| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
+-------------+-------+--------+----------+--------------+---------+-----------------
| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
| Total | 17957 | 10 | 2.25MB | 0B | |
+-------------+-------+--------+----------+--------------+---------+-----------------
show column stats item_partitioned
+------------------+-----------+------------------+--------+----------+--------------
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size
+------------------+-----------+------------------+--------+----------+--------------
| i_item_sk | INT | 19443 | -1 | 4 | 4
| i_item_id | STRING | 9025 | -1 | 16 | 16
| i_rec_start_date | TIMESTAMP | 4 | -1 | 16 | 16
| i_rec_end_date | TIMESTAMP | 3 | -1 | 16 | 16
| i_item_desc | STRING | 13330 | -1 | 200 | 100.302803039
| i_current_price | FLOAT | 2807 | -1 | 4 | 4
| i_wholesale_cost | FLOAT | 2105 | -1 | 4 | 4
| i_brand_id | INT | 965 | -1 | 4 | 4
| i_brand | STRING | 725 | -1 | 22 | 16.1776008605
| i_class_id | INT | 16 | -1 | 4 | 4
| i_class | STRING | 101 | -1 | 15 | 7.76749992370
| i_category_id | INT | 10 | -1 | 4 | 4
| i_manufact_id | INT | 1857 | -1 | 4 | 4
| i_manufact | STRING | 1028 | -1 | 15 | 11.3295001983
| i_size | STRING | 8 | -1 | 11 | 4.33459997177
| i_formulation | STRING | 12884 | -1 | 20 | 19.9799995422
| i_color | STRING | 92 | -1 | 10 | 5.38089990615
| i_units | STRING | 22 | -1 | 7 | 4.18690013885
| i_container | STRING | 2 | -1 | 7 | 6.99259996414
| i_manager_id | INT | 105 | -1 | 4 | 4
| i_product_name | STRING | 19094 | -1 | 25 | 18.0233001708
| i_category | STRING | 10 | 0 | -1 | -1
+------------------+-----------+------------------+--------+----------+--------------
</codeblock>
<p>
To remove all statistics from the table, whether produced by <codeph>COMPUTE STATS</codeph> or
<codeph>COMPUTE INCREMENTAL STATS</codeph>, use the <codeph>DROP STATS</codeph> statement without the
<codeph>INCREMENTAL</codeph> clause). Now, both table-level and column-level statistics are reset.
</p>
<codeblock>drop stats item_partitioned;
show table stats item_partitioned
+-------------+-------+--------+----------+--------------+---------+------------------
| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
+-------------+-------+--------+----------+--------------+---------+------------------
| Books | -1 | 1 | 223.74KB | NOT CACHED | PARQUET | false
| Children | -1 | 1 | 230.05KB | NOT CACHED | PARQUET | false
| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
| Home | -1 | 1 | 232.56KB | NOT CACHED | PARQUET | false
| Jewelry | -1 | 1 | 223.72KB | NOT CACHED | PARQUET | false
| Men | -1 | 1 | 231.25KB | NOT CACHED | PARQUET | false
| Music | -1 | 1 | 237.90KB | NOT CACHED | PARQUET | false
| Shoes | -1 | 1 | 234.90KB | NOT CACHED | PARQUET | false
| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
| Women | -1 | 1 | 226.27KB | NOT CACHED | PARQUET | false
| Total | -1 | 10 | 2.25MB | 0B | |
+-------------+-------+--------+----------+--------------+---------+------------------
show column stats item_partitioned
+------------------+-----------+------------------+--------+----------+----------+
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
+------------------+-----------+------------------+--------+----------+----------+
| i_item_sk | INT | -1 | -1 | 4 | 4 |
| i_item_id | STRING | -1 | -1 | -1 | -1 |
| i_rec_start_date | TIMESTAMP | -1 | -1 | 16 | 16 |
| i_rec_end_date | TIMESTAMP | -1 | -1 | 16 | 16 |
| i_item_desc | STRING | -1 | -1 | -1 | -1 |
| i_current_price | FLOAT | -1 | -1 | 4 | 4 |
| i_wholesale_cost | FLOAT | -1 | -1 | 4 | 4 |
| i_brand_id | INT | -1 | -1 | 4 | 4 |
| i_brand | STRING | -1 | -1 | -1 | -1 |
| i_class_id | INT | -1 | -1 | 4 | 4 |
| i_class | STRING | -1 | -1 | -1 | -1 |
| i_category_id | INT | -1 | -1 | 4 | 4 |
| i_manufact_id | INT | -1 | -1 | 4 | 4 |
| i_manufact | STRING | -1 | -1 | -1 | -1 |
| i_size | STRING | -1 | -1 | -1 | -1 |
| i_formulation | STRING | -1 | -1 | -1 | -1 |
| i_color | STRING | -1 | -1 | -1 | -1 |
| i_units | STRING | -1 | -1 | -1 | -1 |
| i_container | STRING | -1 | -1 | -1 | -1 |
| i_manager_id | INT | -1 | -1 | 4 | 4 |
| i_product_name | STRING | -1 | -1 | -1 | -1 |
| i_category | STRING | 10 | 0 | -1 | -1 |
+------------------+-----------+------------------+--------+----------+----------+
</codeblock>
<p conref="../shared/impala_common.xml#common/related_info"/>
<p>
<xref href="impala_compute_stats.xml#compute_stats"/>, <xref href="impala_show.xml#show_table_stats"/>,
<xref href="impala_show.xml#show_column_stats"/>, <xref href="impala_perf_stats.xml#perf_stats"/>
</p>
</conbody>
</concept>