| //// |
| /** |
| * @@@ START COPYRIGHT @@@ |
| * |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, |
| * software distributed under the License is distributed on an |
| * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| * KIND, either express or implied. See the License for the |
| * specific language governing permissions and limitations |
| * under the License. |
| * |
| * @@@ END COPYRIGHT @@@ |
| */ |
| //// |
| |
| [[manage-histograms]] |
| = Manage Histograms |
| |
| This section describes CQDs that are used to manage histograms. |
| |
| [[cache-histograms-refresh-interval]] |
| == CACHE_HISTOGRAMS_REFRESH_INTERVAL |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Defines the time interval after which timestamps for cached histograms are checked for refresh processing. |
| | *Values* | Unsigned integer in seconds. + |
| + |
| The default value is *'3600'* (1 hour). |
| | *Usage* | Histogram statistics are cached so that the compiler can avoid access to the metadata tables, thereby reducing compile times. |
| The timestamp of the tables are checked against those of the cached histograms at an interval specified by this CQD, in order to see if the cached histograms |
| need to be refreshed. + |
| + |
| You can increase the interval to reduce the impact on compile times as long as you do not need to obtain fresh statistics more frequently in order to improve |
| query performance. It may be that the default interval is too long and you would rather refresh the statistics more frequently than the default one hour, |
| in order to improve query performance at the cost of increased compile times. + |
| + |
| This setting depends on how frequently you are updating statistics on tables. There is no point in refreshing statistics frequently when statistics are not |
| being updated during that time. On the other hand, if you are updating statistics, or generating them for the first time on freshly loaded tables frequently |
| enough, and you want these to be picked up immediately by the compiler because you have seen this to have a dramatic impact on plan quality, then you can |
| make the refresh more frequent. |
| | *Production Usage* | Not applicable. |
| | *Impact* | Longer histogram refresh intervals can improve compile times. However, longer refresh intervals yield more obsolete |
| the histograms. More obsolete histograms may result in poor performance for queries that could leverage recently updated statistics. |
| | *Level* | System or Service. |
| | *Conflicts/Synergies* | Frequency of update statistics run using MAINTAIN. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |
| |
| <<< |
| [[hist-missing-stats-warning-level]] |
| == HIST_MISSING_STATS_WARNING_LEVEL |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Controls the level of missing statistics warnings that should be displayed. The warnings impacted are 6007, 6008, 6010, and 6011. |
| | *Values* | |
| *'0'*: Display no warnings. + |
| + |
| *'1'*: Display only missing single column statistics warnings. These include 6008 and 6011. + |
| + |
| *'2'*: Display all missing single and multi-column statistics warnings for scans only. + |
| + |
| *'3'*: Display all missing single and multi-column statistics warnings for scans and join operators only. + |
| + |
| *'4'*: Display all missing single and multi-column statistics warnings. + |
| + |
| The default value is *'4'*. |
| | *Usage* | |
| If you do not want to see these warnings, then change the setting to '0'. If you want to track the warnings, then you have a |
| choice of which warnings you want to track. Each setting gives you the ability to filter the warnings seen for missing single |
| or multi-column statistics for join or scan operations. This controls the resulting number of warning messages. + |
| + |
| If poor query plans are being caused by cardinality estimations that seem to be off, then you can check the histogram statistics |
| to see if statistics are being collected for those columns and how accurate they are. If you don't find statistics being collected, |
| then you could look for the warnings by setting this CQD to the appropriate setting. Based on that you could take appropriate action: |
| either find out why USAS is not collecting appropriate statistics, or if USAS is not being used then ensure that update statistics |
| is being run to generate those statistics. |
| | *Production Usage* | Many tools divide a query into several steps. During the first phases volatile tables are created |
| and populated, the last phase usually joins all the volatile tables created in the previous steps. Usually statistics are not needed |
| for those volatile tables because the final join is straight forward and the optimizer has no big choices. Nevertheless the log is |
| flooded with useless warnings if you don't set the warning level to 0. If possible, try to direct queries from those tools to a |
| dedicated service where you set the warning level to 0. |
| | *Impact* | Though the warnings give information about all statistics that are missing, it can be overwhelming |
| to get several warnings. Not all warnings may contribute to plan improvements. The optimizer issues multi-column statistics warnings |
| based on the search path, some of which may not even impact the plan quality. Also, the cost of gathering statistics on those columns |
| may not bring commensurate benefit to a large number of queries. |
| | *Level* | System. |
| | *Conflicts/Synergies* | Not applicable. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |
| |
| <<< |
| [[hist-no-stats-refresh-interval]] |
| == HIST_NO_STATS_REFRESH_INTERVAL |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Defines the time interval after which the fake histograms in the cache should be refreshed unconditionally. |
| | *Values* | Unsigned integer. Unit is seconds. + |
| + |
| The default value is *'3600'* (1 hour). |
| | *Usage* | Histogram statistics are "fake" when update statistics is not being run, but instead the customer is |
| updating the histogram tables directly with statistics to guide the optimizer. This may be done if the data in the table is very |
| volatile (such as for temporary tables), update statistics is not possible because of constant flush and fill of the table occurring, |
| and statistics are manually set to provide some guidance to the optimizer to generate a good plan. + |
| + |
| If these fake statistics are updated constantly to reflect the data churn, then this default can be set to 0. This would ensure that |
| the histograms with fake statistics are not cached and are always refreshed. If these fake statistics are set and not touched again, |
| then this interval could be set very high. |
| | *Production Usage* | Not applicable. |
| | *Impact* | Setting a high interval improves compilation time. However, if statistics are being updated, |
| then the compiler may be working with obsolete histogram statistics, potentially resulting in poorer plans. |
| | *Level* | Service. |
| | *Conflicts/Synergies* | Not applicable. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |
| |
| <<< |
| [[hist-prefetch]] |
| == HIST_PREFETCH |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Influences the compiler to pre-fetch the histograms and save them in cache. |
| | *Values* | |
| *'ON'*: Pre-fetches the histograms. + |
| *'OFF'*: Does not pre-fetch the histograms. + |
| + |
| The default value is *'ON'*. |
| | *Usage* | You may want to turn this off if you don't want to pre-fetch a large number of histograms, many of which may not be used. |
| | *Production Usage* | Not applicable. |
| | *Impact* | Though it makes compilation time faster, it may result in the histogram cache to be filled with histograms that may never be used. |
| | *Level* | System or Service. |
| | *Conflicts/Synergies* | Use this CQD with <<cache-histograms,CACHE_HISTOGRAMS>>. If CACHE_HISTOGRAMS is OFF, then this CQD has no effect. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |
| |
| <<< |
| [[hist-rowcount-requiring-stats]] |
| == HIST_ROWCOUNT_REQUIRING_STATS |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Specifies the minimum row count for which the {project-name} Optimizer needs histograms, in order to compute better cardinality estimates. |
| The Optimizer does not issue any missing statistics warnings for tables whose size is smaller than the value of this CQD. |
| | *Values* | Integer. + |
| + |
| The default value is *'50000'*. |
| | *Usage* | Use this CQD to reduce the number of statistics warnings. |
| | *Production Usage* | Not applicable. |
| | *Impact* | Missing statistics warnings are not displayed for smaller tables, which in most cases don't impact plan quality much. |
| However, there may be some exceptions where missing statistics on small tables could result in less than optimal plans. |
| | *Level* | System. |
| | *Conflicts/Synergies* | Use this CQD with <<hist-missing-stats-warning-level,HIST_MISSING_STATS_WARNING_LEVEL>>. If the warning level CQD is 0, |
| then this CQD does not have any effect. Also, for tables having fewer rows than set in this CQD, no warnings are displayed irrespective of the warning level. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |
| |
| <<< |
| [[hist-use-sample-for-cardinality-estimation]] |
| == HIST_USE_SAMPLE_FOR_CARDINALITY_ESTIMATION |
| |
| [cols="25%h,75%"] |
| |=== |
| | *Description* | Enables the Compile Time Stats feature. Compile Time Stats are produced during query plan generation by executing a |
| subset of the query on a subset of data to gather more accurate cardinality estimations. |
| | *Values* | |
| *'ON'*: Compile Time Statistics is enabled. + |
| *'OFF'*: Compile Time Statistics is disabled. + |
| + |
| The default value is *'ON'*. |
| | *Usage* | The feature is very helpful for cases when the query contains complex predicates on a table. These predicates include |
| LIKE, CASE, any other expressions or more than one range predicates and equality on large character columns. + |
| + |
| It can be disabled if most of the queries are single table or at most two-way joins. + |
| + |
| It can also be disabled if the extra collection of statistics seems to be adversely affecting the total query compile and execution time. |
| | *Production Usage* | Not applicable. |
| | *Impact* | The feature improves cardinality estimates for Scan operators thus improving the plan quality. |
| However, it can also increase the compile time. |
| | *Level* | Any. |
| | *Conflicts/Synergies* | In order to use the feature in its default form, sample tables should exist in `public_access_schema`. |
| | *Real Problem Addressed* | Not applicable. |
| | *Introduced In Release* | {project-name} 1.3.0. |
| | *Deprecated In Release* | Not applicable. |
| |=== |