blob: 1de9b3aa2acc7b82dab008940f248b965579aa87 [file] [log] [blame]
////
/**
* @@@ START COPYRIGHT @@@
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* @@@ END COPYRIGHT @@@
*/
////
[[sql_utilities]]
= SQL Utilities
A utility is a tool that runs within {project-name} SQL and performs tasks.
This section describes the {project-name} SQL utilities:
[cols=","]
|===
| <<cleanup_statement,CLEANUP Statement>> | Cleans up objects whose information is inconsistent in the metadata.
| <<load_statement,LOAD Statement>> | Uses the {project-name} Bulk Loader to load data from a source table, either
a {project-name} table or a Hive table, into a target {project-name} table.
| <<populate_index_utility,POPULATE INDEX Utility>> | Loads indexes.
| <<purgedata_utility,PURGEDATA Utility>> | Purges data from tables and indexes.
| <<unload_statement,UNLOAD Statement>> | Unloads data from {project-name} tables into an HDFS location that you
specify.
| <<update_statistics_statement,UPDATE STATISTICS Statement>> | Updates the histogram statistics for one or more groups of columns
within a table. These statistics are used to devise optimized access plans.
|===
NOTE: {project-name} SQL utilities are entered interactively or from script
files using a client-based tool, such as the {project-name} Command Interface
(TrafCI). To install and configure a client application that enables you
to connect to and issue SQL utilities, see the
http://trafodion.apache.org/docs/client_install/index.html[Trafodion Client Installation Guide].
<<<
[[cleanup_statement]]
== CLEANUP Statement
The CLEANUP Statement cleans up objects whose information is inconsistent in the metadata. This command can be used to clean up individual objects or all objects in a schema. This command can also be used to locate all inconsistent objects in the Trafodion metadata and then clean them up.
Inconsistent objects can be cleaned up and removed from the metadata and HBase by using one of the following options.
```
CLEANUP object-type object-name [ ,UID {object-uid} ]
CLEANUP [ PRIVATE | SHARED ] SCHEMA {schema-name}
CLEANUP UID {object-uid}
CLEANUP METADATA [ , CHECK ] [ , RETURN DETAILS ]
```
```
object-type is: TABLE | INDEX | SEQUENCE | OBJECT
object-name is: Name of the object that needs to be cleaned up
object-uid is: UID (Unique ID) of the object that is stored in metadata
schema-name is: Name of the schema whose contents need to be cleaned up
```
[[cleanup_syntax]]
=== Syntax Description of CLEANUP Statement
* `CLEANUP object-type object-name [ , UID {object-uid} ]`
+
This command will clean up the specified object from all Trafodion metadata tables
and HBase. If an object cannot be found in the metadata but exists in HBase,
then it will be removed from HBase. All dependent objects, such as indexes,
views, LOBs, internal sequences, and constraints, are also removed from the
metadata and HBase.
** `object-type`
*** If an object type is specified as TABLE, INDEX, or SEQUENCE and that object exists
in the metadata, then a check will be done to make sure that the specified `object-name` is of the specified type. An error is returned if they are not the same.
*** If an object type is specified as OBJECT, then any object that matches the
specified name is removed from the metadata and HBase.
** `object-name`
+
Specifies the name of the object to be removed.
** `object-uid`
+
If an object exists in the OBJECTS metadata table, then the stored object UID is compared with the specified object UID. If they do not match, then cleanup is not done. If an object does not exist in the OBJECTS metadata table, then all entries matching the specified object UID are removed from the metadata tables.
* `CLEANUP [ PRIVATE | SHARED ] SCHEMA {schema-name}`
+
This command will clean up all objects that are created in the specified schema.
* `CLEANUP UID {object-uid}`
+
This command will remove all entries from all metadata tables that match the specified object UID. If the specified object UID exists in the OBJECTS metadata table and the object name associated with that UID can be retrieved, then that object is also removed from HBase.
* `CLEANUP METADATA [ , CHECK ] [ , RETURN DETAILS ]`
+
This command will look for inconsistent entries in the metadata, return information about them, and clean them up. Multiple checks are done to validate metadata consistency.
** `CHECK`
+
One can check for inconsistent metadata entries without actually cleaning them up.
This can be done by specifying the `CHECK` option. If `CHECK` is specified, the following checks will be performed simultaneously, and a summary of inconsistent objects belonging to the categories listed below will be returned.
*** Orphan Metadata Objects
+
This check is done to validate that objects that exist in metadata also exist in HBase. If the object does not exist in HBase, then it is marked as an orphan metadata object.
*** Orphan HBase Objects
+
This check is done to validate that objects that exist in HBase also exist in metadata. If the object does not exist in metadata, then it is marked as an orphan HBase object.
*** Inconsistent Object Entries
+
This check is done to make sure that all entries in the OBJECTS table also exist in
the OBJECTS unique index (OBJECTS_UNIQ_IDX) and that all entries in OBJECTS_UNIQ_IDX also exist in the OBJECTS table. It also checks to make sure that all entries in the COLUMNS table have a corresponding entry in the OBJECTS table. If any of the previous checks fail, then that entry is marked as inconsistent.
*** Inconsistent View Entries
+
This check is done to make sure that all view entries are consistent among the views and the OBJECTS table.
** `RETURN DETAILS`
+
If `RETURN DETAILS` is specified, then details on which objects are inconsistent is also returned.
<<<
[[cleanup_considerations]]
=== Considerations for CLEANUP Statement
[[security_and_privileges]]
==== Security and Privileges
To clean up an object, one needs to have DROP privileges. If object privileges cannot be retrieved from the metadata or if metadata cleanup is being done, then one needs to be DB_ROOT to perform the cleanup.
[[consistency_checks_during_access]]
==== Consistency Checks During Access
If an object exists in the metadata but does not exist in HBase, then error 4254 will be returned if that object is accessed during any DML or DDL operation.
An object is also marked as inconsistent during any complex DDL operation. For example, if a CREATE TABLE also creates implicit indexes or constraints, then the object will be marked as inconsistent until the complex DDL operation is completed. If an abnormal error happens (such as a process crash or a node failure), then that object remains inconsistent. Any attempt to access that object will return error 4254.
An inconsistent object will need to be cleaned up before it can be accessed.
<<<
[[cleanup_examples]]
=== Examples of CLEANUP Statement
* Accessing Inconsistent Objects
+
Table T026T1 has been marked as inconsistent from a previous session. Any operation on it will return error 4254. For this test case, a metadata update was done, and the table was marked as invalid in the OBJECTS table.
+
```
>>invoke t026t1;
*** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.
*** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.
--- SQL operation failed with errors.
>>drop table t026t1;
*** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.
--- SQL operation failed with errors.
>>select * from t026t1;
*** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.
*** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.
*** ERROR[8822] The statement was not prepared.
>>
```
* Accessing Incorrect Objects
+
If an object type is specified, then the object stored in the metadata must match it. In this example, object ‘TI’ is an index that cannot be cleaned up as a table.
+
```
>>create index ti on t(b);
--- SQL operation complete.
>>cleanup table ti;
*** ERROR[1389] Object TI does not exist in Trafodion.
*** ERROR[4256] Object type specified for this object does not match the object type stored in metadata.
--- SQL operation failed with errors.
>>
```
* Cleaning Up an Inconsistent Object
+
A CLEANUP command will need to be used to clean up an object in invalid state.
+
```
>>drop table t026t1;
*** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.
--- SQL operation failed with errors.
>>cleanup table t026t1;
--- SQL operation complete.
>>invoke t026t1;
*** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.
--- SQL operation failed with errors.
>>
```
+
DROP TABLE, CREATE TABLE, DROP INDEX, and CREATE INDEX failures will call cleanup.
* Cleaning Up an Object by Specifying its UID
+
Specify object T by its object UID and clean it up.
+
```
>>select object_uid from "_MD_".objects where schema_name = 'SCH026' and object_name = 'T';
OBJECT_UID
--------------------
127287511636506969
--- 1 row(s) selected.
>>cleanup uid 127287511636506969;
--- SQL operation complete.
>>invoke t;
*** ERROR[4082] Object TRAFODION.SCH026.T does not exist or is inaccessible.
--- SQL operation failed with errors.
>>
```
* Metadata Cleanup
+
This example shows that the metadata is consistent and that there are no issues with it.
+
```
>>cleanup metadata, check, return details;
Metadata Cleanup: started, check only
Start: Cleanup Orphan Objects Entries
End: Cleanup Orphan Objects Entries (0 entries found)
Start: Cleanup Orphan Hbase Entries
End: Cleanup Orphan Hbase Entries (0 entries found)
Start: Cleanup Inconsistent Objects Entries
End: Cleanup Inconsistent Objects Entries (0 entries found)
Start: Cleanup Inconsistent Views Entries
End: Cleanup Inconsistent Views Entries (0 entries found)
Metadata Cleanup: done
--- SQL operation complete.
>>
```
+
This example shows inconsistencies in the metadata:
+
```
>>cleanup metadata, check, return details;
Metadata Cleanup: started, check only
Start: Cleanup Orphan Objects Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T10
End: Cleanup Orphan Objects Entries (1 entry found)
Start: Cleanup Orphan Hbase Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T11
End: Cleanup Orphan Hbase Entries (1 entry found)
Start: Cleanup Inconsistent Objects Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T12
Entry #2(UID): 29899934458095403
End: Cleanup Inconsistent Objects Entries (2 entries found)
Start: Cleanup Inconsistent Views Entries
End: Cleanup Inconsistent Views Entries (0 entries found)
Metadata Cleanup: done
--- SQL operation complete.
>>
```
+
Running the CLEANUP METADATA command cleans up the inconsistent metadata:
+
```
>>cleanup metadata, return details;
Metadata Cleanup: started
Start: Cleanup Orphan Objects Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T10
End: Cleanup Orphan Objects Entries (1 entry cleaned up)
Start: Cleanup Orphan Hbase Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T11
End: Cleanup Orphan Hbase Entries (1 entry cleaned up)
Start: Cleanup Inconsistent Objects Entries
Entry #1(OBJECT): TRAFODION.SCH026.T026T12
Entry #2(UID): 29899934458095403
End: Cleanup Inconsistent Objects Entries (2 entries cleaned up)
Start: Cleanup Inconsistent Views Entries
End: Cleanup Inconsistent Views Entries (0 entries cleaned up)
Metadata Cleanup: done
--- SQL operation complete.
>>
>>cleanup metadata, check, return details;
Metadata Cleanup: started, check only
Start: Cleanup Orphan Objects Entries
End: Cleanup Orphan Objects Entries (0 entries found)
Start: Cleanup Orphan Hbase Entries
End: Cleanup Orphan Hbase Entries (0 entries found)
Start: Cleanup Inconsistent Objects Entries
End: Cleanup Inconsistent Objects Entries (0 entries found)
Start: Cleanup Inconsistent Views Entries
End: Cleanup Inconsistent Views Entries (0 entries found)
Metadata Cleanup: done
--- SQL operation complete.
```
<<<
[[load_statement]]
== LOAD Statement
The LOAD statement uses the {project-name} Bulk Loader to load data from a
source table, either a {project-name} table or a Hive table, into a target
{project-name} table. The {project-name} Bulk Loader prepares and loads HFiles
directly in the region servers and bypasses the write path and the cost
associated with it. The write path begins at a client, moves to a region
server, and ends when data eventually is written to an HBase data file
called an HFile.
The {project-name} bulk load process takes place in the following phases:
* *Disable Indexes* (if incremental index build not used)
* *Prepare* (takes most time, heart of the bulk load operation)
** Read source files ({project-name} Table, Hive table, or Hive external table)
** Data encoded in {project-name} encoding
** Data repartitioned and sorted to match regions of target table
** Data written to HFiles
** Data repartitioned and written to index HFiles (if incremental index build IS used)
* *Complete* (with or without Snapshot recovery)
** Take a snapshot of the table
** Merge HFiles into HBase table (very fast – move, not a copy)
** Delete snapshot or restore from snapshot if merge fails
* *Populate Indexes* (if incremental index build is NOT used)
* *Cleanup*
** HFiles temporary space cleanup
LOAD is a {project-name} SQL extension.
```
LOAD [WITH option[[,] option]...] INTO target-table SELECT ... FROM source-table
option is:
CONTINUE ON ERROR
| LOG ERROR ROWS [TO error-location-name]
| STOP AFTER num ERROR ROWS
| TRUNCATE TABLE
| NO POPULATE INDEXES
| NO DUPLICATE CHECK
| NO OUTPUT
| NO RECOVERY
| INDEX TABLE ONLY
| REBUILD INDEXES
| UPSERT USING LOAD
```
[[load_syntax]]
=== Syntax Description of LOAD
* `_target-table_`
+
is the name of the target {project-name} table where the data will be loaded.
See <<database_object_names,Database Object Names>>.
* `_source-table_`
+
is the name of either a {project-name} table or a Hive table that has the
source data. Hive tables can be accessed in {project-name} using the
HIVE.HIVE schema (for example, hive.hive.orders). The Hive table needs
to already exist in Hive before {project-name} can access it. If you want to
load data that is already in an HDFS folder, then you need to create an
external Hive table with the right fields and pointing to the HDFS
folder containing the data. You can also specify a WHERE clause on the
source data as a filter.
* `[WITH _option_[[,] _option_]&#8230;]`
+
is a set of options that you can specify for the load operation. You can
specify one or more of these options:
** `CONTINUE ON ERROR`
+
LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
+
Errors during the load or sort phase will cause the LOAD statement to abort.
+
This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.
** `LOG ERROR ROWS [TO _error-location-name_]`
*** Error rows
+
If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
`_error-location-name_` must be a HDFS directory name to which trafodion has write access.
+
Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
+
The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
+
The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the instance ID starting from 0, generally there is only one instance.
+
For example, the full path of the table test_load_log is `/user/trafodion/bulkload/logs/test/ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918/traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0`,
+
where:
+
1. `/user/trafodion/bulkload/logs/test` is the default name of *directory*.
+
2. `ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918` is the default name of *subdirectory*.
+
3. `traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0` is the default name of *error file*.
*** Error logs
+
Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
+
The actual log file location is displayed in the load command output. It is recommended that you use the same location for load as it’s easier to find the error logs.
** `STOP AFTER _num_ ERROR ROWS`
+
Use this option to stop the LOAD statement from progressing after num errors have been encountered during this statement.
+
Duplicate check errors and errors which we have continued from using previous options are included in the count.
*** _num_
+
specifies the number of error rows.
** `TRUNCATE TABLE`
+
causes the Bulk Loader to truncate the target table before starting the
load operation. By default, the Bulk Loader does not truncate the target
table before loading data.
** `NO POPULATE INDEXES`
+
specifies that the Bulk Loader not handle index maintenance or populate
the indexes. By default, the Bulk Loader handles index maintenance,
disabling indexes before starting the load operation and populating them
after the load operation is complete.
** `NO DUPLICATE CHECK`
+
causes the Bulk Loader to ignore duplicates in the source data. By
default, the Bulk Loader checks if there are duplicates in the source
data and generates an error when it detects duplicates.
** `NO OUTPUT`
+
prevents the LOAD statement from displaying status messages. By default,
the LOAD statement prints status messages listing the steps that the
Bulk Loader is executing.
** `NO RECOVERY`
+
specifies that the Bulk Loader not use HBase snapshots for recovery. By default, the Bulk Loader handles recovery using the HBase snapshots mechanism.
** `INDEX TABLE ONLY`
+
specifies that the target table, which is an index, be populated with
data from the parent table.
** `REBUILD INDEXES`
+
specifies that indexes of the target table will be updated automatically when the source table
is updated.
+
This is the default behavior of the LOAD Statement, that is, even if this option is not
specified, the LOAD Statement will rebuild indexes unless the
CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE` is turned *ON*. This CQD is turned *OFF* by default.
In this case, the elapsed time of using LOAD Statement without options is nearly equal to
using LOAD WITH REBUILD INDEXES, and the elapsed time of the former is less than the latter
if this CQD is turned ON.
** `UPSERT USING LOAD`
+
specifies that data is inserted into target tables using rowset inserts without transactions.
+
In this case, Trafodion uses regular HBase interface (Put) rather than HBase Bulk Loader when inserting multiple rows at a time.
<<<
[[load_considerations]]
=== Considerations for LOAD
[[load_required_privileges]]
==== Required Privileges
To issue a LOAD statement, one of the following must be true:
* You are DB ROOT.
* You are the owner of the target table.
* You have these privileges:
** SELECT and INSERT privileges on the target table
** DELETE privilege on the target table if TRUNCATE TABLE is specified
* You have the MANAGE_LOAD component privilege for the SQL_OPERATIONS component.
[[load_configuration_before_running_load]]
==== Configuration Before Running LOAD
Before running the LOAD statement, make sure that you have configured
the staging folder, source table, and HBase according to these
guidelines.
==== Staging Folder for HFiles
The Bulk Loader uses an HDFS folder as a staging area for the HFiles
before calling HBase APIs to merge them into the {project-name} table.
By default, {project-name} uses /bulkload as the staging folder. This folder
must be owned by the same user as the one under which {project-name} runs. {project-name}
also must have full permissions on this folder. The HBase user (that is,
the user under which HBase runs) must have read/write access to this
folder.
Example:
```
drwxr-xr-x - trafodion trafodion 0 2014-07-07 09:49 /bulkload.
```
<<<
==== Improving Throughput
The following CQD (Control Query Default) settings help improve the Bulk Loader
throughput:
* `TRAF_LOAD_MAX_HFILE_SIZE`
+
Specifies the HFile size limit beyond which the current file is closed and a
new file is created for the same partition. Adjust this setting to minimize
HBase splitting/merging.
+
*Default*: 10GB
* `TRAF_LOAD_PREP_TMP_LOCATION`
+
Specifies the HDFS directory where HFiles are created during load.
+
*Default*: `/bulkload`
Also, consider using `NO DUPLICATE CHECK` to improve througput if your
source data is clean.
==== Hive Source Table
To load data stored in HDFS, you need to create a Hive table with
the right fields and types pointing to the HDFS folder containing the
data before you start the load.
==== HBase Snapshots
If you do not specify the NO RECOVERY OPTION in the LOAD statement, the
Bulk Loader uses HBase snapshots as a mechanism for recovery. Snapshots
are a lightweight operation where some metadata is copied. (Data is not
copied.)
A snapshot is taken before the load starts and is removed after
the load completes successfully. If something goes wrong and it is
possible to recover, the snapshot is used to restore the table to its
initial state before the load started. To use this recovery mechanism,
HBase needs to be configured to allow snapshots.
==== Incremental Loads
The Bulk Loader allows for incremental loads by default. Snapshots are
taken before second phase starts and deleted once the bulk load completes.
If something goes wrong with the load, then the snapshot is restored to
go to the previous state.
<<<
==== Non-Incremental Loads
These following bulk load options can be used to do non-incremental load:
* `NO RECOVERY`: Do not take a snapshot of the table.
* `TRUNCATE TABLE`: Truncates the table before starting the load.
==== Space Usage
The target table values for SYSKEY, SALT, identity, divisioning columns
are created automatically the during transformation step. The size of the
HBase files is determined based on encoding, compression, HDFS replication
factor, and row format. Target table can be pre-split into regions using
salting, a Java Program, by seeding the table with data.
==== Performance
The overall throughput is influenced by row format, row length, number of
columns, skew in data, etc. LOAD sas upsert semantics (duplicate constraint
not checked with existing data). LOAD has lower CPU abd disk activity than
similar trickle load (INSERT, UPSERT, or UPSERT USING LOAD), Also, LOAD has
lower compaction activity after completion than Trickle Load.
==== Hive Scans
Direct access for Hive table data supports:
* Only text input format and sequence files.
* Only structured data types.
Tables must be created/dropped/altered through Hive itself.
{project-name}:
* Reads Hive metadata to determine information about table.
* UPDATE STATISTICS can be performed on Hive tables - improves performance!
* Can write to Hive tables in both Text and Sequence formats (used by UNLOAD).
<<<
[[load_examples]]
=== Examples of LOAD
[[continue_on_error_examples]]
==== Examples of `CONTINUE ON ERROR`
Suppose that we have following tables:
_source_table_:
```
SQL>SELECT * FROM source_table;
A B
----------- ----
1 aaaa
2 bbbb
3 cccc
4 dd
5 ee
6 fff
--- 6 row(s) selected.
SQL>SHOWDDL source_table;
CREATE TABLE TRAFODION.SEABASE.SOURCE_TABLE
(
A INT DEFAULT NULL NOT SERIALIZED
, B CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
_target_table1_:
```
SQL>SELECT * FROM target_table1;
--- 0 row(s) selected.
SQL>SHOWDDL target_table1;
CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE1
(
A INT DEFAULT NULL NOT SERIALIZED
, B CHAR(3) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
* The examples below demonstrate how the LOAD Statement behaves without and with `CONTINUE ON ERROR`, when ignorable data conversion errors occur.
** When loading data from _source_table_ into _target_table1_ if `CONTINUE ON ERROR` is not specified, the operation fails with a data conversion error.
+
```
SQL>LOAD INTO target_table1 SELECT * FROM source_table;
UTIL_OUTPUT
-------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-03 16:15:53.222441
Task: CLEANUP Status: Ended Time: 2018-01-03 16:15:53.250826
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.028
Task: LOADING DATA Status: Started Time: 2018-01-03 16:15:53.250909
*** ERROR[8402] A string overflow occurred during the evaluation of a character expression. Conversion of Source Type:CHAR(REC_BYTE_F_ASCII,4 BYTES,ISO88591) Source Value:aaaa to Target Type:CHAR(REC_BYTE_F_ASCII,3 BYTES,ISO88591). [2018-01-03 16:15:54]
```
** When loading data from _source_table_ into _target_table1_ if `CONTINUE ON ERROR` is specified, the operation succeeds after ignorable data conversion errors.
+
```
SQL>LOAD WITH CONTINUE ON ERROR INTO target_table1 SELECT * FROM source_table;
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Ended Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: PREPARATION Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Rows Processed: 3
Task: PREPARATION Status: Ended ET: 00:00:03.151
Task: COMPLETION Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: COMPLETION Status: Ended ET: 00:00:01.137
--- 3 row(s) loaded.
UTIL_OUTPUT
-------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-03 16:19:43.543405
Task: CLEANUP Status: Ended Time: 2018-01-03 16:19:43.568828
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.025
Task: LOADING DATA Status: Started Time: 2018-01-03 16:19:43.568899
Rows Processed: 6
Error Rows: 3
Task: LOADING DATA Status: Ended Time: 2018-01-03 16:19:44.211150
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:00.642
Task: COMPLETION Status: Started Time: 2018-01-03 16:19:44.211192
Rows Loaded: 3
Task: COMPLETION Status: Ended Time: 2018-01-03 16:19:45.171458
Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.960
--- SQL operation complete.
SQL>SELECT * FROM target_table1;
A B
----------- ----
4 dd
5 ee
6 fff
--- 3 row(s) selected.
```
[[log_error_rows_examples]]
==== Examples of `LOG ERROR ROWS [TO error-location-name]`
Suppose that we have two same tables (_source_table_ and _target_table1_) as shown in the <<continue_on_error_examples,Examples of `CONTINUE ON ERROR`>>.
** This example explains how the LOAD statement loads data and logs error rows to the default directory `user/trafodion/bulkload/logs`.
+
```
SQL>LOAD WITH LOG ERROR ROWS INTO target_table1 SELECT * FROM source_table;
UTIL_OUTPUT
-------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-03 16:23:03.142862
Task: CLEANUP Status: Ended Time: 2018-01-03 16:23:03.151725
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.009
Logging Location: /user/trafodion/bulkload/logs/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_082303
Task: LOADING DATA Status: Started Time: 2018-01-03 16:23:03.151815
Rows Processed: 6
Error Rows: 3
Task: LOADING DATA Status: Ended Time: 2018-01-03 16:23:03.920270
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:00.768
Task: COMPLETION Status: Started Time: 2018-01-03 16:23:03.920313
Rows Loaded: 3
Task: COMPLETION Status: Ended Time: 2018-01-03 16:23:04.301579
Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.381
--- SQL operation complete.
SQL>SELECT * FROM target_table1;
A B
----------- ----
4 dd
5 ee
6 fff
--- 3 row(s) selected.
[root@cent-1 bin]$ hdfs dfs -ls /user/trafodion/bulkload/logs/
Found 1 items
drwxr-xr-x - trafodion trafodion 0 2018-01-13 16:23
/user/trafodion/bulkload/logs/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_082303
```
** This example shows how the LOAD statement loads and logs error rows to the specified directory `user/trafodion/bulkload/error_log`.
+
```
SQL>LOAD WITH LOG ERROR ROWS TO '/BULKLOAD/ERROR_LOG' INTO target_table1 SELECT * FROM source_table;
UTIL_OUTPUT
-------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-03 17:19:43.436689
Task: CLEANUP Status: Ended Time: 2018-01-03 17:19:43.456761
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.020
Logging Location: /bulkload/error_log/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_091943
Task: LOADING DATA Status: Started Time: 2018-01-03 17:19:43.456804
Rows Processed: 6
Error Rows: 3
Task: LOADING DATA Status: Ended Time: 2018-01-03 17:19:43.722825
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:00.266
Task: COMPLETION Status: Started Time: 2018-01-03 17:19:43.722868
Rows Loaded: 3
Task: COMPLETION Status: Ended Time: 2018-01-03 17:19:44.591544
Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.869
--- SQL operation complete.
[root@cent-1 bin]$ hdfs dfs -ls /bulkload/error_log
Found 1 items
drwxr-xr-x - trafodion trafodion 0 2018-01-03 17:19
/bulkload/error_log/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_091943
```
[[stop_after_num_error_rows_examples]]
==== Examples of `STOP AFTER num ERROR ROWS`
Suppose that we have two same tables (_source_table_ and _target_table1_) as shown in the <<continue_on_error_examples,Examples of `CONTINUE ON ERROR`>>.
The examples below illustrate how the LOAD Statement behaves depending on the different `num`.
```
SQL>LOAD WITH STOP AFTER 2 ERROR ROWS INTO target_table1 SELECT * FROM source_table;
UTIL_OUTPUT
---------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-05 10:53:52.20569
Task: CLEANUP Status: Ended Time: 2018-01-05 10:53:52.45689
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.025
Task: LOADING DATA Status: Started Time: 2018-01-05 10:53:52.45757
*** WARNING[8114] The number of error rows is 3 [2018-01-05 10:53:53]
*** ERROR[8113] The maximum number of error rows is exceeded. [2018-01-05 10:53:53]
*** WARNING[8114] The number of error rows is 3 [2018-01-05 10:53:53]
SQL>SELECT * FROM target_table1;
--- 0 row(s) selected.
```
```
SQL>LOAD WITH STOP AFTER 3 ERROR ROWS INTO target_table1 SELECT * FROM source_table;
UTIL_OUTPUT
---------------------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-05 15:55:58.975459
Task: CLEANUP Status: Ended Time: 2018-01-05 15:55:59.20219
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.045
Task: LOADING DATA Status: Started Time: 2018-01-05 15:55:59.20322
Rows Processed: 6
Error Rows: 3
Task: LOADING DATA Status: Ended Time: 2018-01-05 15:55:59.960109
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:00.940
Task: COMPLETION Status: Started Time: 2018-01-05 15:55:59.960180
Rows Loaded: 3
Task: COMPLETION Status: Ended Time: 2018-01-05 15:56:00.448496
Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.488
--- SQL operation complete.
SQL>SELECT * FROM target_table1;
A B
----------- ----
4 dd
5 ee
6 fff
--- 3 row(s) selected.
```
[[index_table_only_examples]]
==== Examples of `INDEX TABLE ONLY`
Suppose that we have following tables:
_source_table_:
```
SQL>SELECT * FROM source_table;
A B
----------- ----
1 aaaa
2 bbbb
3 cccc
4 dd
5 ee
6 fff
--- 6 row(s) selected.
SQL>SHOWDDL source_table;
CREATE TABLE TRAFODION.SEABASE.SOURCE_TABLE
(
A INT DEFAULT NULL NOT SERIALIZED
, B CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
_target_table1_:
```
SQL>SELECT * FROM target_table1;
--- 0 row(s) selected.
SQL>SHOWDDL target_table1;
CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE1
(
A INT DEFAULT NULL NOT SERIALIZED
, B CHAR(3) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
_target_table2_:
```
SQL>SELECT * FROM target_table2;
--- 0 row(s) selected.
SQL>SHOWDDL target_table2;
CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE2
(
A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
SERIALIZED
, B CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, PRIMARY KEY (A ASC)
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
_target_table3_:
```
SELECT * FROM target_table3;
--- 0 row(s) selected.
SHOWDDL target_table3;
CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE3
(
A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
SERIALIZED
, B CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
, PRIMARY KEY (A ASC)
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
_target_table4_:
```
SELECT * FROM target_table4;
--- 0 row(s) selected.
CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE4
(
A INT DEFAULT NULL NOT SERIALIZED
, B CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT DEFAULT NULL NOT SERIALIZED
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
```
* The examples below demonstrate how the index table and target tabel get populated.
** The index table gets populated, while the target table does not get populated if `NO POPULATE` is specified.
+
```
SQL>CREATE INDEX index_target_table1 ON target_table1(b) NO POPULATE;
--- SQL operation complete.
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table1) SELECT b,a FROM source_table;
--- SQL operation complete.
SQL>SELECT * FROM target_table1;
--- 0 row(s) selected.
SELECT * FROM table(index_table index_target_table1);
B@ A
---- --------------------
aaaa 1
bbbb 2
cccc 3
dd 4
ee 5
fff 6
--- 6 row(s) selected.
```
** The index table gets populated, and the target table gets populated as well if `NO POPULATE` is not specified.
+
```
SQL>CREATE INDEX index_target_table1 ON target_table1(b);
--- SQL operation complete.
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table1) SELECT b,a FROM source_table;
--- SQL operation complete.
SQL>SELECT * FROM target_table1;
A B
----------- ----
1 aaaa
2 bbbb
3 cccc
4 dd
5 ee
6 fff
--- 6 row(s) selected.
SQL>SELECT * FROM table(index_table index_target_table1);
B@ A
---- --------------------
aaaa 1
bbbb 2
cccc 3
dd 4
ee 5
fff 6
--- 6 row(s) selected.
```
* The examples below illustrate that how to populate index tables depending on different target tables.
** The _target_table2_ has columns A (primary key column) and B (index column) defined, in this case, populate the index table with columns B and A from the _source_table_.
+
```
SQL>CREATE INDEX index_target_table2 ON target_table2(b) NO POPULATE;
--- SQL operation complete.
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table2);
CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE2
(
"B@" CHAR(4) CHARACTER SET ISO88591 COLLATE
DEFAULT NO DEFAULT NOT SERIALIZED
, A INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
SERIALIZED
, PRIMARY KEY ("B@" ASC, A ASC)
)
ATTRIBUTES ALIGNED FORMAT
;
SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table2) SELECT b,a FROM source_table;
--- SQL operation complete.
SQL>SELECT * FROM target_table2;
--- 0 row(s) selected.
SQL>SELECT * FROM TABLE(INDEX_TABLE index_target_table2);
B@ A
---- --------------------
aaaa 1
bbbb 2
cccc 3
dd 4
ee 5
fff 6
--- 6 row(s) selected.
```
** The _target_table3_ has columns A (primary key column and index column) and B defined, in this case, populate the index table with column A from the _source_table_.
+
```
SQL>CREATE INDEX index_target_table3 ON target_table3(a) NO POPULATE;
--- SQL operation complete.
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table3);
CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE3
(
"A@" INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
SERIALIZED
, PRIMARY KEY ("A@" ASC)
)
ATTRIBUTES ALIGNED FORMAT
;
SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table3) SELECT a FROM source_table;
--- SQL operation complete.
SQL>SELECT * FROM target_table3;
--- 0 row(s) selected.
SQL> SELECT * FROM TABLE(INDEX_TABLE index_target_table3);
A@
-----------
1
2
3
4
5
6
--- 6 row(s) selected.
```
** The _target_table4_ has columns A (index column) and B defined, in this case, populate the index table with column A and syskey from the _source_table_.
+
```
SQL> CREATE INDEX index_target_table4 ON target_table4(a) NO POPULATE;
--- SQL operation complete.
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table4);
CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE4
(
"A@" INT NO DEFAULT NOT SERIALIZED
, SYSKEY LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
NOT SERIALIZED
, PRIMARY KEY ("A@" ASC, SYSKEY ASC)
)
ATTRIBUTES ALIGNED FORMAT
;
--- SQL operation complete.
SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table4) SELECT a,syskey FROM source_table;
--- SQL operation complete.
SQL>SELECT * FROM target_table4;
--- 0 row(s) selected.
SQL>SELECT * FROM TABLE(INDEX_TABLE index_target_table4);
A@ SYSKEY
----------- --------------------
1 4239726128363214004
2 4239726128363256924
3 4239726128363258834
4 4239726128363260240
5 4239726128363261628
6 4239726128363263088
--- 6 row(s) selected.
```
NOTE: At this moment, if you want to drop the index, such as _index_target_table2_, _index_target_table3_ or _index_target_table4_ created above, please populate the index from its parent table before dropping it, see the example below. For more information, see <<populate_index_utility,POPULATE INDEX Utility>>.
```
SQL> DROP INDEX index_target_table4;
*** ERROR[4254] Object TRAFODION.SEABASE.INDEX_TARGET_TABLE4 has invalid state and cannot be accessed. Use cleanup command to drop it.
SQL> POPULATE INDEX index_target_table4 ON target_table4;
--- SQL operation complete.
SQL> DROP INDEX index_target_table4;
--- SQL operation complete.
```
[[rebuild_indexes_examples]]
==== Examples of `REBUILD INDEXES`
Suppose that we have following tables:
_source_table_:
```
SQL>select count(*) from source_table;
(EXPR)
--------------------
             1000000
--- 1 row(s) selected.
```
_target_table1_ and _target_table2_ both have the following structure:
```
SQL>CREATE TABLE target_table1
  (
    ID                               INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
      SERIALIZED
  , NUM                              INT DEFAULT NULL NOT SERIALIZED
  , CARD_ID                          LARGEINT DEFAULT NULL NOT SERIALIZED
  , PRICE                            DECIMAL(11, 3) DEFAULT NULL NOT SERIALIZED
  , START_DATE                       DATE DEFAULT NULL NOT SERIALIZED
  , START_TIME                       TIME(0) DEFAULT NULL NOT SERIALIZED
  , END_TIME                         TIMESTAMP(6) DEFAULT NULL NOT SERIALIZED
  , B_YEAR                           INTERVAL YEAR(10) DEFAULT NULL NOT
      SERIALIZED
  , B_YM                             INTERVAL YEAR(5) TO MONTH DEFAULT NULL NOT
      SERIALIZED
  , B_DS                             INTERVAL DAY(10) TO SECOND(3) DEFAULT NULL
      NOT SERIALIZED
  , PRIMARY KEY (ID ASC)
  )
  SALT USING 9 PARTITIONS
ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_1500000'
  HBASE_OPTIONS
  (
    MEMSTORE_FLUSH_SIZE = '1073741824'
  )
;
```
* This example compares the execution time of using LOAD Statement without options and
using `LOAD WITH REBUILD INDEXES` when the CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE`
is turned *OFF* by default. These two statements take almost the same time.
+
```
SQL>CREATE INDEX index_target_table1 ON target_table1(id);
--- SQL operation complete.
SQL>SET STATISTICS ON;
SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id < 301;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-18 13:33:52.310
Task: CLEANUP Status: Ended Time: 2018-01-18 13:33:52.328
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.019
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:33:52.328
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:34:04.709
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:12.381
Task: LOADING DATA Status: Started Time: 2018-01-18 13:34:04.709
Rows Processed: 300
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:34:21.629
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:16.919
Task: COMPLETION Status: Started Time: 2018-01-18 13:34:21.629
Rows Loaded: 300
Task: COMPLETION Status: Ended Time: 2018-01-18 13:34:22.436
Task: COMPLETION Status: Ended Elapsed Time: 00:00:00.808
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:34:22.436
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:34:31.116
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:08.680
--- SQL operation complete.
Start Time 2018/01/18 13:33:51.478782
End Time 2018/01/18 13:34:31.549491
Elapsed Time 00:00:40.070709
Compile Time 00:00:00.510024
Execution Time 00:00:39.559433
SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id > 300;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-18 13:35:01.804
Task: CLEANUP Status: Ended Time: 2018-01-18 13:35:01.823
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.018
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:35:01.823
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:35:13.840
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:12.017
Task: LOADING DATA Status: Started Time: 2018-01-18 13:35:13.840
Rows Processed: 999700
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:35:19.720
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:05.879
Task: COMPLETION Status: Started Time: 2018-01-18 13:35:19.720
Rows Loaded: 999700
Task: COMPLETION Status: Ended Time: 2018-01-18 13:35:22.436
Task: COMPLETION Status: Ended Elapsed Time: 00:00:02.717
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:35:22.436
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:35:33.346
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:10.910
--- SQL operation complete.
Start Time 2018/01/18 13:35:00.624490
End Time 2018/01/18 13:35:33.779394
Elapsed Time 00:00:33.154904
Compile Time 00:00:00.825703
Execution Time 00:00:32.321890
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table1);
(EXPR)
--------------------
1000000
--- 1 row(s) selected.
```
+
```
SQL>CREATE INDEX index_target_table2 ON target_table2(id);
--- SQL operation complete.
SQL>SET STATISTICS ON;
SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id < 301;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE2
Task: CLEANUP Status: Started Time: 2018-01-18 13:34:37.836
Task: CLEANUP Status: Ended Time: 2018-01-18 13:34:37.847
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.011
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:34:37.847
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:34:45.445
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:07.598
Task: LOADING DATA Status: Started Time: 2018-01-18 13:34:45.445
Rows Processed: 300
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:35:03.576
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:18.131
Task: COMPLETION Status: Started Time: 2018-01-18 13:35:03.577
Rows Loaded: 300
Task: COMPLETION Status: Ended Time: 2018-01-18 13:35:04.873
Task: COMPLETION Status: Ended Elapsed Time: 00:00:01.296
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:35:04.873
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:35:12.461
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:07.589
--- SQL operation complete.
Start Time 2018/01/18 13:34:37.053647
End Time 2018/01/18 13:35:12.893891
Elapsed Time 00:00:35.840244
Compile Time 00:00:00.435855
Execution Time 00:00:35.402620
SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id > 300;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE2
Task: CLEANUP Status: Started Time: 2018-01-18 13:35:25.480
Task: CLEANUP Status: Ended Time: 2018-01-18 13:35:25.493
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.013
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:35:25.493
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:35:38.844
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:12.591
Task: LOADING DATA Status: Started Time: 2018-01-18 13:35:38.845
Rows Processed: 999700
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:35:43.491
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:05.407
Task: COMPLETION Status: Started Time: 2018-01-18 13:35:43.491
Rows Loaded: 999700
Task: COMPLETION Status: Ended Time: 2018-01-18 13:35:45.920
Task: COMPLETION Status: Ended Elapsed Time: 00:00:01.601
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:35:45.920
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:35:56.322
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:11.230
--- SQL operation complete.
Start Time 2018/01/18 13:35:24.693410
End Time 2018/01/18 13:35:56.754441
Elapsed Time 00:00:32.061031
Compile Time 00:00:00.449236
Execution Time 00:00:31.611112
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table2);
(EXPR)
--------------------
1000000
--- 1 row(s) selected.
```
* This example compares the execution time of using LOAD Statement without options and
using `LOAD WITH REBUILD INDEXES` when the CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE`
is turned *ON*. The former takes less time than the latter.
+
```
SQL>CQD TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE 'ON';
--- SQL operation complete.
SQL>CREATE INDEX index_target_table1 ON target_table1(id);
--- SQL operation complete.
SQL>SET STATISTICS ON;
SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id < 301;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-18 13:46:01.730
Task: CLEANUP Status: Ended Time: 2018-01-18 13:46:01.756
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.027
Task: LOADING DATA Status: Started Time: 2018-01-18 13:46:01.756
Rows Processed: 300
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:46:22.415
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:20.659
Task: COMPLETION Status: Started Time: 2018-01-18 13:46:22.415
Rows Loaded: 300
Task: COMPLETION Status: Ended Time: 2018-01-18 13:46:26.353
Task: COMPLETION Status: Ended Elapsed Time: 00:00:03.938
--- SQL operation complete.
Start Time 2018/01/18 13:46:00.954518
End Time 2018/01/18 13:46:26.795757
Elapsed Time 00:00:25.841239
Compile Time 00:00:00.455681
Execution Time 00:00:25.384158
SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id > 300;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE1
Task: CLEANUP Status: Started Time: 2018-01-18 13:46:57.811
Task: CLEANUP Status: Ended Time: 2018-01-18 13:46:57.870
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.058
Task: LOADING DATA Status: Started Time: 2018-01-18 13:46:57.870
Rows Processed: 999700
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:47:12.411
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:14.541
Task: COMPLETION Status: Started Time: 2018-01-18 13:47:12.411
Rows Loaded: 999700
Task: COMPLETION Status: Ended Time: 2018-01-18 13:47:16.292
Task: COMPLETION Status: Ended Elapsed Time: 00:00:03.881
--- SQL operation complete.
Start Time 2018/01/18 13:46:57.400422
End Time 2018/01/18 13:47:16.738970
Elapsed Time 00:00:19.338548
Compile Time 00:00:00.010545
Execution Time 00:00:19.321781
SQL>SET PARSERFLAGS 1;
--- SQL operation complete.
SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table1);
(EXPR)
--------------------
1000000
--- 1 row(s) selected.
```
+
```
SQL>CREATE INDEX index_target_table2 ON target_table2(id);
--- SQL operation complete.
SQL>SET STATISTICS ON;
SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id < 301;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE2
Task: CLEANUP Status: Started Time: 2018-01-18 13:46:28.303
Task: CLEANUP Status: Ended Time: 2018-01-18 13:46:28.796
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.049
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:46:28.796
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:46:38.479
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:10.400
Task: LOADING DATA Status: Started Time: 2018-01-18 13:46:38.479
Rows Processed: 300
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:46:58.143
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:19.665
Task: COMPLETION Status: Started Time: 2018-01-18 13:46:58.144
Rows Loaded: 300
Task: COMPLETION Status: Ended Time: 2018-01-18 13:47:00.186
Task: COMPLETION Status: Ended Elapsed Time: 00:00:02.043
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:47:00.187
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:47:09.966
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:08.910
--- SQL operation complete.
Start Time 2018/01/18 13:46:27.063642
End Time 2018/01/18 13:47:09.529257
Elapsed Time 00:00:42.465615
Compile Time 00:00:00.532541
Execution Time 00:00:41.928812
SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id > 300;
UTIL_OUTPUT
---------------------------------------------------------------------------------
Task: LOAD Status: Started Object: TRAFODION.SEABASE.TARGET_TABLE2
Task: CLEANUP Status: Started Time: 2018-01-18 13:47:18.187
Task: CLEANUP Status: Ended Time: 2018-01-18 13:47:18.198
Task: CLEANUP Status: Ended Elapsed Time: 00:00:00.011
Task: DISABLE INDEXE Status: Started Time: 2018-01-18 13:47:18.198
Task: DISABLE INDEXE Status: Ended Time: 2018-01-18 13:47:30.670
Task: DISABLE INDEXE Status: Ended Elapsed Time: 00:00:12.472
Task: LOADING DATA Status: Started Time: 2018-01-18 13:47:30.670
Rows Processed: 999700
Error Rows: 0
Task: LOADING DATA Status: Ended Time: 2018-01-18 13:47:39.311
Task: LOADING DATA Status: Ended Elapsed Time: 00:00:08.641
Task: COMPLETION Status: Started Time: 2018-01-18 13:47:39.311
Rows Loaded: 999700
Task: COMPLETION Status: Ended Time: 2018-01-18 13:47:40.497
Task: COMPLETION Status: Ended Elapsed Time: 00:00:01.186
Task: POPULATE INDEX Status: Started Time: 2018-01-18 13:47:40.497
Task: POPULATE INDEX Status: Ended Time: 2018-01-18 13:47:52.367
Task: POPULATE INDEX Status: Ended Elapsed Time: 00:00:11.539
--- SQL operation complete.
Start Time 2018/01/18 13:47:17.447093
End Time 2018/01/18 13:47:52.469190
Elapsed Time 00:00:35.022097
Compile Time 00:00:00.412718
Execution Time 00:00:34.608571
SQL>SET PARSERFLAGES 1;
--- SQL operation complete.
SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table2);
(EXPR)
--------------------
1000000
--- 1 row(s) selected.
```
[[loading_data_from_hive_table_examples]]
==== Examples of Loading data from Hive Table
* For customer demographics data residing in
`/hive/tpcds/customer_demographics`, create an external Hive table using
the following Hive SQL:
+
```
create external table customer_demographics
(
cd_demo_sk int
, cd_gender string
, cd_marital_status string
, cd_education_status string
, cd_purchase_estimate int
, cd_credit_rating string
, cd_dep_count int
, cd_dep_employed_count int
, cd_dep_college_count int
)
row format delimited fields terminated by '|' location
'/hive/tpcds/customer_demographics';
```
* The {project-name} table where you want to load the data is defined using
this DDL:
+
```
create table customer_demographics_salt
(
cd_demo_sk int not null
, cd_gender char(1)
, cd_marital_status char(1)
, cd_education_status char(20)
, cd_purchase_estimate int
, cd_credit_rating char(10)
, cd_dep_count int
, cd_dep_employed_count int
, cd_dep_college_count int
, primary key (cd_demo_sk)
)
salt using 4 partitions on (cd_demo_sk);
```
* This example shows how the LOAD statement loads the
customer_demographics_salt table from the Hive table,
`hive.hive.customer_demographics`:
+
```
>>load into customer_demographics_salt
+>select * from hive.hive.customer_demographics where cd_demo_sk <= 5000;
Task: LOAD Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: DISABLE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: DISABLE INDEX Status: Ended Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: PREPARATION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Rows Processed: 5000
Task: PREPARATION Status: Ended ET: 00:00:03.199
Task: COMPLETION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: COMPLETION Status: Ended ET: 00:00:00.331
Task: POPULATE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
Task: POPULATE INDEX Status: Ended ET: 00:00:05.262
```
<<<
[[populate_index_utility]]
== POPULATE INDEX Utility
The POPULATE INDEX utility performs a fast INSERT of data into an index
from the parent table. You can execute this utility in a client-based
tool like TrafCI.
```
POPULATE INDEX index ON table [index-option]
index-option is:
ONLINE | OFFLINE
```
[[populate_index_syntax]]
=== Syntax Description of POPULATE INDEX
* `_index_`
+
is an SQL identifier that specifies the simple name for the index. You
cannot qualify _index_ with its schema name. Indexes have their own
name space within a schema, so an index name might be the same as a table
or constraint name. However, no two indexes in a schema can have the
same name.
* `_table_`
+
is the name of the table for which to populate the index. See
<<database_object_names,Database Object Names>>.
* `ONLINE`
+
specifies that the populate operation should be done on-line. That is,
ONLINE allows read and write DML access on the base table while the
populate operation occurs. Additionally, ONLINE reads the audit trail to
replay updates to the base table during the populate phase. If a lot of
audit is generated and you perform many CREATE INDEX operations, we
recommend that you avoid ONLINE operations because they can add more
contention to the audit trail. The default is ONLINE.
* `OFFLINE`
+
specifies that the populate should be done off-line. OFFLINE allows only
read DML access to the base table. The base table is unavailable for
write operations at this time. OFFLINE must be specified explicitly.
SELECT is allowed.
<<<
[[populate_index_considerations]]
=== Considerations for POPULATE INDEX
When POPULATE INDEX is executed, the following steps occur:
* The POPULATE INDEX operation runs in many transactions.
* The actual data load operation is run outside of a transaction.
If a failure occurs, the rollback is faster because it does not have to
process a lot of audit. Also, if a failure occurs, the index remains
empty, unaudited, and not attached to the base table (off-line).
* When an off-line POPULATE INDEX is being executed, the base table is
accessible for read DML operations. When an on-line POPULATE INDEX is
being executed, the base table is accessible for read and write DML
operations during that time period, except during the commit phase at
the very end.
* If the POPULATE INDEX operation fails unexpectedly, you may need to
drop the index again and re-create and repopulate.
* On-line POPULATE INDEX reads the audit trail to replay updates by
allowing read/write access. If you plan to create many indexes in
parallel or if you have a high level of activity on the audit trail, you
should consider using the OFFLINE option.
Errors can occur if the source base table or target index cannot be
accessed, or if the load fails due to some resource problem or problem
in the file system.
[[populate_index_required_privileges]]
==== Required Privileges
To perform a POPULATE INDEX operation, one of the following must be
true:
* You are DB ROOT.
* You are the owner of the table.
* You have the SELECT and INSERT (or ALL) privileges on the associated table.
[[populate_index_examples]]
=== Examples of POPULATE INDEX
* This example loads the specified index from the specified table:
+
```
POPULATE INDEX myindex ON myschema.mytable;
```
* This example loads the specified index from the specified table, which
uses the default schema:
+
```
POPULATE INDEX index2 ON table2;
```
<<<
[[purgedata_utility]]
== PURGEDATA Utility
The PURGEDATA utility performs a fast DELETE of data from a table and
its related indexes. You can execute this utility in a client-based tool
like TrafCI.
```
PURGEDATA object
```
[[purgedata_syntax]]
=== Syntax Description of PURGEDATA
* _object_
+
is the name of the table from which to purge the data. See
<<"database object names","Database Object Names">>.
[[purgedata_considerations]]
=== Considerations for PURGEDATA
* The _object_ can be a table name.
* Errors are returned if _table_ cannot be accessed or if a resource or
file-system problem causes the delete to fail.
* PURGEDATA is not supported for volatile tables.
[[purgedata_required_privileges]]
==== Required Privileges
To perform a PURGEDATA operation, one of the following must be true:
* You are DB ROOT.
* You are the owner of the table.
* You have the SELECT and DELETE (or ALL) privileges on the associated
table.
[[purgedata_availability]]
==== Availability
PURGEDATA marks the table OFFLINE and sets the corrupt bit while
processing. If PURGEDATA fails before it completes, the table and its
dependent indexes will be unavailable, and you must run PURGEDATA again
to complete the operation and remove the data. Error 8551 with an
accompanying file system error 59 or error 1071 is returned in this
case.
[[purgedata_examples]]
=== Examples of PURGEDATA
* This example purges the data in the specified table. If the table has
indexes, their data is also purged.
+
```
PURGEDATA myschema.mytable;
```
<<<
[[unload_statement]]
== UNLOAD Statement
The UNLOAD statement unloads data from {project-name} tables into an HDFS
location that you specify. Extracted data can be either compressed or
uncompressed based on what you choose.
UNLOAD is a {project-name} SQL extension.
```
UNLOAD [WITH option[ option]...] INTO 'target-location' SELECT ... FROM source-table ...
option is:
DELIMITER { 'delimiter-string' | delimiter-ascii-value }
| RECORD_SEPARATOR { 'separator-literal' | separator-ascii-value }
| NULL_STRING 'string-literal'
| PURGEDATA FROM TARGET
| COMPRESSION GZIP
| MERGE FILE merged_file-path [OVERWRITE]
| NO OUTPUT
| { NEW | EXISTING } SNAPSHOT HAVING SUFFIX 'string'
```
[[unload_syntax]]
=== Syntax Description of UNLOAD
* `'_target-location_'`
+
is the full pathname of the target HDFS folder where the extracted data
will be written. Enclose the name of folder in single quotes. Specify
the folder name as a full pathname and not as a relative path. You must
have write permissions on the target HDFS folder. If you run UNLOAD in
parallel, multiple files will be produced under the _target-location_.
The number of files created will equal the number of ESPs.
* `SELECT &#8230; FROM _source-table_ &#8230;`
+
is either a simple query or a complex one that contains GROUP BY, JOIN,
or UNION clauses. _source-table_ is the name of a {project-name} table that
has the source data. See <<database_object_names,Database Object Names>>.
* `[WITH _option_[ _option_]&#8230;]`
+
is a set of options that you can specify for the unload operation. If
you specify an option more than once, {project-name} returns an error with
SQLCODE -4489. You can specify one or more of these options:
** `DELIMITER { '_delimiter-string_' | _delimiter-ascii-value_ }`
+
specifies the delimiter as either a delimiter string or an ASCII value.
If you do not specify this option, {project-name} uses the character "|" as
the delimiter.
*** _delimiter-string_ can be any ASCII or Unicode string. You can also
specify the delimiter as an ASCII value. Valid values range from 1 to 255.
Specify the value in decimal notation; hexadecimal or octal
notation are currently not supported. If you are using an ASCII value,
the delimiter can be only one character wide. Do not use quotes when
specifying an ASCII value for the delimiter.
** `RECORD_SEPARATOR { '_separator-literal_' | _separator-ascii-value_ }`
+
specifies the character that will be used to separate consecutive
records or rows in the output file. You can specify either a literal
or an ASCII value for the separator. The default value is a newline character.
*** _separator-literal_ can be any ASCII or Unicode character. You can also
specify the separator as an ASCII value. Valid values range from 1 to 255.
Specify the value in decimal notation; hexadecimal or octal
notation are currently not supported. If you are using an ASCII value,
the separator can be only one character wide. Do not use quotes when
specifying an ASCII value for the separator.
** `NULL_STRING '_string-literal_'`
+
specifies the string that will be used to indicate a NULL value. The
default value is the empty string ''.
** `PURGEDATA FROM TARGET`
+
causes files in the target HDFS folder to be deleted before the unload
operation.
** `COMPRESSION GZIP`
+
uses gzip compression in the extract node, writing the data to disk in
this compressed format. GZIP is currently the only supported type of
compression. If you do not specify this option, the extracted data will
be uncompressed.
** `MERGE FILE _merged_file-path_ [OVERWRITE]`
+
merges the unloaded files into one single file in the specified
_merged-file-path_. If you specify compression, the unloaded data will
be in compressed format, and the merged file will also be in compressed
format. If you specify the optional OVERWRITE keyword, the file is
overwritten if it already exists; otherwise, {project-name} raises an error
if the file already exists.
** `NO OUTPUT`
+
prevents the UNLOAD statement from displaying status messages. By
default, the UNLOAD statement prints status messages listing the steps
that the Bulk Unloader is executing.
<<<
* `{ NEW | EXISTING } SNAPSHOT HAVING SUFFIX '_string_'`
+
initiates an HBase snapshot scan during the unload operation. During a
snapshot scan, the Bulk Unloader will get a list of the {project-name} tables
from the query explain plan and will create and verify snapshots for the
tables. Specify a suffix string, '_string_', which will be appended to
each table name.
[[unload_considerations]]
=== Considerations for UNLOAD
* You must have write permissions on the target HDFS folder.
* If a WITH option is specified more than once, {project-name} returns an
error with SQLCODE -4489.
[[unload_required_privileges]]
==== Required Privileges
To issue an UNLOAD statement, one of the following must be true:
* You are DB ROOT.
* You are the owner of the target table.
* You have the SELECT privilege on the target table.
* You have the MANAGE_LOAD component privilege for the SQL_OPERATIONS
component.
[[unload_examples]]
=== Examples of UNLOAD
* This example shows how the UNLOAD statement extracts data from a
{project-name} table, `TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS`, into an HDFS
folder, `/bulkload/customer_demographics`:
+
```
>>UNLOAD
+>WITH PURGEDATA FROM TARGET
+>MERGE FILE 'merged_customer_demogs.gz' OVERWRITE
+>COMPRESSION GZIP
+>INTO '/bulkload/customer_demographics'
+>select * from trafodion.hbase.customer_demographics
+><<+ cardinality 10e10 ,+ cardinality 10e10 >>;
Task: UNLOAD Status: Started
Task: EMPTY TARGET Status: Started
Task: EMPTY TARGET Status: Ended ET: 00:00:00.014
Task: EXTRACT Status: Started
Rows Processed: 200000
Task: EXTRACT Status: Ended ET: 00:00:04.743 Task: MERGE FILES Status: Started
Task: MERGE FILES Status: Ended ET: 00:00:00.063
--- 200000 row(s) unloaded.
```
<<<
[[update_statistics_statement]]
== UPDATE STATISTICS Statement
The UPDATE STATISTICS statement updates the histogram statistics for one
or more groups of columns within a table. These statistics are used to
devise optimized access plans.
UPDATE STATISTICS is a {project-name} SQL extension.
```
UPDATE STATISTICS FOR TABLE table [CLEAR | on-clause | sample-table-clause ]
on-clause is:
ON column-group-list CLEAR
| ON column-group-list [histogram-option]...
| ON column-group-list INCREMENTAL WHERE predicate
column-group-list is:
column-list [,column-list]...
| EVERY COLUMN [,column-list]...
| EVERY KEY [,column-list]...
| EXISTING COLUMN[S] [,column-list]...
| NECESSARY COLUMN[S] [,column-list]...
column-list for a single-column group is:
column-name
| (column-name)
| column-name TO column-name
| (column-name) TO (column-name)
| column-name TO (column-name)
| (column-name) TO column-name
column-list for a multicolumn group is:
(column-name, column-name [,column-name]...)
histogram-option is:
GENERATE n INTERVALS
| SAMPLE [sample-option]
sample-option is:
[r ROWS]
| RANDOM percent PERCENT [PERSISTENT]
| PERIODIC size ROWS EVERY period ROWS
sample-table-clause is:
CREATE SAMPLE RANDOM percent PERCENT
| REMOVE SAMPLE
```
[[update_statistics_syntax]]
=== Syntax Description of UPDATE STATISTICS
* `_table_`
+
names the table for which statistics are to be updated. To refer to a
table, use the ANSI logical name.
See <<database_object_names,Database Object Names>>.
* `CLEAR`
+
deletes some or all histograms for the table _table_. Use this option
when new applications no longer use certain histogram statistics.
+
If you do not specify _column-group-list_, all histograms for _table_
are deleted. If you specify _column-group-list_, only columns in the
group list are deleted.
* `ON _column-group-list_`
+
specifies one or more groups of columns for which to generate histogram
statistics with the option of clearing the histogram statistics. You
must use the ON clause to generate statistics stored in histogram
tables.
* `_column-list_`
+
specifies how _column-group-list_ can be defined. The column list
represents both a single-column group and a multi-column group.
** Single-column group:
*** `_column-name_ | (_column-name_) | _column-name_ TO _column-name_ |
(_column-name_) TO (_column-name_)`
+
specifies how you can specify individual columns or a group of
individual columns.
+
To generate statistics for individual columns, list each column. You can
list each single column name within or without parentheses.
** Multicolumn group:
*** `(_column-name_, _column-name_ [,_column-name_]&#8230;)`
+
specifies a multi-column group.
+
To generate multi-column statistics, group a set of columns within
parentheses, as shown. You cannot specify the name of a column more than
once in the same group of columns.
+
<<<
+
One histogram is generated for each unique column group. Duplicate
groups, meaning any permutation of the same group of columns, are
ignored and processing continues. When you run UPDATE STATISTICS again
for the same user table, the new data for that table replaces the data
previously generated and stored in the table’s histogram tables.
Histograms of column groups not specified in the ON clause remain
unchanged in histogram tables.
+
For more information about specifying columns, see
<<generating_and_clearing_statistics_for_columns,Generating and Clearing Statistics for Columns>>.
* `EVERY COLUMN`
+
The EVERY COLUMN keyword indicates that histogram statistics are to be
generated for each individual column of _table_ and any multi-columns
that make up the primary key and indexes. For example, _table_ has
columns A, B, C, D defined, where A, B, C compose the primary key. In
this case, the ON EVERY COLUMN option generates a single column
histogram for columns A, B, C, D, and two multi-column histograms of (A,
B, C) and (A, B).
+
The EVERY COLUMN option does what EVERY KEY does, with additional
statistics on the individual columns.
* `EVERY KEY`
+
The EVERY KEY keyword indicates that histogram statistics are to be
generated for columns that make up the primary key and indexes. For
example, _table_ has columns A, B, C, D defined. If the primary key
comprises columns A, B, statistics are generated for (A, B), A and B. If
the primary key comprises columns A, B, C, statistics are generated for
(A,B,C), (A,B), A, B, C. If the primary key comprises columns A, B, C,
D, statistics are generated for (A, B, C, D), (A, B, C), (A, B), and A,
B, C, D.
* `EXISTING COLUMN[S]`
+
The EXISTING COLUMN keyword indicates that all existing histograms of
the table are to be updated. Statistics must be previously captured to
establish existing columns.
* `NECESSARY COLUMN[S]`
+
The NECESSARY COLUMN[S] keyword generates statistics for histograms that
the optimizer has requested but do not exist. Update statistics
automation must be enabled for NECESSARY COLUMN[S] to generate
statistics.
<<<
* `_histogram-option_`
** `GENERATE _n_ INTERVALS`
+
The GENERATE _n_ INTERVALS option for UPDATE STATISTICS accepts values
between 1 and 10,000. Keep in mind that increasing the number of
intervals per histograms may have a negative impact on compile time.
+
Increasing the number of intervals can be used for columns with small
set of possible values and large variance of the frequency of these
values. For example, consider a column ‘CITY’ in table SALES, which
stores the city code where the item was sold, where number of cities in
the sales data is 1538. Setting the number of intervals to a number
greater or equal to the number of cities (that is, setting the number of
intervals to 1600) guarantees that the generated histogram captures the
number of rows for each city. If the specified value n exceeds the
number of unique values in the column, the system generates only as many
intervals as the number of unique values.
** `SAMPLE [_sample-option_]`
+
is a clause that specifies that sampling is to be used to gather a
subset of the data from the table. UPDATE STATISTICS stores the sample
results and generates histograms.
+
If you specify the SAMPLE clause without additional options, the result
depends on the number of rows in the table. If the table contains no
more than 10,000 rows, the entire table will be read (no sampling). If
the number of rows is greater than 10,000 but less than 1 million,
10,000 rows are randomly sampled from the table. If there are more than
1 million rows in the table, a random row sample is used to read 1
percent of the rows in the table, with a maximum of 1 million rows
sampled.
+
TIP: As a guideline, the default sample of 1 percent of the rows in the
table, with a maximum of 1 million rows, provides good statistics for
the optimizer to generate good plans.
+
If you do not specify the SAMPLE clause, if the table has fewer rows
than specified, or if the sample size is greater than the system limit,
{project-name} SQL reads all rows from _table_. See <<sample_clause,SAMPLE Clause>>.
*** `_sample-option_`
**** `r_ rows`
+
A row sample is used to read _r_ rows from the table. The value _r_ must
be an integer that is greater than zero
**** `RANDOM _percent_ PERCENT`
+
directs {project-name} SQL to choose rows randomly from the table. The value
percent must be a value between zero and 100 (0 < percent &#60;= 100). In
addition, only the first four digits to the right of the decimal point
are significant. For example, value 0.00001 is considered to be 0.0000,
Value 1.23456 is considered to be 1.2345.
***** `PERSISTENT`
+
directs {project-name} SQL to create a persistent sample table and store the
random sample in it. This table can then be used later for updating statistics
incrementally.
**** `PERIODIC _size_ ROWS EVERY _period_ ROWS`
+
directs {project-name} SQL to choose the first _size_ number of rows from
each _period_ of rows. The value _size_ must be an integer that is
greater than zero and less than or equal to the value _period_. (0 <
_size_ &#60;= _period_). The size of the _period_ is defined by the number
of rows specified for _period_. The value _period_ must be an integer
that is greater than zero (_period_ > 0).
* `INCREMENTAL WHERE _predicate_`
+
directs {project-name} SQL to update statistics incrementally. That is, instead
of taking a fresh sample of the entire table, {project-name} SQL will use a previously
created persistent sample table. {project-name} SQL will update the persistent sample
by replacing any rows satisfying the _predicate_ with a fresh sample of rows from
the original table satisfying the _predicate_. The sampling rate used is the
_percent_ specified when the persistent sample table was created. Statistics
are then generated from this updated sample. See also
<<update_statistics_incremental_update_statistics,
Incremental Update Statistics>>.
* `CREATE SAMPLE RANDOM _percent_ PERCENT`
+
Creates a persistent sample table associated with this table. The sample is
created using a random sample of _percent_ percent of the rows. The table
can then be used for later incremental statistics update.
* `REMOVE SAMPLE`
+
Drops the persistent sample table associated with this table.
[[update_statistics_considerations]]
=== Considerations for UPDATE STATISTICS
[[update_statistics_using_statistics]]
==== Using Statistics
Use UPDATE STATISTICS to collect and save statistics on columns. The SQL
compiler uses histogram statistics to determine the selectivity of
predicates, indexes, and tables. Because selectivity directly influences
the cost of access plans, regular collection of statistics increases the
likelihood that {project-name} SQL chooses efficient access plans.
While UPDATE STATISTICS is running on a table, the table is active and
available for query access.
When a user table is changed, either by changing its data significantly
or its definition, re-execute the UPDATE STATISTICS statement for the
table.
<<<
[[update_statistics_histogram_statistics]]
==== Histogram Statistics
Histogram statistics are used by the compiler to produce the best plan
for a given SQL query. When histograms are not available, default
assumptions are made by the compiler and the resultant plan might not
perform well. Histograms that reflect the latest data in a table are
optimal.
The compiler does not need histogram statistics for every column of a
table. For example, if a column is only in the select list, its
histogram statistics will be irrelevant. A histogram statistic is useful
when a column appears in:
* A predicate
* A GROUP BY column
* An ORDER BY clause
* A HAVING clause
* Or similar clause
In addition to single-column histogram statistics, the compiler needs
multi-column histogram statistics, such as when group by column-5,
column-3, column-19 appears in a query. Then, histogram statistics for
the combination (column-5, column-3, column-19) are needed.
[[update_statistics_required-privileges]]
==== Required Privileges
To perform an UPDATE STATISTICS operation, one of the following must be
true:
* You are DB ROOT.
* You are the owner of the target table.
* You have the MANAGE_STATISTICS component privilege for the
SQL_OPERATIONS component.
[[update_statistics_locking]]
==== Locking
UPDATE STATISTICS momentarily locks the definition of the user table
during the operation but not the user table itself. The UPDATE
STATISTICS statement uses READ UNCOMMITTED isolation level for the user
table.
<<<
[[update_statistics_transactions]]
==== Transactions
Do not start a transaction before executing UPDATE STATISTICS. UPDATE
STATISTICS runs multiple transactions of its own, as needed. Starting
your own transaction in which UPDATE STATISTICS runs could cause the
transaction auto abort time to be exceeded during processing.
[[update_statistics_generating_and_clearing_statistics_for_columns]]
==== Generating and Clearing Statistics for Columns
To generate statistics for particular columns, name each column, or name
the first and last columns of a sequence of columns in the table. For
example, suppose that a table has consecutive columns CITY, STATE, ZIP.
This list gives a few examples of possible options you can specify:
[cols="25%,37%,37%",options="header"]
|===
| Single-Column Group | Single-Column Group Within Parentheses | Multicolumn Group
| ON CITY, STATE, ZIP | ON (CITY),(STATE),(ZIP) | ON (CITY, STATE) or ON (CITY,STATE,ZIP)
| ON CITY TO ZIP | ON (CITY) TO (ZIP) |
| ON ZIP TO CITY | ON (ZIP) TO (CITY) |
| ON CITY, STATE TO ZIP | ON (CITY), (STATE) TO (ZIP) |
| ON CITY TO STATE, ZIP | ON (CITY) TO (STATE), (ZIP) |
|===
The TO specification is useful when a table has many columns, and you
want histograms on a subset of columns. Do not confuse (CITY) TO (ZIP)
with (CITY, STATE, ZIP), which refers to a multi-column histogram.
You can clear statistics in any combination of columns you specify, not
necessarily with the _column-group-list_ you used to create statistics.
However, those statistics will remain until you clear them.
<<<
[[update_statistics_column_lists_and_access_plans]]
==== Column Lists and Access Plans
Generate statistics for columns most often used in data access plans for
a table—that is, the primary key, indexes defined on the table, and any
other columns frequently referenced in predicates in WHERE or GROUP BY
clauses of queries issued on the table. Use the EVERY COLUMN option to
generate histograms for every individual column or multi columns that
make up the primary key and indexes.
The EVERY KEY option generates histograms that make up the primary key
and indexes.
If you often perform a GROUP BY over specific columns in a table, use
multi-column lists in the UPDATE STATISTICS statement (consisting of the
columns in the GROUP BY clause) to generate histogram statistics that
enable the optimizer to choose a better plan. Similarly, when a query
joins two tables by two or more columns, multi-column lists (consisting
of the columns being joined) help the optimizer choose a better plan.
[[update_statistics_incremental_update_statistics]]
==== Incremental Update Statistics
UPDATE STATISTICS processing time can be lengthy for very large tables.
One strategy for reducing the time is to create histograms only for
columns that actually need them (for example, using the ON NECESSARY COLUMNS
column group). Another strategy is to update statistics incrementally. These
strategies can be used together if desired.
To use the incremental update statistics feature, you must first create
statistics for the table and create a persistent sample table. One way to
do this is to perform a normal update statistics command, adding the
PERSISTENT keyword to the _sample-option_. Another way to do this if you
already have reasonably up-to-date statistics for the table, is to create
a persistent sample table separately using the CREATE SAMPLE option.
You can then perform update statistics incrementally by using the INCREMENTAL
WHERE _predicate_ syntax in the on-clause. The _predicate_ should be chosen
to describe the set of rows that have changed since the last statistics update
was performed. For example, if your table contains a column with a timestamp
giving the date and time of last update, this is a particularly useful column
to use in the _predicate_.
If you decide later that you wish to change the _percent_ sampling rate used
for the persistent sample table, you can do so by dropping the persistent
sample table (using REMOVE SAMPLE) and creating a new one (by using the
CREATE SAMPLE option). Using a higher _percent_ results in more accurate
histograms, but at the price of a longer-running operation.
<<<
[[update_statistics_examples]]
=== Examples of UPDATE STATISTICS
* This example generates four histograms for the columns jobcode,
empnum, deptnum, and (empnum, deptnum) for the table EMPLOYEE. Depending
on the table’s size and data distribution, each histogram should contain
ten intervals.
+
```
UPDATE STATISTICS FOR TABLE employee
ON (jobcode),(empnum, deptnum) GENERATE 10 INTERVALS;
--- SQL operation complete.
```
* This example generates histogram statistics using the ON EVERY COLUMN
option for the table DEPT. This statement performs a full scan, and
{project-name} SQL determines the default number of intervals.
+
```
UPDATE STATISTICS FOR TABLE dept ON EVERY COLUMN;
--- SQL operation complete.
```
* Suppose that a construction company has an ADDRESS table of potential
sites and a DEMOLITION_SITES table that contains some of the columns of
the ADDRESS table. The primary key is ZIP. Join these two tables on two
of the columns in common:
+
```
SELECT COUNT(AD.number), AD.street,
AD.city, AD.zip, AD.state
FROM address AD, demolition_sites DS
WHERE AD.zip = DS.zip AND AD.type = DS.type
GROUP BY AD.street, AD.city, AD.zip, AD.state;
```
+
To generate statistics specific to this query, enter these statements:
+
```
UPDATE STATISTICS FOR TABLE address
ON (street), (city), (state), (zip, type);
UPDATE STATISTICS FOR TABLE demolition_sites ON (zip, type);
```
* This example removes all histograms for table DEMOLITION_SITES:
+
```
UPDATE STATISTICS FOR TABLE demolition_sites CLEAR;
```
<<<
* This example selectively removes the histogram for column STREET in
table ADDRESS:
+
```
UPDATE STATISTICS FOR TABLE address ON street CLEAR;
```
* This example generates statistics but also creates a persistent
sample table for use when updating statistics incrementally:
+
```
UPDATE STATISTICS FOR TABLE address
ON (street), (city), (state), (zip, type)
SAMPLE RANDOM 5 PERCENT PERSISTENT;
```
* This example updates statistics incrementally. It assumes that
a persistent sample table has already been created. The predicate
in the WHERE clause describes the set of rows that have changed
since statistics were last updated. Here we assume that rows
with a state of California are the only rows that have changed:
+
```
UPDATE STATISTICS FOR TABLE address
ON EXISTING COLUMNS
INCREMENTAL WHERE state = 'CA';
```