docs/sql_reference/src/asciidoc/_chapters/sql_utilities.adoc - trafodion - Git at Google

 ////
 /**
 * @@@ START COPYRIGHT @@@
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 * @@@ END COPYRIGHT @@@
 */
 ////

 [[sql_utilities]]
 = SQL Utilities

 A utility is a tool that runs within {project-name} SQL and performs tasks.
 This section describes the {project-name} SQL utilities:

 [cols=","]
 |===
 | <<cleanup_statement,CLEANUP Statement>>                     | Cleans up objects whose information is inconsistent in the metadata.
 | <<load_statement,LOAD Statement>>                           | Uses the {project-name} Bulk Loader to load data from a source table, either
 a {project-name} table or a Hive table, into a target {project-name} table.
 | <<populate_index_utility,POPULATE INDEX Utility>>           | Loads indexes.
 | <<purgedata_utility,PURGEDATA Utility>>                     | Purges data from tables and indexes.
 | <<unload_statement,UNLOAD Statement>>                       | Unloads data from {project-name} tables into an HDFS location that you
 specify.
 | <<update_statistics_statement,UPDATE STATISTICS Statement>> | Updates the histogram statistics for one or more groups of columns
 within a table. These statistics are used to devise optimized access plans.
 |===

 NOTE: {project-name} SQL utilities are entered interactively or from script
 files using a client-based tool, such as the {project-name} Command Interface
 (TrafCI). To install and configure a client application that enables you
 to connect to and issue SQL utilities, see the
 http://trafodion.apache.org/docs/client_install/index.html[Trafodion Client Installation Guide].

 <<<
 [[cleanup_statement]]
 == CLEANUP Statement

 The CLEANUP Statement cleans up objects whose information is inconsistent in the metadata. This command can be used to clean up individual objects or all objects in a schema. This command can also be used to locate all inconsistent objects in the Trafodion metadata and then clean them up.

 Inconsistent objects can be cleaned up and removed from the metadata and HBase by using one of the following options.

 ```
 CLEANUP object-type object-name [ ,UID {object-uid} ]

 CLEANUP [ PRIVATE | SHARED ] SCHEMA {schema-name}

 CLEANUP UID {object-uid}

 CLEANUP METADATA [ , CHECK ] [ , RETURN DETAILS ]
 ```

 ```
 object-type is: TABLE | INDEX | SEQUENCE | OBJECT

 object-name is: Name of the object that needs to be cleaned up

 object-uid is: UID (Unique ID) of the object that is stored in metadata

 schema-name is: Name of the schema whose contents need to be cleaned up
 ```

 [[cleanup_syntax]]
 === Syntax Description of CLEANUP Statement

 * `CLEANUP object-type object-name [ , UID {object-uid} ]`
 +
 This command will clean up the specified object from all Trafodion metadata tables
 and HBase. If an object cannot be found in the metadata but exists in HBase,
 then it will be removed from HBase. All dependent objects, such as indexes,
 views, LOBs, internal sequences, and constraints, are also removed from the
 metadata and HBase.

 ** `object-type`

 *** If an object type is specified as TABLE, INDEX, or SEQUENCE and that object exists
 in the metadata, then a check will be done to make sure that the specified `object-name` is of the specified type. An error is returned if they are not the same.

 *** If an object type is specified as OBJECT, then any object that matches the
 specified name is removed from the metadata and HBase.

 ** `object-name`
 +
 Specifies the name of the object to be removed.

 ** `object-uid`
 +
 If an object exists in the OBJECTS metadata table, then the stored object UID is compared with the specified object UID. If they do not match, then cleanup is not done. If an object does not exist in the OBJECTS metadata table, then all entries matching the specified object UID are removed from the metadata tables.

 * `CLEANUP [ PRIVATE | SHARED ] SCHEMA {schema-name}`
 +
 This command will clean up all objects that are created in the specified schema.

 * `CLEANUP UID {object-uid}`
 +
 This command will remove all entries from all metadata tables that match the specified object UID. If the specified object UID exists in the OBJECTS metadata table and the object name associated with that UID can be retrieved, then that object is also removed from HBase.

 * `CLEANUP METADATA [ , CHECK ] [ , RETURN DETAILS ]`
 +
 This command will look for inconsistent entries in the metadata, return information about them, and clean them up. Multiple checks are done to validate metadata consistency.

 ** `CHECK`
 +
 One can check for inconsistent metadata entries without actually cleaning them up.
 This can be done by specifying the `CHECK` option. If `CHECK` is specified, the following checks will be performed simultaneously, and a summary of inconsistent objects belonging to the categories listed below will be returned.

 *** Orphan Metadata Objects
 +
 This check is done to validate that objects that exist in metadata also exist in HBase. If the object does not exist in HBase, then it is marked as an orphan metadata object.

 *** Orphan HBase Objects
 +
 This check is done to validate that objects that exist in HBase also exist in metadata. If the object does not exist in metadata, then it is marked as an orphan HBase object.

 *** Inconsistent Object Entries
 +
 This check is done to make sure that all entries in the OBJECTS table also exist in
 the OBJECTS unique index (OBJECTS_UNIQ_IDX) and that all entries in OBJECTS_UNIQ_IDX also exist in the OBJECTS table. It also checks to make sure that all entries in the COLUMNS table have a corresponding entry in the OBJECTS table. If any of the previous checks fail, then that entry is marked as inconsistent.

 *** Inconsistent View Entries
 +
 This check is done to make sure that all view entries are consistent among the views and the OBJECTS table.

 ** `RETURN DETAILS`
 +
 If `RETURN DETAILS` is specified, then details on which objects are inconsistent is also returned.

 <<<
 [[cleanup_considerations]]
 === Considerations for CLEANUP Statement

 [[security_and_privileges]]
 ==== Security and Privileges

 To clean up an object, one needs to have DROP privileges. If object privileges cannot be retrieved from the metadata or if metadata cleanup is being done, then one needs to be DB_ROOT to perform the cleanup.

 [[consistency_checks_during_access]]
 ==== Consistency Checks During Access

 If an object exists in the metadata but does not exist in HBase, then error 4254 will be returned if that object is accessed during any DML or DDL operation.

 An object is also marked as inconsistent during any complex DDL operation. For example, if a CREATE TABLE also creates implicit indexes or constraints, then the object will be marked as inconsistent until the complex DDL operation is completed. If an abnormal error happens (such as a process crash or a node failure), then that object remains inconsistent. Any attempt to access that object will return error 4254.

 An inconsistent object will need to be cleaned up before it can be accessed.

 <<<
 [[cleanup_examples]]
 === Examples of CLEANUP Statement

 * Accessing Inconsistent Objects
 +
 Table T026T1 has been marked as inconsistent from a previous session. Any operation on it will return error 4254. For this test case, a metadata update was done, and the table was marked as invalid in the OBJECTS table.

 +
 ```
 >>invoke t026t1;

 *** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.

 *** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.

 --- SQL operation failed with errors.
 >>drop table t026t1;

 *** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.

 --- SQL operation failed with errors.
 >>select * from t026t1;

 *** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.

 *** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.

 *** ERROR[8822] The statement was not prepared.

 >>
 ```

 * Accessing Incorrect Objects
 +
 If an object type is specified, then the object stored in the metadata must match it. In this example, object ‘TI’ is an index that cannot be cleaned up as a table.

 +
 ```
 >>create index ti on t(b);

 --- SQL operation complete.
 >>cleanup table ti;

 *** ERROR[1389] Object TI does not exist in Trafodion.

 *** ERROR[4256] Object type specified for this object does not match the object type stored in metadata.

 --- SQL operation failed with errors.
 >>
 ```

 * Cleaning Up an Inconsistent Object
 +
 A CLEANUP command will need to be used to clean up an object in invalid state.

 +
 ```
 >>drop table t026t1;

 *** ERROR[4254] Object TRAFODION.SCH026.T026T1 has invalid state and cannot be accessed. Use cleanup command to drop it.

 --- SQL operation failed with errors.
 >>cleanup  table t026t1;

 --- SQL operation complete.
 >>invoke t026t1;

 *** ERROR[4082] Object TRAFODION.SCH026.T026T1 does not exist or is inaccessible.

 --- SQL operation failed with errors.
 >>
 ```

 +
 DROP TABLE, CREATE TABLE, DROP INDEX, and CREATE INDEX failures will call cleanup.

 * Cleaning Up an Object by Specifying its UID
 +
 Specify object T by its object UID and clean it up.

 +
 ```
 >>select object_uid from "_MD_".objects where schema_name = 'SCH026' and object_name = 'T';

 OBJECT_UID
 --------------------

   127287511636506969

 --- 1 row(s) selected.
 >>cleanup uid 127287511636506969;

 --- SQL operation complete.
 >>invoke t;

 *** ERROR[4082] Object TRAFODION.SCH026.T does not exist or is inaccessible.

 --- SQL operation failed with errors.
 >>
 ```

 * Metadata Cleanup
 +
 This example shows that the metadata is consistent and that there are no issues with it.

 +
 ```
 >>cleanup metadata, check, return details;
 Metadata Cleanup: started, check only

   Start: Cleanup Orphan Objects Entries
   End:   Cleanup Orphan Objects Entries (0 entries found)

   Start: Cleanup Orphan Hbase Entries
   End:   Cleanup Orphan Hbase Entries (0 entries found)

   Start: Cleanup Inconsistent Objects Entries
   End:   Cleanup Inconsistent Objects Entries (0 entries found)

   Start: Cleanup Inconsistent Views Entries
   End:   Cleanup Inconsistent Views Entries (0 entries found)

 Metadata Cleanup: done

 --- SQL operation complete.
 >>
 ```

 +
 This example shows inconsistencies in the metadata:

 +
 ```
 >>cleanup metadata, check, return details;
 Metadata Cleanup: started, check only

   Start: Cleanup Orphan Objects Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T10
   End:   Cleanup Orphan Objects Entries (1 entry found)

   Start: Cleanup Orphan Hbase Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T11
   End:   Cleanup Orphan Hbase Entries (1 entry found)

   Start: Cleanup Inconsistent Objects Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T12
     Entry #2(UID):    29899934458095403
   End:   Cleanup Inconsistent Objects Entries (2 entries found)

   Start: Cleanup Inconsistent Views Entries
   End:   Cleanup Inconsistent Views Entries (0 entries found)

 Metadata Cleanup: done

 --- SQL operation complete.
 >>
 ```

 +
 Running the CLEANUP METADATA command cleans up the inconsistent metadata:

 +
 ```
 >>cleanup metadata, return details;
 Metadata Cleanup: started

   Start: Cleanup Orphan Objects Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T10
   End:   Cleanup Orphan Objects Entries (1 entry cleaned up)

   Start: Cleanup Orphan Hbase Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T11
   End:   Cleanup Orphan Hbase Entries (1 entry cleaned up)

   Start: Cleanup Inconsistent Objects Entries
     Entry #1(OBJECT): TRAFODION.SCH026.T026T12
     Entry #2(UID):    29899934458095403
   End:   Cleanup Inconsistent Objects Entries (2 entries cleaned up)

   Start: Cleanup Inconsistent Views Entries
   End:   Cleanup Inconsistent Views Entries (0 entries cleaned up)

 Metadata Cleanup: done


 --- SQL operation complete.
 >>
 >>cleanup metadata, check, return details;
 Metadata Cleanup: started, check only

   Start: Cleanup Orphan Objects Entries
   End:   Cleanup Orphan Objects Entries (0 entries found)

   Start: Cleanup Orphan Hbase Entries
   End:   Cleanup Orphan Hbase Entries (0 entries found)

   Start: Cleanup Inconsistent Objects Entries
   End:   Cleanup Inconsistent Objects Entries (0 entries found)

   Start: Cleanup Inconsistent Views Entries
   End:   Cleanup Inconsistent Views Entries (0 entries found)

 Metadata Cleanup: done


 --- SQL operation complete.
 ```

 <<<
 [[load_statement]]
 == LOAD Statement

 The LOAD statement uses the {project-name} Bulk Loader to load data from a
 source table, either a {project-name} table or a Hive table, into a target
 {project-name} table. The {project-name} Bulk Loader prepares and loads HFiles
 directly in the region servers and bypasses the write path and the cost
 associated with it. The write path begins at a client, moves to a region
 server, and ends when data eventually is written to an HBase data file
 called an HFile.

 The {project-name} bulk load process takes place in the following phases:

 * *Disable Indexes* (if incremental index build not used)

 * *Prepare* (takes most time, heart of the bulk load operation)
 ** Read source files ({project-name} Table, Hive table, or Hive external table)
 ** Data encoded in {project-name} encoding
 ** Data repartitioned and sorted to match regions of target table
 ** Data written to HFiles
 ** Data repartitioned and written to index HFiles (if incremental index build IS used)

 * *Complete* (with or without Snapshot recovery)
 ** Take a snapshot of the table
 ** Merge HFiles into HBase table (very fast – move, not a copy)
 ** Delete snapshot or restore from snapshot if merge fails

 * *Populate Indexes* (if incremental index build is NOT used)

 * *Cleanup*
 ** HFiles temporary space cleanup

 LOAD is a {project-name} SQL extension.

 ```
 LOAD [WITH option[[,] option]...] INTO target-table SELECT ... FROM source-table

 option is:
     CONTINUE ON ERROR
   | LOG ERROR ROWS [TO error-location-name]
   | STOP AFTER num ERROR ROWS
   | TRUNCATE TABLE
   | NO POPULATE INDEXES
   | NO DUPLICATE CHECK
   | NO OUTPUT
   | NO RECOVERY
   | INDEX TABLE ONLY
   | REBUILD INDEXES
   | UPSERT USING LOAD
 ```

 [[load_syntax]]
 === Syntax Description of LOAD

 * `_target-table_`
 +
 is the name of the target {project-name} table where the data will be loaded.
 See <<database_object_names,Database Object Names>>.

 * `_source-table_`
 +
 is the name of either a {project-name} table or a Hive table that has the
 source data. Hive tables can be accessed in {project-name} using the
 HIVE.HIVE schema (for example, hive.hive.orders). The Hive table needs
 to already exist in Hive before {project-name} can access it. If you want to
 load data that is already in an HDFS folder, then you need to create an
 external Hive table with the right fields and pointing to the HDFS
 folder containing the data. You can also specify a WHERE clause on the
 source data as a filter.

 * `[WITH _option_[[,] _option_]&#8230;]`
 +
 is a set of options that you can specify for the load operation. You can
 specify one or more of these options:

 ** `CONTINUE ON ERROR`
 +
 LOAD statement will continue after ignorable errors while scanning rows from source table or loading into the target table. The ignorable errors are usually data conversion errors.
 +
 Errors during the load or sort phase will cause the LOAD statement to abort.
 +
 This option is implied if `LOG ERROR ROWS [TO _error-location-name_]` or `STOP AFTER _num_ ERROR ROWS` is specified.

 ** `LOG ERROR ROWS [TO _error-location-name_]`
 *** Error rows
 +
 If error rows must be written to a specified location, then specify TO _error-location-name_, otherwise they will be written to the default location.
 `_error-location-name_` must be a HDFS directory name to which trafodion has write access.
 +
 Error rows will be logged in HDFS files in the *directory* `/user/trafodion/bulkload/logs` if the error log location is not specified.
 +
 The default name of the *subdirectory* is `_ERR_catalog.schema.target_table_date_id_`, where `_id_` is a numeric identifier timestamp (YYYYMMDD_HHMMSS) unique to the process where the error was seen.
 +
 The default name of the *error file* is `_loggingFileNamePrefix_catalog.schema.target_table_instanceID_`, where `_loggingFileNamePrefix_` is hive_scan_err or traf_upsert_err depending on the data source table, and `_instanceID_` is the instance ID starting from 0, generally there is only one instance.
 +
 For example, the full path of the table test_load_log is `/user/trafodion/bulkload/logs/test/ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918/traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0`,
 +
 where:
 +
 1. `/user/trafodion/bulkload/logs/test` is the default name of *directory*.
 +
 2. `ERR_TRAFODION.SEABASE.TEST_LOAD_LOG_20171218_035918` is the default name of *subdirectory*.
 +
 3. `traf_upsert_err_TRAFODION.SEABASE.TEST_LOAD_LOG_0` is the default name of *error file*.

 *** Error logs
 +
 Error logs are written in separate files by the processes involved in the load command under sub-directory representing the load command in the given location.
 +
 The actual log file location is displayed in the load command output. It is recommended that you use the same location for load as it’s easier to find the error logs.

 ** `STOP AFTER _num_ ERROR ROWS`
 +
 Use this option to stop the LOAD statement from progressing after num errors have been encountered during this statement.
 +
 Duplicate check errors and errors which we have continued from using previous options are included in the count.

 *** _num_
 +
 specifies the number of error rows.

 ** `TRUNCATE TABLE`
 +
 causes the Bulk Loader to truncate the target table before starting the
 load operation. By default, the Bulk Loader does not truncate the target
 table before loading data.

 ** `NO POPULATE INDEXES`
 +
 specifies that the Bulk Loader not handle index maintenance or populate
 the indexes. By default, the Bulk Loader handles index maintenance,
 disabling indexes before starting the load operation and populating them
 after the load operation is complete.

 ** `NO DUPLICATE CHECK`
 +
 causes the Bulk Loader to ignore duplicates in the source data. By
 default, the Bulk Loader checks if there are duplicates in the source
 data and generates an error when it detects duplicates.

 ** `NO OUTPUT`
 +
 prevents the LOAD statement from displaying status messages. By default,
 the LOAD statement prints status messages listing the steps that the
 Bulk Loader is executing.

 ** `NO RECOVERY`
 +
 specifies that the Bulk Loader not use HBase snapshots for recovery. By default, the Bulk Loader handles recovery using the HBase snapshots mechanism.

 ** `INDEX TABLE ONLY`
 +
 specifies that the target table, which is an index, be populated with
 data from the parent table.

 ** `REBUILD INDEXES`
 +
 specifies that indexes of the target table will be updated automatically when the source table
 is updated.
 +
 This is the default behavior of the LOAD Statement, that is, even if this option is not
 specified, the LOAD Statement will rebuild indexes unless the
 CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE` is turned *ON*. This CQD is turned *OFF* by default.
 In this case, the elapsed time of using LOAD Statement without options is nearly equal to
 using LOAD WITH REBUILD INDEXES, and the elapsed time of the former is less than the latter
 if this CQD is turned ON.

 ** `UPSERT USING LOAD`
 +
 specifies that data is inserted into target tables using rowset inserts without transactions.
 +
 In this case, Trafodion uses regular HBase interface (Put) rather than HBase Bulk Loader when inserting multiple rows at a time.

 <<<
 [[load_considerations]]
 === Considerations for LOAD

 [[load_required_privileges]]
 ==== Required Privileges

 To issue a LOAD statement, one of the following must be true:

 * You are DB ROOT.
 * You are the owner of the target table.
 * You have these privileges:
 ** SELECT and INSERT privileges on the target table
 ** DELETE privilege on the target table if TRUNCATE TABLE is specified
 * You have the MANAGE_LOAD component privilege for the SQL_OPERATIONS component.

 [[load_configuration_before_running_load]]
 ==== Configuration Before Running LOAD

 Before running the LOAD statement, make sure that you have configured
 the staging folder, source table, and HBase according to these
 guidelines.

 ==== Staging Folder for HFiles

 The Bulk Loader uses an HDFS folder as a staging area for the HFiles
 before calling HBase APIs to merge them into the {project-name} table.

 By default, {project-name} uses /bulkload as the staging folder. This folder
 must be owned by the same user as the one under which {project-name} runs. {project-name}
 also must have full permissions on this folder. The HBase user (that is,
 the user under which HBase runs) must have read/write access to this
 folder.

 Example:

 ```
 drwxr-xr-x - trafodion trafodion 0 2014-07-07 09:49 /bulkload.
 ```

 <<<
 ==== Improving Throughput

 The following CQD (Control Query Default) settings help improve the Bulk Loader
 throughput:

 * `TRAF_LOAD_MAX_HFILE_SIZE`
 +
 Specifies the HFile size limit beyond which the current file is closed and a
 new file is created for the same partition. Adjust this setting to minimize
 HBase splitting/merging.
 +
 *Default*: 10GB

 * `TRAF_LOAD_PREP_TMP_LOCATION`
 +
 Specifies the HDFS directory where HFiles are created during load.
 +
 *Default*: `/bulkload`

 Also, consider using `NO DUPLICATE CHECK` to improve througput if your
 source data is clean.

 ==== Hive Source Table

 To load data stored in HDFS, you need to create a Hive table with
 the right fields and types pointing to the HDFS folder containing the
 data before you start the load.

 ==== HBase Snapshots

 If you do not specify the NO RECOVERY OPTION in the LOAD statement, the
 Bulk Loader uses HBase snapshots as a mechanism for recovery. Snapshots
 are a lightweight operation where some metadata is copied. (Data is not
 copied.)

 A snapshot is taken before the load starts and is removed after
 the load completes successfully. If something goes wrong and it is
 possible to recover, the snapshot is used to restore the table to its
 initial state before the load started. To use this recovery mechanism,
 HBase needs to be configured to allow snapshots.

 ==== Incremental Loads

 The Bulk Loader allows for incremental loads by default. Snapshots are
 taken before second phase starts and deleted once the bulk load completes.

 If something goes wrong with the load, then the snapshot is restored to
 go to the previous state.

 <<<
 ==== Non-Incremental Loads

 These following bulk load options can be used to do non-incremental load:

 * `NO RECOVERY`: Do not take a snapshot of the table.
 * `TRUNCATE TABLE`: Truncates the table before starting the load.

 ==== Space Usage

 The target table values for SYSKEY, SALT, identity, divisioning columns
 are created automatically the during transformation step. The size of the
 HBase files is determined based on encoding, compression, HDFS replication
 factor, and row format. Target table can be pre-split into regions using
 salting, a Java Program, by seeding the table with data.

 ==== Performance

 The overall throughput is influenced by row format, row length, number of
 columns, skew in data, etc. LOAD sas upsert semantics (duplicate constraint
 not checked with existing data). LOAD has lower CPU abd disk activity than
 similar trickle load (INSERT, UPSERT, or UPSERT USING LOAD), Also, LOAD has
 lower compaction activity after completion than Trickle Load.

 ==== Hive Scans

 Direct access for Hive table data supports:

 * Only text input format and sequence files.
 * Only structured data types.

 Tables must be created/dropped/altered through Hive itself.

 {project-name}:

 * Reads Hive metadata to determine information about table.
 * UPDATE STATISTICS can be performed on Hive tables - improves performance!
 * Can write to Hive tables in both Text and Sequence formats (used by UNLOAD).

 <<<
 [[load_examples]]
 === Examples of LOAD

 [[continue_on_error_examples]]
 ==== Examples of `CONTINUE ON ERROR`

 Suppose that we have following tables:

 _source_table_:

 ```
 SQL>SELECT * FROM source_table;
 A            B
 -----------  ----
           1  aaaa
           2  bbbb
           3  cccc
           4  dd
           5  ee
           6  fff
 --- 6 row(s) selected.

 SQL>SHOWDDL source_table;
 CREATE TABLE TRAFODION.SEABASE.SOURCE_TABLE
   (
     A                                INT DEFAULT NULL NOT SERIALIZED
   , B                                CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 _target_table1_:

 ```
 SQL>SELECT * FROM target_table1;
 --- 0 row(s) selected.

 SQL>SHOWDDL target_table1;
 CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE1
   (
     A                                INT DEFAULT NULL NOT SERIALIZED
   , B                                CHAR(3) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```
 * The examples below demonstrate how the LOAD Statement behaves without and with `CONTINUE ON ERROR`, when ignorable data conversion errors occur.

 ** When loading data from _source_table_ into _target_table1_ if `CONTINUE ON ERROR` is not specified, the operation fails with a data conversion error.
 +
 ```
 SQL>LOAD INTO target_table1 SELECT * FROM source_table;

 UTIL_OUTPUT
 -------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-03 16:15:53.222441
 Task:  CLEANUP         Status: Ended      Time: 2018-01-03 16:15:53.250826
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.028
 Task:  LOADING DATA    Status: Started    Time: 2018-01-03 16:15:53.250909
 *** ERROR[8402] A string overflow occurred during the evaluation of a character expression. Conversion of Source Type:CHAR(REC_BYTE_F_ASCII,4 BYTES,ISO88591) Source Value:aaaa to Target Type:CHAR(REC_BYTE_F_ASCII,3 BYTES,ISO88591). [2018-01-03 16:15:54]
 ```

 ** When loading data from _source_table_ into _target_table1_ if `CONTINUE ON ERROR` is specified, the operation succeeds after ignorable data conversion errors.
 +
 ```
 SQL>LOAD WITH CONTINUE ON ERROR INTO target_table1 SELECT * FROM source_table;

 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Ended      Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  PREPARATION     Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
        Rows Processed: 3
 Task:  PREPARATION     Status: Ended      ET: 00:00:03.151
 Task:  COMPLETION      Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  COMPLETION      Status: Ended      ET: 00:00:01.137
 --- 3 row(s) loaded.

 UTIL_OUTPUT
 -------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-03 16:19:43.543405
 Task:  CLEANUP         Status: Ended      Time: 2018-01-03 16:19:43.568828
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.025
 Task:  LOADING DATA    Status: Started    Time: 2018-01-03 16:19:43.568899
        Rows Processed: 6
        Error Rows:     3
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-03 16:19:44.211150
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:00.642
 Task:  COMPLETION      Status: Started    Time: 2018-01-03 16:19:44.211192
        Rows Loaded:    3
 Task:  COMPLETION      Status: Ended      Time: 2018-01-03 16:19:45.171458
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:00.960
 --- SQL operation complete.

 SQL>SELECT * FROM target_table1;
 A           B
 ----------- ----
           4 dd
           5 ee
           6 fff
 --- 3 row(s) selected.
 ```

 [[log_error_rows_examples]]
 ==== Examples of `LOG ERROR ROWS [TO error-location-name]`

 Suppose that we have two same tables (_source_table_ and _target_table1_) as shown in the <<continue_on_error_examples,Examples of `CONTINUE ON ERROR`>>.

 ** This example explains how the LOAD statement loads data and logs error rows to the default directory `user/trafodion/bulkload/logs`.
 +
 ```
 SQL>LOAD WITH LOG ERROR ROWS INTO target_table1 SELECT * FROM source_table;

 UTIL_OUTPUT
 -------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-03 16:23:03.142862
 Task:  CLEANUP         Status: Ended      Time: 2018-01-03 16:23:03.151725
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.009
 Logging Location: /user/trafodion/bulkload/logs/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_082303
 Task:  LOADING DATA    Status: Started    Time: 2018-01-03 16:23:03.151815
        Rows Processed: 6
        Error Rows:     3
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-03 16:23:03.920270
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:00.768
 Task:  COMPLETION      Status: Started    Time: 2018-01-03 16:23:03.920313
        Rows Loaded:    3
 Task:  COMPLETION      Status: Ended      Time: 2018-01-03 16:23:04.301579
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:00.381
 --- SQL operation complete.

 SQL>SELECT * FROM target_table1;
 A           B
 ----------- ----
           4 dd
           5 ee
           6 fff
 --- 3 row(s) selected.

 [root@cent-1 bin]$ hdfs dfs -ls /user/trafodion/bulkload/logs/
 Found 1 items
 drwxr-xr-x   - trafodion trafodion          0 2018-01-13 16:23
 /user/trafodion/bulkload/logs/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_082303
 ```

 ** This example shows how the LOAD statement loads and logs error rows to the specified directory `user/trafodion/bulkload/error_log`.
 +
 ```
 SQL>LOAD WITH LOG ERROR ROWS TO '/BULKLOAD/ERROR_LOG' INTO target_table1 SELECT * FROM source_table;

 UTIL_OUTPUT
 -------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-03 17:19:43.436689
 Task:  CLEANUP         Status: Ended      Time: 2018-01-03 17:19:43.456761
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.020
 Logging Location: /bulkload/error_log/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_091943
 Task:  LOADING DATA    Status: Started    Time: 2018-01-03 17:19:43.456804
        Rows Processed: 6
        Error Rows:     3
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-03 17:19:43.722825
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:00.266
 Task:  COMPLETION      Status: Started    Time: 2018-01-03 17:19:43.722868
        Rows Loaded:    3
 Task:  COMPLETION      Status: Ended      Time: 2018-01-03 17:19:44.591544
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:00.869
 --- SQL operation complete.

 [root@cent-1 bin]$ hdfs dfs -ls /bulkload/error_log
 Found 1 items
 drwxr-xr-x   - trafodion trafodion          0 2018-01-03 17:19
 /bulkload/error_log/ERR_TRAFODION.SEABASE.TARGET_TABLE_20180103_091943
 ```

 [[stop_after_num_error_rows_examples]]
 ==== Examples of `STOP AFTER num ERROR ROWS`

 Suppose that we have two same tables (_source_table_ and _target_table1_) as shown in the <<continue_on_error_examples,Examples of `CONTINUE ON ERROR`>>.

 The examples below illustrate how the LOAD Statement behaves depending on the different `num`.

 ```
 SQL>LOAD WITH STOP AFTER 2 ERROR ROWS INTO target_table1 SELECT * FROM source_table;

 UTIL_OUTPUT
 ---------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-05 10:53:52.20569
 Task:  CLEANUP         Status: Ended      Time: 2018-01-05 10:53:52.45689
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.025
 Task:  LOADING DATA    Status: Started    Time: 2018-01-05 10:53:52.45757
 *** WARNING[8114] The number of error rows is 3 [2018-01-05 10:53:53]
 *** ERROR[8113] The maximum number of error rows is exceeded. [2018-01-05 10:53:53]
 *** WARNING[8114] The number of error rows is 3 [2018-01-05 10:53:53]

 SQL>SELECT * FROM target_table1;
 --- 0 row(s) selected.
 ```

 ```
 SQL>LOAD WITH STOP AFTER 3 ERROR ROWS INTO target_table1 SELECT * FROM source_table;

 UTIL_OUTPUT
 ---------------------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-05 15:55:58.975459
 Task:  CLEANUP         Status: Ended      Time: 2018-01-05 15:55:59.20219
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.045
 Task:  LOADING DATA    Status: Started    Time: 2018-01-05 15:55:59.20322
        Rows Processed: 6
        Error Rows:     3
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-05 15:55:59.960109
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:00.940
 Task:  COMPLETION      Status: Started    Time: 2018-01-05 15:55:59.960180
        Rows Loaded:    3
 Task:  COMPLETION      Status: Ended      Time: 2018-01-05 15:56:00.448496
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:00.488
 --- SQL operation complete.

 SQL>SELECT * FROM target_table1;
 A           B
 ----------- ----
           4 dd
           5 ee
           6 fff
 --- 3 row(s) selected.
 ```

 [[index_table_only_examples]]
 ==== Examples of `INDEX TABLE ONLY`

 Suppose that we have following tables:

 _source_table_:
 ```
 SQL>SELECT * FROM source_table;
 A            B
 -----------  ----
           1  aaaa
           2  bbbb
           3  cccc
           4  dd
           5  ee
           6  fff
 --- 6 row(s) selected.

 SQL>SHOWDDL source_table;
 CREATE TABLE TRAFODION.SEABASE.SOURCE_TABLE
   (
     A                                INT DEFAULT NULL NOT SERIALIZED
   , B                                CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 _target_table1_:
 ```
 SQL>SELECT * FROM target_table1;
 --- 0 row(s) selected.

 SQL>SHOWDDL target_table1;
 CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE1
   (
     A                                INT DEFAULT NULL NOT SERIALIZED
   , B                                CHAR(3) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 _target_table2_:
 ```
 SQL>SELECT * FROM target_table2;
 --- 0 row(s) selected.

 SQL>SHOWDDL target_table2;
 CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE2
   (
     A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
       SERIALIZED
   , B                                CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   , PRIMARY KEY (A ASC)
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 _target_table3_:
 ```
 SELECT * FROM target_table3;
 --- 0 row(s) selected.

 SHOWDDL target_table3;
 CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE3
   (
     A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
       SERIALIZED
   , B                                CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   , PRIMARY KEY (A ASC)
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 _target_table4_:
 ```
 SELECT * FROM target_table4;
 --- 0 row(s) selected.

 CREATE TABLE TRAFODION.SEABASE.TARGET_TABLE4
   (
     A                                INT DEFAULT NULL NOT SERIALIZED
   , B                                CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT DEFAULT NULL NOT SERIALIZED
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.
 ```

 * The examples below demonstrate how the index table and target tabel get populated.
 ** The index table gets populated, while the target table does not get populated if `NO POPULATE` is specified.
 +
 ```
 SQL>CREATE INDEX index_target_table1 ON target_table1(b) NO POPULATE;
 --- SQL operation complete.

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table1) SELECT b,a FROM source_table;
 --- SQL operation complete.

 SQL>SELECT * FROM target_table1;
 --- 0 row(s) selected.

 SELECT * FROM table(index_table index_target_table1);
 B@    A
 ----  --------------------
 aaaa                     1
 bbbb                     2
 cccc                     3
 dd                       4
 ee                       5
 fff                      6
 --- 6 row(s) selected.
 ```

 ** The index table gets populated, and the target table gets populated as well if `NO POPULATE` is not specified.
 +
 ```
 SQL>CREATE INDEX index_target_table1 ON target_table1(b);
 --- SQL operation complete.

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table1) SELECT b,a FROM source_table;
 --- SQL operation complete.

 SQL>SELECT * FROM target_table1;
 A            B
 -----------  ----
           1  aaaa
           2  bbbb
           3  cccc
           4  dd
           5  ee
           6  fff
 --- 6 row(s) selected.

 SQL>SELECT * FROM table(index_table index_target_table1);
 B@    A
 ----  --------------------
 aaaa                     1
 bbbb                     2
 cccc                     3
 dd                       4
 ee                       5
 fff                      6
 --- 6 row(s) selected.
 ```

 * The examples below illustrate that how to populate index tables depending on different target tables.
 ** The _target_table2_ has columns A (primary key column) and B (index column) defined, in this case, populate the index table with columns B and A from the _source_table_.
 +
 ```
 SQL>CREATE INDEX index_target_table2 ON target_table2(b) NO POPULATE;
 --- SQL operation complete.

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table2);
 CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE2
   (
     "B@"                             CHAR(4) CHARACTER SET ISO88591 COLLATE
       DEFAULT NO DEFAULT NOT SERIALIZED
   , A                                INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
       SERIALIZED
   , PRIMARY KEY ("B@" ASC, A ASC)
   )
  ATTRIBUTES ALIGNED FORMAT
 ;

 SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table2) SELECT b,a FROM source_table;
 --- SQL operation complete.

 SQL>SELECT * FROM target_table2;
 --- 0 row(s) selected.

 SQL>SELECT * FROM TABLE(INDEX_TABLE index_target_table2);
 B@    A
 ----  --------------------
 aaaa                     1
 bbbb                     2
 cccc                     3
 dd                       4
 ee                       5
 fff                      6
 --- 6 row(s) selected.
 ```

 ** The _target_table3_ has columns A (primary key column and index column) and B defined, in this case, populate the index table with column A from the _source_table_.
 +
 ```
 SQL>CREATE INDEX index_target_table3 ON target_table3(a) NO POPULATE;
 --- SQL operation complete.

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table3);
 CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE3
   (
     "A@"                             INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
       SERIALIZED
   , PRIMARY KEY ("A@" ASC)
   )
  ATTRIBUTES ALIGNED FORMAT
 ;

 SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table3) SELECT a FROM source_table;
 --- SQL operation complete.

 SQL>SELECT * FROM target_table3;
 --- 0 row(s) selected.

 SQL> SELECT * FROM TABLE(INDEX_TABLE index_target_table3);
 A@
 -----------
           1
           2
           3
           4
           5
           6
 --- 6 row(s) selected.
 ```

 ** The _target_table4_ has columns A (index column) and B defined, in this case, populate the index table with column A and syskey from the _source_table_.
 +
 ```
 SQL> CREATE INDEX index_target_table4 ON target_table4(a) NO POPULATE;
 --- SQL operation complete.

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SHOWDDL TABLE(INDEX_TABLE index_target_table4);
 CREATE TABLE TRAFODION.SEABASE.INDEX_TARGET_TABLE4
   (
     "A@"                             INT NO DEFAULT NOT SERIALIZED
   , SYSKEY                           LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
       NOT SERIALIZED
   , PRIMARY KEY ("A@" ASC, SYSKEY ASC)
   )
  ATTRIBUTES ALIGNED FORMAT
 ;
 --- SQL operation complete.

 SQL>LOAD WITH INDEX TABLE ONLY INTO TABLE(INDEX_TABLE index_target_table4) SELECT a,syskey FROM source_table;
 --- SQL operation complete.

 SQL>SELECT * FROM target_table4;
 --- 0 row(s) selected.

 SQL>SELECT * FROM TABLE(INDEX_TABLE index_target_table4);
 A@           SYSKEY
 -----------  --------------------
           1   4239726128363214004
           2   4239726128363256924
           3   4239726128363258834
           4   4239726128363260240
           5   4239726128363261628
           6   4239726128363263088
 --- 6 row(s) selected.
 ```

 NOTE: At this moment, if you want to drop the index, such as _index_target_table2_, _index_target_table3_ or _index_target_table4_ created above, please populate the index from its parent table before dropping it, see the example below. For more information, see <<populate_index_utility,POPULATE INDEX Utility>>.

 ```
 SQL> DROP INDEX index_target_table4;
 *** ERROR[4254] Object TRAFODION.SEABASE.INDEX_TARGET_TABLE4 has invalid state and cannot be accessed. Use cleanup command to drop it.

 SQL> POPULATE INDEX index_target_table4 ON target_table4;
 --- SQL operation complete.

 SQL> DROP INDEX index_target_table4;
 --- SQL operation complete.
 ```

 [[rebuild_indexes_examples]]
 ==== Examples of `REBUILD INDEXES`

 Suppose that we have following tables:

 _source_table_:

 ```
 SQL>select count(*) from source_table;
 (EXPR)
 --------------------
              1000000

 --- 1 row(s) selected.
 ```

 _target_table1_ and _target_table2_ both have the following structure:

 ```
 SQL>CREATE TABLE target_table1
   (
     ID                               INT NO DEFAULT NOT NULL NOT DROPPABLE NOT
       SERIALIZED
   , NUM                              INT DEFAULT NULL NOT SERIALIZED
   , CARD_ID                          LARGEINT DEFAULT NULL NOT SERIALIZED
   , PRICE                            DECIMAL(11, 3) DEFAULT NULL NOT SERIALIZED
   , START_DATE                       DATE DEFAULT NULL NOT SERIALIZED
   , START_TIME                       TIME(0) DEFAULT NULL NOT SERIALIZED
   , END_TIME                         TIMESTAMP(6) DEFAULT NULL NOT SERIALIZED
   , B_YEAR                           INTERVAL YEAR(10) DEFAULT NULL NOT
       SERIALIZED
   , B_YM                             INTERVAL YEAR(5) TO MONTH DEFAULT NULL NOT
       SERIALIZED
   , B_DS                             INTERVAL DAY(10) TO SECOND(3) DEFAULT NULL
       NOT SERIALIZED
   , PRIMARY KEY (ID ASC)
   )
   SALT USING 9 PARTITIONS
   ATTRIBUTES ALIGNED FORMAT NAMESPACE 'TRAF_1500000'
   HBASE_OPTIONS
   (
     MEMSTORE_FLUSH_SIZE = '1073741824'
   )
 ;
 ```

 * This example compares the execution time of using LOAD Statement without options and
 using `LOAD WITH REBUILD INDEXES` when the CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE`
 is turned *OFF* by default. These two statements take almost the same time.

 +
 ```
 SQL>CREATE INDEX index_target_table1 ON target_table1(id);
 --- SQL operation complete.

 SQL>SET STATISTICS ON;

 SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id < 301;

 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:33:52.310
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:33:52.328
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.019
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:33:52.328
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:34:04.709
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:12.381
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:34:04.709
        Rows Processed: 300
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:34:21.629
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:16.919
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:34:21.629
        Rows Loaded:    300
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:34:22.436
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:00.808
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:34:22.436
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:34:31.116
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:08.680
 --- SQL operation complete.

 Start Time             2018/01/18 13:33:51.478782
 End Time               2018/01/18 13:34:31.549491
 Elapsed Time                      00:00:40.070709
 Compile Time                      00:00:00.510024
 Execution Time                    00:00:39.559433

 SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id > 300;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:35:01.804
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:35:01.823
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.018
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:35:01.823
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:35:13.840
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:12.017
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:35:13.840
        Rows Processed: 999700
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:35:19.720
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:05.879
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:35:19.720
        Rows Loaded:    999700
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:35:22.436
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:02.717
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:35:22.436
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:35:33.346
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:10.910
 --- SQL operation complete.

 Start Time             2018/01/18 13:35:00.624490
 End Time               2018/01/18 13:35:33.779394
 Elapsed Time                      00:00:33.154904
 Compile Time                      00:00:00.825703
 Execution Time                    00:00:32.321890

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table1);
 (EXPR)
 --------------------
              1000000
 --- 1 row(s) selected.
 ```

 +
 ```
 SQL>CREATE INDEX index_target_table2 ON target_table2(id);
 --- SQL operation complete.

 SQL>SET STATISTICS ON;

 SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id < 301;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE2
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:34:37.836
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:34:37.847
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.011
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:34:37.847
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:34:45.445
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:07.598
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:34:45.445
        Rows Processed: 300
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:35:03.576
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:18.131
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:35:03.577
        Rows Loaded:    300
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:35:04.873
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:01.296
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:35:04.873
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:35:12.461
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:07.589
 --- SQL operation complete.

 Start Time             2018/01/18 13:34:37.053647
 End Time               2018/01/18 13:35:12.893891
 Elapsed Time                      00:00:35.840244
 Compile Time                      00:00:00.435855
 Execution Time                    00:00:35.402620

 SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id > 300;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE2
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:35:25.480
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:35:25.493
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.013
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:35:25.493
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:35:38.844
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:12.591
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:35:38.845
        Rows Processed: 999700
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:35:43.491
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:05.407
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:35:43.491
        Rows Loaded:    999700
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:35:45.920
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:01.601
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:35:45.920
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:35:56.322
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:11.230
 --- SQL operation complete.

 Start Time             2018/01/18 13:35:24.693410
 End Time               2018/01/18 13:35:56.754441
 Elapsed Time                      00:00:32.061031
 Compile Time                      00:00:00.449236
 Execution Time                    00:00:31.611112

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table2);
 (EXPR)
 --------------------
              1000000

 --- 1 row(s) selected.
 ```

 * This example compares the execution time of using LOAD Statement without options and
 using `LOAD WITH REBUILD INDEXES` when the CQD `TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE`
 is turned *ON*. The former takes less time than the latter.

 +
 ```
 SQL>CQD TRAF_LOAD_ALLOW_RISKY_INDEX_MAINTENANCE 'ON';
 --- SQL operation complete.

 SQL>CREATE INDEX index_target_table1 ON target_table1(id);
 --- SQL operation complete.

 SQL>SET STATISTICS ON;

 SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id < 301;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:46:01.730
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:46:01.756
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.027
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:46:01.756
        Rows Processed: 300
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:46:22.415
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:20.659
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:46:22.415
        Rows Loaded:    300
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:46:26.353
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:03.938
 --- SQL operation complete.

 Start Time             2018/01/18 13:46:00.954518
 End Time               2018/01/18 13:46:26.795757
 Elapsed Time                      00:00:25.841239
 Compile Time                      00:00:00.455681
 Execution Time                    00:00:25.384158

 SQL>LOAD INTO target_table1 SELECT * FROM source_table WHERE id > 300;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE1
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:46:57.811
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:46:57.870
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.058
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:46:57.870
        Rows Processed: 999700
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:47:12.411
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:14.541
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:47:12.411
        Rows Loaded:    999700
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:47:16.292
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:03.881
 --- SQL operation complete.

 Start Time             2018/01/18 13:46:57.400422
 End Time               2018/01/18 13:47:16.738970
 Elapsed Time                      00:00:19.338548
 Compile Time                      00:00:00.010545
 Execution Time                    00:00:19.321781

 SQL>SET PARSERFLAGS 1;
 --- SQL operation complete.

 SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table1);
 (EXPR)
 --------------------
              1000000
 --- 1 row(s) selected.
 ```

 +
 ```
 SQL>CREATE INDEX index_target_table2 ON target_table2(id);
 --- SQL operation complete.

 SQL>SET STATISTICS ON;

 SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id < 301;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE2
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:46:28.303
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:46:28.796
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.049
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:46:28.796
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:46:38.479
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:10.400
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:46:38.479
        Rows Processed: 300
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:46:58.143
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:19.665
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:46:58.144
        Rows Loaded:    300
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:47:00.186
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:02.043
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:47:00.187
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:47:09.966
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:08.910
 --- SQL operation complete.

 Start Time             2018/01/18 13:46:27.063642
 End Time               2018/01/18 13:47:09.529257
 Elapsed Time                      00:00:42.465615
 Compile Time                      00:00:00.532541
 Execution Time                    00:00:41.928812

 SQL>LOAD WITH REBUILD INDEXES INTO target_table2 SELECT * FROM source_table WHERE id > 300;
 UTIL_OUTPUT
 ---------------------------------------------------------------------------------
 Task:  LOAD            Status: Started    Object: TRAFODION.SEABASE.TARGET_TABLE2
 Task:  CLEANUP         Status: Started    Time: 2018-01-18 13:47:18.187
 Task:  CLEANUP         Status: Ended      Time: 2018-01-18 13:47:18.198
 Task:  CLEANUP         Status: Ended      Elapsed Time:    00:00:00.011
 Task:  DISABLE INDEXE  Status: Started    Time: 2018-01-18 13:47:18.198
 Task:  DISABLE INDEXE  Status: Ended      Time: 2018-01-18 13:47:30.670
 Task:  DISABLE INDEXE  Status: Ended      Elapsed Time:    00:00:12.472
 Task:  LOADING DATA    Status: Started    Time: 2018-01-18 13:47:30.670
        Rows Processed: 999700
        Error Rows:     0
 Task:  LOADING DATA    Status: Ended      Time: 2018-01-18 13:47:39.311
 Task:  LOADING DATA    Status: Ended      Elapsed Time:    00:00:08.641
 Task:  COMPLETION      Status: Started    Time: 2018-01-18 13:47:39.311
        Rows Loaded:    999700
 Task:  COMPLETION      Status: Ended      Time: 2018-01-18 13:47:40.497
 Task:  COMPLETION      Status: Ended      Elapsed Time:    00:00:01.186
 Task:  POPULATE INDEX  Status: Started    Time: 2018-01-18 13:47:40.497
 Task:  POPULATE INDEX  Status: Ended      Time: 2018-01-18 13:47:52.367
 Task:  POPULATE INDEX  Status: Ended      Elapsed Time:    00:00:11.539
 --- SQL operation complete.

 Start Time             2018/01/18 13:47:17.447093
 End Time               2018/01/18 13:47:52.469190
 Elapsed Time                      00:00:35.022097
 Compile Time                      00:00:00.412718
 Execution Time                    00:00:34.608571

 SQL>SET PARSERFLAGES 1;
 --- SQL operation complete.

 SQL>SELECT COUNT(*) FROM TABLE(INDEX_TABLE index_target_table2);
 (EXPR)
 --------------------
              1000000
 --- 1 row(s) selected.
 ```

 [[loading_data_from_hive_table_examples]]
 ==== Examples of Loading data from Hive Table

 * For customer demographics data residing in
 `/hive/tpcds/customer_demographics`, create an external Hive table using
 the following Hive SQL:
 +
 ```
 create external table customer_demographics
 (
     cd_demo_sk int
   , cd_gender string
   , cd_marital_status string
   , cd_education_status string
   , cd_purchase_estimate int
   , cd_credit_rating string
   , cd_dep_count int
   , cd_dep_employed_count int
   , cd_dep_college_count int
 )

 row format delimited fields terminated by '|' location
 '/hive/tpcds/customer_demographics';
 ```

 * The {project-name} table where you want to load the data is defined using
 this DDL:
 +
 ```
 create table customer_demographics_salt
 (
     cd_demo_sk int not null
   , cd_gender char(1)
   , cd_marital_status char(1)
   , cd_education_status char(20)
   , cd_purchase_estimate int
   , cd_credit_rating char(10)
   , cd_dep_count int
   , cd_dep_employed_count int
   , cd_dep_college_count int
   , primary key (cd_demo_sk)
 )
 salt using 4 partitions on (cd_demo_sk);
 ```

 * This example shows how the LOAD statement loads the
 customer_demographics_salt table from the Hive table,
 `hive.hive.customer_demographics`:
 +
 ```
 >>load into customer_demographics_salt
 +>select * from hive.hive.customer_demographics where cd_demo_sk <= 5000;
 Task: LOAD Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
 Task: DISABLE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
 Task: DISABLE INDEX Status: Ended Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
 Task: PREPARATION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
        Rows Processed: 5000
 Task: PREPARATION Status: Ended ET: 00:00:03.199
 Task: COMPLETION Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
 Task: COMPLETION Status: Ended ET: 00:00:00.331
 Task: POPULATE INDEX Status: Started Object: TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS_SALT
 Task: POPULATE INDEX Status: Ended ET: 00:00:05.262
 ```

 <<<
 [[populate_index_utility]]
 == POPULATE INDEX Utility

 The POPULATE INDEX utility performs a fast INSERT of data into an index
 from the parent table. You can execute this utility in a client-based
 tool like TrafCI.

 ```
 POPULATE INDEX index ON table [index-option]

 index-option is:
     ONLINE | OFFLINE
 ```

 [[populate_index_syntax]]
 === Syntax Description of POPULATE INDEX

 * `_index_`
 +
 is an SQL identifier that specifies the simple name for the index. You
 cannot qualify _index_ with its schema name. Indexes have their own
 name space within a schema, so an index name might be the same as a table
 or constraint name. However, no two indexes in a schema can have the
 same name.

 * `_table_`
 +
 is the name of the table for which to populate the index. See
 <<database_object_names,Database Object Names>>.

 * `ONLINE`
 +
 specifies that the populate operation should be done on-line. That is,
 ONLINE allows read and write DML access on the base table while the
 populate operation occurs. Additionally, ONLINE reads the audit trail to
 replay updates to the base table during the populate phase. If a lot of
 audit is generated and you perform many CREATE INDEX operations, we
 recommend that you avoid ONLINE operations because they can add more
 contention to the audit trail. The default is ONLINE.

 * `OFFLINE`
 +
 specifies that the populate should be done off-line. OFFLINE allows only
 read DML access to the base table. The base table is unavailable for
 write operations at this time. OFFLINE must be specified explicitly.
 SELECT is allowed.

 <<<
 [[populate_index_considerations]]
 === Considerations for POPULATE INDEX

 When POPULATE INDEX is executed, the following steps occur:

 * The POPULATE INDEX operation runs in many transactions.
 * The actual data load operation is run outside of a transaction.

 If a failure occurs, the rollback is faster because it does not have to
 process a lot of audit. Also, if a failure occurs, the index remains
 empty, unaudited, and not attached to the base table (off-line).

 * When an off-line POPULATE INDEX is being executed, the base table is
 accessible for read DML operations. When an on-line POPULATE INDEX is
 being executed, the base table is accessible for read and write DML
 operations during that time period, except during the commit phase at
 the very end.
 * If the POPULATE INDEX operation fails unexpectedly, you may need to
 drop the index again and re-create and repopulate.
 * On-line POPULATE INDEX reads the audit trail to replay updates by
 allowing read/write access. If you plan to create many indexes in
 parallel or if you have a high level of activity on the audit trail, you
 should consider using the OFFLINE option.

 Errors can occur if the source base table or target index cannot be
 accessed, or if the load fails due to some resource problem or problem
 in the file system.

 [[populate_index_required_privileges]]
 ==== Required Privileges

 To perform a POPULATE INDEX operation, one of the following must be
 true:

 * You are DB ROOT.
 * You are the owner of the table.
 * You have the SELECT and INSERT (or ALL) privileges on the associated table.

 [[populate_index_examples]]
 === Examples of POPULATE INDEX

 * This example loads the specified index from the specified table:
 +
 ```
 POPULATE INDEX myindex ON myschema.mytable;
 ```

 * This example loads the specified index from the specified table, which
 uses the default schema:
 +
 ```
 POPULATE INDEX index2 ON table2;
 ```

 <<<
 [[purgedata_utility]]
 == PURGEDATA Utility

 The PURGEDATA utility performs a fast DELETE of data from a table and
 its related indexes. You can execute this utility in a client-based tool
 like TrafCI.

 ```
 PURGEDATA object
 ```

 [[purgedata_syntax]]
 === Syntax Description of PURGEDATA

 * _object_
 +
 is the name of the table from which to purge the data. See
 <<"database object names","Database Object Names">>.

 [[purgedata_considerations]]
 === Considerations for PURGEDATA

 * The _object_ can be a table name.
 * Errors are returned if _table_ cannot be accessed or if a resource or
 file-system problem causes the delete to fail.
 * PURGEDATA is not supported for volatile tables.

 [[purgedata_required_privileges]]
 ==== Required Privileges

 To perform a PURGEDATA operation, one of the following must be true:

 * You are DB ROOT.
 * You are the owner of the table.
 * You have the SELECT and DELETE (or ALL) privileges on the associated
 table.

 [[purgedata_availability]]
 ==== Availability

 PURGEDATA marks the table OFFLINE and sets the corrupt bit while
 processing. If PURGEDATA fails before it completes, the table and its
 dependent indexes will be unavailable, and you must run PURGEDATA again
 to complete the operation and remove the data. Error 8551 with an
 accompanying file system error 59 or error 1071 is returned in this
 case.

 [[purgedata_examples]]
 === Examples of PURGEDATA

 * This example purges the data in the specified table. If the table has
 indexes, their data is also purged.
 +
 ```
 PURGEDATA myschema.mytable;
 ```

 <<<
 [[unload_statement]]
 == UNLOAD Statement

 The UNLOAD statement unloads data from {project-name} tables into an HDFS
 location that you specify. Extracted data can be either compressed or
 uncompressed based on what you choose.

 UNLOAD is a {project-name} SQL extension.

 ```
 UNLOAD [WITH option[ option]...] INTO 'target-location' SELECT ... FROM source-table ...

 option is:
     DELIMITER { 'delimiter-string' | delimiter-ascii-value }
   | RECORD_SEPARATOR { 'separator-literal' | separator-ascii-value }
   | NULL_STRING 'string-literal'
   | PURGEDATA FROM TARGET
   | COMPRESSION GZIP
   | MERGE FILE merged_file-path [OVERWRITE]
   | NO OUTPUT
   | { NEW | EXISTING } SNAPSHOT HAVING SUFFIX 'string'
 ```

 [[unload_syntax]]
 === Syntax Description of UNLOAD

 * `'_target-location_'`
 +
 is the full pathname of the target HDFS folder where the extracted data
 will be written. Enclose the name of folder in single quotes. Specify
 the folder name as a full pathname and not as a relative path. You must
 have write permissions on the target HDFS folder. If you run UNLOAD in
 parallel, multiple files will be produced under the _target-location_.
 The number of files created will equal the number of ESPs.

 * `SELECT &#8230; FROM _source-table_ &#8230;`
 +
 is either a simple query or a complex one that contains GROUP BY, JOIN,
 or UNION clauses. _source-table_ is the name of a {project-name} table that
 has the source data. See <<database_object_names,Database Object Names>>.

 * `[WITH _option_[ _option_]&#8230;]`
 +
 is a set of options that you can specify for the unload operation. If
 you specify an option more than once, {project-name} returns an error with
 SQLCODE -4489. You can specify one or more of these options:

 ** `DELIMITER { '_delimiter-string_' | _delimiter-ascii-value_ }`
 +
 specifies the delimiter as either a delimiter string or an ASCII value.
 If you do not specify this option, {project-name} uses the character "|" as
 the delimiter.

 *** _delimiter-string_ can be any ASCII or Unicode string. You can also
 specify the delimiter as an ASCII value. Valid values range from 1 to 255.
 Specify the value in decimal notation; hexadecimal or octal
 notation are currently not supported. If you are using an ASCII value,
 the delimiter can be only one character wide. Do not use quotes when
 specifying an ASCII value for the delimiter.

 ** `RECORD_SEPARATOR { '_separator-literal_' | _separator-ascii-value_ }`
 +
 specifies the character that will be used to separate consecutive
 records or rows in the output file. You can specify either a literal
 or an ASCII value for the separator. The default value is a newline character.

 *** _separator-literal_ can be any ASCII or Unicode character. You can also
 specify the separator as an ASCII value. Valid values range from 1 to 255.
 Specify the value in decimal notation; hexadecimal or octal
 notation are currently not supported. If you are using an ASCII value,
 the separator can be only one character wide. Do not use quotes when
 specifying an ASCII value for the separator.

 ** `NULL_STRING '_string-literal_'`
 +
 specifies the string that will be used to indicate a NULL value. The
 default value is the empty string ''.

 ** `PURGEDATA FROM TARGET`
 +
 causes files in the target HDFS folder to be deleted before the unload
 operation.

 ** `COMPRESSION GZIP`
 +
 uses gzip compression in the extract node, writing the data to disk in
 this compressed format. GZIP is currently the only supported type of
 compression. If you do not specify this option, the extracted data will
 be uncompressed.

 ** `MERGE FILE _merged_file-path_ [OVERWRITE]`
 +
 merges the unloaded files into one single file in the specified
 _merged-file-path_. If you specify compression, the unloaded data will
 be in compressed format, and the merged file will also be in compressed
 format. If you specify the optional OVERWRITE keyword, the file is
 overwritten if it already exists; otherwise, {project-name} raises an error
 if the file already exists.

 ** `NO OUTPUT`
 +
 prevents the UNLOAD statement from displaying status messages. By
 default, the UNLOAD statement prints status messages listing the steps
 that the Bulk Unloader is executing.

 <<<
 * `{ NEW | EXISTING } SNAPSHOT HAVING SUFFIX '_string_'`
 +
 initiates an HBase snapshot scan during the unload operation. During a
 snapshot scan, the Bulk Unloader will get a list of the {project-name} tables
 from the query explain plan and will create and verify snapshots for the
 tables. Specify a suffix string, '_string_', which will be appended to
 each table name.

 [[unload_considerations]]
 === Considerations for UNLOAD

 * You must have write permissions on the target HDFS folder.
 * If a WITH option is specified more than once, {project-name} returns an
 error with SQLCODE -4489.

 [[unload_required_privileges]]
 ==== Required Privileges

 To issue an UNLOAD statement, one of the following must be true:

 * You are DB ROOT.
 * You are the owner of the target table.
 * You have the SELECT privilege on the target table.
 * You have the MANAGE_LOAD component privilege for the SQL_OPERATIONS
 component.

 [[unload_examples]]
 === Examples of UNLOAD

 * This example shows how the UNLOAD statement extracts data from a
 {project-name} table, `TRAFODION.HBASE.CUSTOMER_DEMOGRAPHICS`, into an HDFS
 folder, `/bulkload/customer_demographics`:
 +
 ```
 >>UNLOAD
 +>WITH PURGEDATA FROM TARGET
 +>MERGE FILE 'merged_customer_demogs.gz' OVERWRITE
 +>COMPRESSION GZIP
 +>INTO '/bulkload/customer_demographics'
 +>select * from trafodion.hbase.customer_demographics
 +><<+ cardinality 10e10 ,+ cardinality 10e10 >>;
 Task: UNLOAD Status: Started
 Task: EMPTY TARGET Status: Started
 Task: EMPTY TARGET Status: Ended ET: 00:00:00.014
 Task: EXTRACT Status: Started
        Rows Processed: 200000
 Task: EXTRACT Status: Ended ET: 00:00:04.743 Task: MERGE FILES Status: Started
 Task: MERGE FILES Status: Ended ET: 00:00:00.063

 --- 200000 row(s) unloaded.
 ```

 <<<
 [[update_statistics_statement]]
 == UPDATE STATISTICS Statement

 The UPDATE STATISTICS statement updates the histogram statistics for one
 or more groups of columns within a table. These statistics are used to
 devise optimized access plans.

 UPDATE STATISTICS is a {project-name} SQL extension.

 ```
 UPDATE STATISTICS FOR TABLE table [CLEAR | on-clause | sample-table-clause ]

 on-clause is:
     ON column-group-list CLEAR
   | ON column-group-list [histogram-option]...
   | ON column-group-list INCREMENTAL WHERE predicate

 column-group-list is:
     column-list [,column-list]...
   | EVERY COLUMN [,column-list]...
   | EVERY KEY [,column-list]...
   | EXISTING COLUMN[S] [,column-list]...
   | NECESSARY COLUMN[S] [,column-list]...

 column-list for a single-column group is:
     column-name
   | (column-name)
   | column-name TO column-name
   | (column-name) TO (column-name)
   | column-name TO (column-name)
   | (column-name) TO column-name

 column-list for a multicolumn group is:
     (column-name, column-name [,column-name]...)

 histogram-option is:
     GENERATE n INTERVALS
   | SAMPLE [sample-option]

 sample-option is:
     [r ROWS]
   | RANDOM percent PERCENT [PERSISTENT]
   | PERIODIC size ROWS EVERY period ROWS

 sample-table-clause is:
     CREATE SAMPLE RANDOM percent PERCENT
   | REMOVE SAMPLE
 ```

 [[update_statistics_syntax]]
 === Syntax Description of UPDATE STATISTICS

 * `_table_`
 +
 names the table for which statistics are to be updated. To refer to a
 table, use the ANSI logical name.
 See <<database_object_names,Database Object Names>>.

 * `CLEAR`
 +
 deletes some or all histograms for the table _table_. Use this option
 when new applications no longer use certain histogram statistics.
 +
 If you do not specify _column-group-list_, all histograms for _table_
 are deleted. If you specify _column-group-list_, only columns in the
 group list are deleted.

 * `ON _column-group-list_`
 +
 specifies one or more groups of columns for which to generate histogram
 statistics with the option of clearing the histogram statistics. You
 must use the ON clause to generate statistics stored in histogram
 tables.

 * `_column-list_`
 +
 specifies how _column-group-list_ can be defined. The column list
 represents both a single-column group and a multi-column group.

 ** Single-column group:

 *** `_column-name_ | (_column-name_) | _column-name_ TO _column-name_ |
 (_column-name_) TO (_column-name_)`
 +
 specifies how you can specify individual columns or a group of
 individual columns.
 +
 To generate statistics for individual columns, list each column. You can
 list each single column name within or without parentheses.

 ** Multicolumn group:

 *** `(_column-name_, _column-name_ [,_column-name_]&#8230;)`
 +
 specifies a multi-column group.
 +
 To generate multi-column statistics, group a set of columns within
 parentheses, as shown. You cannot specify the name of a column more than
 once in the same group of columns.
 +
 <<<
 +
 One histogram is generated for each unique column group. Duplicate
 groups, meaning any permutation of the same group of columns, are
 ignored and processing continues. When you run UPDATE STATISTICS again
 for the same user table, the new data for that table replaces the data
 previously generated and stored in the table’s histogram tables.
 Histograms of column groups not specified in the ON clause remain
 unchanged in histogram tables.
 +
 For more information about specifying columns, see
 <<generating_and_clearing_statistics_for_columns,Generating and Clearing Statistics for Columns>>.

 * `EVERY COLUMN`
 +
 The EVERY COLUMN keyword indicates that histogram statistics are to be
 generated for each individual column of _table_ and any multi-columns
 that make up the primary key and indexes. For example, _table_ has
 columns A, B, C, D defined, where A, B, C compose the primary key. In
 this case, the ON EVERY COLUMN option generates a single column
 histogram for columns A, B, C, D, and two multi-column histograms of (A,
 B, C) and (A, B).
 +
 The EVERY COLUMN option does what EVERY KEY does, with additional
 statistics on the individual columns.

 * `EVERY KEY`
 +
 The EVERY KEY keyword indicates that histogram statistics are to be
 generated for columns that make up the primary key and indexes. For
 example, _table_ has columns A, B, C, D defined. If the primary key
 comprises columns A, B, statistics are generated for (A, B), A and B. If
 the primary key comprises columns A, B, C, statistics are generated for
 (A,B,C), (A,B), A, B, C. If the primary key comprises columns A, B, C,
 D, statistics are generated for (A, B, C, D), (A, B, C), (A, B), and A,
 B, C, D.

 * `EXISTING COLUMN[S]`
 +
 The EXISTING COLUMN keyword indicates that all existing histograms of
 the table are to be updated. Statistics must be previously captured to
 establish existing columns.

 * `NECESSARY COLUMN[S]`
 +
 The NECESSARY COLUMN[S] keyword generates statistics for histograms that
 the optimizer has requested but do not exist. Update statistics
 automation must be enabled for NECESSARY COLUMN[S] to generate
 statistics.

 <<<
 * `_histogram-option_`

 ** `GENERATE _n_ INTERVALS`
 +
 The GENERATE _n_ INTERVALS option for UPDATE STATISTICS accepts values
 between 1 and 10,000. Keep in mind that increasing the number of
 intervals per histograms may have a negative impact on compile time.
 +
 Increasing the number of intervals can be used for columns with small
 set of possible values and large variance of the frequency of these
 values. For example, consider a column ‘CITY’ in table SALES, which
 stores the city code where the item was sold, where number of cities in
 the sales data is 1538. Setting the number of intervals to a number
 greater or equal to the number of cities (that is, setting the number of
 intervals to 1600) guarantees that the generated histogram captures the
 number of rows for each city. If the specified value n exceeds the
 number of unique values in the column, the system generates only as many
 intervals as the number of unique values.

 ** `SAMPLE [_sample-option_]`
 +
 is a clause that specifies that sampling is to be used to gather a
 subset of the data from the table. UPDATE STATISTICS stores the sample
 results and generates histograms.
 +
 If you specify the SAMPLE clause without additional options, the result
 depends on the number of rows in the table. If the table contains no
 more than 10,000 rows, the entire table will be read (no sampling). If
 the number of rows is greater than 10,000 but less than 1 million,
 10,000 rows are randomly sampled from the table. If there are more than
 1 million rows in the table, a random row sample is used to read 1
 percent of the rows in the table, with a maximum of 1 million rows
 sampled.
 +
 TIP: As a guideline, the default sample of 1 percent of the rows in the
 table, with a maximum of 1 million rows, provides good statistics for
 the optimizer to generate good plans.
 +
 If you do not specify the SAMPLE clause, if the table has fewer rows
 than specified, or if the sample size is greater than the system limit,
 {project-name} SQL reads all rows from _table_. See <<sample_clause,SAMPLE Clause>>.

 *** `_sample-option_`

 **** `r_ rows`
 +
 A row sample is used to read _r_ rows from the table. The value _r_ must
 be an integer that is greater than zero

 **** `RANDOM _percent_ PERCENT`
 +
 directs {project-name} SQL to choose rows randomly from the table. The value
 percent must be a value between zero and 100 (0 < percent &#60;= 100). In
 addition, only the first four digits to the right of the decimal point
 are significant. For example, value 0.00001 is considered to be 0.0000,
 Value 1.23456 is considered to be 1.2345.

 ***** `PERSISTENT`
 +
 directs {project-name} SQL to create a persistent sample table and store the
 random sample in it. This table can then be used later for updating statistics
 incrementally.

 **** `PERIODIC _size_ ROWS EVERY _period_ ROWS`
 +
 directs {project-name} SQL to choose the first _size_ number of rows from
 each _period_ of rows. The value _size_ must be an integer that is
 greater than zero and less than or equal to the value _period_. (0 <
 _size_ &#60;= _period_). The size of the _period_ is defined by the number
 of rows specified for _period_. The value _period_ must be an integer
 that is greater than zero (_period_ > 0).

 * `INCREMENTAL WHERE _predicate_`
 +
 directs {project-name} SQL to update statistics incrementally. That is, instead
 of taking a fresh sample of the entire table, {project-name} SQL will use a previously
 created persistent sample table. {project-name} SQL will update the persistent sample
 by replacing any rows satisfying the _predicate_ with a fresh sample of rows from
 the original table satisfying the _predicate_. The sampling rate used is the
 _percent_ specified when the persistent sample table was created. Statistics
 are then generated from this updated sample. See also
 <<update_statistics_incremental_update_statistics,
 Incremental Update Statistics>>.

 * `CREATE SAMPLE RANDOM _percent_ PERCENT`
 +
 Creates a persistent sample table associated with this table. The sample is
 created using a random sample of _percent_ percent of the rows. The table
 can then be used for later incremental statistics update.

 * `REMOVE SAMPLE`
 +
 Drops the persistent sample table associated with this table.

 [[update_statistics_considerations]]
 === Considerations for UPDATE STATISTICS

 [[update_statistics_using_statistics]]
 ==== Using Statistics

 Use UPDATE STATISTICS to collect and save statistics on columns. The SQL
 compiler uses histogram statistics to determine the selectivity of
 predicates, indexes, and tables. Because selectivity directly influences
 the cost of access plans, regular collection of statistics increases the
 likelihood that {project-name} SQL chooses efficient access plans.

 While UPDATE STATISTICS is running on a table, the table is active and
 available for query access.

 When a user table is changed, either by changing its data significantly
 or its definition, re-execute the UPDATE STATISTICS statement for the
 table.

 <<<
 [[update_statistics_histogram_statistics]]
 ==== Histogram Statistics

 Histogram statistics are used by the compiler to produce the best plan
 for a given SQL query. When histograms are not available, default
 assumptions are made by the compiler and the resultant plan might not
 perform well. Histograms that reflect the latest data in a table are
 optimal.

 The compiler does not need histogram statistics for every column of a
 table. For example, if a column is only in the select list, its
 histogram statistics will be irrelevant. A histogram statistic is useful
 when a column appears in:

 * A predicate
 * A GROUP BY column
 * An ORDER BY clause
 * A HAVING clause
 * Or similar clause

 In addition to single-column histogram statistics, the compiler needs
 multi-column histogram statistics, such as when group by column-5,
 column-3, column-19 appears in a query. Then, histogram statistics for
 the combination (column-5, column-3, column-19) are needed.

 [[update_statistics_required-privileges]]
 ==== Required Privileges

 To perform an UPDATE STATISTICS operation, one of the following must be
 true:

 * You are DB ROOT.
 * You are the owner of the target table.
 * You have the MANAGE_STATISTICS component privilege for the
 SQL_OPERATIONS component.

 [[update_statistics_locking]]
 ==== Locking

 UPDATE STATISTICS momentarily locks the definition of the user table
 during the operation but not the user table itself. The UPDATE
 STATISTICS statement uses READ UNCOMMITTED isolation level for the user
 table.

 <<<
 [[update_statistics_transactions]]
 ==== Transactions

 Do not start a transaction before executing UPDATE STATISTICS. UPDATE
 STATISTICS runs multiple transactions of its own, as needed. Starting
 your own transaction in which UPDATE STATISTICS runs could cause the
 transaction auto abort time to be exceeded during processing.

 [[update_statistics_generating_and_clearing_statistics_for_columns]]
 ==== Generating and Clearing Statistics for Columns

 To generate statistics for particular columns, name each column, or name
 the first and last columns of a sequence of columns in the table. For
 example, suppose that a table has consecutive columns CITY, STATE, ZIP.
 This list gives a few examples of possible options you can specify:

 [cols="25%,37%,37%",options="header"]
 |===
 | Single-Column Group   | Single-Column Group Within Parentheses | Multicolumn Group
 | ON CITY, STATE, ZIP   | ON (CITY),(STATE),(ZIP)                | ON (CITY, STATE) or ON (CITY,STATE,ZIP)
 | ON CITY TO ZIP        | ON (CITY) TO (ZIP)                     |
 | ON ZIP TO CITY        | ON (ZIP) TO (CITY)                     |
 | ON CITY, STATE TO ZIP | ON (CITY), (STATE) TO (ZIP)            |
 | ON CITY TO STATE, ZIP | ON (CITY) TO (STATE), (ZIP)            |
 |===

 The TO specification is useful when a table has many columns, and you
 want histograms on a subset of columns. Do not confuse (CITY) TO (ZIP)
 with (CITY, STATE, ZIP), which refers to a multi-column histogram.

 You can clear statistics in any combination of columns you specify, not
 necessarily with the _column-group-list_ you used to create statistics.
 However, those statistics will remain until you clear them.

 <<<
 [[update_statistics_column_lists_and_access_plans]]
 ==== Column Lists and Access Plans

 Generate statistics for columns most often used in data access plans for
 a table—that is, the primary key, indexes defined on the table, and any
 other columns frequently referenced in predicates in WHERE or GROUP BY
 clauses of queries issued on the table. Use the EVERY COLUMN option to
 generate histograms for every individual column or multi columns that
 make up the primary key and indexes.

 The EVERY KEY option generates histograms that make up the primary key
 and indexes.

 If you often perform a GROUP BY over specific columns in a table, use
 multi-column lists in the UPDATE STATISTICS statement (consisting of the
 columns in the GROUP BY clause) to generate histogram statistics that
 enable the optimizer to choose a better plan. Similarly, when a query
 joins two tables by two or more columns, multi-column lists (consisting
 of the columns being joined) help the optimizer choose a better plan.

 [[update_statistics_incremental_update_statistics]]
 ==== Incremental Update Statistics

 UPDATE STATISTICS processing time can be lengthy for very large tables.
 One strategy for reducing the time is to create histograms only for
 columns that actually need them (for example, using the ON NECESSARY COLUMNS
 column group). Another strategy is to update statistics incrementally. These
 strategies can be used together if desired.

 To use the incremental update statistics feature, you must first create
 statistics for the table and create a persistent sample table. One way to
 do this is to perform a normal update statistics command, adding the
 PERSISTENT keyword to the _sample-option_. Another way to do this if you
 already have reasonably up-to-date statistics for the table, is to create
 a persistent sample table separately using the CREATE SAMPLE option.

 You can then perform update statistics incrementally by using the INCREMENTAL
 WHERE _predicate_ syntax in the on-clause. The _predicate_ should be chosen
 to describe the set of rows that have changed since the last statistics update
 was performed. For example, if your table contains a column with a timestamp
 giving the date and time of last update, this is a particularly useful column
 to use in the _predicate_.

 If you decide later that you wish to change the _percent_ sampling rate used
 for the persistent sample table, you can do so by dropping the persistent
 sample table (using REMOVE SAMPLE) and creating a new one (by using the
 CREATE SAMPLE option). Using a higher _percent_ results in more accurate
 histograms, but at the price of a longer-running operation.

 <<<
 [[update_statistics_examples]]
 === Examples of UPDATE STATISTICS

 * This example generates four histograms for the columns jobcode,
 empnum, deptnum, and (empnum, deptnum) for the table EMPLOYEE. Depending
 on the table’s size and data distribution, each histogram should contain
 ten intervals.
 +
 ```
 UPDATE STATISTICS FOR TABLE employee
 ON (jobcode),(empnum, deptnum) GENERATE 10 INTERVALS;

 --- SQL operation complete.
 ```

 * This example generates histogram statistics using the ON EVERY COLUMN
 option for the table DEPT. This statement performs a full scan, and
 {project-name} SQL determines the default number of intervals.
 +
 ```
 UPDATE STATISTICS FOR TABLE dept ON EVERY COLUMN;

 --- SQL operation complete.
 ```

 * Suppose that a construction company has an ADDRESS table of potential
 sites and a DEMOLITION_SITES table that contains some of the columns of
 the ADDRESS table. The primary key is ZIP. Join these two tables on two
 of the columns in common:
 +
 ```
 SELECT COUNT(AD.number), AD.street,
        AD.city, AD.zip, AD.state
 FROM address AD, demolition_sites DS
 WHERE AD.zip = DS.zip AND AD.type = DS.type
 GROUP BY AD.street, AD.city, AD.zip, AD.state;
 ```
 +
 To generate statistics specific to this query, enter these statements:
 +
 ```
 UPDATE STATISTICS FOR TABLE address
 ON (street), (city), (state), (zip, type);

 UPDATE STATISTICS FOR TABLE demolition_sites ON (zip, type);
 ```

 * This example removes all histograms for table DEMOLITION_SITES:
 +
 ```
 UPDATE STATISTICS FOR TABLE demolition_sites CLEAR;
 ```

 <<<
 * This example selectively removes the histogram for column STREET in
 table ADDRESS:
 +
 ```
 UPDATE STATISTICS FOR TABLE address ON street CLEAR;
 ```

 * This example generates statistics but also creates a persistent
 sample table for use when updating statistics incrementally:
 +
 ```
 UPDATE STATISTICS FOR TABLE address
 ON (street), (city), (state), (zip, type)
 SAMPLE RANDOM 5 PERCENT PERSISTENT;
 ```

 * This example updates statistics incrementally. It assumes that
 a persistent sample table has already been created. The predicate
 in the WHERE clause describes the set of rows that have changed
 since statistics were last updated. Here we assume that rows
 with a state of California are the only rows that have changed:
 +
 ```
 UPDATE STATISTICS FOR TABLE address
 ON EXISTING COLUMNS
 INCREMENTAL WHERE state = 'CA';
 ```