blob: 4434227c74eb4c7e39139dd3deedc59d9e05342c [file] [log] [blame]
////
/**
* @@@ START COPYRIGHT @@@
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*
* @@@ END COPYRIGHT @@@
*/
////
[[introduction]]
= Introduction
The {project-name} SQL database software allows you to use SQL statements, which comply closely to
ANSI SQL:1999, to access data in {project-name} SQL tables, which map to HBase tables, and to access
native HBase tables and Hive tables.
This introduction describes:
* <<sql_language,SQL Language>>
* <<using_trafodion_sql_to_access_hbase_tables,Using {project-name} SQL to Access HBase Tables>>
* <<using_trafodion_sql_to_access_hive_tables,Using {project-name} SQL to Access Hive Tables>>
* <<data_consistency_and_access_options,Data Consistency and Access Options>>
* <<transaction_management,Transaction Management>>
* <<ansi_compliance_and_trafodion_sql_extensions,ANSI Compliance and {project-name} SQL Extensions>>
* <<trafodion_sql_error_messages,{project-name} SQL Error Messages>>
Other sections of this manual describe the syntax and semantics of individual statements, commands, and language elements.
[[sql_language]]
== SQL Language
The SQL language consists of statements and other language elements that you can use to access SQL
databases. For descriptions of individual SQL statements, see <<"sql_statements","SQL Statements">>.
SQL language elements are part of statements and commands and include data types, expressions, functions,
identifiers, literals, and predicates. For more information, see:
* <<sql_language,SQL Language>>
* <<elements,Elements>>
* <<sql_clauses,SQL Clauses>>
For information on specific functions and expressions, see:
* <<sql_functions_and_expressions,SQL Functions and Expressions>>
* <<olap_functions,OLAP Functions>>
<<<
[[using_trafodion_sql_to_access_hbase_tables]]
== Using {project-name} SQL to Access HBase Tables
You can use {project-name} SQL statements to read, update, and create HBase tables.
* <<initializing_the_trafodion_metadata,Initializing the Trafodion Metadata>>
* <<ways_to_access_hbase_tables,Ways to Access HBase Tables>>
* <<trafodion_sql_tables_versus_native_hbase_tables,{project-name} SQL Tables Versus Native HBase Tables>>
* <<supported_sql_statements_with_hbase_tables,Supported SQL Statements With HBase Tables>>
For a list of Control Query Default (CQD) settings for the HBase environment, see the
{docs-url}/cqd_reference/index.hmtl[{project-name} Control Query Default (CQD) Reference Guide].
[[initializing_the_trafodion_metadata]]
=== Initializing the Trafodion Metadata
Before using SQL statements for the first time to access HBase tables, you will need to initialize the Trafodion metadata.
To initialize the Trafodion metadata, run this command:
```
initialize trafodion;
```
[[ways_to_access_hbase_tables]]
=== Ways to Access HBase Tables
{project-name} SQL supports these ways to access HBase tables:
* <<accessing_trafodion_sql_tables,Accessing {project-name} SQL Tables>>
* <<cell_per_row_access_to_hbase_tables,Cell-Per-Row Access to HBase Tables (Technology Preview)>>
* <<rowwise_access_to_hbase_tables,Rowwise Access to HBase Tables (Technology Preview)>>
<<<
[[accessing_trafodion_sql_tables]]
==== Accessing {project-name} SQL Tables
A {project-name} SQL table is a relational SQL table generated by a `CREATE TABLE` statement and mapped
to an HBase table. {project-name} SQL tables have regular ANSI names in the catalog `TRAFODION`.
A {project-name} SQL table name can be a fully qualified ANSI name of the form
`TRAFODION._schema-name.object-name_`.
To access a {project-name} SQL table, specify its ANSI table name in a {project-name} SQL statement, similar
to how you would specify an ANSI table name when running SQL statements in a relational database.
*Example*
```
CREATE TABLE trafodion.sales.odetail
( ordernum NUMERIC (6) UNSIGNED NO DEFAULT NOT NULL
, partnum NUMERIC (4) UNSIGNED NO DEFAULT NOT NULL
, unit_price NUMERIC (8,2) NO DEFAULT NOT NULL
, qty_ordered NUMERIC (5) UNSIGNED NO DEFAULT NOT NULL
, PRIMARY KEY (ordernum, partnum)
);
INSERT INTO trafodion.sales.odetail VALUES ( 900000, 7301, 425.00, 100 );
SET SCHEMA trafodion.sales;
SELECT * FROM odetail;
```
For more information about {project-name} SQL tables, see
<<trafodion_sql_tables_versus_native_hbase_tables,{project-name} SQL Tables Versus Native HBase Tables>>.
<<<
[[cell_per_row_access_to_hbase_tables]]
==== Cell-Per-Row Access to HBase Tables (Technology Preview)
NOTE: This is a _Technology Preview (Complete But Not Tested)_ feature, meaning that it is functionally
complete but has not been tested or debugged.
To access HBase data using cell-per-row mode, specify the schema `HBASE."_CELL_"` and the full ANSI
name of the table as a delimited table name. You can specify the name of any HBase table, regardless of whether
it was created through {project-name} SQL.
*Example*
```
select * from hbase."_CELL_"."TRAFODION.MYSCH.MYTAB";
select * from hbase."_CELL_"."table_created_in_HBase";
```
All tables accessed through this schema have the same column layout:
```
>>invoke hbase."_CELL_"."table_created_in_HBase";
(
ROW_ID VARCHAR(100) ...
, COL_FAMILY VARCHAR(100) ...
, COL_NAME VARCHAR(100) ...
, COL_TIMESTAMP LARGEINT ...
, COL_VALUE VARCHAR(1000) ...
)
PRIMARY KEY (ROW_ID)
>>select * from hbase."_CELL_"."mytab";
```
<<<
[[rowwise_access_to_hbase_tables]]
==== Rowwise Access to HBase Tables (Technology Preview)
NOTE: This is a _Technology Preview (Complete But Not Tested)_ feature, meaning that it is functionally
complete but has not been tested or debugged.
To access HBase data using rowwise mode, specify the schema `HBASE."_ROW_"` and the full ANSI name of the
table as a delimited table name. You can specify the name of any HBase table, regardless of whether
it was created through {project-name} SQL.
*Example*
```
select * from hbase."_ROW_"."TRAFODION.MYSCH.MYTAB";
select * from hbase."_ROW_"."table_created_in_HBase";
```
All column values of the row are returned as a single, big varchar:
```
>>invoke hbase."_ROW_"."mytab";
(
ROW_ID VARCHAR(100) ...
, COLUMN_DETAILS VARCHAR(10000) ...
)
PRIMARY KEY (ROW_ID)
>>select * from hbase."_ROW_"."mytab";
```
<<<
[[trafodion_sql_tables_versus_native_hbase_tables]]
=== {project-name} SQL Tables Versus Native HBase Tables
{project-name} SQL tables have many advantages over regular HBase tables:
* They can be made to look like regular, structured SQL tables with fixed columns.
* They support the usual SQL data types supported in relational databases.
* They support compound keys, unlike HBase tables that have a single row key (a string).
* They support indexes.
* They support _salting_, which is a technique of adding a hash value of the row key as a
key prefix to avoid hot spots for sequential keys. For the syntax,
see the <<create_table_statement,CREATE TABLE Statement>>.
The problem with {project-name} SQL tables is that they use a fixed format to represent column values,
making it harder for native HBase applications to access them. Also, they have a fixed structure,
so users lose the flexibility of dynamic columns that comes with HBase.
[[supported_sql_statements_with_hbase_tables]]
=== Supported SQL Statements With HBase Tables
You can use these SQL statements with HBase tables:
|===
| <<select_statement,SELECT Statement>> | <<insert_statement,INSERT Statement>>
| <<update_statement,UPDATE Statement>> | <<delete_statement,DELETE Statement>>
| <<merge_statement,MERGE Statement>> | <<get_statement,GET Statement>>
| <<invoke_statement,INVOKE Statement>> | <<alter_table_statement,ALTER TABLE Statement>>
| <<create_index_statement,CREATE INDEX Statement>> | <<create_table_statement,CREATE TABLE Statement>>
| <<create_view_statement,CREATE VIEW Statement>> | <<drop_index_statement,DROP INDEX Statement>>
| <<drop_table_statement,DROP TABLE Statement>> | <<drop_view_statement,DROP VIEW Statement>>
| <<grant_statement,GRANT Statement>> | <<revoke_statement,REVOKE Statement>>
|===
<<<
[[using_trafodion_sql_to_access_hive_tables]]
== Using {project-name} SQL to Access Hive Tables
You can use {project-name} SQL statements to access Hive tables.
* <<ansi_names_for_hive_tables,ANSI Names for Hive Tables>>
* <<type_mapping_from_hive_to_trafodion_sql,Type Mapping From Hive to {project-name} SQL>>
* <<supported_sql_statements_with_hive_tables,Supported SQL Statements With Hive Tables>>
For a list of Control Query Default (CQD) settings for the Hive environment, see the
{docs-url}/cqd_reference/index.hmtl[{project-name} Control Query Default (CQD) Reference Guide].
[[ansi_names_for_hive_tables]]
=== ANSI Names for Hive Tables
Hive tables appear in the {project-name} Hive ANSI name space in a special catalog and schema named `HIVE.HIVE`.
To select from a Hive table named `T`, specify an implicit or explicit name, such as `HIVE.HIVE.T`,
in a {project-name} SQL statement.
*Example*
This example should work if a Hive table named `T` has already been defined:
```
set schema hive.hive;
CQD HIVE_MAX_STRING_LENGTH '20'; -- creates a more readable display
select * from t; -- implicit table name
set schema trafodion.seabase;
select * from hive.hive.t; -- explicit table name
```
<<<
[[type_mapping_from_hive_to_trafodion_sql]]
=== Type Mapping From Hive to {project-name} SQL
{project-name} performs the following data-type mappings:
[cols="2*",options="header"]
|===
| Hive Type | {project-name} SQL Type
| `tinyint` | `tinyint`
| `tinyint` | `smallint`
| `smallint` | `smallint`
| `int` | `int`
| `bigint` | `largeint`
| `string` | `varchar(_n_ bytes) character set utf8`^1^
| `varchar` | `varchar`
| `char` | `char`
| `float` | `real`
| `decimal (precision, scale)` | `decimal (precision, scale)`^2^ +
`numeric (precision, scale)`^3^
| `double` | `float(54)`
| `timestamp` | `timestamp(6)`^4^
| `date` | `date`
|===
1. The value `_n_` is determined by `CQD HIVE_MAX_STRING_LENGTH`. See the
{docs-url}/cqd_reference/index.hmtl[{project-name} Control Query Default (CQD) Reference Guide].
2. If p is less than or equal to 18, decimal (precision, scale) is mapped to decimal (precision, scale).
3. If p is greater than 18, decimal (precision, scale) is mapped to numeric (precision, scale).
4. Hive supports timestamps with nanosecond resolution (precision of 9). {project-name} SQL supports only microsecond resolution (precision 6).
[[supported_sql_statements_with_hive_tables]]
=== Supported SQL Statements With Hive Tables
You can use these SQL statements with Hive tables:
* <<select_statement,SELECT Statement>>
* <<load_statement,LOAD Statement>>
* GET TABLES (See the <<get_statement,GET Statement>>.)
* <<invoke_statement,INVOKE Statement>>
<<<
[[data_consistency_and_access_options]]
== Data Consistency and Access Options
Access options for DML statements affect the consistency of the data that your query accesses.
For any DML statement, you specify access options by using the `FOR _option_ ACCESS` clause and,
for a `SELECT` statement, by using this same clause, you can also specify access options for individual
tables and views referenced in the FROM clause.
The possible settings for `_option_` in a DML statement are:
* <<read_committed,READ COMMITTED>>
Specifies that the data accessed by the DML statement must be from committed rows.
The SQL default access option for DML statements is `READ COMMITTED`.
For related information about transactions, see
<<transaction_isolation_levels,Transaction Isolation Levels>>.
[[read_committed]]
=== READ COMMITTED
This option allows you to access only committed data.
The implementation requires that a lock can be acquired on the data requested by the DML statement—but
does not actually lock the data, thereby reducing lock request conflicts. If a lock cannot be granted
(implying that the row contains uncommitted data), the DML statement request waits until the lock in
place is released.
READ COMMITTED provides the next higher level of data consistency (compared to READ UNCOMMITTED).
A statement executing with this access option does not allow dirty reads, but both non-repeatable reads
and phantoms are possible.
READ COMMITTED provides sufficient consistency for any process that does not require a repeatable read
capability.
READ COMMITTED is the default isolation level.
<<<
[[transaction_management]]
== Transaction Management
A transaction (a set of database changes that must be completed as a group) is the basic recoverable unit
in case of a failure or transaction interruption. Transactions are controlled through client tools that
interact with the database using ODBC or JDBC.
The typical order of events is:
1. Transaction is started.
2. Database changes are made.
3. Transaction is committed.
If, however, the changes cannot be made or if you do not want to complete the transaction, then you can abort
the transaction so that the database is rolled back to its original state.
This subsection discusses these considerations for transaction management:
* <<user_defined_and_system_defined_transactions,User-Defined and System-Defined Transactions>>
* <<rules_for_dml_statements,Rules for DML Statements>>
* <<effect_of_autocommit_option,Effect of AUTOCOMMIT Option>>
* <<concurrency,Concurrency>>
* <<transaction_isolation_levels,Transaction Isolation Levels>>
[[user_defined_and_system_defined_transactions]]
=== User-Defined and System-Defined Transactions
Transactions you define are called _user-defined transactions_. To be sure that a sequence of statements executes
successfully or not at all, you can define one transaction consisting of these statements by using the BEGIN WORK
statement and COMMIT WORK statement. You can abort a transaction by using the ROLLBACK WORK statement.
If AUTOCOMMIT is on, then you do not have to end the transaction explicitly as {project-name} SQL will end the transaction
automatically. Sometimes an error occurs that requires the user-defined transaction to be aborted. {project-name} SQL
will automatically abort the transaction and return an error indicating that the transaction was rolled back.
<<<
[[system_defined_transactions]]
==== System-Defined Transactions
In some cases, {project-name} SQL defines transactions for you. These transactions are called _system-defined transactions_.
Most DML statements initiate transactions implicitly at the start of execution.
See <<implicit_transactions,Implicit Transactions>>.
However, even if a transaction is initiated implicitly, you must end a transaction explicitly with the COMMIT WORK
statement or the ROLLBACK WORK statement. If AUTOCOMMIT is on, you do not need to end a transaction explicitly.
[[rules_for_dml_statements]]
=== Rules for DML Statements
If deadlock occurs, the DML statement times out and receives an error.
[[effect_of_autocommit_option]]
=== Effect of AUTOCOMMIT Option
AUTOCOMMIT is an option that can be set in a SET TRANSACTION statement. It specifies whether {project-name} SQL will commit
automatically, or roll back if an error occurs, at the end of statement execution. This option applies to any statement
for which the system initiates a transaction. See <<set_transaction_statement,SET TRANSACTION Statement>>.
If this option is set to ON, {project-name} SQL automatically commits any changes, or rolls back any changes, made to the
database at the end of statement execution.
[[concurrency]]
=== Concurrency
Concurrency is defined by two or more processes accessing the same data at the same time. The degree of concurrency
available &#8212; whether a process that requests access to data that is already being accessed is given access or placed
in a wait queue &#8212; depends on the purpose of the access mode (read or update) and the isolation level. Currently, the only
isolation level is READ COMMITTED.
{project-name} SQL provides concurrent database access for most operations and controls database access through concurrency
control and the mechanism for opening and closing tables. For DML operations, the access option affects the degree of
concurrency. See <<data_consistency_and_access_options,Data Consistency and Access Options>>.
<<<
[[transaction_isolation_levels]]
=== Transaction Isolation Levels
A transaction has an isolation level that is <<read_committed,READ COMMITTED>>.
[[read_committed]]
==== READ COMMITTED
This option, which is ANSI compliant, allows your transaction to access only committed data. No row locks are acquired
when READ COMMITTED is the specified isolation level.
READ COMMITTED provides the next level of data consistency. A transaction executing with this isolation level does not
allow dirty reads, but both non-repeatable reads and phantoms are possible.
READ COMMITTED provides sufficient consistency for any transaction that does not require a repeatable-read capability.
The default isolation level is READ COMMITTED.
<<<
[[ansi_compliance_and_trafodion_sql_extensions]]
== ANSI Compliance and {project-name} SQL Extensions
{project-name} SQL complies most closely with Core SQL 99. {project-name} SQL also includes some features from SQL 99 and part of
the SQL 2003 standard, and special {project-name} SQL extensions to the SQL language.
Statements and SQL elements in this manual are ANSI compliant unless specified as {project-name} SQL extensions.
[[ansi_compliant_statements]]
=== ANSI-Compliant Statements
These statements are ANSI compliant, but some might contain {project-name} SQL extensions:
|===
| <<alter_table_statement,ALTER TABLE Statement>> | <<call_statement,CALL Statement>>
| <<commit_work_statement,COMMIT WORK Statement>> | <<create_function_statement,CREATE FUNCTION Statement>>
| <<create_procedure_statement,CREATE PROCEDURE Statement>> | <<create_role_statement,CREATE ROLE Statement>>
| <<create_schema_statement,CREATE SCHEMA Statement>> | <<create_table_statement,CREATE TABLE Statement>>
| <<create_view_statement,CREATE VIEW Statement>> | <<delete_statement,DELETE Statement>>
| <<drop_function_statement,DROP FUNCTION Statement>> | <<drop_procedure_statement,DROP PROCEDURE Statement>>
| <<drop_role_statement,DROP ROLE Statement>> | <<drop_schema_statement,DROP SCHEMA Statement>>
| <<drop_table_statement,DROP TABLE Statement>> | <<drop_view_statement,DROP VIEW Statement>>
| <<execute_statement,EXECUTE Statement>> | <<grant_statement,GRANT Statement>>
| <<grant_role_statement,GRANT ROLE Statement>> | <<insert_statement,INSERT Statement>>
| <<merge_statement,MERGE Statement>> | <<prepare_statement,PREPARE Statement>>
| <<revoke_statement,REVOKE Statement>> | <<revoke_role_statement,REVOKE ROLE Statement>>
| <<rollback_work_statement,ROLLBACK WORK Statement>> | <<select_statement,SELECT Statement>>
| <<set_schema_statement,SET SCHEMA Statement>> | <<set_transaction_statement,SET TRANSACTION Statement>>
| <<table_statement,TABLE Statement>> | <<update_statement,UPDATE Statement>>
| <<values_statement,VALUES Statement>>
|===
<<<
[[statements_that_are_trafodion_sql_extensions]]
=== Statements That Are {project-name} SQL Extensions
These statements are {project-name} SQL extensions to the ANSI standard.
|===
| <<alter_library_statement,ALTER LIBRARY Statement>> | <<alter_user_statement,ALTER USER Statement>>
| <<begin_work_statement,BEGIN WORK Statement>> | <<control_query_cancel_statement,CONTROL QUERY CANCEL Statement>>
| <<control_query_default_statement,CONTROL QUERY DEFAULT Statement>> | <<create_index_statement,CREATE INDEX Statement>>
| <<create_library_statement,CREATE LIBRARY Statement>> | <<drop_index_statement,DROP INDEX Statement>>
| <<drop_library_statement,DROP LIBRARY Statement>> | <<explain_statement,EXPLAIN Statement>>
| <<get_statement,GET Statement>> | <<get_hbase_objects_statement,GET HBASE OBJECTS Statement>>
| <<get_version_of_metadata_statement,GET VERSION OF METADATA Statement>> | <<get_version_of_software_statement,GET VERSION OF SOFTWARE Statement>>
| <<grant_component_privilege_statement,GRANT COMPONENT PRIVILEGE Statement>> | <<invoke_statement,INVOKE Statement>>
| <<load_statement,LOAD Statement>> | <<register_user_statement,REGISTER USER Statement>>
| <<revoke_component_privilege_statement,REVOKE COMPONENT PRIVILEGE Statement>> | <<showcontrol_statement,SHOWCONTROL Statement>>
| <<showddl_statement,SHOWDDL Statement>> | <<showddl_schema_statement,SHOWDDL SCHEMA Statement>>
| <<showstats_statement,SHOWSTATS Statement>> | <<unload_statement,UNLOAD Statement>>
| <<unregister_user_statement,UNREGISTER USER Statement>> | <<update_statistics_statement,UPDATE STATISTICS Statement>>
| <<upsert_statement,UPSERT Statement>>
|===
<<<
[[ansi_compliant_functions]]
=== ANSI-Compliant Functions
These functions are ANSI compliant, but some might contain {project-name} SQL extensions:
|===
| <<avg,AVG function>> | <<case, CASE expression>>
| <<cast,CAST expression>> | <<char_length,CHAR_LENGTH>>
| <<coalesce,COALESCE>> | <<count,COUNT Function>>
| <<current,CURRENT>> | <<current_date,CURRENT_DATE>>
| <<current_time,CURRENT_TIME>> | <<current_timestamp,CURRENT_TIMESTAMP>>
| <<current_user,CURRENT_USER>> | <<extract,EXTRACT>>
| <<lower,LOWER>> | <<max,MAX>>
| <<min,MIN>> | <<nullif,NULLIF>>
| <<octet_length,OCTET_LENGTH>> | <<position,POSITION>>
| <<session_user,SESSION_USER>> | <<substring,SUBSTRING>>
| <<sum,SUM>> | <<trim,TRIM>>
| <<upper,UPPER>>
|===
All other functions are {project-name} SQL extensions.
== {project-name} SQL Error Messages
{project-name} SQL reports error messages and exception conditions. When an error condition occurs,
{project-name} SQL returns a message number and a brief description of the condition.
*Example*
{project-name} SQL might display this error message:
```
*** ERROR[1000] A syntax error occurred.
```
The message number is the SQLCODE value (without the sign). In this example, the SQLCODE value is `1000`.