docs/src/site/sphinx/Connectors.rst - sqoop - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements.  See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License.  You may obtain a copy of the License at

        http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.


 ==================
 Sqoop 2 Connectors
 ==================

 This document describes how to use the built-in connectors. This includes a detailed description of how connectors partition, format their output, extract data, and load data.

 .. contents::
    :depth: 3

 ++++++++++++++++++++++
 Generic JDBC Connector
 ++++++++++++++++++++++

 The Generic JDBC Connector can connect to any data source that adheres to the **JDBC 4** specification.

 -----
 Usage
 -----

 To use the Generic JDBC Connector, create a link for the connector and a job that uses the link.

 **Link Configuration**
 ++++++++++++++++++++++

 Inputs associated with the link configuration include:

 +-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
 | Input                       | Type    | Description                                                           | Example                                  |
 +=============================+=========+=======================================================================+==========================================+
 | JDBC Driver Class           | String  | The full class name of the JDBC driver.                               | com.mysql.jdbc.Driver                    |
 |                             |         | *Required* and accessible by the Sqoop server.                        |                                          |
 +-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
 | JDBC Connection String      | String  | The JDBC connection string to use when connecting to the data source. | jdbc:mysql://localhost/test              |
 |                             |         | *Required*. Connectivity upon creation is optional.                   |                                          |
 +-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
 | Username                    | String  | The username to provide when connecting to the data source.           | sqoop                                    |
 |                             |         | *Optional*. Connectivity upon creation is optional.                   |                                          |
 +-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
 | Password                    | String  | The password to provide when connecting to the data source.           | sqoop                                    |
 |                             |         | *Optional*. Connectivity upon creation is optional.                   |                                          |
 +-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
 | JDBC Connection Properties  | Map     | A map of JDBC connection properties to pass to the JDBC driver        | profileSQL=true&useFastDateParsing=false |
 |                             |         | *Optional*.                                                           |                                          |
 +-----------------------------+---------------------------------------------------------------------------------+------------------------------------------+

 **FROM Job Configuration**
 ++++++++++++++++++++++++++

 Inputs associated with the Job configuration for the FROM direction include:

 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Input                       | Type    | Description                                                             | Example                                     |
 +=============================+=========+=========================================================================+=============================================+
 | Schema name                 | String  | The schema name the table is part of.                                   | sqoop                                       |
 |                             |         | *Optional*                                                              |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Table name                  | String  | The table name to import data from.                                     | test                                        |
 |                             |         | *Optional*. See note below.                                             |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Table SQL statement         | String  | The SQL statement used to perform a **free form query**.                | ``SELECT COUNT(*) FROM test ${CONDITIONS}`` |
 |                             |         | *Optional*. See notes below.                                            |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Table column names          | String  | Columns to extract from the JDBC data source.                           | col1,col2                                   |
 |                             |         | *Optional* Comma separated list of columns.                             |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Partition column name       | Map     | The column name used to partition the data transfer process.            | col1                                        |
 |                             |         | *Optional*.  Defaults to primary key of table.                          |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Null value allowed for      | Boolean | True or false depending on whether NULL values are allowed in data      | true                                        |
 | the partition column        |         | of the Partition column. *Optional*.                                    |                                             |
 +-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
 | Boundary query              | String  | The query used to define an upper and lower boundary when partitioning. |                                             |
 |                             |         | *Optional*.                                                             |                                             |
 +-----------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+

 **Notes**
 =========

 1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided.
 2. *Table column names* should be provided only if *Table name* is provided.
 3. If there are columns with similar names, column aliases are required. For example: ``SELECT table1.id as "i", table2.id as "j" FROM table1 INNER JOIN table2 ON table1.id = table2.id``.

 **TO Job Configuration**
 ++++++++++++++++++++++++

 Inputs associated with the Job configuration for the TO direction include:

 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Input                       | Type    | Description                                                             | Example                                         |
 +=============================+=========+=========================================================================+=================================================+
 | Schema name                 | String  | The schema name the table is part of.                                   | sqoop                                           |
 |                             |         | *Optional*                                                              |                                                 |
 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Table name                  | String  | The table name to import data from.                                     | test                                            |
 |                             |         | *Optional*. See note below.                                             |                                                 |
 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Table SQL statement         | String  | The SQL statement used to perform a **free form query**.                | ``INSERT INTO test (col1, col2) VALUES (?, ?)`` |
 |                             |         | *Optional*. See note below.                                             |                                                 |
 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Table column names          | String  | Columns to insert into the JDBC data source.                            | col1,col2                                       |
 |                             |         | *Optional* Comma separated list of columns.                             |                                                 |
 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Stage table name            | String  | The name of the table used as a *staging table*.                        | staging                                         |
 |                             |         | *Optional*.                                                             |                                                 |
 +-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
 | Should clear stage table    | Boolean | True or false depending on whether the staging table should be cleared  | true                                            |
 |                             |         | after the data transfer has finished. *Optional*.                       |                                                 |
 +-----------------------------+-----------------------------------------------------------------------------------+-------------------------------------------------+

 **Notes**
 =========

 1. *Table name* and *Table SQL statement* are mutually exclusive. If *Table name* is provided, the *Table SQL statement* should not be provided. If *Table SQL statement* is provided then *Table name* should not be provided.
 2. *Table column names* should be provided only if *Table name* is provided.

 -----------
 Partitioner
 -----------

 The Generic JDBC Connector partitioner generates conditions to be used by the extractor.
 It varies in how it partitions data transfer based on the partition column data type.
 Though, each strategy roughly takes on the following form:
 ::

   (upper boundary - lower boundary) / (max partitions)

 By default, the *primary key* will be used to partition the data unless otherwise specified.

 The following data types are currently supported:

 1. TINYINT
 2. SMALLINT
 3. INTEGER
 4. BIGINT
 5. REAL
 6. FLOAT
 7. DOUBLE
 8. NUMERIC
 9. DECIMAL
 10. BIT
 11. BOOLEAN
 12. DATE
 13. TIME
 14. TIMESTAMP
 15. CHAR
 16. VARCHAR
 17. LONGVARCHAR

 ---------
 Extractor
 ---------

 During the *extraction* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

 - If *Table name* is provided, then the SQL statement generated will take on the form ``SELECT * FROM <table name>``.
 - If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``SELECT <columns> FROM <table name>``.
 - If *Table SQL statement* is provided, then the provided SQL statement will be used.

 The conditions generated by the *partitioner* are appended to the end of the SQL query to query a section of data.

 The Generic JDBC connector extracts CSV data usable by the *CSV Intermediate Data Format*.

 ------
 Loader
 ------

 During the *loading* phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

 - If *Table name* is provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (col1, col2, ...) VALUES (?,?,..)``.
 - If *Table name* and *Columns* are provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (<columns>) VALUES (?,?,..)``.
 - If *Table SQL statement* is provided, then the provided SQL statement will be used.

 This connector expects to receive CSV data consumable by the *CSV Intermediate Data Format*.

 ----------
 Destroyers
 ----------

 The Generic JDBC Connector performs two operations in the destroyer in the TO direction:

 1. Copy the contents of the staging table to the desired table.
 2. Clear the staging table.

 No operations are performed in the FROM direction.
	.. Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.


	==================
	Sqoop 2 Connectors
	==================

	This document describes how to use the built-in connectors. This includes a detailed description of how connectors partition, format their output, extract data, and load data.

	.. contents::
	:depth: 3

	++++++++++++++++++++++
	Generic JDBC Connector
	++++++++++++++++++++++

	The Generic JDBC Connector can connect to any data source that adheres to the JDBC 4 specification.

	-----
	Usage
	-----

	To use the Generic JDBC Connector, create a link for the connector and a job that uses the link.

	Link Configuration
	++++++++++++++++++++++

	Inputs associated with the link configuration include:

	+-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
	\| Input \| Type \| Description \| Example \|
	+=============================+=========+=======================================================================+==========================================+
	\| JDBC Driver Class \| String \| The full class name of the JDBC driver. \| com.mysql.jdbc.Driver \|
	\| \| \| Required and accessible by the Sqoop server. \| \|
	+-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
	\| JDBC Connection String \| String \| The JDBC connection string to use when connecting to the data source. \| jdbc:mysql://localhost/test \|
	\| \| \| Required. Connectivity upon creation is optional. \| \|
	+-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
	\| Username \| String \| The username to provide when connecting to the data source. \| sqoop \|
	\| \| \| Optional. Connectivity upon creation is optional. \| \|
	+-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
	\| Password \| String \| The password to provide when connecting to the data source. \| sqoop \|
	\| \| \| Optional. Connectivity upon creation is optional. \| \|
	+-----------------------------+---------+-----------------------------------------------------------------------+------------------------------------------+
	\| JDBC Connection Properties \| Map \| A map of JDBC connection properties to pass to the JDBC driver \| profileSQL=true&useFastDateParsing=false \|
	\| \| \| Optional. \| \|
	+-----------------------------+---------------------------------------------------------------------------------+------------------------------------------+

	FROM Job Configuration
	++++++++++++++++++++++++++

	Inputs associated with the Job configuration for the FROM direction include:

	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Input \| Type \| Description \| Example \|
	+=============================+=========+=========================================================================+=============================================+
	\| Schema name \| String \| The schema name the table is part of. \| sqoop \|
	\| \| \| Optional \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Table name \| String \| The table name to import data from. \| test \|
	\| \| \| Optional. See note below. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Table SQL statement \| String \| The SQL statement used to perform a free form query. \| ``SELECT COUNT(*) FROM test ${CONDITIONS}`` \|
	\| \| \| Optional. See notes below. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Table column names \| String \| Columns to extract from the JDBC data source. \| col1,col2 \|
	\| \| \| Optional Comma separated list of columns. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Partition column name \| Map \| The column name used to partition the data transfer process. \| col1 \|
	\| \| \| Optional. Defaults to primary key of table. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Null value allowed for \| Boolean \| True or false depending on whether NULL values are allowed in data \| true \|
	\| the partition column \| \| of the Partition column. Optional. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+---------------------------------------------+
	\| Boundary query \| String \| The query used to define an upper and lower boundary when partitioning. \| \|
	\| \| \| Optional. \| \|
	+-----------------------------+-----------------------------------------------------------------------------------+---------------------------------------------+

	Notes
	=========

	1. Table name and Table SQL statement are mutually exclusive. If Table name is provided, the Table SQL statement should not be provided. If Table SQL statement is provided then Table name should not be provided.
	2. Table column names should be provided only if Table name is provided.
	3. If there are columns with similar names, column aliases are required. For example: ``SELECT table1.id as "i", table2.id as "j" FROM table1 INNER JOIN table2 ON table1.id = table2.id``.

	TO Job Configuration
	++++++++++++++++++++++++

	Inputs associated with the Job configuration for the TO direction include:

	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Input \| Type \| Description \| Example \|
	+=============================+=========+=========================================================================+=================================================+
	\| Schema name \| String \| The schema name the table is part of. \| sqoop \|
	\| \| \| Optional \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Table name \| String \| The table name to import data from. \| test \|
	\| \| \| Optional. See note below. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Table SQL statement \| String \| The SQL statement used to perform a free form query. \| ``INSERT INTO test (col1, col2) VALUES (?, ?)`` \|
	\| \| \| Optional. See note below. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Table column names \| String \| Columns to insert into the JDBC data source. \| col1,col2 \|
	\| \| \| Optional Comma separated list of columns. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Stage table name \| String \| The name of the table used as a staging table. \| staging \|
	\| \| \| Optional. \| \|
	+-----------------------------+---------+-------------------------------------------------------------------------+-------------------------------------------------+
	\| Should clear stage table \| Boolean \| True or false depending on whether the staging table should be cleared \| true \|
	\| \| \| after the data transfer has finished. Optional. \| \|
	+-----------------------------+-----------------------------------------------------------------------------------+-------------------------------------------------+

	Notes
	=========

	1. Table name and Table SQL statement are mutually exclusive. If Table name is provided, the Table SQL statement should not be provided. If Table SQL statement is provided then Table name should not be provided.
	2. Table column names should be provided only if Table name is provided.

	-----------
	Partitioner
	-----------

	The Generic JDBC Connector partitioner generates conditions to be used by the extractor.
	It varies in how it partitions data transfer based on the partition column data type.
	Though, each strategy roughly takes on the following form:
	::

	(upper boundary - lower boundary) / (max partitions)

	By default, the primary key will be used to partition the data unless otherwise specified.

	The following data types are currently supported:

	1. TINYINT
	2. SMALLINT
	3. INTEGER
	4. BIGINT
	5. REAL
	6. FLOAT
	7. DOUBLE
	8. NUMERIC
	9. DECIMAL
	10. BIT
	11. BOOLEAN
	12. DATE
	13. TIME
	14. TIMESTAMP
	15. CHAR
	16. VARCHAR
	17. LONGVARCHAR

	---------
	Extractor
	---------

	During the extraction phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

	- If Table name is provided, then the SQL statement generated will take on the form ``SELECT * FROM <table name>``.
	- If Table name and Columns are provided, then the SQL statement generated will take on the form ``SELECT <columns> FROM <table name>``.
	- If Table SQL statement is provided, then the provided SQL statement will be used.

	The conditions generated by the partitioner are appended to the end of the SQL query to query a section of data.

	The Generic JDBC connector extracts CSV data usable by the CSV Intermediate Data Format.

	------
	Loader
	------

	During the loading phase, the JDBC data source is queried using SQL. This SQL will vary based on your configuration.

	- If Table name is provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (col1, col2, ...) VALUES (?,?,..)``.
	- If Table name and Columns are provided, then the SQL statement generated will take on the form ``INSERT INTO <table name> (<columns>) VALUES (?,?,..)``.
	- If Table SQL statement is provided, then the provided SQL statement will be used.

	This connector expects to receive CSV data consumable by the CSV Intermediate Data Format.

	----------
	Destroyers
	----------

	The Generic JDBC Connector performs two operations in the destroyer in the TO direction:

	1. Copy the contents of the staging table to the desired table.
	2. Clear the staging table.

	No operations are performed in the FROM direction.