src/docs/user/connectors.txt - sqoop - Git at Google

 ////
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ////

 [[connectors]]
 Notes for specific connectors
 -----------------------------

 MySQL JDBC Connector
 ~~~~~~~~~~~~~~~~~~~~

 This section contains information specific to MySQL JDBC Connector.

 Upsert functionality
 ^^^^^^^^^^^^^^^^^^^^

 MySQL JDBC Connector is supporting upsert functionality using argument
 +\--update-mode allowinsert+. To achieve that Sqoop is using MySQL clause INSERT INTO
 ... ON DUPLICATE KEY UPDATE. This clause do not allow user to specify which columns
 should be used to distinct whether we should update existing row or add new row. Instead
 this clause relies on table's unique keys (primary key belongs to this set). MySQL
 will try to insert new row and if the insertion fails with duplicate unique key error
 it will update appropriate row instead. As a result, Sqoop is ignoring values specified
 in parameter +\--update-key+, however user needs to specify at least one valid column
 to turn on update mode itself.


 MySQL Direct Connector
 ~~~~~~~~~~~~~~~~~~~~~~

 MySQL Direct Connector allows faster import and export to/from MySQL using +mysqldump+ and +mysqlimport+ tools functionality
 instead of SQL selects and inserts.

 To use the MySQL Direct Connector, specify the +\--direct+ argument for your import or export job.

 Example:

 ----
 $ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \
     --direct
 ----

 Passing additional parameters to mysqldump:

 ----
 $ sqoop import --connect jdbc:mysql://server.foo.com/db --table bar \
     --direct -- --default-character-set=latin1
 ----

 Requirements
 ^^^^^^^^^^^^

 Utilities +mysqldump+ and +mysqlimport+ should be present in the shell path of the user running the Sqoop command on
 all nodes. To validate SSH as this user to all nodes and execute these commands. If you get an error, so will Sqoop.

 Limitations
 ^^^^^^^^^^^^

 * Currently the direct connector does not support import of large object columns (BLOB and CLOB).
 * Importing to HBase and Accumulo is not supported
 * Use of a staging table when exporting data is not supported
 * Import of views is not supported

 Direct-mode Transactions
 ^^^^^^^^^^^^^^^^^^^^^^^^

 For performance, each writer will commit the current transaction
 approximately every 32 MB of exported data. You can control this
 by specifying the following argument _before_ any tool-specific arguments: +-D
 sqoop.mysql.export.checkpoint.bytes=size+, where _size_ is a value in
 bytes. Set _size_ to 0 to disable intermediate checkpoints,
 but individual files being exported will continue to be committed
 independently of one another.

 Sometimes you need to export large data with Sqoop to a live MySQL cluster that
 is under a high load serving random queries from the users of your application.
 While data consistency issues during the export can be easily solved with a
 staging table, there is still a problem with the performance impact caused by
 the heavy export.

 First off, the resources of MySQL dedicated to the import process can affect
 the performance of the live product, both on the master and on the slaves.
 Second, even if the servers can handle the import with no significant
 performance impact (mysqlimport should be relatively "cheap"), importing big
 tables can cause serious replication lag in the cluster risking data
 inconsistency.

 With +-D sqoop.mysql.export.sleep.ms=time+, where _time_ is a value in
 milliseconds, you can let the server relax between checkpoints and the replicas
 catch up by pausing the export process after transferring the number of bytes
 specified in +sqoop.mysql.export.checkpoint.bytes+. Experiment with different
 settings of these two parameters to archieve an export pace that doesn't
 endanger the stability of your MySQL cluster.

 IMPORTANT: Note that any arguments to Sqoop that are of the form +-D
 parameter=value+ are Hadoop _generic arguments_ and must appear before
 any tool-specific arguments (for example, +\--connect+, +\--table+, etc).
 Don't forget that these parameters are only supported with the +\--direct+
 flag set.

 Microsoft SQL Connector
 ~~~~~~~~~~~~~~~~~~~~~~~

 Extra arguments
 ^^^^^^^^^^^^^^^

 List of all extra arguments supported by Microsoft SQL Connector is shown below:

 .Supported Microsoft SQL Connector extra arguments:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--identity-insert                      Set IDENTITY_INSERT to ON before \
                                          export insert.
 +\--resilient+                           Attempt to recover failed \
                                          export operations.
 +\--schema <name>+                       Scheme name that sqoop should use. \
                                          Default is "dbo".
 +\--table-hints <hints>+                 Table hints that Sqoop should use for \
                                          data movement.
 ---------------------------------------------------------------------------------

 Allow identity inserts
 ^^^^^^^^^^^^^^^^^^^^^^

 You can allow inserts on columns that have identity. For example:

 ----
 $ sqoop export ... --export-dir custom_dir --table custom_table -- --identity-insert
 ----

 Resilient operations
 ^^^^^^^^^^^^^^^^^^^^

 You can override the default and use resilient operations during export or import.
 This will retry failed operations, i.e. if the connection gets dropped by
 SQL Server, the mapper will try to reconnect and continue from where it was before.

 In case of export, the +\--resilient+ option will ensure that Sqoop will try to recover
 from connection resets.

 In case of import, however, one has to use both the +\--resilient+ option and specify
 the +\--split-by+ column to trigger the retry mechanism. An important requirement is
 that the data must be unique and ordered ascending by the split-by column, otherwise
 records could be either lost or duplicated.

 Example commands using resilient operations:
 ----
 $ sqoop export ... --export-dir custom_dir --table custom_table -- --resilient
 ----

 Importing from a table:
 ----
 $ sqoop import ... --table custom_table --split-by id -- --resilient
 ----

 Importing via a query:
 ----
 $ sqoop import ... --query "SELECT ... WHERE $CONDITIONS" --split-by ordered_column -- --resilient

 Schema support
 ^^^^^^^^^^^^^^

 If you need to work with tables that are located in non-default schemas, you can
 specify schema names via the +\--schema+ argument. Custom schemas are supported for
 both import and export jobs. For example:

 ----
 $ sqoop import ... --table custom_table -- --schema custom_schema
 ----

 Table hints
 ^^^^^^^^^^^

 Sqoop supports table hints in both import and export jobs. Table hints are used only
 for queries that move data from/to Microsoft SQL Server, but they cannot be used for
 meta data queries. You can specify a comma-separated list of table hints in the
 +\--table-hints+ argument. For example:

 ----
 $ sqoop import ... --table custom_table -- --table-hints NOLOCK
 ----


 PostgreSQL Connector
 ~~~~~~~~~~~~~~~~~~~~~

 Extra arguments
 ^^^^^^^^^^^^^^^

 List of all extra arguments supported by PostgreSQL Connector is shown below:

 .Supported PostgreSQL extra arguments:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--schema <name>+                       Scheme name that sqoop should use. \
                                          Default is "public".
 ---------------------------------------------------------------------------------

 Schema support
 ^^^^^^^^^^^^^^

 If you need to work with table that is located in schema other than default one,
 you need to specify extra argument +\--schema+. Custom schemas are supported for
 both import and export job (optional staging table however must be present in the
 same schema as target table). Example invocation:

 ----
 $ sqoop import ... --table custom_table -- --schema custom_schema
 ----

 PostgreSQL Direct Connector
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 PostgreSQL Direct Connector allows faster import and export to/from PostgresSQL "COPY" command.

 To use the PostgreSQL Direct Connector, specify the +\--direct+ argument for your import or export job.

 When importing from PostgreSQL in conjunction with direct mode, you
 can split the import into separate files after
 individual files reach a certain size. This size limit is controlled
 with the +\--direct-split-size+ argument.

 The direct connector offers also additional extra arguments:

 .Additional supported PostgreSQL extra arguments in direct mode:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--boolean-true-string <str>+           String that will be used to encode \
                                          +true+ value of +boolean+ columns.
                                          Default is "TRUE".
 +\--boolean-false-string <str>+          String that will be used to encode \
                                          +false+ value of +boolean+ columns.
                                          Default is "FALSE".
 ---------------------------------------------------------------------------------

 Requirements
 ^^^^^^^^^^^^

 Utility +psql+ should be present in the shell path of the user running the Sqoop command on
 all nodes. To validate SSH as this user to all nodes and execute these commands. If you get an error, so will Sqoop.


 Limitations
 ^^^^^^^^^^^^

 * Currently the direct connector does not support import of large object columns (BLOB and CLOB).
 * Importing to HBase and Accumulo is not supported
 * Import of views is not supported

 pg_bulkload connector
 ~~~~~~~~~~~~~~~~~~~~~

 Purpose
 ^^^^^^^
 pg_bulkload connector is a direct connector for exporting data into PostgreSQL.
 This connector uses
 http://pgbulkload.projects.postgresql.org/index.html[pg_bulkload].
 Users benefit from functionality of pg_bulkload such as
 fast exports bypassing shared bufferes and WAL,
 flexible error records handling,
 and ETL feature with filter functions.

 Requirements
 ^^^^^^^^^^^^
 pg_bulkload connector requires following conditions for export job execution:

 * The link:http://pgbulkload.projects.postgresql.org/index.html[pg_bulkload]
   must be installed on DB server and all slave nodes.
   RPM for RedHat or CentOS is available in then
   link:http://pgfoundry.org/frs/?group_id=1000261[download page].
 * The link:http://jdbc.postgresql.org/index.html[PostgreSQL JDBC]
   is required on client node.
 * Superuser role of PostgreSQL database is required for execution of pg_bulkload.

 Syntax
 ^^^^^^
 Use +--connection-manager+ option to specify connection manager classname.
 ----
 $ sqoop export (generic-args) --connection-manager org.apache.sqoop.manager.PGBulkloadManager (export-args)
 $ sqoop-export (generic-args) --connection-manager org.apache.sqoop.manager.PGBulkloadManager (export-args)
 ----

 This connector supports export arguments shown below.

 .Supported export control arguments:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--export-dir <dir>+                    HDFS source path for the export
 +-m,\--num-mappers <n>+                  Use 'n' map tasks to export in\
                                          parallel
 +\--table <table-name>+                  Table to populate
 +\--input-null-string <null-string>+     The string to be interpreted as\
                                          null for string columns
 ---------------------------------------------------------------------------------

 There are additional configuration for pg_bulkload execution
 specified via Hadoop Configuration properties
 which can be given with +-D <property=value>+ option.
 Because Hadoop Configuration properties are generic arguments of the sqoop,
 it must preceed any export control arguments.

 .Supported export control properties:
 [grid="all"]
 `------------------------------`----------------------------------------------
 Property                       Description
 -----------------------------------------------------------------------------
 mapred.reduce.tasks            Number of reduce tasks for staging. \
                                The defalt value is 1. \
                                Each tasks do staging in a single transaction.
 pgbulkload.bin                 Path of the pg_bulkoad binary \
                                installed on each slave nodes.
 pgbulkload.check.constraints   Specify whether CHECK constraints are checked \
                                during the loading. \
                                The default value is YES.
 pgbulkload.parse.errors        The maximum mumber of ingored records \
                                that cause errors during parsing, \
                                encoding, filtering, constraints checking, \
                                and data type conversion. \
                                Error records are recorded \
                                in the PARSE BADFILE.  \
                                The default value is INFINITE.
 pgbulkload.duplicate.errors    Number of ingored records \
                                that violate unique constraints. \
                                Duplicated records are recorded in the \
                                DUPLICATE BADFILE on DB server. \
                                The default value is INFINITE.
 pgbulkload.filter              Specify the filter function \
                                to convert each row in the input file.  \
                                See the pg_bulkload documentation to know \
                                how to write FILTER functions.
 pgbulkload.clear.staging.table Indicates that any data present in\
                                the staging table can be dropped.
 ------------------------------------------------------------------------------

 Here is a example of complete command line.
 ----
 $ sqoop export \
     -Dmapred.reduce.tasks=2
     -Dpgbulkload.bin="/usr/local/bin/pg_bulkload" \
     -Dpgbulkload.input.field.delim=$'\t' \
     -Dpgbulkload.check.constraints="YES" \
     -Dpgbulkload.parse.errors="INFINITE" \
     -Dpgbulkload.duplicate.errors="INFINITE" \
     --connect jdbc:postgresql://pgsql.example.net:5432/sqooptest \
     --connection-manager org.apache.sqoop.manager.PGBulkloadManager \
     --table test --username sqooptest --export-dir=/test -m 2
 ----

 Data Staging
 ^^^^^^^^^^^^
 Each map tasks of pg_bulkload connector's export job create
 their own staging table on the fly.
 The Name of staging tables is decided based on the destination table
 and the task attempt ids.
 For example, the name of staging table for the "test" table is like
 +test_attempt_1345021837431_0001_m_000000_0+ .

 Staging tables are automatically dropped if tasks successfully complete
 or map tasks fail.
 When reduce task fails,
 staging table for the task are left for manual retry and
 users must take care of it.

 Netezza Connector
 ~~~~~~~~~~~~~~~~~

 Extra arguments
 ^^^^^^^^^^^^^^^

 List of all extra arguments supported by Netezza Connector is shown below:

 .Supported Netezza extra arguments:
 [grid="all"]
 `-------------------------------------`----------------------------------------
 Argument                              Description
 -------------------------------------------------------------------------------
 +--partitioned-access+                Whether each mapper acts on a subset\
                                       of data slices of a table or all\
                                       Default is "false" for standard mode\
                                       and "true" for direct mode.
 +--max-errors+                        Applicable only for direct mode export.\
                                       This option specifies the error threshold\
                                       per mapper while transferring data. If\
                                       the number of errors encountered exceed\
                                       this threshold then the job will fail.
                                       Default value is 1.
 +--log-dir+                           Applicable only for direct mode export.\
                                       Specifies the directory where Netezza\
                                       external table operation logs are stored\
                                       on the hadoop filesystem.  Logs are\
                                       stored under this directory with one\
                                       directory for the job and sub-directories\
                                       for each task number and attempt.\
                                       Default value is the user home directory.\
                                       The nzlog and nzbad files will be under
                                       (logdir)/job-id/job-attempt-id.
 +--trunc-string+                      Applicable only for direct mode export.\
                                       Specifies whether the system \
                                       truncates strings to the declared\
                                       storage and loads the data. By default\
                                       truncation of strings is reported as an\
                                       error.
 +--ctrl-chars+                        Applicable only for direct mode export.\
                                       Specifies whether control characters \
                                       (ASCII chars 1 - 31) can be allowed \
                                       to be part of char/nchar/varchar/nvarchar\
                                       columns.  Default is false.
 +--crin-string+                       Applicable only for direct mode export.\
                                       Specifies whether carriage return \
                                       (ASCII char 13) can be allowed \
                                       to be part of char/nchar/varchar/nvarchar\
                                       columns.  Note that CR can no longer \
                                       be a record delimiter with this option.\
                                       Default is false.
 +--ignore-zero+                       Applicable only for direct mode export.\
                                       Specifies whether NUL character \
                                       (ASCII char 0) should be scanned \
                                       and ignored as part of the data loaded\
                                       into char/nchar/varchar/nvarchar \
                                       columns.\
                                       Default is false.
 --------------------------------------------------------------------------------


 Direct Mode
 ^^^^^^^^^^^
 Netezza connector supports an optimized data transfer facility using the
 Netezza external tables feature.  Each map tasks of Netezza connector's import
 job will work on a subset of the Netezza partitions and transparently create
 and use an external table to transport data.  Similarly, export jobs will use
 the external table to push data fast onto the NZ system.   Direct mode does
 not support staging tables, upsert options etc.

 Here is an example of complete command line for import using the Netezza
 external table feature.

 ----
 $ sqoop import \
     --direct \
     --connect jdbc:netezza://nzhost:5480/sqoop \
     --table nztable \
     --username nzuser \
     --password nzpass \
     --target-dir hdfsdir

 ----

 Here is an example of complete command line for export with tab as the field
 terminator character.

 ----
 $ sqoop export \
     --direct \
     --connect jdbc:netezza://nzhost:5480/sqoop \
     --table nztable \
     --username nzuser \
     --password nzpass \
     --export-dir hdfsdir \
     --input-fields-terminated-by "\t"
 ----

 Null string handling
 ^^^^^^^^^^^^^^^^^^^^

 Netezza direct connector supports the null-string features of Sqoop.  The null
 string values are converted to appropriate external table options during export
 and import operations.

 .Supported export control arguments:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--input-null-string <null-string>+     The string to be interpreted as\
                                          null for string columns.
 +\--input-null-non-string <null-string>+ The string to be interpreted as\
                                          null for non string columns.
 ---------------------------------------------------------------------------------

 In the case of Netezza direct mode connector, both the arguments must be
 left to the default values or explicitly set to the same value.  Furthermore
 the null string value is restricted to 0-4 utf8 characters.

 On export, for non-string columns, if the chosen null value is a valid
 representation in the column domain, then the column might not be loaded as
 null.  For example, if the null string value is specified as "1", then on
 export, any occurrence of "1" in the input file will be loaded as value 1
 instead of NULL for int columns.

 It is suggested that the null value be specified as empty string for
 performance and consistency.

 .Supported import control arguments:
 [grid="all"]
 `----------------------------------------`---------------------------------------
 Argument                                 Description
 ---------------------------------------------------------------------------------
 +\--null-string <null-string>+           The string to be interpreted as\
                                          null for string columns.
 +\--null-non-string <null-string>+       The string to be interpreted as\
                                          null for non string columns.
 ---------------------------------------------------------------------------------

 In the case of Netezza direct mode connector, both the arguments must be
 left to the default values or explicitly set to the same value.  Furthermore
 the null string value is restricted to 0-4 utf8 characters.

 On import, for non-string columns, the chosen null value in current
 implementations the null value representation is ignored for non character
 columns.  For example, if the null string value is specified as "\N", then on
 import, any occurrence of NULL for non-char columns in the table will be
 imported as an empty string instead of '\N', the chosen null string
 representation.

 It is suggested that the null value be specified as empty string for
 performance and consistency.

 Data Connector for Oracle and Hadoop
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 About
 ^^^^^

 The Data Connector for Oracle and Hadoop is now included in Sqoop.

 It can be enabled by specifying the +\--direct+ argument for your import or
 export job.

 Jobs
 ++++

 The Data Connector for Oracle and Hadoop inspects each Sqoop job and assumes
 responsibility for the ones it can perform better than the Oracle manager built
 into Sqoop.

 Data Connector for Oracle and Hadoop accepts responsibility for the following
 Sqoop Job types:

 - *Import* jobs that are *Non-Incremental*.
 - *Export* jobs
 - Data Connector for Oracle and Hadoop does not accept responsibility for other
 Sqoop job types. For example Data Connector for Oracle and Hadoop does not
 accept *eval* jobs etc.

 Data Connector for Oracle and Hadoop accepts responsibility for those Sqoop Jobs
 with the following attributes:

 - Oracle-related
 - Table-Based - Jobs where the table argument is used and the specified object
 is a table.
 +
 NOTE: Data Connector for Oracle and Hadoop does not process index-organized
 tables unless the table is partitioned and +oraoop.chunk.method+ is set
 to +PARTITION+

 - There are at least 2 mappers — Jobs where the Sqoop command-line does not
 include: +--num-mappers 1+

 How The Standard Oracle Manager Works for Imports
 +++++++++++++++++++++++++++++++++++++++++++++++++

 The Oracle manager built into Sqoop uses a range-based query for each mapper.
 Each mapper executes a query of the form:

 ----
 SELECT * FROM sometable WHERE id >= lo AND id < hi
 ----

 The *lo* and *hi* values are based on the number of mappers and the minimum and
 maximum values of the data in the column the table is being split by.

 If no suitable index exists on the table then these queries result in full
 table-scans within Oracle. Even with a suitable index, multiple mappers may
 fetch data stored within the same Oracle blocks, resulting in redundant IO
 calls.

 How The Data Connector for Oracle and Hadoop Works for Imports
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 The Data Connector for Oracle and Hadoop generates queries for the mappers of
 the form:

 ----
 SELECT *
   FROM sometable
  WHERE rowid >= dbms_rowid.rowid_create(1, 893, 1, 279, 0) AND
        rowid <= dbms_rowid.rowid_create(1, 893, 1, 286, 32767)
 ----

 The Data Connector for Oracle and Hadoop queries ensure that:

 - No two mappers read data from the same Oracle block. This minimizes
 redundant IO.
 - The table does not require indexes.
 - The Sqoop command line does not need to specify a +--split-by+ column.

 Data Connector for Oracle and Hadoop Exports
 ++++++++++++++++++++++++++++++++++++++++++++

 Benefits of the Data Connector for Oracle and Hadoop:

 - *Merge-Export facility* - Update Oracle tables by modifying changed rows AND
 inserting rows from the HDFS file that did not previously exist in the Oracle
 table. The Connector for Oracle and Hadoop's Merge-Export is unique - there is
 no Sqoop equivalent.
 - *Lower impact on the Oracle database* - Update the rows in the Oracle table
 that have changed, not all rows in the Oracle table. This has performance
 benefits and reduces the impact of the query on Oracle (for example, the Oracle
 redo logs).
 - *Improved performance* - With partitioned tables, mappers utilize temporary
 Oracle tables which allow parallel inserts and direct path writes.

 Requirements
 ^^^^^^^^^^^^

 Ensure The Oracle Database JDBC Driver Is Setup Correctly
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 You may want to ensure the Oracle Database 11g Release 2 JDBC driver is setup
 correctly on your system. This driver is required for Sqoop to work with Oracle.

 The Oracle Database 11g Release 2 JDBC driver file is +ojdbc6.jar+ (3.2Mb).

 If this file is not on your system then download it from:
 http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html

 This file should be put into the +$SQOOP_HOME/lib+ directory.

 Oracle Roles and Privileges
 +++++++++++++++++++++++++++
 The Oracle user for The Data Connector for Oracle and Hadoop requires the
 following roles and privileges:

 - +create session+

 In addition, the user must have the select any dictionary privilege or
 select_catalog_role role or all of the following object privileges:

 - +select on v_$instance+
 - +select on dba_tables+
 - +select on dba_tab_columns+
 - +select on dba_objects+
 - +select on dba_extents+
 - +select on dba_segments+ — Required for Sqoop imports only
 - +select on dba_constraints+ — Required for Sqoop imports only
 - +select on v_$database+ — Required for Sqoop imports only
 - +select on v_$parameter+ — Required for Sqoop imports only

 NOTE: The user also requires the alter session privilege to make use of session
 tracing functionality. See "oraoop.oracle.session.initialization.statements"
 for more information.

 Additional Oracle Roles And Privileges Required for Export
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 The Oracle user for Data Connector for Oracle and Hadoop requires:

 - Quota on the tablespace in which the Oracle export tables are located.
 +
 An example Oracle command to achieve this is
 +
 ----
 alter user username quota unlimited on tablespace
 ----

 - The following privileges:
 +
 [grid="all"]
 `-----------------------------------------------------------------------------------------------`--------------------------------------------------------------------------
 Type of Export                                                                                  Privileges Required
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 All Export                                                                                      +create table+
                                                                                                 +select on dba_tab_partitions+
                                                                                                 +select on dba_tab_subpartitions+
                                                                                                 +select on dba_indexes+
                                                                                                 +select on dba_ind_columns+
 Insert-Export with a template table into another schema                                         +select any table+
                                                                                                 +create any table+
                                                                                                 +insert any table+
                                                                                                 +alter any table+ (partitioning)
 Insert-Export without a template table into another schema                                      +select,insert on table+ (no partitioning)
                                                                                                 +select,alter on table+ (partitioning)
 Update-Export into another schema                                                               +select,update on table+ (no partitioning)
                                                                                                 +select,delete,alter,insert on table+ (partitioning)
 Merge-Export into another schema                                                                +select,insert,update on table+ (no partitioning)
                                                                                                 +select,insert,delete,alter on table+ (partitioning)
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Supported Data Types
 ++++++++++++++++++++

 The following Oracle data types are supported by the Data Connector for
 Oracle and Hadoop:

 [grid="all"]
 `----------------------------------------`-----------------------------------------------------------------
 BINARY_DOUBLE                            NCLOB
 BINARY_FLOAT                             NUMBER
 BLOB                                     NVARCHAR2
 CHAR                                     RAW
 CLOB                                     ROWID
 DATE                                     TIMESTAMP
 FLOAT                                    TIMESTAMP WITH TIME ZONE
 INTERVAL DAY TO SECOND                   TIMESTAMP WITH LOCAL TIME ZONE
 INTERVAL YEAR TO MONTH                   URITYPE
 LONG                                     VARCHAR2
 NCHAR
 -----------------------------------------------------------------------------------------------------------

 All other Oracle column types are NOT supported. Example Oracle column types NOT
 supported by Data Connector for Oracle and Hadoop include:

 [grid="all"]
 `-----------------------------------------------------`-------------------------------
 All of the ANY types                                  BFILE
 All of the MEDIA types                                LONG RAW
 All of the SPATIAL types                              MLSLABEL
 Any type referred to as UNDEFINED                     UROWID
 All custom (user-defined) URI types                   XMLTYPE
 --------------------------------------------------------------------------------------

 NOTE: Data types RAW, LONG and LOB (BLOB, CLOB and NCLOB) are supported for
 Data Connector for Oracle and Hadoop imports. They are not supported for Data
 Connector for Oracle and Hadoop exports.

 Execute Sqoop With Data Connector for Oracle and Hadoop
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Connect to Oracle / Oracle RAC
 ++++++++++++++++++++++++++++++

 The Sqoop +--connect+ parameter defines the Oracle instance or Oracle RAC to
 connect to. It is required with all Sqoop import and export commands.

 Data Connector for Oracle and Hadoop expects the associated connection string
 to be of a specific format dependent on whether the Oracle SID, Service
 or TNS name is defined.  The TNS name based URL scheme can be used to enable
 authentication using Oracle wallets.

 +--connect jdbc:oracle:thin:@OracleServer:OraclePort:OracleSID+

 +--connect jdbc:oracle:thin:@//OracleServer:OraclePort/OracleService+

 +--connect jdbc:oracle:thin:@TNSName+

 Connect to An Oracle Database Instance
 ++++++++++++++++++++++++++++++++++++++

 [grid="all"]
 `---------------------------------------------`----------------------------------------
 Parameter / Component                         Description
 ---------------------------------------------------------------------------------------
 +jdbc:oracle:thin+                            The Data Connector for Oracle and\
                                               Hadoop requires the connection string\
                                               starts with jdbc:oracle.

                                               The Data Connector for Oracle and\
                                               Hadoop has been tested with the thin\
                                               driver however it should work equally\
                                               well with other drivers such as OCI.
 +OracleServer+                                The host name of the Oracle server.
 +OraclePort+                                  The port to connect to the Oracle server.
 +OracleSID+                                   The Oracle instance.
 +OracleService+                               The Oracle Service.
 +TNSName+                                     The TNS name for the entry describing\
                                               the connection to the Oracle server.
 ----------------------------------------------------------------------------------------

 [NOTE]
 ================================================================================
 The Hadoop mappers connect to the Oracle database using a dynamically
 generated JDBC URL. This is designed to improve performance however it can be
 disabled by specifying:

 +-D oraoop.jdbc.url.verbatim=true+
 ================================================================================

 Connect to An Oracle RAC
 ++++++++++++++++++++++++

 Use the +--connect+ parameter as above. The connection string should point to
 one instance of the Oracle RAC. The listener of the host of this Oracle
 instance will locate the other instances of the Oracle RAC.

 NOTE: To improve performance, The Data Connector for Oracle and Hadoop
 identifies the active instances of the Oracle RAC and connects each Hadoop
 mapper to them in a roundrobin manner.

 If services are defined for this Oracle RAC then use the following parameter
 to specify the service name:

 +-D oraoop.oracle.rac.service.name=ServiceName+

 [grid="all"]
 `--------------------------------------------------------------------`------------------------------------------------
 Parameter / Component                                                 Description
 ----------------------------------------------------------------------------------------------------------------------
 +OracleServer:OraclePort:OracleInstance+                              Name one instance of the Oracle RAC.\
                                                                       The Data Connector for Oracle and\
                                                                       Hadoop assumes the same port number for\
                                                                       all instances of the Oracle RAC.

                                                                       The listener of the host of this Oracle\
                                                                       instance is used to locate other\
                                                                       instances of the Oracle RAC. For more\
                                                                       information enter this command on the\
                                                                       host command line:

                                                                       +lsnrctl status+
 +-D oraoop.oracle.rac.service.name=ServiceName+                       The service to connect to in the Oracle RAC.

                                                                       A connection is made to all instances\
                                                                       of the Oracle RAC associated with the\
                                                                       service given by +ServiceName+.

                                                                       If omitted, a connection is made to all\
                                                                       instances of the Oracle RAC.

                                                                       The listener of the host of this Oracle\
                                                                       instance needs to know the +ServiceName+\
                                                                       and all instances of the Oracle RAC. For\
                                                                       more information enter this command on\
                                                                       the host command line:

                                                                       +lsnrctl status+
 ----------------------------------------------------------------------------------------------------------------------

 Login to The Oracle Instance
 ++++++++++++++++++++++++++++

 Login to the Oracle instance on the Sqoop command line:

 +--connect jdbc:oracle:thin:@OracleServer:OraclePort:OracleInstance --username
 UserName -P+

 [grid="all"]
 `-------------------------------------`----------------------------------------
 Parameter / Component                    Description
 -------------------------------------------------------------------------------
 +--username UserName+                    The username to login to the Oracle\
                                          instance (SID).
 +-P+                                     You will be prompted for the password\
                                          to login to the Oracle instance.
 --------------------------------------------------------------------------------

 Kill Data Connector for Oracle and Hadoop Jobs
 ++++++++++++++++++++++++++++++++++++++++++++++

 Use the Hadoop Job Tracker to kill the Sqoop job, just as you would kill any
 other Map-Reduce job.

 $ +hadoop job -kill jobid+

 To allow an Oracle DBA to kill a Data Connector for Oracle and Hadoop
 job (via killing the sessions in Oracle) you need to prevent Map-Reduce from
 re-attempting failed jobs. This is done via the following Sqoop
 command-line switch:

 +-D mapred.map.max.attempts=1+

 This sends instructions similar to the following to the console:

 ----
 14/07/07 15:24:51 INFO oracle.OraOopManagerFactory:
 Note: This Data Connector for Oracle and Hadoop job can be killed via Oracle
 by executing the following statement:
   begin
     for row in (select sid,serial# from v$session where
                 module='Data Connector for Oracle and Hadoop' and
                 action='import 20140707152451EST') loop
       execute immediate 'alter system kill session ''' || row.sid ||
                         ',' || row.serial# || '''';
     end loop;
   end;
 ----

 Import Data from Oracle
 ^^^^^^^^^^^^^^^^^^^^^^^

 Execute Sqoop. Following is an example command:

 $ +sqoop import --direct --connect ... --table OracleTableName+

 If The Data Connector for Oracle and Hadoop accepts the job then the following
 text is output:

 ----
 **************************************************
 *** Using Data Connector for Oracle and Hadoop ***
 **************************************************
 ----

 [NOTE]
 ================================================================================
 - More information is available on the +--connect+ parameter. See "Connect to
 Oracle / Oracle RAC" for more information.

 - If Java runs out of memory the workaround is to specify each mapper's
 JVM memory allocation. Add the following parameter for example to allocate 4GB:
 +
 +-Dmapred.child.java.opts=-Xmx4000M+

 - An Oracle optimizer hint is included in the SELECT statement by default.
 See "oraoop.import.hint" for more information.
 +
 You can alter the hint on the command line as follows:
 +
 +-Doraoop.import.hint="NO_INDEX(t)"+
 +
 You can turn off the hint on the command line as follows (notice the space
 between the double quotes):
 +
 +-Doraoop.import.hint=" "+
 ================================================================================

 Match Hadoop Files to Oracle Table Partitions
 +++++++++++++++++++++++++++++++++++++++++++++

 +-Doraoop.chunk.method={ROWID|PARTITION}+

 To import data from a partitioned table in such a way that the resulting HDFS
 folder structure in Hadoop will match the table’s partitions, set the chunk
 method to PARTITION. The alternative (default) chunk method is ROWID.

 [NOTE]
 ================================================================================
 - For the number of Hadoop files to match the number of Oracle partitions,
 set the number of mappers to be greater than or equal to the number of
 partitions.
 - If the table is not partitioned then value PARTITION will lead to an error.
 ================================================================================

 Specify The Partitions To Import
 ++++++++++++++++++++++++++++++++

 +-Doraoop.import.partitions=PartitionA,PartitionB --table OracleTableName+

 Imports +PartitionA+ and +PartitionB+ of +OracleTableName+.

 [NOTE]
 ================================================================================
 - You can enclose an individual partition name in double quotes to retain the
 letter case or if the name has special characters.
 +
 +-Doraoop.import.partitions=\'"PartitionA",PartitionB' --table OracleTableName+
 +
 If the partition name is not double quoted then its name will be automatically
 converted to upper case, PARTITIONB for above.
 +
 When using double quotes the entire list of partition names must be enclosed in
 single quotes.
 +
 If the last partition name in the list is double quoted then there must be a
 comma at the end of the list.
 +
 +-Doraoop.import.partitions=\'"PartitionA","PartitionB",' --table
 OracleTableName+

 - Name each partition to be included. There is no facility to provide a range of
 partition names.

 - There is no facility to define sub partitions. The entire partition is
 included/excluded as per the filter.
 ================================================================================

 Consistent Read: All Mappers Read From The Same Point In Time
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 +-Doraoop.import.consistent.read={true|false}+

 When set to +false+ (by default) each mapper runs a select query. This will
 return potentially inconsistent data if there are a lot of DML operations on
 the table at the time of import.

 Set to +true+ to ensure all mappers read from the same point in time. The
 System Change Number (SCN) is passed down to all mappers, which use the Oracle
 Flashback Query to query the table as at that SCN.

 [NOTE]
 ================================================================================
 - Values +true+ | +false+ are case sensitive.
 - By default the SCN is taken from V$database. You can specify the SCN in the
 following command
 +
 +-Doraoop.import.consistent.read.scn=12345+
 ================================================================================

 Export Data into Oracle
 ^^^^^^^^^^^^^^^^^^^^^^^

 Execute Sqoop. Following is an example command:


 $ +sqoop export --direct --connect ... --table OracleTableName --export-dir
 /user/username/tablename+


 The Data Connector for Oracle and Hadoop accepts all jobs that export data to
 Oracle. You can verify The Data Connector for Oracle and Hadoop is in use by
 checking the following text is output:

 ----
 **************************************************
 *** Using Data Connector for Oracle and Hadoop ***
 **************************************************
 ----

 [NOTE]
 ================================================================================
 - +OracleTableName+ is the Oracle table the data will export into.
 - +OracleTableName+ can be in a schema other than that for the connecting user.
 Prefix the table name with the schema, for example +SchemaName.OracleTableName+.
 - Hadoop tables are picked up from the +/user/username/tablename+ directory.
 - The export will fail if the Hadoop file contains any fields of a data type
 not supported by The Data Connector for Oracle and Hadoop. See
 "Supported Data Types" for more information.
 - The export will fail if the column definitions in the Hadoop table do not
 exactly match the column definitions in the Oracle table.
 - The Data Connector for Oracle and Hadoop indicates if it finds temporary
 tables that it created more than a day ago that still exist. Usually these
 tables can be dropped. The only circumstance when these tables should not be
 dropped is when an The Data Connector for Oracle and Hadoop job has been
 running for more than 24 hours and is still running.
 - More information is available on the +--connect+ parameter. See
 "Connect to Oracle / Oracle RAC" for more information.
 ================================================================================

 Insert-Export
 +++++++++++++

 Appends data to +OracleTableName+. It does not modify existing data in
 +OracleTableName+.

 Insert-Export is the default method, executed in the absence of the
 +--update-key parameter+. All rows in the HDFS file in
 +/user/UserName/TableName+ are inserted into +OracleTableName+. No
 change is made to pre-existing data in +OracleTableName+.

 $ +sqoop export --direct --connect ... --table OracleTableName --export-dir
 /user/username/tablename+

 [NOTE]
 ================================================================================
 - If +OracleTableName+ was previously created by The Data Connector for Oracle
 and Hadoop with partitions then this export will create a new partition for the
 data being inserted.
 - When creating +OracleTableName+ specify a template. See
 "Create Oracle Tables" for more information.
 ================================================================================

 Update-Export
 +++++++++++++

 +--update-key OBJECT+

 Updates existing rows in +OracleTableName+.

 Rows in the HDFS file in +/user/UserName/TableName+ are matched to rows in
 +OracleTableName+ by the +OBJECT+ column. Rows that match are copied from the
 HDFS file to the Oracle table. No action is taken on rows that do not match.

 $ +sqoop export --direct --connect ... --update-key OBJECT --table
 OracleTableName --export-dir /user/username/tablename+

 [NOTE]
 ================================================================================
 - If +OracleTableName+ was previously created by The Data Connector for Oracle
 and Hadoop with partitions then this export will create a new partition for the
 data being inserted. Updated rows will be moved to the new partition that was
 created for the export.
 - For performance reasons it is strongly recommended that where more than a few
 rows are involved column +OBJECT+ be an index column of +OracleTableName+.
 - Ensure the column name defined with +--update-key OBJECT+ is specified in the
 correct letter case. Sqoop will show an error if the letter case is incorrect.
 - It is possible to match rows via multiple columns. See "Match Rows Via
 Multiple Columns" for more information.
 ================================================================================

 Merge-Export
 ++++++++++++

 +--update-key OBJECT -Doraoop.export.merge=true+

 Updates existing rows in +OracleTableName+. Copies across rows from the HDFS
 file that do not exist within the Oracle table.

 Rows in the HDFS file in +/user/UserName/TableName+ are matched to rows in
 +OracleTableName+ by the +OBJECT+ column. Rows that match are copied from the
 HDFS file to the Oracle table. Rows in the HDFS file that do not exist in
 +OracleTableName+ are added to +OracleTableName+.

 $ +sqoop export --direct --connect ... --update-key OBJECT
 -Doraoop.export.merge=true --table OracleTableName --export-dir
 /user/username/tablename+

 [NOTE]
 ================================================================================
 - Merge-Export is unique to The Data Connector for Oracle and Hadoop. It is
 not a standard Sqoop feature.
 - If +OracleTableName+ was previously created by The Data Connector for Oracle
 and Hadoop with partitions, then this export will create a new partition for
 the data being inserted. Updated rows will be moved to the new partition that
 was created for the export.
 - For performance reasons it is strongly recommended that where more than a
 few rows are involved column +OBJECT+ be an index column of +OracleTableName+.
 - Ensure the column name defined with +--update-key OBJECT+ is specified in the
 correct letter case. Sqoop will show an error if the letter case is incorrect.
 - It is possible to match rows via multiple columns. See "Match Rows Via
 Multiple Columns" for more information.
 ================================================================================

 Create Oracle Tables
 ++++++++++++++++++++

 +-Doraoop.template.table=TemplateTableName+

 Creates +OracleTableName+ by replicating the structure and data types of
 +TemplateTableName+. +TemplateTableName+ is a table that exists in Oracle prior
 to executing the Sqoop command.

 [NOTE]
 ================================================================================
 - The export will fail if the Hadoop file contains any fields of a data type
 not supported by The Data Connector for Oracle and Hadoop. See "Supported
 Data Types" for more information.
 - The export will fail if the column definitions in the Hadoop table do not
 exactly match the column definitions in the Oracle table.
 - This parameter is specific to creating an Oracle table. The export will fail
 if +OracleTableName+ already exists in Oracle.
 ================================================================================

 Example command:

 $ +sqoop export --direct --connect.. --table OracleTableName --export-dir
 /user/username/tablename -Doraoop.template.table=TemplateTableName+

 NOLOGGING
 +++++++++

 +-Doraoop.nologging=true+

 Assigns the NOLOGGING option to +OracleTableName+.

 NOLOGGING may enhance performance but you will be unable to backup the table.

 Partitioning
 ++++++++++++

 +-Doraoop.partitioned=true+

 Partitions the table with the following benefits:

 - The speed of the export is improved by allowing each mapper to insert data
 into a separate Oracle table using direct path writes. (An alter table exchange
 subpartition SQL statement is subsequently executed to swap the data into the
 export table.)
 - You can selectively query or delete the data inserted by each Sqoop export
 job. For example, you can delete old data by dropping old partitions from
 the table.

 The partition value is the SYSDATE of when Sqoop export job was performed.

 The partitioned table created by The Data Connector for Oracle and Hadoop
 includes the following columns that don't exist in the template table:

 - +oraoop_export_sysdate+ - This is the Oracle SYSDATE when the Sqoop export
 job was performed. The created table will be partitioned by this column.
 - +oraoop_mapper_id+ - This is the id of the Hadoop mapper that was used to
 process the rows from the HDFS file. Each partition is subpartitioned by this
 column. This column exists merely to facilitate the exchange subpartition
 mechanism that is performed by each mapper during the export process.
 - +oraoop_mapper_row+ - A unique row id within the mapper / partition.

 NOTE: If a unique row id is required for the table it can be formed by a
 combination of oraoop_export_sysdate, oraoop_mapper_id and oraoop_mapper_row.

 Match Rows Via Multiple Columns
 +++++++++++++++++++++++++++++++

 +-Doraoop.update.key.extra.columns="ColumnA,ColumnB"+

 Used with Update-Export and Merge-Export to match on more than one column. The
 first column to be matched on is +--update-key OBJECT+. To match on additional
 columns, specify those columns on this parameter.

 [NOTE]
 ================================================================================
 - Letter case for the column names on this parameter is not important.
 - All columns used for matching should be indexed. The first three items on the
 index should be +ColumnA+, +ColumnB+ and the column specified on
 +--update-key+ - but the order in which the columns are specified is not
 important.
 ================================================================================

 Storage Clauses
 +++++++++++++++

 +-Doraoop.temporary.table.storage.clause="StorageClause"+

 +-Doraoop.table.storage.clause="StorageClause"+

 Use to customize storage with Oracle clauses as in TABLESPACE or COMPRESS

 +-Doraoop.table.storage.clause+ applies to the export table that is created
 from +-Doraoop.template.table+. See "Create Oracle Tables" for more
 information. +-Doraoop.temporary.table.storage.clause+ applies to all other
 working tables that are created during the export process and then dropped at
 the end of the export job.

 Manage Date And Timestamp Data Types
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Import Date And Timestamp Data Types from Oracle
 ++++++++++++++++++++++++++++++++++++++++++++++++

 This section lists known differences in the data obtained by performing an
 Data Connector for Oracle and Hadoop import of an Oracle table versus a native
 Sqoop import of the same table.

 The Data Connector for Oracle and Hadoop Does Not Apply A Time Zone to DATE / TIMESTAMP Data Types
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Data stored in a DATE or TIMESTAMP column of an Oracle table is not associated
 with a time zone. Sqoop without the Data Connector for Oracle and Hadoop
 inappropriately applies time zone information to this data.

 Take for example the following timestamp in an Oracle DATE or TIMESTAMP column:
 +2am on 3rd October, 2010+.

 Request Sqoop without the Data Connector for Oracle and Hadoop import this data
 using a system located in Melbourne Australia. The data is adjusted to Melbourne
 Daylight Saving Time. The data is imported into Hadoop as:
 +3am on 3rd October, 2010.+

 The Data Connector for Oracle and Hadoop does not apply time zone information to
 these Oracle data-types. Even from a system located in Melbourne Australia, The
 Data Connector for Oracle and Hadoop ensures the Oracle and Hadoop timestamps
 match. The Data Connector for Oracle and Hadoop correctly imports this
 timestamp as:
 +2am on 3rd October, 2010+.

 NOTE: In order for The Data Connector for Oracle and Hadoop to ensure data
 accuracy, Oracle DATE and TIMESTAMP values must be represented by a String,
 even when +--as-sequencefile+ is used on the Sqoop command-line to produce a
 binary file in Hadoop.

 The Data Connector for Oracle and Hadoop Retains Time Zone Information in TIMEZONE Data Types
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Data stored in a TIMESTAMP WITH TIME ZONE column of an Oracle table is
 associated with a time zone. This data consists of two distinct parts: when the
 event occurred and where the event occurred.

 When Sqoop without The Data Connector for Oracle and Hadoop is used to import
 data it converts the timestamp to the time zone of the system running Sqoop and
 omits the component of the data that specifies where the event occurred.

 Take for example the following timestamps (with time zone) in an Oracle
 TIMESTAMP WITH TIME ZONE column:

 ----
 2:59:00 am on 4th April, 2010. Australia/Melbourne
 2:59:00 am on 4th April, 2010. America/New York
 ----

 Request Sqoop without The Data Connector for Oracle and Hadoop import this data
 using a system located in Melbourne Australia. From the data imported into
 Hadoop we know when the events occurred, assuming we know the Sqoop command was
 run from a system located in the Australia/Melbourne time zone, but we have lost
 the information regarding where the event occurred.

 ----
 2010-04-04 02:59:00.0
 2010-04-04 16:59:00.0
 ----

 Sqoop with The Data Connector for Oracle and Hadoop imports the example
 timestamps as follows. The Data Connector for Oracle and Hadoop retains the
 time zone portion of the data.
 ----
 2010-04-04 02:59:00.0 Australia/Melbourne
 2010-04-04 02:59:00.0 America/New_York
 ----

 Data Connector for Oracle and Hadoop Explicitly States Time Zone for LOCAL TIMEZONE Data Types
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 Data stored in a TIMESTAMP WITH LOCAL TIME ZONE column of an Oracle table is
 associated with a time zone. Multiple end-users in differing time zones
 (locales) will each have that data expressed as a timestamp within their
 respective locale.

 When Sqoop without the Data Connector for Oracle and Hadoop is used to import
 data it converts the timestamp to the time zone of the system running Sqoop and
 omits the component of the data that specifies location.

 Take for example the following two timestamps (with time zone) in an Oracle
 TIMESTAMP WITH LOCAL TIME ZONE column:

 ----
 2:59:00 am on 4th April, 2010. Australia/Melbourne
 2:59:00 am on 4th April, 2010. America/New York
 ----

 Request Sqoop without the Data Connector for Oracle and Hadoop import this data
 using a system located in Melbourne Australia. The timestamps are imported
 correctly but the local time zone has to be guessed. If multiple systems in
 different locale were executing the Sqoop import it would be very difficult to
 diagnose the cause of the data corruption.

 ----
 2010-04-04 02:59:00.0
 2010-04-04 16:59:00.0
 ----

 Sqoop with the Data Connector for Oracle and Hadoop explicitly states the time
 zone portion of the data imported into Hadoop. The local time zone is GMT by
 default. You can set the local time zone with parameter:

 +-Doracle.sessionTimeZone=Australia/Melbourne+

 The Data Connector for Oracle and Hadoop would import these two timestamps as:

 ----
 2010-04-04 02:59:00.0 Australia/Melbourne
 2010-04-04 16:59:00.0 Australia/Melbourne
 ----

 java.sql.Timestamp
 ++++++++++++++++++

 To use Sqoop's handling of date and timestamp data types when importing data
 from Oracle use the following parameter:

 +-Doraoop.timestamp.string=false+

 NOTE: Sqoop's handling of date and timestamp data types does not store the
 timezone. However, some developers may prefer Sqoop's handling as the Data
 Connector for Oracle and Hadoop converts date and timestamp data types to
 string. This may not work for some developers as the string will require
 parsing later in the workflow.

 Export Date And Timestamp Data Types into Oracle
 ++++++++++++++++++++++++++++++++++++++++++++++++

 Ensure the data in the HDFS file fits the required format exactly before using
 Sqoop to export the data into Oracle.

 [NOTE]
 ================================================================================
 - The Sqoop export command will fail if the data is not in the required format.
 - ff = Fractional second
 - TZR = Time Zone Region
 ================================================================================

 [grid="all"]
 `-------------------------------------`-------------------------------------------------------
 Oracle Data Type                      Required Format of The Data in the HDFS File
 ----------------------------------------------------------------------------------------------
 DATE                                  +yyyy-mm-dd hh24:mi:ss+
 TIMESTAMP                             +yyyy-mm-dd hh24:mi:ss.ff+
 TIMESTAMPTZ                           +yyyy-mm-dd hh24:mi:ss.ff TZR+
 TIMESTAMPLTZ                          +yyyy-mm-dd hh24:mi:ss.ff TZR+
 ----------------------------------------------------------------------------------------------

 Configure The Data Connector for Oracle and Hadoop
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 oraoop-site-template.xml
 ++++++++++++++++++++++++

 The oraoop-site-template.xml file is supplied with the Data Connector for
 Oracle and Hadoop. It contains a number of ALTER SESSION statements that are
 used to initialize the Oracle sessions created by the Data Connector for Oracle
 and Hadoop.

 If you need to customize these initializations to your environment then:

 1. Find +oraoop-site-template.xml+ in the Sqoop configuration directory.
 2. Copy +oraoop-site-template.xml+ to +oraoop-site.xml+.
 3. Edit the +ALTER SESSION+ statements in +oraoop-site.xml+.

 oraoop.oracle.session.initialization.statements
 +++++++++++++++++++++++++++++++++++++++++++++++

 The value of this property is a semicolon-delimited list of Oracle SQL
 statements. These statements are executed, in order, for each Oracle session
 created by the Data Connector for Oracle and Hadoop.

 The default statements include:

 +alter session set time_zone = \'{oracle.sessionTimeZone|GMT}';+::
 This statement initializes the timezone of the JDBC client. This ensures that
 data from columns of type TIMESTAMP WITH LOCAL TIMEZONE are correctly adjusted
 into the timezone of the client and not kept in the timezone of the Oracle
 database.
 +
 [NOTE]
 ================================================================================
 - There is an explanation to the text within the curly-braces. See
 "Expressions in oraoop-site.xml" for more information..
 - A list of the time zones supported by your Oracle database is available by
 executing the following query: +SELECT TZNAME FROM V$TIMEZONE_NAMES;+
 ================================================================================

 +alter session disable parallel query;+::
 This statement instructs Oracle to not parallelize SQL statements executed by
 the Data Connector for Oracle and Hadoop sessions. This Oracle feature is
 disabled because the Map/Reduce job launched by Sqoop is the mechanism used
 for parallelization.
 +
 It is recommended that you not enable parallel query because it can have an
 adverse effect the load on the Oracle instance and on the balance between
 the Data Connector for Oracle and Hadoop mappers.
 +
 Some export operations are performed in parallel where deemed appropriate by
 the Data Connector for Oracle and Hadoop. See "Parallelization" for
 more information.

 +alter session set "_serial_direct_read"=true;+::
 This statement instructs Oracle to bypass the buffer cache. This is used to
 prevent Oracle from filling its buffers with the data being read by the Data
 Connector for Oracle and Hadoop, therefore diminishing its capacity to cache
 higher prioritized data. Hence, this statement is intended to minimize the
 Data Connector for Oracle and Hadoop's impact on the immediate future
 performance of the Oracle database.

 +--alter session set events \'10046 trace name context forever, level 8';+::
 This statement has been commented-out. To allow tracing, remove the comment
 token "--" from the start of the line.

 [NOTE]
 ================================================================================
 - These statements are placed on separate lines for readability. They do not
 need to be placed on separate lines.
 - A statement can be commented-out via the standard Oracle double-hyphen
 token: "--". The comment takes effect until the next semicolon.
 ================================================================================

 oraoop.table.import.where.clause.location
 +++++++++++++++++++++++++++++++++++++++++

 SUBSPLIT (default)::
 When set to this value, the where clause is applied to each subquery used to
 retrieve data from the Oracle table.
 +
 A Sqoop command like:
 +
 +sqoop import -D oraoop.table.import.where.clause.location=SUBSPLIT --table
 JUNK --where "owner like \'G%'"+
 +
 Generates SQL query of the form:
 +
 ----
 SELECT OWNER,OBJECT_NAME
   FROM JUNK
  WHERE ((rowid >=
            dbms_rowid.rowid_create(1, 113320, 1024, 4223664, 0)
          AND rowid <=
              dbms_rowid.rowid_create(1, 113320, 1024, 4223671, 32767)))
    AND (owner like 'G%')
 UNION ALL
 SELECT OWNER,OBJECT_NAME
   FROM JUNK
  WHERE ((rowid >=
            dbms_rowid.rowid_create(1, 113320, 1024, 4223672, 0)
          AND rowid <=
            dbms_rowid.rowid_create(1, 113320, 1024, 4223679, 32767)))
    AND (owner like 'G%')
 ----

 SPLIT::
 When set to this value, the where clause is applied to the entire SQL
 statement used by each split/mapper.
 +
 A Sqoop command like:
 +
 +sqoop import -D oraoop.table.import.where.clause.location=SPLIT --table
 JUNK --where "rownum <= 10"+
 +
 Generates SQL query of the form:
 +
 ----
 SELECT OWNER,OBJECT_NAME
   FROM (
         SELECT OWNER,OBJECT_NAME
           FROM JUNK
          WHERE ((rowid >=
                    dbms_rowid.rowid_create(1, 113320, 1024, 4223664, 0)
                  AND rowid <=
                    dbms_rowid.rowid_create(1, 113320, 1024, 4223671, 32767)))
         UNION ALL
         SELECT OWNER,OBJECT_NAME
           FROM JUNK
          WHERE ((rowid >=
                    dbms_rowid.rowid_create(1, 113320, 1024, 4223672, 0)
                  AND rowid <=
                    dbms_rowid.rowid_create(1, 113320, 1024, 4223679,32767)))
        )
  WHERE rownum <= 10
 ----
 +
 [NOTE]
 ================================================================================
 - In this example, there are up to 10 rows imported per mapper.
 - The SPLIT clause may result in greater overhead than the SUBSPLIT
 clause because the UNION statements need to be fully materialized
 before the data can be streamed to the mappers. However, you may
 wish to use SPLIT in the case where you want to limit the total
 number of rows processed by each mapper.
 ================================================================================

 oracle.row.fetch.size
 +++++++++++++++++++++

 The value of this property is an integer specifying the number of rows the
 Oracle JDBC driver should fetch in each network round-trip to the database.
 The default value is 5000.

 If you alter this setting, confirmation of the
 change is displayed in the logs of the mappers during the Map-Reduce job.

 oraoop.import.hint
 ++++++++++++++++++

 The Oracle optimizer hint is added to the SELECT statement for IMPORT jobs
 as follows:

 ----
 SELECT /*+ NO_INDEX(t) */ * FROM employees;
 ----

 The default hint is +NO_INDEX(t)+

 [NOTE]
 ================================================================================
 - The hint can be added to the command line. See "Import Data from Oracle" for
 more information.
 - See the Oracle Database Performance Tuning Guide (Using Optimizer Hints)
 for more information on Oracle optimizer hints.
 - To turn the hint off, insert a space between the <value> elements.
 +
 ----
 <property>
   <name>oraoop.import.hint</name>
   <value> </value>
 </property>
 ----
 ================================================================================

 oraoop.oracle.append.values.hint.usage
 ++++++++++++++++++++++++++++++++++++++

 The value of this property is one of: AUTO / ON / OFF.

 AUTO::
 AUTO is the default value.
 +
 Currently AUTO is equivalent to OFF.

 ON::
 During export the Data Connector for Oracle and Hadoop uses direct path
 writes to populate the target Oracle table, bypassing the buffer cache.
 Oracle only allows a single session to perform direct writes against a specific
 table at any time, so this has the effect of serializing the writes to the
 table. This may reduce throughput, especially if the number of mappers is high.
 However, for databases where DBWR is very busy, or where the IO bandwidth to
 the underlying table is narrow (table resides on a single disk spindle for
 instance), then setting +oraoop.oracle.append.values.hint.usage+ to ON may
 reduce the load on the Oracle database and possibly increase throughput.

 OFF::
 During export the Data Connector for Oracle and Hadoop does not use the
 +APPEND_VALUES+ Oracle hint.

 NOTE: This parameter is only effective on Oracle 11g Release 2 and above.

 mapred.map.tasks.speculative.execution
 ++++++++++++++++++++++++++++++++++++++

 By default speculative execution is disabled for the Data Connector for
 Oracle and Hadoop. This avoids placing redundant load on the Oracle database.

 If Speculative execution is enabled, then Hadoop may initiate multiple mappers
 to read the same blocks of data, increasing the overall load on the database.

 oraoop.block.allocation
 +++++++++++++++++++++++

 This setting determines how Oracle's data-blocks are assigned to Map-Reduce mappers.

 NOTE: Applicable to import. Not applicable to export.

 ROUNDROBIN (default)::
 Each chunk of Oracle blocks is allocated to the mappers in a roundrobin
 manner. This helps prevent one of the mappers from being
 allocated a large proportion of typically small-sized blocks from the
 start of Oracle data-files. In doing so it also helps prevent one of the
 other mappers from being allocated a large proportion of typically
 larger-sized blocks from the end of the Oracle data-files.
 +
 Use this method to help ensure all the mappers are allocated a similar
 amount of work.

 RANDOM::
 The list of Oracle blocks is randomized before being allocated to the
 mappers via a round-robin approach. This has the benefit of increasing
 the chance that, at any given instant in time, each mapper is reading
 from a different Oracle data-file. If the Oracle data-files are located on
 separate spindles, this should increase the overall IO throughput.

 SEQUENTIAL::
 Each chunk of Oracle blocks is allocated to the mappers sequentially.
 This produces the tendency for each mapper to sequentially read a large,
 contiguous proportion of an Oracle data-file. It is unlikely for the
 performance of this method to exceed that of the round-robin method
 and it is more likely to allocate a large difference in the work between
 the mappers.
 +
 Use of this method is generally not recommended.

 oraoop.import.omit.lobs.and.long
 ++++++++++++++++++++++++++++++++

 This setting can be used to omit all LOB columns (BLOB, CLOB and NCLOB) and LONG
 column from an Oracle table being imported. This is advantageous in
 troubleshooting, as it provides a convenient way to exclude all LOB-based data
 from the import.

 oraoop.locations
 ++++++++++++++++

 NOTE: Applicable to import. Not applicable to export.

 By default, four mappers are used for a Sqoop import job. The number of mappers
 can be altered via the Sqoop +--num-mappers+ parameter.

 If the data-nodes in your Hadoop cluster have 4 task-slots (that is they are
 4-CPU core machines) it is likely for all four mappers to execute on the
 same machine. Therefore, IO may be concentrated between the Oracle database
 and a single machine.

 This setting allows you to control which DataNodes in your Hadoop cluster each
 mapper executes on. By assigning each mapper to a separate machine you may
 improve the overall IO performance for the job. This will also have the
 side-effect of the imported data being more diluted across the machines in
 the cluster. (HDFS replication will dilute the data across the cluster anyway.)

 Specify the machine names as a comma separated list. The locations are
 allocated to each of the mappers in a round-robin manner.

 If using EC2, specify the internal name of the machines. Here is an example
 of using this parameter from the Sqoop command-line:

 $ +sqoop import -D
 oraoop.locations=ip-10-250-23-225.ec2.internal,ip-10-250-107-32.ec2.internal,ip-10-250-207-2.ec2.internal,ip-10-250-27-114.ec2.internal
 --direct --connect...+

 sqoop.connection.factories
 ++++++++++++++++++++++++++

 This setting determines behavior if the Data Connector for Oracle and Hadoop
 cannot accept the job. By default Sqoop accepts the jobs that the Data Connector
 for Oracle and Hadoop rejects.

 Set the value to +org.apache.sqoop.manager.oracle.OraOopManagerFactory+ when you
 want the job to fail if the Data Connector for Oracle and Hadoop cannot
 accept the job.

 Expressions in oraoop-site.xml
 ++++++++++++++++++++++++++++++

 Text contained within curly-braces { and } are expressions to be evaluated
 prior to the SQL statement being executed. The expression contains the name
 of the configuration property optionally followed by a default value to use
 if the property has not been set. A pipe | character is used to delimit the
 property name and the default value.

 For example:

 When this Sqoop command is executed::
 $ +sqoop import -D oracle.sessionTimeZone=US/Hawaii --direct --connect+

 The statement within oraoop-site.xml::
 +alter session set time_zone =\'{oracle.sessionTimeZone|GMT}\';+

 Becomes::
 +alter session set time_zone = \'US/Hawaii'+

 If the oracle.sessionTimeZone property had not been set, then this statement would use the specified default value and would become::
 +alter session set time_zone = \'GMT'+

 NOTE: The +oracle.sessionTimeZone+ property can be specified within the
 +sqoop-site.xml+ file if you want this setting to be used all the time.

 Troubleshooting The Data Connector for Oracle and Hadoop
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Quote Oracle Owners And Tables
 ++++++++++++++++++++++++++++++

 [frame="topbot",cols="2*v"]
 |===============================================================================
 |If the owner of the Oracle table needs to be
 quoted, use:
 |$ +sqoop import ... --table
 "\"\"Scott\".customers\""+

 This is the equivalent of:
 "Scott".customers
 |If the Oracle table needs to be quoted, use:
 |$ +sqoop import ... --table
 "\"scott.\"Customers\"\""+

 This is the equivalent of:
 scott."Customers"
 |If both the owner of the Oracle table and the
 table itself needs to be quoted, use:
 |$ +sqoop import ... --table
 "\"\"Scott\".\"Customers\"\""+

 This is the equivalent of:
 "Scott"."Customers"
 |===============================================================================

 [NOTE]
 ================================================================================
 - The HDFS output directory is called something like:
 /user/username/"Scott"."Customers"
 - If a table name contains a $ character, it may need to be escaped within your
 Unix shell. For example, the dr$object table in the ctxsys schema would be
 referred to as: $ +sqoop import ... --table "ctxsys.dr\$object"+
 ================================================================================

 Quote Oracle Columns
 ++++++++++++++++++++

 If a column name of an Oracle table needs to be quoted, use::
 $ +sqoop import ... --table customers --columns "\"\"first name\"\""+
 +
 This is the equivalent of: `select "first name" from customers`

 If multiple columns need to be quoted, use::
 $ +sqoop import ... --table customers --columns "\"\"first name\",\"last
 name\",\"region name\"\""+
 +
 This is the equivalent of: `select "first name", "last name", "region name"
 from customers`

 Confirm The Data Connector for Oracle and Hadoop Can Initialize The Oracle Session
 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 If the Sqoop output includes feedback such as the following then the
 configuration properties contained within +oraoop-site-template.xml+ and
 +oraoop-site.xml+ have been loaded by Hadoop and can be accessed by the Data
 Connector for Oracle and Hadoop.

 +14/07/08 15:21:13 INFO oracle.OracleConnectionFactory:
 Initializing Oracle session with SQL+

 Check The Sqoop Debug Logs for Error Messages
 +++++++++++++++++++++++++++++++++++++++++++++

 For more information about any errors encountered during the Sqoop import,
 refer to the log files generated by each of the (by default 4) mappers that
 performed the import.

 The logs can be obtained via your Map-Reduce Job Tracker's web page.

 Include these log files with any requests you make for assistance on the Sqoop
 User Group web site.

 Export: Check Tables Are Compatible
 +++++++++++++++++++++++++++++++++++

 Check tables particularly in the case of a parsing error.

 - Ensure the fields contained with the HDFS file and the columns within the
 Oracle table are identical. If they are not identical, the Java code
 dynamically generated by Sqoop to parse the HDFS file will throw an error when
 reading the file – causing the export to fail. When creating a table in Oracle
 ensure the definitions for the table template are identical to the definitions
 for the HDFS file.
 - Ensure the data types in the table are supported. See "Supported Data Types"
 for more information.
 - Are date and time zone based data types used? See "Export Date And Timestamp
 Data Types into Oracle" for more information.

 Export: Parallelization
 +++++++++++++++++++++++

 +-D oraoop.export.oracle.parallelization.enabled=false+

 If you see a parallelization error you may decide to disable parallelization
 on Oracle queries.

 Export: Check oraoop.oracle.append.values.hint.usage
 ++++++++++++++++++++++++++++++++++++++++++++++++++++

 The oraoop.oracle.append.values.hint.usage parameter should not be set to ON
 if the Oracle table contains either a BINARY_DOUBLE or BINARY_FLOAT column and
 the HDFS file being exported contains a NULL value in either of these column
 types. Doing so will result in the error: +ORA-12838: cannot read/modify an
 object after modifying it in parallel+.

 Turn On Verbose
 +++++++++++++++

 Turn on verbose on the Sqoop command line.

 +--verbose+

 Check Sqoop stdout (standard output) and the mapper logs for information as to
 where the problem may be.