src/docs/user/hcatalog.txt - sqoop - Git at Google


 ////
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ////

 Sqoop-HCatalog Integration
 --------------------------

 HCatalog Background
 ~~~~~~~~~~~~~~~~~~~

 HCatalog is a table and storage management service for Hadoop that enables
 users with different data processing tools Pig, MapReduce, and Hive
 to more easily read and write data on the grid. HCatalog's table abstraction
 presents users with a relational view of data in the Hadoop distributed
 file system (HDFS) and ensures that users need not worry about where or
 in what format their data is stored: RCFile format, text files, or
 SequenceFiles.

 HCatalog supports reading and writing files in any format for which a Hive
 SerDe (serializer-deserializer) has been written. By default, HCatalog
 supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom
 format, you must provide the InputFormat and OutputFormat as well as the SerDe.

 The ability of HCatalog to abstract various storage formats is used in
 providing the RCFile (and future file types) support to Sqoop.

 Exposing HCatalog Tables to Sqoop
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 HCatalog integration with Sqoop is patterned on an existing feature set that
 supports Avro and Hive tables. Seven new command line options are introduced,
 and some command line options defined for Hive have been reused.

 New Command Line Options
 ^^^^^^^^^^^^^^^^^^^^^^^^

 +--hcatalog-database+::
 Specifies the database name for the HCatalog table. If not specified,
 the default database name +default+ is used. Providing the
 +--hcatalog-database+ option without +--hcatalog-table+ is an error.
 This is not a required option.

 +--hcatalog-table+::
 The argument value for this option is the HCatalog tablename.
 The presence of the +--hcatalog-table+ option signifies that the import
 or export job is done using HCatalog tables, and it is a required option for
 HCatalog jobs.

 +--hcatalog-home+::
 The home directory for the HCatalog installation. The directory is
 expected to have a +lib+ subdirectory and a +share/hcatalog+ subdirectory
 with necessary HCatalog libraries. If not specified, the system property
 +hcatalog.home+ will be checked and failing that, a system environment
 variable +HCAT_HOME+ will be checked.  If none of these are set, the
 default value will be used and currently the default is set to
 +/usr/lib/hcatalog+.
 This is not a required option.

 +--create-hcatalog-table+::

 This option specifies whether an HCatalog table should be created
 automatically when importing data. By default, HCatalog tables are assumed
 to exist. The table name will be the same as the database table name
 translated to lower case. Further described in +Automatic Table Creation+
 below.

 +--drop-and-create-hcatalog-table+::

 Same as +--create-hcatalog-table+, but does a +drop if exists+ before creating
 the table.

 +--hcatalog-storage-stanza+::

 This option specifies the storage stanza to be appended to the table.
 Further described in +Automatic Table Creation+ below.

 +--hcatalog-partition-keys+ and +--hcatalog-partition-values+::

 These two options are used to specify multiple static partition key/value
 pairs.  In the prior releases, +--hive-partition-key+ and
 +--hive-partition-value+ options were used to specify the static partition
 key/value pair, but only one level of static partition keys could be provided.
 The options +--hcatalog-partition-keys+ and +--hcatalog-partition-values+
 allow multiple keys and values to be provided as static partitioning keys.
 Multiple option values are to be separated by ',' (comma).

 For example, if the hive partition keys for the table to export/import from are
 defined with partition key names year, month and date and a specific partition
 with year=1999, month=12, day=31 is the desired partition, then the values
 for the two options will be as follows:

     * +--hcatalog-partition-keys+ year,month,day
     * +--hcatalog-partition-values+ 1999,12,31

 To provide backward compatibility, if +--hcatalog-partition-keys+ or
 +--hcatalog-partition-values+ options are not provided, then
 +--hive-partitition-key+ and +--hive-partition-value+ will be used if provided.

 It is an error to specify only one of +--hcatalog-partition-keys+ or
 +--hcatalog-partition-values+ options. Either both of the options should be
 provided or neither of the options should be provided.

 Supported Sqoop Hive Options
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The following Sqoop options are also used along with the +--hcatalog-table+
 option to provide additional input to the HCatalog jobs. Some of the existing
 Hive import job options are reused with HCatalog jobs instead of creating
 HCatalog-specific options for the same purpose.

 +--map-column-hive+::
 This option maps a database column to HCatalog with a specific HCatalog
 type.

 +--hive-home+::
 The Hive home location.

 +--hive-partition-key+::
 Used for static partitioning filter. The partitioning key should be of
 type STRING. There can be only one static partitioning key.
 Please see the discussion about +--hcatalog-partition-keys+ and
 +--hcatalog-partition-values+ options.

 +--hive-partition-value+::
 The value associated with the partition.
 Please see the discussion about +--hcatalog-partition-keys+ and
 +--hcatalog-partition-values+ options.

 Direct Mode support
 ^^^^^^^^^^^^^^^^^^^

 HCatalog integration in Sqoop has been enhanced to support direct mode
 connectors (which are high performance connectors specific to a database).
 Netezza direct mode connector has been enhanced to take advatange of this
 feature.

 IMPORTANT: Only Netezza direct mode connector is currently enabled to work
 with HCatalog.

 Unsupported Sqoop Options
 ^^^^^^^^^^^^^^^^^^^^^^^^^

 Unsupported Sqoop Hive Import Options
 +++++++++++++++++++++++++++++++++++++

 The following Sqoop Hive import options are not supported with HCatalog jobs.

 * +--hive-import+
 * +--hive-overwrite+

 Unsupported Sqoop Export and Import Options
 +++++++++++++++++++++++++++++++++++++++++++

 The following Sqoop export and import options are not supported with HCatalog jobs.

 * +--export-dir+
 * +--target-dir+
 * +--warehouse-dir+
 * +--append+
 * +--as-sequencefile+
 * +--as-avrodatafile+
 * +--as-parquetfile+

 Ignored Sqoop Options
 ^^^^^^^^^^^^^^^^^^^^^

 The following options are ignored with HCatalog jobs.

 * All input delimiter options are ignored.

 * Output delimiters are generally ignored unless either
 +--hive-drop-import-delims+ or +--hive-delims-replacement+ is used. When the
 +--hive-drop-import-delims+ or +--hive-delims-replacement+ option is
 specified, all +CHAR+ type database table columns will be post-processed
 to either remove or replace the delimiters, respectively. See +Delimited Text
 Formats and Field and Line Delimiter Characters+ below. This is only needed
 if the HCatalog table uses text formats.

 Automatic Table Creation
 ~~~~~~~~~~~~~~~~~~~~~~~~

 One of the key features of Sqoop is to manage and create the table metadata
 when importing into Hadoop. HCatalog import jobs also provide for this
 feature with the option +--create-hcatalog-table+. Furthermore, one of the
 important benefits of the HCatalog integration is to provide storage
 agnosticism to Sqoop data movement jobs. To provide for that feature,
 HCatalog import jobs provide an option that lets a user specifiy the
 storage format for the created table.

 The option +--create-hcatalog-table+ is used as an indicator that a table
 has to be created as part of the HCatalog import job.  If the option
 +--create-hcatalog-table+ is specified and the table exists, then the
 table creation will fail and the job will be aborted.

 The option +--hcatalog-storage-stanza+ can be used to specify the storage
 format of the newly created table. The default value for this option is
 +stored as rcfile+. The value specified for this option is assumed to be a
 valid Hive storage format expression. It will be appended to the +create table+
 command generated by the HCatalog import job as part of automatic table
 creation. Any error in the storage stanza will cause the table creation to
 fail and the import job will be aborted.

 Any additional resources needed to support the storage format referenced in
 the option +--hcatalog-storage-stanza+ should be provided to the job either
 by placing them in +$HIVE_HOME/lib+ or by providing them in +HADOOP_CLASSPATH+
 and +LIBJAR+ files.

 If the option +--hive-partition-key+ is specified, then the value of this
 option is used as the partitioning key for the newly created table. Only
 one partitioning key can be specified with this option.

 Object names are mapped to the lowercase equivalents as specified below
 when mapped to an HCatalog table. This includes the table name (which
 is the same as the external store table name converted to lower case)
 and field names.

 Delimited Text Formats and Field and Line Delimiter Characters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 HCatalog supports delimited text format as one of the table storage formats.
 But when delimited text is used and the imported data has fields that contain
 those delimiters, then the data may be parsed into a different number of
 fields and records by Hive, thereby losing data fidelity.

 For this case, one of these existing Sqoop import options can be used:

 * +--hive-delims-replacement+

 * +--hive-drop-import-delims+

 If either of these options is provided for import, then any column of type
 STRING will be formatted with the Hive delimiter processing and then written
 to the HCatalog table.

 HCatalog Table Requirements
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 The HCatalog table should be created before using it as part of a Sqoop job
 if the default table creation options (with optional storage stanza) are not
 sufficient. All storage formats supported by HCatalog can be used with the
 creation of the HCatalog tables. This makes this feature readily adopt new
 storage formats that come into the Hive project, such as ORC files.

 Support for Partitioning
 ~~~~~~~~~~~~~~~~~~~~~~~~

 The Sqoop HCatalog feature supports the following table types:

 * Unpartitioned tables

 * Partitioned tables with a static partitioning key specified

 * Partitioned tables with dynamic partition keys from the database
 result set

 * Partitioned tables with a combination of a static key and additional
 dynamic partitioning keys

 Schema Mapping
 ~~~~~~~~~~~~~~

 Sqoop currently does not support column name mapping. However, the user
 is allowed to override the type mapping. Type mapping loosely follows
 the Hive type mapping already present in Sqoop except that SQL types
 FLOAT and REAL are mapped to HCatalog type float. In the Sqoop type
 mapping for Hive, these two are mapped to double. Type mapping is primarily
 used for checking the column definition correctness only and can be overridden
 with the --map-column-hive option.

 All types except binary are assignable to a String type.

 Any field of number type (int, shortint, tinyint, bigint and bigdecimal,
 float and double) is assignable to another field of any number type during
 exports and imports. Depending on the precision and scale of the target type
 of assignment, truncations can occur.

 Furthermore, date/time/timestamps are mapped to date/timestamp hive types.
 (the full date/time/timestamp representation).   Date/time/timstamp columns
 can also be mapped to bigint Hive type in which case the value will be
 the number of milliseconds since epoch.

 BLOBs and CLOBs are only supported for imports. The BLOB/CLOB objects when
 imported are stored in a Sqoop-specific format and knowledge of this format
 is needed for processing these objects in a Pig/Hive job or another Map Reduce
 job.

 Database column names are mapped to their lowercase equivalents when mapped
 to the HCatalog fields. Currently, case-sensitive database object names are
 not supported.

 Projection of a set of columns from a table to an HCatalog table or loading
 to a column projection is allowed, subject to table constraints. The dynamic
 partitioning columns, if any, must be part of the projection when importing
 data into HCatalog tables.

 Dynamic partitioning fields should be mapped to database columns that are
 defined with the NOT NULL attribute (although this is not enforced during
 schema mapping). A null value during import for a dynamic partitioning
 column will abort the Sqoop job.

 Support for HCatalog Data Types
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 All the primitive Hive types that are part of Hive 0.13 version are supported.
 Currently all the complex HCatalog types are not supported.

 BLOB/CLOB database types are only supported for imports.

 Providing Hive and HCatalog Libraries for the Sqoop Job
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 With the support for HCatalog added to Sqoop, any HCatalog job depends on a
 set of jar files being available both on the Sqoop client host and where the
 Map/Reduce tasks run. To run HCatalog jobs, the environment variable
 +HADOOP_CLASSPATH+ must be set up as shown below before launching the Sqoop
 HCatalog jobs.

 +HADOOP_CLASSPATH=$(hcat -classpath)+
 +export HADOOP_CLASSPATH+


 The necessary HCatalog dependencies will be copied to the distributed cache
 automatically by the Sqoop job.

 Examples
 ~~~~~~~~

 Create an HCatalog table, such as:

 +hcat -e "create table txn(txn_date string, cust_id string, amount float,
 store_id int) partitioned by (cust_id string) stored as rcfile;"+


 Then Sqoop import and export of the "txn" HCatalog table can be invoked as
 follows:

 Import
 ~~~~~~

 +$SQOOP_HOME/bin/sqoop import --connect <jdbc-url> -table <table-name> --hcatalog-table txn <other sqoop options>+

 Export
 ~~~~~~

 +$SQOOP_HOME/bin/sqoop export --connect <jdbc-url> -table <table-name> --hcatalog-table txn <other sqoop options>+

	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	////

	Sqoop-HCatalog Integration
	--------------------------

	HCatalog Background
	~~~~~~~~~~~~~~~~~~~

	HCatalog is a table and storage management service for Hadoop that enables
	users with different data processing tools Pig, MapReduce, and Hive
	to more easily read and write data on the grid. HCatalog's table abstraction
	presents users with a relational view of data in the Hadoop distributed
	file system (HDFS) and ensures that users need not worry about where or
	in what format their data is stored: RCFile format, text files, or
	SequenceFiles.

	HCatalog supports reading and writing files in any format for which a Hive
	SerDe (serializer-deserializer) has been written. By default, HCatalog
	supports RCFile, CSV, JSON, and SequenceFile formats. To use a custom
	format, you must provide the InputFormat and OutputFormat as well as the SerDe.

	The ability of HCatalog to abstract various storage formats is used in
	providing the RCFile (and future file types) support to Sqoop.

	Exposing HCatalog Tables to Sqoop
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	HCatalog integration with Sqoop is patterned on an existing feature set that
	supports Avro and Hive tables. Seven new command line options are introduced,
	and some command line options defined for Hive have been reused.

	New Command Line Options
	^^^^^^^^^^^^^^^^^^^^^^^^

	+--hcatalog-database+::
	Specifies the database name for the HCatalog table. If not specified,
	the default database name +default+ is used. Providing the
	+--hcatalog-database+ option without +--hcatalog-table+ is an error.
	This is not a required option.

	+--hcatalog-table+::
	The argument value for this option is the HCatalog tablename.
	The presence of the +--hcatalog-table+ option signifies that the import
	or export job is done using HCatalog tables, and it is a required option for
	HCatalog jobs.

	+--hcatalog-home+::
	The home directory for the HCatalog installation. The directory is
	expected to have a +lib+ subdirectory and a +share/hcatalog+ subdirectory
	with necessary HCatalog libraries. If not specified, the system property
	+hcatalog.home+ will be checked and failing that, a system environment
	variable +HCAT_HOME+ will be checked. If none of these are set, the
	default value will be used and currently the default is set to
	+/usr/lib/hcatalog+.
	This is not a required option.

	+--create-hcatalog-table+::

	This option specifies whether an HCatalog table should be created
	automatically when importing data. By default, HCatalog tables are assumed
	to exist. The table name will be the same as the database table name
	translated to lower case. Further described in +Automatic Table Creation+
	below.

	+--drop-and-create-hcatalog-table+::

	Same as +--create-hcatalog-table+, but does a +drop if exists+ before creating
	the table.

	+--hcatalog-storage-stanza+::

	This option specifies the storage stanza to be appended to the table.
	Further described in +Automatic Table Creation+ below.

	+--hcatalog-partition-keys+ and +--hcatalog-partition-values+::

	These two options are used to specify multiple static partition key/value
	pairs. In the prior releases, +--hive-partition-key+ and
	+--hive-partition-value+ options were used to specify the static partition
	key/value pair, but only one level of static partition keys could be provided.
	The options +--hcatalog-partition-keys+ and +--hcatalog-partition-values+
	allow multiple keys and values to be provided as static partitioning keys.
	Multiple option values are to be separated by ',' (comma).

	For example, if the hive partition keys for the table to export/import from are
	defined with partition key names year, month and date and a specific partition
	with year=1999, month=12, day=31 is the desired partition, then the values
	for the two options will be as follows:

	* +--hcatalog-partition-keys+ year,month,day
	* +--hcatalog-partition-values+ 1999,12,31

	To provide backward compatibility, if +--hcatalog-partition-keys+ or
	+--hcatalog-partition-values+ options are not provided, then
	+--hive-partitition-key+ and +--hive-partition-value+ will be used if provided.

	It is an error to specify only one of +--hcatalog-partition-keys+ or
	+--hcatalog-partition-values+ options. Either both of the options should be
	provided or neither of the options should be provided.

	Supported Sqoop Hive Options
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	The following Sqoop options are also used along with the +--hcatalog-table+
	option to provide additional input to the HCatalog jobs. Some of the existing
	Hive import job options are reused with HCatalog jobs instead of creating
	HCatalog-specific options for the same purpose.

	+--map-column-hive+::
	This option maps a database column to HCatalog with a specific HCatalog
	type.

	+--hive-home+::
	The Hive home location.

	+--hive-partition-key+::
	Used for static partitioning filter. The partitioning key should be of
	type STRING. There can be only one static partitioning key.
	Please see the discussion about +--hcatalog-partition-keys+ and
	+--hcatalog-partition-values+ options.

	+--hive-partition-value+::
	The value associated with the partition.
	Please see the discussion about +--hcatalog-partition-keys+ and
	+--hcatalog-partition-values+ options.

	Direct Mode support
	^^^^^^^^^^^^^^^^^^^

	HCatalog integration in Sqoop has been enhanced to support direct mode
	connectors (which are high performance connectors specific to a database).
	Netezza direct mode connector has been enhanced to take advatange of this
	feature.

	IMPORTANT: Only Netezza direct mode connector is currently enabled to work
	with HCatalog.

	Unsupported Sqoop Options
	^^^^^^^^^^^^^^^^^^^^^^^^^

	Unsupported Sqoop Hive Import Options
	+++++++++++++++++++++++++++++++++++++

	The following Sqoop Hive import options are not supported with HCatalog jobs.

	* +--hive-import+
	* +--hive-overwrite+

	Unsupported Sqoop Export and Import Options
	+++++++++++++++++++++++++++++++++++++++++++

	The following Sqoop export and import options are not supported with HCatalog jobs.

	* +--export-dir+
	* +--target-dir+
	* +--warehouse-dir+
	* +--append+
	* +--as-sequencefile+
	* +--as-avrodatafile+
	* +--as-parquetfile+

	Ignored Sqoop Options
	^^^^^^^^^^^^^^^^^^^^^

	The following options are ignored with HCatalog jobs.

	* All input delimiter options are ignored.

	* Output delimiters are generally ignored unless either
	+--hive-drop-import-delims+ or +--hive-delims-replacement+ is used. When the
	+--hive-drop-import-delims+ or +--hive-delims-replacement+ option is
	specified, all +CHAR+ type database table columns will be post-processed
	to either remove or replace the delimiters, respectively. See +Delimited Text
	Formats and Field and Line Delimiter Characters+ below. This is only needed
	if the HCatalog table uses text formats.

	Automatic Table Creation
	~~~~~~~~~~~~~~~~~~~~~~~~

	One of the key features of Sqoop is to manage and create the table metadata
	when importing into Hadoop. HCatalog import jobs also provide for this
	feature with the option +--create-hcatalog-table+. Furthermore, one of the
	important benefits of the HCatalog integration is to provide storage
	agnosticism to Sqoop data movement jobs. To provide for that feature,
	HCatalog import jobs provide an option that lets a user specifiy the
	storage format for the created table.

	The option +--create-hcatalog-table+ is used as an indicator that a table
	has to be created as part of the HCatalog import job. If the option
	+--create-hcatalog-table+ is specified and the table exists, then the
	table creation will fail and the job will be aborted.

	The option +--hcatalog-storage-stanza+ can be used to specify the storage
	format of the newly created table. The default value for this option is
	+stored as rcfile+. The value specified for this option is assumed to be a
	valid Hive storage format expression. It will be appended to the +create table+
	command generated by the HCatalog import job as part of automatic table
	creation. Any error in the storage stanza will cause the table creation to
	fail and the import job will be aborted.

	Any additional resources needed to support the storage format referenced in
	the option +--hcatalog-storage-stanza+ should be provided to the job either
	by placing them in +$HIVE_HOME/lib+ or by providing them in +HADOOP_CLASSPATH+
	and +LIBJAR+ files.

	If the option +--hive-partition-key+ is specified, then the value of this
	option is used as the partitioning key for the newly created table. Only
	one partitioning key can be specified with this option.

	Object names are mapped to the lowercase equivalents as specified below
	when mapped to an HCatalog table. This includes the table name (which
	is the same as the external store table name converted to lower case)
	and field names.

	Delimited Text Formats and Field and Line Delimiter Characters
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	HCatalog supports delimited text format as one of the table storage formats.
	But when delimited text is used and the imported data has fields that contain
	those delimiters, then the data may be parsed into a different number of
	fields and records by Hive, thereby losing data fidelity.

	For this case, one of these existing Sqoop import options can be used:

	* +--hive-delims-replacement+

	* +--hive-drop-import-delims+

	If either of these options is provided for import, then any column of type
	STRING will be formatted with the Hive delimiter processing and then written
	to the HCatalog table.

	HCatalog Table Requirements
	~~~~~~~~~~~~~~~~~~~~~~~~~~~

	The HCatalog table should be created before using it as part of a Sqoop job
	if the default table creation options (with optional storage stanza) are not
	sufficient. All storage formats supported by HCatalog can be used with the
	creation of the HCatalog tables. This makes this feature readily adopt new
	storage formats that come into the Hive project, such as ORC files.

	Support for Partitioning
	~~~~~~~~~~~~~~~~~~~~~~~~

	The Sqoop HCatalog feature supports the following table types:

	* Unpartitioned tables

	* Partitioned tables with a static partitioning key specified

	* Partitioned tables with dynamic partition keys from the database
	result set

	* Partitioned tables with a combination of a static key and additional
	dynamic partitioning keys

	Schema Mapping
	~~~~~~~~~~~~~~

	Sqoop currently does not support column name mapping. However, the user
	is allowed to override the type mapping. Type mapping loosely follows
	the Hive type mapping already present in Sqoop except that SQL types
	FLOAT and REAL are mapped to HCatalog type float. In the Sqoop type
	mapping for Hive, these two are mapped to double. Type mapping is primarily
	used for checking the column definition correctness only and can be overridden
	with the --map-column-hive option.

	All types except binary are assignable to a String type.

	Any field of number type (int, shortint, tinyint, bigint and bigdecimal,
	float and double) is assignable to another field of any number type during
	exports and imports. Depending on the precision and scale of the target type
	of assignment, truncations can occur.

	Furthermore, date/time/timestamps are mapped to date/timestamp hive types.
	(the full date/time/timestamp representation). Date/time/timstamp columns
	can also be mapped to bigint Hive type in which case the value will be
	the number of milliseconds since epoch.

	BLOBs and CLOBs are only supported for imports. The BLOB/CLOB objects when
	imported are stored in a Sqoop-specific format and knowledge of this format
	is needed for processing these objects in a Pig/Hive job or another Map Reduce
	job.

	Database column names are mapped to their lowercase equivalents when mapped
	to the HCatalog fields. Currently, case-sensitive database object names are
	not supported.

	Projection of a set of columns from a table to an HCatalog table or loading
	to a column projection is allowed, subject to table constraints. The dynamic
	partitioning columns, if any, must be part of the projection when importing
	data into HCatalog tables.

	Dynamic partitioning fields should be mapped to database columns that are
	defined with the NOT NULL attribute (although this is not enforced during
	schema mapping). A null value during import for a dynamic partitioning
	column will abort the Sqoop job.

	Support for HCatalog Data Types
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	All the primitive Hive types that are part of Hive 0.13 version are supported.
	Currently all the complex HCatalog types are not supported.

	BLOB/CLOB database types are only supported for imports.

	Providing Hive and HCatalog Libraries for the Sqoop Job
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	With the support for HCatalog added to Sqoop, any HCatalog job depends on a
	set of jar files being available both on the Sqoop client host and where the
	Map/Reduce tasks run. To run HCatalog jobs, the environment variable
	+HADOOP_CLASSPATH+ must be set up as shown below before launching the Sqoop
	HCatalog jobs.

	+HADOOP_CLASSPATH=$(hcat -classpath)+
	+export HADOOP_CLASSPATH+


	The necessary HCatalog dependencies will be copied to the distributed cache
	automatically by the Sqoop job.

	Examples
	~~~~~~~~

	Create an HCatalog table, such as:

	+hcat -e "create table txn(txn_date string, cust_id string, amount float,
	store_id int) partitioned by (cust_id string) stored as rcfile;"+


	Then Sqoop import and export of the "txn" HCatalog table can be invoked as
	follows:

	Import
	~~~~~~

	+$SQOOP_HOME/bin/sqoop import --connect <jdbc-url> -table <table-name> --hcatalog-table txn <other sqoop options>+

	Export
	~~~~~~

	+$SQOOP_HOME/bin/sqoop export --connect <jdbc-url> -table <table-name> --hcatalog-table txn <other sqoop options>+