src/docs/user/saved-jobs.txt - sqoop - Git at Google


 ////
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ////

 Saved Jobs
 ----------

 Imports and exports can be repeatedly performed by issuing the same command
 multiple times. Especially when using the incremental import capability,
 this is an expected scenario.

 Sqoop allows you to define _saved jobs_ which make this process easier. A
 saved job records the configuration information required to execute a
 Sqoop command at a later time. The section on the +sqoop-job+ tool
 describes how to create and work with saved jobs.

 By default, job descriptions are saved to a private repository stored
 in +$HOME/.sqoop/+. You can configure Sqoop to instead use a shared
 _metastore_, which makes saved jobs available to multiple users across a
 shared cluster. Starting the metastore is covered by the section on the
 +sqoop-metastore+ tool.


 +sqoop-job+
 -----------

 Purpose
 ~~~~~~~

 include::job-purpose.txt[]

 Syntax
 ~~~~~~

 ----
 $ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
 $ sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
 ----

 Although the Hadoop generic arguments must preceed any job arguments,
 the job arguments can be entered in any order with respect to one
 another.

 .Job management options:
 [grid="all"]
 `---------------------------`------------------------------------------
 Argument                    Description
 -----------------------------------------------------------------------
 +\--create <job-id>+        Define a new saved job with the specified \
                             job-id (name). A second Sqoop \
                             command-line, separated by a +\--+ should \
                             be specified; this defines the saved job.
 +\--delete <job-id>+        Delete a saved job.
 +\--exec <job-id>+          Given a job defined with +\--create+, run \
                             the saved job.
 +\--show <job-id>+          Show the parameters for a saved job.
 +\--list+                   List all saved jobs
 -----------------------------------------------------------------------

 Creating saved jobs is done with the +\--create+ action. This operation
 requires a +\--+ followed by a tool name and its arguments. The tool and
 its arguments will form the basis of the saved job. Consider:

 ----
 $ sqoop job --create myjob -- import --connect jdbc:mysql://example.com/db \
     --table mytable
 ----

 This creates a job named +myjob+ which can be executed later. The job is not
 run. This job is now available in the list of saved jobs:

 ----
 $ sqoop job --list
 Available jobs:
   myjob
 ----

 We can inspect the configuration of a job with the +show+ action. As you can see in the below example even if the password is stored in the metastore the +show+ action will redact its value in the output:

 ----
  $ sqoop job --show myjob
  Job: myjob
  Tool: import
  Options:
  ----------------------------
  direct.import = false
  codegen.input.delimiters.record = 0
  hdfs.append.dir = false
  db.table = mytable
  db.password = ********
  ...
 ----

 And if we are satisfied with it, we can run the job with +exec+:

 ----
 $ sqoop job --exec myjob
 10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation
 ...
 ----

 The +exec+ action allows you to override arguments of the saved job
 by supplying them after a +\--+. For example, if the database were
 changed to require a username, we could specify the username and
 password with:

 ----
 $ sqoop job --exec myjob -- --username someuser -P
 Enter password:
 ...
 ----

 .Metastore connection options:
 [grid="all"]
 `----------------------------`-----------------------------------------
 Argument                     Description
 -----------------------------------------------------------------------
 +\--meta-connect <jdbc-uri>+ Specifies the JDBC connect string used \
                              to connect to the metastore
 +\--meta-username <username>+    Specifies the username for the metastore database
 +\--meta-password <password>+    Specifies the password for the metastore database
 -----------------------------------------------------------------------

 By default, a private metastore is instantiated in +$HOME/.sqoop+. If
 you have configured a hosted metastore with the +sqoop-metastore+
 tool, you can connect to it by specifying the +\--meta-connect+
 argument. This is a JDBC connect string just like the ones used to
 connect to databases for import.

 In +conf/sqoop-site.xml+, you can configure
 +sqoop.metastore.client.autoconnect.url+ with this address, so you do not have
 to supply +\--meta-connect+ to use a remote metastore. This parameter can
 also be modified to move the private metastore to a location on your
 filesystem other than your home directory.

 If you configure +sqoop.metastore.client.enable.autoconnect+ with the
 value +false+, then you must explicitly supply +\--meta-connect+.

 If the +--meta-connect+ option is present, then Sqoop will try to connect to the
 +metastore+ database specified in this parameter value. It will use the username
 and password specified in the +--meta-username+ and +--meta-password+ parameters.
 If they are not present Sqoop will use empty username/password. If the database
 in the connection string is not supported then Sqoop will throw an exception.

 If the +--meta-connect+ parameter is not preset and the +sqoop.metastore.client.enable.autoconnect+
 configuration parameter is false (default value is true) then Sqoop will throw an error since
 there are no applicable +metastore+ implementations.

 Job data can be stored in MySql, PostgreSql, DB2, SqlServer, and Oracle with
 the +\--meta-connect+ argument. The +\--meta-username+ and +\--meta-password+ arguments are necessary
 if the database containing the saved jobs requires a username and password.
 In case of using any of these implementations, you have to ensure that the
 database is online and accessible when Sqoop tries to access them.

 Examples
 ~~~~~~~~

 Listing available jobs in the metastore:
 ----
 sqoop job --list --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
 --meta-username ms_user --meta-password ms_password
 ----

 Creating a new job in the metastore:
 ----
 sqoop job --create myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
 --meta-username ms_user --meta-password ms_password -- import
 --connect jdbc:mysql://mysqlhost:3306/sqoop --username sqoop --password sqoop --table "TestTable" -m 1
 ----

 Executing an existing job:
 ----
 sqoop job --exec myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
 --meta-username ms_user --meta-password ms_password
 ----

 Showing the definition of an existing job:
 ----
 sqoop job --show myjob2 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
 --meta-username ms_user --meta-password ms_password
 ----

 Deleting an existing job:
 ----
 sqoop job --delete myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
 --meta-username ms_user --meta-password ms_password
 ----

 Using a Hsqldb:
 ----
 $ sqoop job --exec myjob --meta-connect jdbc:hsqldb:hsql://localhost:3000/ --meta-username *username* --meta-password *password*

 ----

 .Common options:
 [grid="all"]
 `---------------------------`------------------------------------------
 Argument                    Description
 -----------------------------------------------------------------------
 +\--help+                   Print usage instructions
 +\--verbose+                Print more information while working
 -----------------------------------------------------------------------

 Saved jobs and passwords
 ~~~~~~~~~~~~~~~~~~~~~~~~

 The Sqoop metastore is not a secure resource. Multiple users can access
 its contents. For this reason, Sqoop does not store passwords in the
 metastore. If you create a job that requires a password, you will be
 prompted for that password each time you execute the job.

 You can enable passwords in the metastore by setting
 +sqoop.metastore.client.record.password+ to +true+ in the configuration.

 Note that you have to set +sqoop.metastore.client.record.password+ to +true+
 if you are executing saved jobs via Oozie because Sqoop cannot prompt the user
 to enter passwords while being executed as Oozie tasks.

 Saved jobs and incremental imports
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Incremental imports are performed by comparing the values in a _check column_
 against a reference value for the most recent import. For example, if the
 +\--incremental append+ argument was specified, along with +\--check-column
 id+ and +\--last-value 100+, all rows with +id > 100+ will be imported.
 If an incremental import is run from the command line, the value which
 should be specified as +\--last-value+ in a subsequent incremental import
 will be printed to the screen for your reference. If an incremental import is
 run from a saved job, this value will be retained in the saved job. Subsequent
 runs of +sqoop job \--exec someIncrementalJob+ will continue to import only
 newer rows than those previously imported.


 +sqoop-metastore+
 -----------------

 Purpose
 ~~~~~~~

 include::metastore-purpose.txt[]

 Syntax
 ~~~~~~

 ----
 $ sqoop metastore (generic-args) (metastore-args)
 $ sqoop-metastore (generic-args) (metastore-args)
 ----

 Although the Hadoop generic arguments must preceed any metastore arguments,
 the metastore arguments can be entered in any order with respect to one
 another.

 .Metastore management options:
 [grid="all"]
 `---------------------------`------------------------------------------
 Argument                    Description
 -----------------------------------------------------------------------
 +\--shutdown+               Shuts down a running metastore instance \
                             on the same machine.
 -----------------------------------------------------------------------

 Running +sqoop-metastore+ launches a shared HSQLDB database instance on
 the current machine. Clients can connect to this metastore and create jobs
 which can be shared between users for execution.

 The location of the metastore's files on disk is controlled by the
 +sqoop.metastore.server.location+ property in +conf/sqoop-site.xml+.
 This should point to a directory on the local filesystem.

 The metastore is available over TCP/IP. The port is controlled by the
 +sqoop.metastore.server.port+ configuration parameter, and defaults to 16000.

 Clients should connect to the metastore by specifying
 +sqoop.metastore.client.autoconnect.url+ or +\--meta-connect+ with a
 JDBC-URI string. For example,
 +jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop+.

 Alternatively, one can start an RDBMS to host the metastore and pass the
 connection parameters to Sqoop. This metastore may be hosted on a machine
 within the Hadoop cluster, or elsewhere in the network. Sqoop supports the
 following database implementations: MySql, Oracle, Postgresql, MSSql and DB2.

 +sqoop-merge+
 -------------

 Purpose
 ~~~~~~~

 include::merge-purpose.txt[]

 Syntax
 ~~~~~~

 ----
 $ sqoop merge (generic-args) (merge-args)
 $ sqoop-merge (generic-args) (merge-args)
 ----

 Although the Hadoop generic arguments must preceed any merge arguments,
 the job arguments can be entered in any order with respect to one
 another.

 .Merge options:
 [grid="all"]
 `---------------------------`------------------------------------------
 Argument                    Description
 -----------------------------------------------------------------------
 +\--class-name <class>+     Specify the name of the record-specific \
                             class to use during the merge job.
 +\--jar-file <file>+        Specify the name of the jar to load the \
                             record class from.
 +\--merge-key <col>+        Specify the name of a column to use as \
                             the merge key.
 +\--new-data <path>+        Specify the path of the newer dataset.
 +\--onto <path>+            Specify the path of the older dataset.
 +\--target-dir <path>+      Specify the target path for the output \
                             of the merge job.
 -----------------------------------------------------------------------

 The +merge+ tool runs a MapReduce job that takes two directories as
 input: a newer dataset, and an older one. These are specified with
 +\--new-data+ and +\--onto+ respectively. The output of the MapReduce
 job will be placed in the directory in HDFS specified by +\--target-dir+.

 When merging the datasets, it is assumed that there is a unique primary
 key value in each record. The column for the primary key is specified
 with +\--merge-key+. Multiple rows in the same dataset should not
 have the same primary key, or else data loss may occur.

 To parse the dataset and extract the key column, the auto-generated
 class from a previous import must be used. You should specify the
 class name and jar file with +\--class-name+ and +\--jar-file+. If
 this is not availab,e you can recreate the class using the +codegen+
 tool.

 The merge tool is typically run after an incremental import with the
 date-last-modified mode (+sqoop import --incremental lastmodified ...+).

 Supposing two incremental imports were performed, where some older data
 is in an HDFS directory named +older+ and newer data is in an HDFS
 directory named +newer+, these could be merged like so:

 ----
 $ sqoop merge --new-data newer --onto older --target-dir merged \
     --jar-file datatypes.jar --class-name Foo --merge-key id
 ----

 This would run a MapReduce job where the value in the +id+ column
 of each row is used to join rows; rows in the +newer+ dataset will
 be used in preference to rows in the +older+ dataset.

 This can be used with both SequenceFile-, Avro- and text-based
 incremental imports. The file types of the newer and older datasets
 must be the same.

	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	////

	Saved Jobs
	----------

	Imports and exports can be repeatedly performed by issuing the same command
	multiple times. Especially when using the incremental import capability,
	this is an expected scenario.

	Sqoop allows you to define _saved jobs_ which make this process easier. A
	saved job records the configuration information required to execute a
	Sqoop command at a later time. The section on the +sqoop-job+ tool
	describes how to create and work with saved jobs.

	By default, job descriptions are saved to a private repository stored
	in +$HOME/.sqoop/+. You can configure Sqoop to instead use a shared
	_metastore_, which makes saved jobs available to multiple users across a
	shared cluster. Starting the metastore is covered by the section on the
	+sqoop-metastore+ tool.


	+sqoop-job+
	-----------

	Purpose
	~~~~~~~

	include::job-purpose.txt[]

	Syntax
	~~~~~~

	----
	$ sqoop job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
	$ sqoop-job (generic-args) (job-args) [-- [subtool-name] (subtool-args)]
	----

	Although the Hadoop generic arguments must preceed any job arguments,
	the job arguments can be entered in any order with respect to one
	another.

	.Job management options:
	[grid="all"]
	`---------------------------`------------------------------------------
	Argument Description
	-----------------------------------------------------------------------
	+\--create <job-id>+ Define a new saved job with the specified \
	job-id (name). A second Sqoop \
	command-line, separated by a +\--+ should \
	be specified; this defines the saved job.
	+\--delete <job-id>+ Delete a saved job.
	+\--exec <job-id>+ Given a job defined with +\--create+, run \
	the saved job.
	+\--show <job-id>+ Show the parameters for a saved job.
	+\--list+ List all saved jobs
	-----------------------------------------------------------------------

	Creating saved jobs is done with the +\--create+ action. This operation
	requires a +\--+ followed by a tool name and its arguments. The tool and
	its arguments will form the basis of the saved job. Consider:

	----
	$ sqoop job --create myjob -- import --connect jdbc:mysql://example.com/db \
	--table mytable
	----

	This creates a job named +myjob+ which can be executed later. The job is not
	run. This job is now available in the list of saved jobs:

	----
	$ sqoop job --list
	Available jobs:
	myjob
	----

	We can inspect the configuration of a job with the +show+ action. As you can see in the below example even if the password is stored in the metastore the +show+ action will redact its value in the output:

	----
	$ sqoop job --show myjob
	Job: myjob
	Tool: import
	Options:
	----------------------------
	direct.import = false
	codegen.input.delimiters.record = 0
	hdfs.append.dir = false
	db.table = mytable
	db.password = ********
	...
	----

	And if we are satisfied with it, we can run the job with +exec+:

	----
	$ sqoop job --exec myjob
	10/08/19 13:08:45 INFO tool.CodeGenTool: Beginning code generation
	...
	----

	The +exec+ action allows you to override arguments of the saved job
	by supplying them after a +\--+. For example, if the database were
	changed to require a username, we could specify the username and
	password with:

	----
	$ sqoop job --exec myjob -- --username someuser -P
	Enter password:
	...
	----

	.Metastore connection options:
	[grid="all"]
	`----------------------------`-----------------------------------------
	Argument Description
	-----------------------------------------------------------------------
	+\--meta-connect <jdbc-uri>+ Specifies the JDBC connect string used \
	to connect to the metastore
	+\--meta-username <username>+ Specifies the username for the metastore database
	+\--meta-password <password>+ Specifies the password for the metastore database
	-----------------------------------------------------------------------

	By default, a private metastore is instantiated in +$HOME/.sqoop+. If
	you have configured a hosted metastore with the +sqoop-metastore+
	tool, you can connect to it by specifying the +\--meta-connect+
	argument. This is a JDBC connect string just like the ones used to
	connect to databases for import.

	In +conf/sqoop-site.xml+, you can configure
	+sqoop.metastore.client.autoconnect.url+ with this address, so you do not have
	to supply +\--meta-connect+ to use a remote metastore. This parameter can
	also be modified to move the private metastore to a location on your
	filesystem other than your home directory.

	If you configure +sqoop.metastore.client.enable.autoconnect+ with the
	value +false+, then you must explicitly supply +\--meta-connect+.

	If the +--meta-connect+ option is present, then Sqoop will try to connect to the
	+metastore+ database specified in this parameter value. It will use the username
	and password specified in the +--meta-username+ and +--meta-password+ parameters.
	If they are not present Sqoop will use empty username/password. If the database
	in the connection string is not supported then Sqoop will throw an exception.

	If the +--meta-connect+ parameter is not preset and the +sqoop.metastore.client.enable.autoconnect+
	configuration parameter is false (default value is true) then Sqoop will throw an error since
	there are no applicable +metastore+ implementations.

	Job data can be stored in MySql, PostgreSql, DB2, SqlServer, and Oracle with
	the +\--meta-connect+ argument. The +\--meta-username+ and +\--meta-password+ arguments are necessary
	if the database containing the saved jobs requires a username and password.
	In case of using any of these implementations, you have to ensure that the
	database is online and accessible when Sqoop tries to access them.

	Examples
	~~~~~~~~

	Listing available jobs in the metastore:
	----
	sqoop job --list --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
	--meta-username ms_user --meta-password ms_password
	----

	Creating a new job in the metastore:
	----
	sqoop job --create myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
	--meta-username ms_user --meta-password ms_password -- import
	--connect jdbc:mysql://mysqlhost:3306/sqoop --username sqoop --password sqoop --table "TestTable" -m 1
	----

	Executing an existing job:
	----
	sqoop job --exec myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
	--meta-username ms_user --meta-password ms_password
	----

	Showing the definition of an existing job:
	----
	sqoop job --show myjob2 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
	--meta-username ms_user --meta-password ms_password
	----

	Deleting an existing job:
	----
	sqoop job --delete myjob1 --meta-connect jdbc:oracle:thin:@//myhost:1521/ORCLCDB
	--meta-username ms_user --meta-password ms_password
	----

	Using a Hsqldb:
	----
	$ sqoop job --exec myjob --meta-connect jdbc:hsqldb:hsql://localhost:3000/ --meta-username username --meta-password password

	----

	.Common options:
	[grid="all"]
	`---------------------------`------------------------------------------
	Argument Description
	-----------------------------------------------------------------------
	+\--help+ Print usage instructions
	+\--verbose+ Print more information while working
	-----------------------------------------------------------------------

	Saved jobs and passwords
	~~~~~~~~~~~~~~~~~~~~~~~~

	The Sqoop metastore is not a secure resource. Multiple users can access
	its contents. For this reason, Sqoop does not store passwords in the
	metastore. If you create a job that requires a password, you will be
	prompted for that password each time you execute the job.

	You can enable passwords in the metastore by setting
	+sqoop.metastore.client.record.password+ to +true+ in the configuration.

	Note that you have to set +sqoop.metastore.client.record.password+ to +true+
	if you are executing saved jobs via Oozie because Sqoop cannot prompt the user
	to enter passwords while being executed as Oozie tasks.

	Saved jobs and incremental imports
	~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

	Incremental imports are performed by comparing the values in a _check column_
	against a reference value for the most recent import. For example, if the
	+\--incremental append+ argument was specified, along with +\--check-column
	id+ and +\--last-value 100+, all rows with +id > 100+ will be imported.
	If an incremental import is run from the command line, the value which
	should be specified as +\--last-value+ in a subsequent incremental import
	will be printed to the screen for your reference. If an incremental import is
	run from a saved job, this value will be retained in the saved job. Subsequent
	runs of +sqoop job \--exec someIncrementalJob+ will continue to import only
	newer rows than those previously imported.


	+sqoop-metastore+
	-----------------

	Purpose
	~~~~~~~

	include::metastore-purpose.txt[]

	Syntax
	~~~~~~

	----
	$ sqoop metastore (generic-args) (metastore-args)
	$ sqoop-metastore (generic-args) (metastore-args)
	----

	Although the Hadoop generic arguments must preceed any metastore arguments,
	the metastore arguments can be entered in any order with respect to one
	another.

	.Metastore management options:
	[grid="all"]
	`---------------------------`------------------------------------------
	Argument Description
	-----------------------------------------------------------------------
	+\--shutdown+ Shuts down a running metastore instance \
	on the same machine.
	-----------------------------------------------------------------------

	Running +sqoop-metastore+ launches a shared HSQLDB database instance on
	the current machine. Clients can connect to this metastore and create jobs
	which can be shared between users for execution.

	The location of the metastore's files on disk is controlled by the
	+sqoop.metastore.server.location+ property in +conf/sqoop-site.xml+.
	This should point to a directory on the local filesystem.

	The metastore is available over TCP/IP. The port is controlled by the
	+sqoop.metastore.server.port+ configuration parameter, and defaults to 16000.

	Clients should connect to the metastore by specifying
	+sqoop.metastore.client.autoconnect.url+ or +\--meta-connect+ with a
	JDBC-URI string. For example,
	+jdbc:hsqldb:hsql://metaserver.example.com:16000/sqoop+.

	Alternatively, one can start an RDBMS to host the metastore and pass the
	connection parameters to Sqoop. This metastore may be hosted on a machine
	within the Hadoop cluster, or elsewhere in the network. Sqoop supports the
	following database implementations: MySql, Oracle, Postgresql, MSSql and DB2.

	+sqoop-merge+
	-------------

	Purpose
	~~~~~~~

	include::merge-purpose.txt[]

	Syntax
	~~~~~~

	----
	$ sqoop merge (generic-args) (merge-args)
	$ sqoop-merge (generic-args) (merge-args)
	----

	Although the Hadoop generic arguments must preceed any merge arguments,
	the job arguments can be entered in any order with respect to one
	another.

	.Merge options:
	[grid="all"]
	`---------------------------`------------------------------------------
	Argument Description
	-----------------------------------------------------------------------
	+\--class-name <class>+ Specify the name of the record-specific \
	class to use during the merge job.
	+\--jar-file <file>+ Specify the name of the jar to load the \
	record class from.
	+\--merge-key <col>+ Specify the name of a column to use as \
	the merge key.
	+\--new-data <path>+ Specify the path of the newer dataset.
	+\--onto <path>+ Specify the path of the older dataset.
	+\--target-dir <path>+ Specify the target path for the output \
	of the merge job.
	-----------------------------------------------------------------------

	The +merge+ tool runs a MapReduce job that takes two directories as
	input: a newer dataset, and an older one. These are specified with
	+\--new-data+ and +\--onto+ respectively. The output of the MapReduce
	job will be placed in the directory in HDFS specified by +\--target-dir+.

	When merging the datasets, it is assumed that there is a unique primary
	key value in each record. The column for the primary key is specified
	with +\--merge-key+. Multiple rows in the same dataset should not
	have the same primary key, or else data loss may occur.

	To parse the dataset and extract the key column, the auto-generated
	class from a previous import must be used. You should specify the
	class name and jar file with +\--class-name+ and +\--jar-file+. If
	this is not availab,e you can recreate the class using the +codegen+
	tool.

	The merge tool is typically run after an incremental import with the
	date-last-modified mode (+sqoop import --incremental lastmodified ...+).

	Supposing two incremental imports were performed, where some older data
	is in an HDFS directory named +older+ and newer data is in an HDFS
	directory named +newer+, these could be merged like so:

	----
	$ sqoop merge --new-data newer --onto older --target-dir merged \
	--jar-file datatypes.jar --class-name Foo --merge-key id
	----

	This would run a MapReduce job where the value in the +id+ column
	of each row is used to join rows; rows in the +newer+ dataset will
	be used in preference to rows in the +older+ dataset.

	This can be used with both SequenceFile-, Avro- and text-based
	incremental imports. The file types of the newer and older datasets
	must be the same.