src/docs/user/basics.txt - sqoop - Git at Google

 ////
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ////

 Basic Usage
 -----------

 With Sqoop, you can _import_ data from a relational database system or a
 mainframe into HDFS. The input to the import process is either database table
 or mainframe datasets. For databases, Sqoop will read the table row-by-row
 into HDFS.  For mainframe datasets, Sqoop will read records from each mainframe
 dataset into HDFS. The output of this import process is a set of files
 containing a copy of the imported table or datasets.
 The import process is performed in parallel. For this reason, the
 output will be in multiple files. These files may be delimited text
 files (for example, with commas or tabs separating each field), or
 binary Avro or SequenceFiles containing serialized record data.

 A by-product of the import process is a generated Java class which
 can encapsulate one row of the imported table. This class is used
 during the import process by Sqoop itself. The Java source code for
 this class is also provided to you, for use in subsequent MapReduce
 processing of the data. This class can serialize and deserialize data
 to and from the SequenceFile format. It can also parse the
 delimited-text form of a record. These abilities allow you to quickly
 develop MapReduce applications that use the HDFS-stored records in
 your processing pipeline. You are also free to parse the delimiteds
 record data yourself, using any other tools you prefer.

 After manipulating the imported records (for example, with MapReduce
 or Hive) you may have a result data set which you can then _export_
 back to the relational database. Sqoop's export process will read
 a set of delimited text files from HDFS in parallel, parse them into
 records, and insert them as new rows in a target database table, for
 consumption by external applications or users.

 Sqoop includes some other commands which allow you to inspect the
 database you are working with. For example, you can list the available
 database schemas (with the +sqoop-list-databases+ tool) and tables
 within a schema (with the +sqoop-list-tables+ tool). Sqoop also
 includes a primitive SQL execution shell (the +sqoop-eval+ tool).

 Most aspects of the import, code generation, and export processes can
 be customized. For databases, you can control the specific row range or
 columns imported.  You can specify particular delimiters and escape characters
 for the file-based representation of the data, as well as the file format
 used.  You can also control the class or package names used in
 generated code. Subsequent sections of this document explain how to
 specify these and other arguments to Sqoop.
	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	////

	Basic Usage
	-----------

	With Sqoop, you can _import_ data from a relational database system or a
	mainframe into HDFS. The input to the import process is either database table
	or mainframe datasets. For databases, Sqoop will read the table row-by-row
	into HDFS. For mainframe datasets, Sqoop will read records from each mainframe
	dataset into HDFS. The output of this import process is a set of files
	containing a copy of the imported table or datasets.
	The import process is performed in parallel. For this reason, the
	output will be in multiple files. These files may be delimited text
	files (for example, with commas or tabs separating each field), or
	binary Avro or SequenceFiles containing serialized record data.

	A by-product of the import process is a generated Java class which
	can encapsulate one row of the imported table. This class is used
	during the import process by Sqoop itself. The Java source code for
	this class is also provided to you, for use in subsequent MapReduce
	processing of the data. This class can serialize and deserialize data
	to and from the SequenceFile format. It can also parse the
	delimited-text form of a record. These abilities allow you to quickly
	develop MapReduce applications that use the HDFS-stored records in
	your processing pipeline. You are also free to parse the delimiteds
	record data yourself, using any other tools you prefer.

	After manipulating the imported records (for example, with MapReduce
	or Hive) you may have a result data set which you can then _export_
	back to the relational database. Sqoop's export process will read
	a set of delimited text files from HDFS in parallel, parse them into
	records, and insert them as new rows in a target database table, for
	consumption by external applications or users.

	Sqoop includes some other commands which allow you to inspect the
	database you are working with. For example, you can list the available
	database schemas (with the +sqoop-list-databases+ tool) and tables
	within a schema (with the +sqoop-list-tables+ tool). Sqoop also
	includes a primitive SQL execution shell (the +sqoop-eval+ tool).

	Most aspects of the import, code generation, and export processes can
	be customized. For databases, you can control the specific row range or
	columns imported. You can specify particular delimiters and escape characters
	for the file-based representation of the data, as well as the file format
	used. You can also control the class or package names used in
	generated code. Subsequent sections of this document explain how to
	specify these and other arguments to Sqoop.