blob: 362283f3f6c9f93ab6b618b9fe733d0c3d5d69ed [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
Basic Usage
-----------
With Sqoop, you can _import_ data from a relational database system or a
mainframe into HDFS. The input to the import process is either database table
or mainframe datasets. For databases, Sqoop will read the table row-by-row
into HDFS. For mainframe datasets, Sqoop will read records from each mainframe
dataset into HDFS. The output of this import process is a set of files
containing a copy of the imported table or datasets.
The import process is performed in parallel. For this reason, the
output will be in multiple files. These files may be delimited text
files (for example, with commas or tabs separating each field), or
binary Avro or SequenceFiles containing serialized record data.
A by-product of the import process is a generated Java class which
can encapsulate one row of the imported table. This class is used
during the import process by Sqoop itself. The Java source code for
this class is also provided to you, for use in subsequent MapReduce
processing of the data. This class can serialize and deserialize data
to and from the SequenceFile format. It can also parse the
delimited-text form of a record. These abilities allow you to quickly
develop MapReduce applications that use the HDFS-stored records in
your processing pipeline. You are also free to parse the delimiteds
record data yourself, using any other tools you prefer.
After manipulating the imported records (for example, with MapReduce
or Hive) you may have a result data set which you can then _export_
back to the relational database. Sqoop's export process will read
a set of delimited text files from HDFS in parallel, parse them into
records, and insert them as new rows in a target database table, for
consumption by external applications or users.
Sqoop includes some other commands which allow you to inspect the
database you are working with. For example, you can list the available
database schemas (with the +sqoop-list-databases+ tool) and tables
within a schema (with the +sqoop-list-tables+ tool). Sqoop also
includes a primitive SQL execution shell (the +sqoop-eval+ tool).
Most aspects of the import, code generation, and export processes can
be customized. For databases, you can control the specific row range or
columns imported. You can specify particular delimiters and escape characters
for the file-based representation of the data, as well as the file format
used. You can also control the class or package names used in
generated code. Subsequent sections of this document explain how to
specify these and other arguments to Sqoop.