blob: 817956d4874e64b6b7c03dc5acf2d5be706a8bd3 [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
Importing Data Into HBase
^^^^^^^^^^^^^^^^^^^^^^^^^
Sqoop supports additional import targets beyond HDFS and Hive. Sqoop
can also import records into a table in HBase.
By specifying +\--hbase-table+, you instruct Sqoop to import
to a table in HBase rather than a directory in HDFS. Sqoop will
import data to the table specified as the argument to +\--hbase-table+.
Each row of the input table will be transformed into an HBase
+Put+ operation to a row of the output table. The key for each row is
taken from a column of the input. By default Sqoop will use the split-by
column as the row key column. If that is not specified, it will try to
identify the primary key column, if any, of the source table. You can
manually specify the row key column with +\--hbase-row-key+. Each output
column will be placed in the same column family, which must be specified
with +\--column-family+.
NOTE: This function is incompatible with direct import (parameter
+\--direct+).
If the input table has composite key, the +\--hbase-row-key+ must be
in the form of a comma-separated list of composite key attributes.
In this case, the row key for HBase row will be generated by combining
values of composite key attributes using underscore as a separator.
NOTE: Sqoop import for a table with composite key will work only if
parameter +\--hbase-row-key+ has been specified.
If the target table and column family do not exist, the Sqoop job will
exit with an error. You should create the target table and column family
before running an import. If you specify +\--hbase-create-table+, Sqoop
will create the target table and column family if they do not exist,
using the default parameters from your HBase configuration.
Sqoop currently serializes all values to HBase by converting each field
to its string representation (as if you were importing to HDFS in text
mode), and then inserts the UTF-8 bytes of this string in the target
cell. Sqoop will skip all rows containing null values in all columns
except the row key column.
By default Sqoop will retain the previously imported value for columns
updated to null during incremental imports. This can be changed to
delete all previous versions of the column by using
+\--hbase-null-incremental-mode delete+.
To decrease the load on hbase, Sqoop can do bulk loading as opposed to
direct writes. To use bulk loading, enable it using +\--hbase-bulkload+.