src/docs/user/accumulo.txt - sqoop - Git at Google


 ////
   Licensed to the Apache Software Foundation (ASF) under one
   or more contributor license agreements.  See the NOTICE file
   distributed with this work for additional information
   regarding copyright ownership.  The ASF licenses this file
   to you under the Apache License, Version 2.0 (the
   "License"); you may not use this file except in compliance
   with the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
 ////


 Importing Data Into Accumulo
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Sqoop supports importing records into a table in Accumulo

 By specifying +\--accumulo-table+, you instruct Sqoop to import
 to a table in Accumulo rather than a directory in HDFS. Sqoop will
 import data to the table specified as the argument to +\--accumulo-table+.
 Each row of the input table will be transformed into an Accumulo
 +Mutation+ operation to a row of the output table. The key for each row is
 taken from a column of the input. By default Sqoop will use the split-by
 column as the row key column. If that is not specified, it will try to
 identify the primary key column, if any, of the source table. You can
 manually specify the row key column with +\--accumulo-row-key+. Each output
 column will be placed in the same column family, which must be specified
 with +\--accumulo-column-family+.

 NOTE: This function is incompatible with direct import (parameter
 +\--direct+), and cannot be used in the same operation as an HBase import.

 If the target table does not exist, the Sqoop job will
 exit with an error, unless the +--accumulo-create-table+ parameter is
 specified. Otherwise, you should create the target table before running
 an import.

 Sqoop currently serializes all values to Accumulo by converting each field
 to its string representation (as if you were importing to HDFS in text
 mode), and then inserts the UTF-8 bytes of this string in the target
 cell.

 By default, no visibility is applied to the resulting cells in Accumulo,
 so the data will be visible to any Accumulo user. Use the
 +\--accumulo-visibility+ parameter to specify a visibility token to
 apply to all rows in the import job.

 For performance tuning, use the optional +\--accumulo-buffer-size\+ and
 +\--accumulo-max-latency+ parameters. See Accumulo's documentation for
 an explanation of the effects of these parameters.

 In order to connect to an Accumulo instance, you must specify the location
 of a Zookeeper ensemble using the +\--accumulo-zookeepers+ parameter,
 the name of the Accumulo instance (+\--accumulo-instance+), and the
 username and password to connect with (+\--accumulo-user+ and
 +\--accumulo-password+ respectively).

	////
	Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	////


	Importing Data Into Accumulo
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Sqoop supports importing records into a table in Accumulo

	By specifying +\--accumulo-table+, you instruct Sqoop to import
	to a table in Accumulo rather than a directory in HDFS. Sqoop will
	import data to the table specified as the argument to +\--accumulo-table+.
	Each row of the input table will be transformed into an Accumulo
	+Mutation+ operation to a row of the output table. The key for each row is
	taken from a column of the input. By default Sqoop will use the split-by
	column as the row key column. If that is not specified, it will try to
	identify the primary key column, if any, of the source table. You can
	manually specify the row key column with +\--accumulo-row-key+. Each output
	column will be placed in the same column family, which must be specified
	with +\--accumulo-column-family+.

	NOTE: This function is incompatible with direct import (parameter
	+\--direct+), and cannot be used in the same operation as an HBase import.

	If the target table does not exist, the Sqoop job will
	exit with an error, unless the +--accumulo-create-table+ parameter is
	specified. Otherwise, you should create the target table before running
	an import.

	Sqoop currently serializes all values to Accumulo by converting each field
	to its string representation (as if you were importing to HDFS in text
	mode), and then inserts the UTF-8 bytes of this string in the target
	cell.

	By default, no visibility is applied to the resulting cells in Accumulo,
	so the data will be visible to any Accumulo user. Use the
	+\--accumulo-visibility+ parameter to specify a visibility token to
	apply to all rows in the import job.

	For performance tuning, use the optional +\--accumulo-buffer-size\+ and
	+\--accumulo-max-latency+ parameters. See Accumulo's documentation for
	an explanation of the effects of these parameters.

	In order to connect to an Accumulo instance, you must specify the location
	of a Zookeeper ensemble using the +\--accumulo-zookeepers+ parameter,
	the name of the Accumulo instance (+\--accumulo-instance+), and the
	username and password to connect with (+\--accumulo-user+ and
	+\--accumulo-password+ respectively).