blob: 11e3c6e4659d525c59722e0968dfbaa2b701911b [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
Developing Sqoop Plugins
------------------------
Sqoop allows users to develop their own plugins. Users can develop their
plugins as separate jars, deploy them in $SQOOP_LIB and register with
sqoop. Infact, Sqoop architecture is a plugin based architecture and all
the internal tools like import, export, merge etc are also supported as
tool plugins. Users can also develop their own custom tool plugins. Once
deployed and registered with sqoop, these plugins will work like any
other internal tool. They will also get listed in the tools when you run
+sqoop help+ command.
BaseSqoopTool - Base class for User defined Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BaseSqoopTool is the base class for all Sqoop Tools. If you want to develop
a cusom tool, you need to inherit your tool from BaseSqoopTool and override
the following methods:
- +public int run(SqoopOptions options)+ : This is the main method for the
tool and acts as entry point for execution for your custom tool.
- +public void configureOptions(ToolOptions toolOptions)+ : Configures the
command-line arguments we expect to receive. You can also specify the
description of all the command line arguments. When a user executes
+sqoop help <your tool>+, the information which is provided in this
method will be output to the user.
- +public void applyOptions(CommandLine in, SqoopOptions out)+ : parses all
options and populates SqoopOptions which acts as a data transfer object
during the complete execution.
- +public void validateOptions(SqoopOptions options)+ : provide any
validations required for your options.
Supporting User defined custom options
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sqoop parses the arguments which are passed by users and are stored in
SqoopOptions object. This object then acts as data transfer object. This
object is passed to various phases of processing like preprocessing before
running the actual MapReduce, MapReduce phase and even postprocessing phase.
This class has a lot of members. The options are parsed and populated in the
respective member. Now lets say that a user creates a new user defined tool
and this tool has some new options which don't map to any of the existing
members of the SqoopOptions class. Either user can add a new member to
SqoopOption class which means users will have to make changes in sqoop and
compile it, which mght not be possible always for all users. Other option
is to use +extraArgs+ member. This is a string array which contains the
options for thirdparty tools which could be passed directly to the third
party tool like mysqldump etc. This array string needs parsing every time
to understand the parameters.
The most elegant way of supporting custom options for user defined tool is
+customToolOptions+ map. This is a map member of SqoopOption class.
Developer can parse the user defined parameters and populate this map with
appropriate key/value pairs. When SqoopOption object is passed to various
phases of processing these values will be readily available and parsing is
not required for every access.
Lets take an example to understand the usage better. Lets say you want to
develop a custom tool to merge two hive tables and it will take the following
parameters :
- +--hive-updates-database+
- +--hive-updates-table+
- +--merge-keys+
- +--retain-updates-tbl+
None of these options are available in SqoopOption object. Tool Developer
can override the +applyOptions+ method and in this method the user options
can be parsed and populated in the customToolOptions map. Once that is done,
SqoopOption object can be passed throughout program and these values will
be available for users.
These option names will be stored as keys and the values passed by users
will be stored as values. Lets define these options as static finals :
....
public static final String MERGE_KEYS = "merge-keys";
public static final String HIVE_UPDATES_TABLE = "hive-updates-table";
public static final String HIVE_UPDATES_TABLE_DB = "hive-updates-database";
public static final String RETAIN_UPDATES_TBL = "retain-updates-tbl";
....
A sample applyOptions example which parses the above said options and
populates the customToolOptions map is below :
....
public void applyOptions(CommandLine in, SqoopOptions out)
throws InvalidOptionsException {
if (in.hasOption(VERBOSE_ARG)) {
LoggingUtils.setDebugLevel();
log.debug("Enabled debug logging.");
}
if (in.hasOption(HELP_ARG)) {
ToolOptions toolOpts = new ToolOptions();
configureOptions(toolOpts);
printHelp(toolOpts);
throw new InvalidOptionsException("");
}
Map<String, String> mergeOptionsMap = new HashMap<String, String>();
if (in.hasOption(MERGE_KEYS)) {
mergeOptionsMap.put(MERGE_KEYS, in.getOptionValue(MERGE_KEYS));
}
if (in.hasOption(HIVE_UPDATES_TABLE)) {
mergeOptionsMap.put(HIVE_UPDATES_TABLE,
in.getOptionValue(HIVE_UPDATES_TABLE));
}
if (in.hasOption(HIVE_UPDATES_TABLE_DB)) {
mergeOptionsMap.put(HIVE_UPDATES_TABLE_DB,
in.getOptionValue(HIVE_UPDATES_TABLE_DB));
}
if (in.hasOption(RETAIN_UPDATES_TBL)) {
mergeOptionsMap.put(RETAIN_UPDATES_TBL, "");
}
if (in.hasOption(HIVE_TABLE_ARG)) {
out.setHiveTableName(in.getOptionValue(HIVE_TABLE_ARG));
}
if (in.hasOption(HIVE_DATABASE_ARG)) {
out.setHiveDatabaseName(in.getOptionValue(HIVE_DATABASE_ARG));
}
if (out.getCustomToolOptions() == null) {
out.setCustomToolOptions(mergeOptionsMap);
}
}
....
ToolPlugin - Base class for the plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Once the tool is developed, you need to wrap it with a plugin class and
register that plugin class with Sqoop. Your plugin class should extend from
+org.apache.sqoop.tool.ToolPlugin+ and override +getTools()+ method.
Example: Lets say that you have developed a tool called hive-merge which
merges 2 hive tables and your Tool class is HiveMergeTool, the plugin
implementation will look like
....
public class HiveMergePlugin extends ToolPlugin {
@Override
public List<ToolDesc> getTools() {
return Collections
.singletonList(new ToolDesc(
"hive-merge",
HiveMergeTool.class,
"This tool is used to perform the merge data from a tmp hive table into a destination hive table."));
}
}
....
Registering User defined plugin with Sqoop
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Finally you need to copy your plugin jar to $SQOOP_LIB directory and register the
plugin class with sqoop in sqoop-site.xml :
....
<property>
<name>sqoop.tool.plugins</name>
<value>com.expedia.sqoop.tool.HiveMergePlugin</value>
<description>A comma-delimited list of ToolPlugin implementations
which are consulted, in order, to register SqoopTool instances which
allow third-party tools to be used.
</description>
</property>
....