| <noautolink> |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| ----- |
| |
| ---+!! Oozie Sqoop Action Extension |
| |
| %TOC% |
| |
| ---++ Sqoop Action |
| |
| *IMPORTANT:* The Sqoop action requires Apache Hadoop 1.x or 2.x. |
| |
| The =sqoop= action runs a Sqoop job. |
| |
| The workflow job will wait until the Sqoop job completes before |
| continuing to the next action. |
| |
| To run the Sqoop job, you have to configure the =sqoop= action with the |
| =job-tracker=, =name-node= and Sqoop =command= or =arg= elements as |
| well as configuration. |
| |
| A =sqoop= action can be configured to create or delete HDFS directories |
| before starting the Sqoop job. |
| |
| Sqoop configuration can be specified with a file, using the =job-xml= |
| element, and inline, using the =configuration= elements. |
| |
| Oozie EL expressions can be used in the inline configuration. Property |
| values specified in the =configuration= element override values specified |
| in the =job-xml= file. |
| |
| Note that Hadoop =mapred.job.tracker= and =fs.default.name= properties |
| must not be present in the inline configuration. |
| |
| As with Hadoop =map-reduce= jobs, it is possible to add files and |
| archives in order to make them available to the Sqoop job. Refer to the |
| [WorkflowFunctionalSpec#FilesArchives][Adding Files and Archives for the Job] |
| section for more information about this feature. |
| |
| *Syntax:* |
| |
| <verbatim> |
| <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> |
| ... |
| <action name="[NODE-NAME]"> |
| <sqoop xmlns="uri:oozie:sqoop-action:0.2"> |
| <job-tracker>[JOB-TRACKER]</job-tracker> |
| <name-node>[NAME-NODE]</name-node> |
| <prepare> |
| <delete path="[PATH]"/> |
| ... |
| <mkdir path="[PATH]"/> |
| ... |
| </prepare> |
| <configuration> |
| <property> |
| <name>[PROPERTY-NAME]</name> |
| <value>[PROPERTY-VALUE]</value> |
| </property> |
| ... |
| </configuration> |
| <command>[SQOOP-COMMAND]</command> |
| <arg>[SQOOP-ARGUMENT]</arg> |
| ... |
| <file>[FILE-PATH]</file> |
| ... |
| <archive>[FILE-PATH]</archive> |
| ... |
| </sqoop> |
| <ok to="[NODE-NAME]"/> |
| <error to="[NODE-NAME]"/> |
| </action> |
| ... |
| </workflow-app> |
| </verbatim> |
| |
| The =prepare= element, if present, indicates a list of paths to delete |
| or create before starting the job. Specified paths must start with =hdfs://HOST:PORT=. |
| |
| The =job-xml= element, if present, specifies a file containing configuration |
| for the Sqoop job. As of schema 0.3, multiple =job-xml= elements are allowed in order to |
| specify multiple =job.xml= files. |
| |
| The =configuration= element, if present, contains configuration |
| properties that are passed to the Sqoop job. |
| |
| *Sqoop command* |
| |
| The Sqoop command can be specified either using the =command= element or multiple =arg= |
| elements. |
| |
| When using the =command= element, Oozie will split the command on every space |
| into multiple arguments. |
| |
| When using the =arg= elements, Oozie will pass each argument value as an argument to Sqoop. |
| |
| The =arg= variant should be used when there are spaces within a single argument. |
| |
| Consult the Sqoop documentation for a complete list of valid Sqoop commands. |
| |
| All the above elements can be parameterized (templatized) using EL |
| expressions. |
| |
| *Examples:* |
| |
| Using the =command= element: |
| |
| <verbatim> |
| <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> |
| ... |
| <action name="myfirsthivejob"> |
| <sqoop xmlns="uri:oozie:sqoop-action:0.2"> |
| <job-tracker>foo:8021</job-tracker> |
| <name-node>bar:8020</name-node> |
| <prepare> |
| <delete path="${jobOutput}"/> |
| </prepare> |
| <configuration> |
| <property> |
| <name>mapred.compress.map.output</name> |
| <value>true</value> |
| </property> |
| </configuration> |
| <command>import --connect jdbc:hsqldb:file:db.hsqldb --table TT --target-dir hdfs://localhost:8020/user/tucu/foo -m 1</command> |
| </sqoop> |
| <ok to="myotherjob"/> |
| <error to="errorcleanup"/> |
| </action> |
| ... |
| </workflow-app> |
| </verbatim> |
| |
| The same Sqoop action using =arg= elements: |
| |
| <verbatim> |
| <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> |
| ... |
| <action name="myfirsthivejob"> |
| <sqoop xmlns="uri:oozie:sqoop-action:0.2"> |
| <job-tracker>foo:8021</job-tracker> |
| <name-node>bar:8020</name-node> |
| <prepare> |
| <delete path="${jobOutput}"/> |
| </prepare> |
| <configuration> |
| <property> |
| <name>mapred.compress.map.output</name> |
| <value>true</value> |
| </property> |
| </configuration> |
| <arg>import</arg> |
| <arg>--connect</arg> |
| <arg>jdbc:hsqldb:file:db.hsqldb</arg> |
| <arg>--table</arg> |
| <arg>TT</arg> |
| <arg>--target-dir</arg> |
| <arg>hdfs://localhost:8020/user/tucu/foo</arg> |
| <arg>-m</arg> |
| <arg>1</arg> |
| </sqoop> |
| <ok to="myotherjob"/> |
| <error to="errorcleanup"/> |
| </action> |
| ... |
| </workflow-app> |
| </verbatim> |
| |
| NOTE: The =arg= elements syntax, while more verbose, allows to have spaces in a single argument, something useful when |
| using free from queries. |
| |
| ---+++ Sqoop Action Counters |
| |
| The counters of the map-reduce job run by the Sqoop action are available to be used in the workflow via the |
| [[WorkflowFunctionalSpec#HadoopCountersEL][hadoop:counters() EL function]]. |
| |
| If the Sqoop action run an import all command, the =hadoop:counters()= EL will return the aggregated counters |
| of all map-reduce jobs run by the Sqoop import all command. |
| |
| ---+++ Sqoop Action Logging |
| |
| Sqoop action logs are redirected to the Oozie Launcher map-reduce job task STDOUT/STDERR that runs Sqoop. |
| |
| From Oozie web-console, from the Sqoop action pop up using the 'Console URL' link, it is possible |
| to navigate to the Oozie Launcher map-reduce job task logs via the Hadoop job-tracker web-console. |
| |
| The logging level of the Sqoop action can set in the Sqoop action configuration using the |
| property =oozie.sqoop.log.level=. The default value is =INFO=. |
| |
| ---++ Appendix, Sqoop XML-Schema |
| |
| ---+++ AE.A Appendix A, Sqoop XML-Schema |
| |
| ---++++ Sqoop Action Schema Version 0.3 |
| <verbatim> |
| <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" |
| xmlns:sqoop="uri:oozie:sqoop-action:0.3" elementFormDefault="qualified" |
| targetNamespace="uri:oozie:sqoop-action:0.3"> |
| |
| <xs:element name="sqoop" type="sqoop:ACTION"/> |
| |
| <xs:complexType name="ACTION"> |
| <xs:sequence> |
| <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> |
| <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> |
| <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> |
| <xs:choice> |
| <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> |
| </xs:choice> |
| <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> |
| <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> |
| </xs:sequence> |
| </xs:complexType> |
| |
| <xs:complexType name="CONFIGURATION"> |
| <xs:sequence> |
| <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> |
| <xs:complexType> |
| <xs:sequence> |
| <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> |
| <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> |
| <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> |
| </xs:sequence> |
| </xs:complexType> |
| </xs:element> |
| </xs:sequence> |
| </xs:complexType> |
| |
| <xs:complexType name="PREPARE"> |
| <xs:sequence> |
| <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> |
| <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> |
| </xs:sequence> |
| </xs:complexType> |
| |
| <xs:complexType name="DELETE"> |
| <xs:attribute name="path" type="xs:string" use="required"/> |
| </xs:complexType> |
| |
| <xs:complexType name="MKDIR"> |
| <xs:attribute name="path" type="xs:string" use="required"/> |
| </xs:complexType> |
| |
| </xs:schema> |
| </verbatim> |
| |
| ---++++ Sqoop Action Schema Version 0.2 |
| <verbatim> |
| <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" |
| xmlns:sqoop="uri:oozie:sqoop-action:0.2" elementFormDefault="qualified" |
| targetNamespace="uri:oozie:sqoop-action:0.2"> |
| |
| <xs:element name="sqoop" type="sqoop:ACTION"/> |
| . |
| <xs:complexType name="ACTION"> |
| <xs:sequence> |
| <xs:element name="job-tracker" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="name-node" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="prepare" type="sqoop:PREPARE" minOccurs="0" maxOccurs="1"/> |
| <xs:element name="job-xml" type="xs:string" minOccurs="0" maxOccurs="1"/> |
| <xs:element name="configuration" type="sqoop:CONFIGURATION" minOccurs="0" maxOccurs="1"/> |
| <xs:choice> |
| <xs:element name="command" type="xs:string" minOccurs="1" maxOccurs="1"/> |
| <xs:element name="arg" type="xs:string" minOccurs="1" maxOccurs="unbounded"/> |
| </xs:choice> |
| <xs:element name="file" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> |
| <xs:element name="archive" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> |
| </xs:sequence> |
| </xs:complexType> |
| . |
| <xs:complexType name="CONFIGURATION"> |
| <xs:sequence> |
| <xs:element name="property" minOccurs="1" maxOccurs="unbounded"> |
| <xs:complexType> |
| <xs:sequence> |
| <xs:element name="name" minOccurs="1" maxOccurs="1" type="xs:string"/> |
| <xs:element name="value" minOccurs="1" maxOccurs="1" type="xs:string"/> |
| <xs:element name="description" minOccurs="0" maxOccurs="1" type="xs:string"/> |
| </xs:sequence> |
| </xs:complexType> |
| </xs:element> |
| </xs:sequence> |
| </xs:complexType> |
| . |
| <xs:complexType name="PREPARE"> |
| <xs:sequence> |
| <xs:element name="delete" type="sqoop:DELETE" minOccurs="0" maxOccurs="unbounded"/> |
| <xs:element name="mkdir" type="sqoop:MKDIR" minOccurs="0" maxOccurs="unbounded"/> |
| </xs:sequence> |
| </xs:complexType> |
| . |
| <xs:complexType name="DELETE"> |
| <xs:attribute name="path" type="xs:string" use="required"/> |
| </xs:complexType> |
| . |
| <xs:complexType name="MKDIR"> |
| <xs:attribute name="path" type="xs:string" use="required"/> |
| </xs:complexType> |
| . |
| </xs:schema> |
| </verbatim> |
| |
| [[index][::Go back to Oozie Documentation Index::]] |
| |
| </noautolink> |