blob: 0801e3b4b5da25300fb814282e2947bd6e76736c [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="author" content="Apache Software Foundation">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Schedulers - Apache Gobblin</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link href="../../css/extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Schedulers";
var mkdocs_page_input_path = "user-guide/Gobblin-Schedulers.md";
var mkdocs_page_url = null;
</script>
<script src="../../js/jquery-2.1.1.min.js" defer></script>
<script src="../../js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Apache Gobblin</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="/">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Powered-By/">Companies Powered By Gobblin</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Getting-Started/">Getting Started</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Gobblin-Architecture/">Architecture</a>
</li>
<li class="toctree-l1">
<span class="caption-text">User Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../Working-with-Job-Configuration-Files/">Job Configuration Files</a>
</li>
<li class="">
<a class="" href="../Gobblin-Deployment/">Deployment</a>
</li>
<li class="">
<a class="" href="../Gobblin-as-a-Library/">Gobblin as a Library</a>
</li>
<li class="">
<a class="" href="../Gobblin-CLI/">Gobblin CLI</a>
</li>
<li class="">
<a class="" href="../Gobblin-Compliance/">Gobblin Compliance</a>
</li>
<li class="">
<a class="" href="../Gobblin-on-Yarn/">Gobblin on Yarn</a>
</li>
<li class="">
<a class="" href="../Compaction/">Compaction</a>
</li>
<li class="">
<a class="" href="../State-Management-and-Watermarks/">State Management and Watermarks</a>
</li>
<li class="">
<a class="" href="../Working-with-the-ForkOperator/">Fork Operator</a>
</li>
<li class="">
<a class="" href="../Configuration-Properties-Glossary/">Configuration Glossary</a>
</li>
<li class="">
<a class="" href="../Source-schema-and-Converters/">Source schema and Converters</a>
</li>
<li class="">
<a class="" href="../Partitioned-Writers/">Partitioned Writers</a>
</li>
<li class="">
<a class="" href="../Monitoring/">Monitoring</a>
</li>
<li class="">
<a class="" href="../Gobblin-template/">Template</a>
</li>
<li class=" current">
<a class="current" href="./">Schedulers</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#table-of-contents">Table of Contents</a></li>
<li class="toctree-l3"><a href="#introduction">Introduction</a></li>
<li class="toctree-l3"><a href="#quartz">Quartz</a></li>
<li class="toctree-l3"><a href="#azkaban">Azkaban</a></li>
<li class="toctree-l3"><a href="#oozie">Oozie</a></li>
<ul>
<li><a class="toctree-l4" href="#launching-gobblin-in-local-mode">Launching Gobblin in Local Mode</a></li>
<li><a class="toctree-l4" href="#launching-gobblin-in-mapreduce-mode">Launching Gobblin in MapReduce Mode</a></li>
<li><a class="toctree-l4" href="#debugging-tips">Debugging Tips</a></li>
</ul>
</ul>
</li>
<li class="">
<a class="" href="../Job-Execution-History-Store/">Job Execution History Store</a>
</li>
<li class="">
<a class="" href="../Building-Gobblin/">Building Gobblin</a>
</li>
<li class="">
<a class="" href="../Gobblin-genericLoad/">Generic Configuration Loading</a>
</li>
<li class="">
<a class="" href="../Hive-Registration/">Hive Registration</a>
</li>
<li class="">
<a class="" href="../Config-Management/">Config Management</a>
</li>
<li class="">
<a class="" href="../Docker-Integration/">Docker Integration</a>
</li>
<li class="">
<a class="" href="../Troubleshooting/">Troubleshooting</a>
</li>
<li class="">
<a class="" href="../FAQs/">FAQs</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sources</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sources/AvroFileSource/">Avro files</a>
</li>
<li class="">
<a class="" href="../../sources/CopySource/">File copy</a>
</li>
<li class="">
<a class="" href="../../sources/QueryBasedSource/">Query based</a>
</li>
<li class="">
<a class="" href="../../sources/RestApiSource/">Rest Api</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleAnalyticsSource/">Google Analytics</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleDriveSource/">Google Drive</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleWebmaster/">Google Webmaster</a>
</li>
<li class="">
<a class="" href="../../sources/HadoopTextInputSource/">Hadoop Text Input</a>
</li>
<li class="">
<a class="" href="../../sources/HelloWorldSource/">Hello World</a>
</li>
<li class="">
<a class="" href="../../sources/HiveAvroToOrcSource/">Hive Avro-to-ORC</a>
</li>
<li class="">
<a class="" href="../../sources/HivePurgerSource/">Hive compliance purging</a>
</li>
<li class="">
<a class="" href="../../sources/SimpleJsonSource/">JSON</a>
</li>
<li class="">
<a class="" href="../../sources/KafkaSource/">Kafka</a>
</li>
<li class="">
<a class="" href="../../sources/MySQLSource/">MySQL</a>
</li>
<li class="">
<a class="" href="../../sources/OracleSource/">Oracle</a>
</li>
<li class="">
<a class="" href="../../sources/SalesforceSource/">Salesforce</a>
</li>
<li class="">
<a class="" href="../../sources/SftpSource/">SFTP</a>
</li>
<li class="">
<a class="" href="../../sources/SqlServerSource/">SQL Server</a>
</li>
<li class="">
<a class="" href="../../sources/TeradataSource/">Teradata</a>
</li>
<li class="">
<a class="" href="../../sources/WikipediaSource/">Wikipedia</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sinks (Writers)</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sinks/AvroHdfsDataWriter/">Avro HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/ParquetHdfsDataWriter/">Parquet HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/SimpleBytesWriter/">HDFS Byte array</a>
</li>
<li class="">
<a class="" href="../../sinks/ConsoleWriter/">Console</a>
</li>
<li class="">
<a class="" href="../../sinks/CouchbaseWriter/">Couchbase</a>
</li>
<li class="">
<a class="" href="../../sinks/Http/">HTTP</a>
</li>
<li class="">
<a class="" href="../../sinks/Gobblin-JDBC-Writer/">JDBC</a>
</li>
<li class="">
<a class="" href="../../sinks/Kafka/">Kafka</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Adaptors</span>
<ul class="subnav">
<li class="">
<a class="" href="../../adaptors/Gobblin-Distcp/">Gobblin Distcp</a>
</li>
<li class="">
<a class="" href="../../adaptors/Hive-Avro-To-ORC-Converter/">Hive Avro-To-Orc Converter</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Case Studies</span>
<ul class="subnav">
<li class="">
<a class="" href="../../case-studies/Kafka-HDFS-Ingestion/">Kafka-HDFS Ingestion</a>
</li>
<li class="">
<a class="" href="../../case-studies/Publishing-Data-to-S3/">Publishing Data to S3</a>
</li>
<li class="">
<a class="" href="../../case-studies/Writing-ORC-Data/">Writing ORC Data</a>
</li>
<li class="">
<a class="" href="../../case-studies/Hive-Distcp/">Hive Distcp</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Data Management</span>
<ul class="subnav">
<li class="">
<a class="" href="../../data-management/Gobblin-Retention/">Retention</a>
</li>
<li class="">
<a class="" href="../../data-management/DistcpNgEvents/">Distcp-NG events</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Metrics</span>
<ul class="subnav">
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics/">Quick Start</a>
</li>
<li class="">
<a class="" href="../../metrics/Existing-Reporters/">Existing Reporters</a>
</li>
<li class="">
<a class="" href="../../metrics/Metrics-for-Gobblin-ETL/">Metrics for Gobblin ETL</a>
</li>
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics-Architecture/">Gobblin Metrics Architecture</a>
</li>
<li class="">
<a class="" href="../../metrics/Implementing-New-Reporters/">Implementing New Reporters</a>
</li>
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics-Performance/">Gobblin Metrics Performance</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Developer Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../../developer-guide/Customization-for-New-Source/">Customization for New Source</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Customization-for-Converter-and-Operator/">Customization for Converter and Operator</a>
</li>
<li class="">
<a class="" href="../../developer-guide/CodingStyle/">Code Style Guide</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Gobblin-Compliance-Design/">Gobblin Compliance Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/IDE-setup/">IDE setup</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Monitoring-Design/">Monitoring Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Documentation-Architecture/">Documentation Architecture</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Contributing/">Contributing</a>
</li>
<li class="">
<a class="" href="../../developer-guide/GobblinModules/">Gobblin Modules</a>
</li>
<li class="">
<a class="" href="../../developer-guide/HighLevelConsumer/">High Level Consumer</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Project</span>
<ul class="subnav">
<li class="">
<a class="" href="../../project/Feature-List/">Feature List</a>
</li>
<li class="">
<a class="" href="/people">Contributors and Team</a>
</li>
<li class="">
<a class="" href="../../project/Talks-and-Tech-Blogs/">Talks and Tech Blog Posts</a>
</li>
<li class="">
<a class="" href="../../project/Posts/">Posts</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Miscellaneous</span>
<ul class="subnav">
<li class="">
<a class="" href="../../miscellaneous/Camus-to-Gobblin-Migration/">Camus to Gobblin Migration</a>
</li>
<li class="">
<a class="" href="../../miscellaneous/Exactly-Once-Support/">Exactly Once Support</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Apache Gobblin</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>User Guide &raquo;</li>
<li>Schedulers</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/apache/incubator-gobblin/edit/master/docs/user-guide/Gobblin-Schedulers.md" rel="nofollow"> Edit on Gobblin</a>
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h1 id="table-of-contents">Table of Contents</h1>
<div class="toc">
<ul>
<li><a href="#table-of-contents">Table of Contents</a></li>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#quartz">Quartz</a></li>
<li><a href="#azkaban">Azkaban</a></li>
<li><a href="#oozie">Oozie</a><ul>
<li><a href="#launching-gobblin-in-local-mode">Launching Gobblin in Local Mode</a><ul>
<li><a href="#example-config-files">Example Config Files</a></li>
<li><a href="#uploading-files-to-hdfs">Uploading Files to HDFS</a><ul>
<li><a href="#adding-gobblin-jar-dependencies">Adding Gobblin jar Dependencies</a></li>
</ul>
</li>
<li><a href="#launching-the-job">Launching the Job</a></li>
</ul>
</li>
<li><a href="#launching-gobblin-in-mapreduce-mode">Launching Gobblin in MapReduce Mode</a><ul>
<li><a href="#example-config-files_1">Example Config Files</a></li>
<li><a href="#further-steps">Further steps</a></li>
</ul>
</li>
<li><a href="#debugging-tips">Debugging Tips</a></li>
</ul>
</li>
</ul>
</div>
<h1 id="introduction">Introduction</h1>
<p>Gobblin jobs can be scheduled on a recurring basis using a few different tools. Gobblin ships with a built in <a href="https://quartz-scheduler.org/" rel="nofollow">Quartz Scheduler</a>. Gobblin also integrates with a few other third party tools.</p>
<h1 id="quartz">Quartz</h1>
<p>Gobblin has a built in Quartz scheduler as part of the <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/JobScheduler.java" rel="nofollow"><code>JobScheduler</code></a> class. This class integrates with the Gobblin <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/SchedulerDaemon.java" rel="nofollow"><code>SchedulerDaemon</code></a>, which can be run using the Gobblin <a href="https://github.com/apache/incubator-gobblin/blob/master/bin/gobblin-standalone.sh" rel="nofollow">`bin/gobblin-standalone.sh</a> script.</p>
<p>So in order to take advantage of the Quartz scheduler two steps need to be taken:</p>
<ul>
<li>Use the <code>bin/gobblin-standalone.sh</code> script</li>
<li>Add the property <code>job.schedule</code> to the <code>.pull</code> file<ul>
<li>The value for this property should be a <a href="http://quartz-scheduler.org/api/2.2.0/org/quartz/CronTrigger.html" rel="nofollow">CRONTrigger</a></li>
</ul>
</li>
</ul>
<h1 id="azkaban">Azkaban</h1>
<p>Gobblin can be launched via <a href="https://azkaban.github.io/" rel="nofollow">Azkaban</a>, and open-source Workflow Manager for scheduling and launching Hadoop jobs. Gobblin's <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-modules/gobblin-azkaban/src/main/java/org/apache/gobblin/azkaban/AzkabanJobLauncher.java" rel="nofollow"><code>AzkabanJobLauncher</code></a> can be used to launch a Gobblin job through Azkaban.</p>
<p>One has to follow the typical setup to create a zip file that can be uploaded to Azkaban (it should include all dependent jars, which can be found in <code>gobblin-dist.tar.gz</code>). The <code>.job</code> file for the Azkaban Job should contain all configuration properties that would be put in a <code>.pull</code> file (for example, the <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull" rel="nofollow">Wikipedia Example</a> <code>.pull</code> file). All Gobblin system dependent properties (e.g. <a href="https://github.com/apache/incubator-gobblin/blob/master/conf/gobblin-mapreduce.properties" rel="nofollow"><code>conf/gobblin-mapreduce.properties</code></a> or <a href="https://github.com/apache/incubator-gobblin/blob/master/conf/gobblin-standalone-v2.properties" rel="nofollow"><code>conf/gobblin-standalone.properties</code></a>) should also be in the zip file.</p>
<p>In the Azkaban <code>.job</code> file, the <code>type</code> parameter should be set to <code>hadoopJava</code> (see <a href="http://azkaban.github.io/azkaban/docs/latest/#hadoopjava-type" rel="nofollow">here</a> for more information about the <code>hadoopJava</code> Job Type). The <code>job.class</code> parameter should be set to <code>gobblin.azkaban.AzkabanJobLauncher</code>.</p>
<h1 id="oozie">Oozie</h1>
<p><a href="https://oozie.apache.org/">Oozie</a> is a very popular scheduler for the Hadoop environment. It allows users to define complex workflows using XML files. A workflow can be composed of a series of actions, such as Java Jobs, Pig Jobs, Spark Jobs, etc. Gobblin has two integration points with Oozie. It can be run as a stand-alone Java process via Oozie's <code>&lt;java&gt;</code> tag, or it can be run as an Map Reduce job via Oozie.</p>
<p>The following guides assume Oozie is already setup and running on some machine, if this is not the case consult the Oozie documentation for getting everything setup.</p>
<p>These tutorial only outline how to launch a basic Oozie job that simply runs a Gobblin java a single time. For information on how to build more complex flows, and how to run jobs on a schedule, check out the Oozie documentation online.</p>
<h3 id="launching-gobblin-in-local-mode">Launching Gobblin in Local Mode</h3>
<p>This guide focuses on getting Gobblin to run in as a stand alone Java Process. This means it will not launch a separate MR job to distribute its workload. It is important to understand how the current version of Oozie will launch a Java process. It will first start an MapReduce job and will run the Gobblin as a Java process inside a single map task. The Gobblin job will then ingest all data it is configured to pull and then it will shutdown.</p>
<h4 id="example-config-files">Example Config Files</h4>
<p><a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-oozie/src/test/resources/local" rel="nofollow"><code>gobblin-oozie/src/main/resources/local</code></a> contains sample configuration files for launching Gobblin Oozie. There are a number of important files in this directory:</p>
<p><a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-oozie/src/test/resources/local/gobblin-oozie-example-system.properties" rel="nofollow"><code>gobblin-oozie-example-system.properties</code></a> contains default system level properties for Gobblin. When launched with Oozie, Gobblin will run inside a map task; it is thus recommended to configure Gobblin to write directly to HDFS rather than the local file system. The property <code>fs.uri</code> in this file should be changed to point to the NameNode of the Hadoop File System the job should write to. By default, all data is written under a folder called <code>gobblin-out</code>; to change this modify the <code>gobblin.work.dir</code> parameter in this file.</p>
<p><a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-oozie/src/test/resources/local/gobblin-oozie-example-workflow.properties" rel="nofollow"><code>gobblin-oozie-example-workflow.properties</code></a> contains default Oozie properties for any job launched. It is also the entry point for launching an Oozie job (e.g. to launch an Oozie job from the command line you execute <code>oozie job -config gobblin-oozie-example-workflow.properties -run</code>). In this file one needs to update the <code>name.node</code> and <code>resource.manager</code> to the values specific to their environment. Another important property in this file is <code>oozie.wf.application.path</code>; it points to a folder on HDFS that contains any workflows to be run. It is important to note, that the <code>workflow.xml</code> files must be on HDFS in order for Oozie to pick them up (this is because Oozie typically runs on a separate machine as any client process).</p>
<p><a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-oozie/src/test/resources/local/gobblin-oozie-example-workflow.xml" rel="nofollow"><code>gobblin-oozie-example-workflow.xml</code></a> contains an example Oozie workflow. This example simply launches a Java process that invokes the main method of the <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-runtime/src/main/java/org/apache/gobblin/runtime/local/CliLocalJobLauncher.java" rel="nofollow"><code>CliLocalJobLauncher</code></a>. The main method of this class expects two file paths to be passed to it (once again these files need to be on HDFS). The <code>jobconfig</code> arg should point to a file on HDFS containing all job configuration parameters. An example <code>jobconfig</code> file can be found <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-example/src/main/resources/wikipedia.pull" rel="nofollow">here</a>. The <code>sysconfig</code> arg should point to a file on HDFS containing all system configuration parameters. An example <code>sysconfig</code> file for Oozie can be found <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-oozie/src/test/resources/local/gobblin-oozie-example-system.properties" rel="nofollow">here</a>.</p>
<!---Ying Do you think we can add some descriptions about launching through MR mode? The simplest way is to use the <shell> tag and invoke `gobblin-mapreduce.sh`. I've tested it before.-->
<h4 id="uploading-files-to-hdfs">Uploading Files to HDFS</h4>
<p>Oozie only reads a job properties file from the local file system (e.g. <code>gobblin-oozie-example-workflow.properties</code>), it expects all other configuration and dependent files to be uploaded to HDFS. Specifically, it looks for these files under the directory specified by <code>oozie.wf.application.path</code> Make sure this is the case before trying to launch an Oozie job.</p>
<h5 id="adding-gobblin-jar-dependencies">Adding Gobblin <code>jar</code> Dependencies</h5>
<p>Gobblin has a number of <code>jar</code> dependencies that need to be used when launching a Gobblin job. These dependencies can be taken from the <code>gobblin-dist.tar.gz</code> file that is created after building Gobblin. The tarball should contain a <code>lib</code> folder will the necessary dependencies. This folder should be placed into a <code>lib</code> folder under the same same directory specified by <code>oozie.wf.application.path</code> in the <code>gobblin-oozie-example-workflow.properties</code> file.</p>
<h4 id="launching-the-job">Launching the Job</h4>
<p>Assuming one has the <a href="https://oozie.apache.org/docs/3.1.3-incubating/DG_CommandLineTool.html">Oozie CLI</a> installed, the job can be launched using the following command: <code>oozie job -config gobblin-oozie-example-workflow.properties -run</code>.</p>
<h3 id="launching-gobblin-in-mapreduce-mode">Launching Gobblin in MapReduce Mode</h3>
<p>Launching Gobblin in mapreduce Mode works quite similar to the local mode. In this mode, the oozie launcher action will spawn a second mapreduce process where gobblin will process its tasks in distributed mode across the cluster. Since each of the Mappers needs access to the gobblin libraries, we need to provide the jars via the <code>job.hdfs.jars</code> variable</p>
<h4 id="example-config-files_1">Example Config Files</h4>
<p><a href="https://github.com/apache/incubator-gobblin/tree/master/gobblin-oozie/src/test/resources/mapreduce" rel="nofollow"><code>gobblin-oozie/src/main/resources/mapreduce</code></a> contains sample configuration files for launching Gobblin Oozie in Mapreduce mode. The main difference to launching Gobblin Oozie in Local mode are a view extra MapReduce related configuration variables in the sysconfig.properties file and launching CliMRJobLauncher instead CliLocalJobLauncher.</p>
<h4 id="further-steps">Further steps</h4>
<p>Everything else should be working the same way as in Local mode (see above)</p>
<h3 id="debugging-tips">Debugging Tips</h3>
<p>Once the job has been launched, its status can be queried via the following command: <code>oozie job -info &lt;oozie-job-id&gt;</code> and the logs can be shown via the following command <code>oozie job -log &lt;oozie-job-id&gt;</code>.</p>
<p>In order to get see the standard output of Gobblin, one needs to check the logs the Map task running the Gobblin process. <code>oozie job -info &lt;oozie-job-id&gt;</code> should show the Hadoop <code>job_id</code> of the Hadoop Job launched to run the Gobblin process. Using this id one should be able to find the logs of the Map tasks through the UI or other command line tools (e.g. <code>yarn logs</code>).</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../Job-Execution-History-Store/" class="btn btn-neutral float-right" title="Job Execution History Store">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../Gobblin-template/" class="btn btn-neutral" title="Template"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org" rel="nofollow">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme" rel="nofollow">theme</a> provided by <a href="https://readthedocs.org" rel="nofollow">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../Gobblin-template/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../Job-Execution-History-Store/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js" defer></script>
<script src="../../js/extra.js" defer></script>
<script src="../../search/main.js" defer></script>
</body>
</html>