blob: a9058c876e45200dae5985275d4a10ea3a817c76 [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="author" content="Apache Software Foundation">
<link rel="shortcut icon" href="../../img/favicon.ico">
<title>Gobblin CLI - Apache Gobblin</title>
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/styles/github.min.css">
<link href="../../css/extra.css" rel="stylesheet">
<script>
// Current page data
var mkdocs_page_name = "Gobblin CLI";
var mkdocs_page_input_path = "user-guide/Gobblin-CLI.md";
var mkdocs_page_url = null;
</script>
<script src="../../js/jquery-2.1.1.min.js" defer></script>
<script src="../../js/modernizr-2.8.3.min.js" defer></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/9.12.0/highlight.min.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../.." class="icon icon-home"> Apache Gobblin</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" title="Type search term here" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li class="toctree-l1">
<a class="" href="/">Home</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Powered-By/">Companies Powered By Gobblin</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Getting-Started/">Getting Started</a>
</li>
<li class="toctree-l1">
<a class="" href="../../Gobblin-Architecture/">Architecture</a>
</li>
<li class="toctree-l1">
<span class="caption-text">User Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../Working-with-Job-Configuration-Files/">Job Configuration Files</a>
</li>
<li class="">
<a class="" href="../Gobblin-Deployment/">Deployment</a>
</li>
<li class="">
<a class="" href="../Gobblin-as-a-Library/">Gobblin as a Library</a>
</li>
<li class=" current">
<a class="current" href="./">Gobblin CLI</a>
<ul class="subnav">
<li class="toctree-l3"><a href="#table-of-contents">Table of Contents</a></li>
<li class="toctree-l3"><a href="#gobblin-commands-execution-modes">Gobblin Commands &amp; Execution Modes</a></li>
<li class="toctree-l3"><a href="#gobblin-commands">Gobblin Commands</a></li>
<li class="toctree-l3"><a href="#the-distcp-quick-app">The Distcp Quick App</a></li>
<li class="toctree-l3"><a href="#the-oneshot-quick-app">The OneShot Quick App</a></li>
<li class="toctree-l3"><a href="#developing-quick-apps-for-the-cli">Developing quick apps for the CLI</a></li>
<li class="toctree-l3"><a href="#implementing-new-gobblin-commands">Implementing new Gobblin commands</a></li>
<li class="toctree-l3"><a href="#gobblin-service-execution-modes-as-daemon">Gobblin Service Execution Modes ( as Daemon )</a></li>
<li class="toctree-l3"><a href="#gobblin-system-configurations">Gobblin System Configurations</a></li>
</ul>
</li>
<li class="">
<a class="" href="../Gobblin-Compliance/">Gobblin Compliance</a>
</li>
<li class="">
<a class="" href="../Gobblin-on-Yarn/">Gobblin on Yarn</a>
</li>
<li class="">
<a class="" href="../Compaction/">Compaction</a>
</li>
<li class="">
<a class="" href="../State-Management-and-Watermarks/">State Management and Watermarks</a>
</li>
<li class="">
<a class="" href="../Working-with-the-ForkOperator/">Fork Operator</a>
</li>
<li class="">
<a class="" href="../Configuration-Properties-Glossary/">Configuration Glossary</a>
</li>
<li class="">
<a class="" href="../Source-schema-and-Converters/">Source schema and Converters</a>
</li>
<li class="">
<a class="" href="../Partitioned-Writers/">Partitioned Writers</a>
</li>
<li class="">
<a class="" href="../Monitoring/">Monitoring</a>
</li>
<li class="">
<a class="" href="../Gobblin-template/">Template</a>
</li>
<li class="">
<a class="" href="../Gobblin-Schedulers/">Schedulers</a>
</li>
<li class="">
<a class="" href="../Job-Execution-History-Store/">Job Execution History Store</a>
</li>
<li class="">
<a class="" href="../Building-Gobblin/">Building Gobblin</a>
</li>
<li class="">
<a class="" href="../Gobblin-genericLoad/">Generic Configuration Loading</a>
</li>
<li class="">
<a class="" href="../Hive-Registration/">Hive Registration</a>
</li>
<li class="">
<a class="" href="../Config-Management/">Config Management</a>
</li>
<li class="">
<a class="" href="../Docker-Integration/">Docker Integration</a>
</li>
<li class="">
<a class="" href="../Troubleshooting/">Troubleshooting</a>
</li>
<li class="">
<a class="" href="../FAQs/">FAQs</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sources</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sources/AvroFileSource/">Avro files</a>
</li>
<li class="">
<a class="" href="../../sources/CopySource/">File copy</a>
</li>
<li class="">
<a class="" href="../../sources/QueryBasedSource/">Query based</a>
</li>
<li class="">
<a class="" href="../../sources/RestApiSource/">Rest Api</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleAnalyticsSource/">Google Analytics</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleDriveSource/">Google Drive</a>
</li>
<li class="">
<a class="" href="../../sources/GoogleWebmaster/">Google Webmaster</a>
</li>
<li class="">
<a class="" href="../../sources/HadoopTextInputSource/">Hadoop Text Input</a>
</li>
<li class="">
<a class="" href="../../sources/HelloWorldSource/">Hello World</a>
</li>
<li class="">
<a class="" href="../../sources/HiveAvroToOrcSource/">Hive Avro-to-ORC</a>
</li>
<li class="">
<a class="" href="../../sources/HivePurgerSource/">Hive compliance purging</a>
</li>
<li class="">
<a class="" href="../../sources/SimpleJsonSource/">JSON</a>
</li>
<li class="">
<a class="" href="../../sources/KafkaSource/">Kafka</a>
</li>
<li class="">
<a class="" href="../../sources/MySQLSource/">MySQL</a>
</li>
<li class="">
<a class="" href="../../sources/OracleSource/">Oracle</a>
</li>
<li class="">
<a class="" href="../../sources/SalesforceSource/">Salesforce</a>
</li>
<li class="">
<a class="" href="../../sources/SftpSource/">SFTP</a>
</li>
<li class="">
<a class="" href="../../sources/SqlServerSource/">SQL Server</a>
</li>
<li class="">
<a class="" href="../../sources/TeradataSource/">Teradata</a>
</li>
<li class="">
<a class="" href="../../sources/WikipediaSource/">Wikipedia</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Sinks (Writers)</span>
<ul class="subnav">
<li class="">
<a class="" href="../../sinks/AvroHdfsDataWriter/">Avro HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/ParquetHdfsDataWriter/">Parquet HDFS</a>
</li>
<li class="">
<a class="" href="../../sinks/SimpleBytesWriter/">HDFS Byte array</a>
</li>
<li class="">
<a class="" href="../../sinks/ConsoleWriter/">Console</a>
</li>
<li class="">
<a class="" href="../../sinks/CouchbaseWriter/">Couchbase</a>
</li>
<li class="">
<a class="" href="../../sinks/Http/">HTTP</a>
</li>
<li class="">
<a class="" href="../../sinks/Gobblin-JDBC-Writer/">JDBC</a>
</li>
<li class="">
<a class="" href="../../sinks/Kafka/">Kafka</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Adaptors</span>
<ul class="subnav">
<li class="">
<a class="" href="../../adaptors/Gobblin-Distcp/">Gobblin Distcp</a>
</li>
<li class="">
<a class="" href="../../adaptors/Hive-Avro-To-ORC-Converter/">Hive Avro-To-Orc Converter</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Case Studies</span>
<ul class="subnav">
<li class="">
<a class="" href="../../case-studies/Kafka-HDFS-Ingestion/">Kafka-HDFS Ingestion</a>
</li>
<li class="">
<a class="" href="../../case-studies/Publishing-Data-to-S3/">Publishing Data to S3</a>
</li>
<li class="">
<a class="" href="../../case-studies/Writing-ORC-Data/">Writing ORC Data</a>
</li>
<li class="">
<a class="" href="../../case-studies/Hive-Distcp/">Hive Distcp</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Data Management</span>
<ul class="subnav">
<li class="">
<a class="" href="../../data-management/Gobblin-Retention/">Retention</a>
</li>
<li class="">
<a class="" href="../../data-management/DistcpNgEvents/">Distcp-NG events</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Gobblin Metrics</span>
<ul class="subnav">
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics/">Quick Start</a>
</li>
<li class="">
<a class="" href="../../metrics/Existing-Reporters/">Existing Reporters</a>
</li>
<li class="">
<a class="" href="../../metrics/Metrics-for-Gobblin-ETL/">Metrics for Gobblin ETL</a>
</li>
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics-Architecture/">Gobblin Metrics Architecture</a>
</li>
<li class="">
<a class="" href="../../metrics/Implementing-New-Reporters/">Implementing New Reporters</a>
</li>
<li class="">
<a class="" href="../../metrics/Gobblin-Metrics-Performance/">Gobblin Metrics Performance</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Developer Guide</span>
<ul class="subnav">
<li class="">
<a class="" href="../../developer-guide/Customization-for-New-Source/">Customization for New Source</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Customization-for-Converter-and-Operator/">Customization for Converter and Operator</a>
</li>
<li class="">
<a class="" href="../../developer-guide/CodingStyle/">Code Style Guide</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Gobblin-Compliance-Design/">Gobblin Compliance Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/IDE-setup/">IDE setup</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Monitoring-Design/">Monitoring Design</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Documentation-Architecture/">Documentation Architecture</a>
</li>
<li class="">
<a class="" href="../../developer-guide/Contributing/">Contributing</a>
</li>
<li class="">
<a class="" href="../../developer-guide/GobblinModules/">Gobblin Modules</a>
</li>
<li class="">
<a class="" href="../../developer-guide/HighLevelConsumer/">High Level Consumer</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Project</span>
<ul class="subnav">
<li class="">
<a class="" href="../../project/Feature-List/">Feature List</a>
</li>
<li class="">
<a class="" href="/people">Contributors and Team</a>
</li>
<li class="">
<a class="" href="../../project/Talks-and-Tech-Blogs/">Talks and Tech Blog Posts</a>
</li>
<li class="">
<a class="" href="../../project/Posts/">Posts</a>
</li>
</ul>
</li>
<li class="toctree-l1">
<span class="caption-text">Miscellaneous</span>
<ul class="subnav">
<li class="">
<a class="" href="../../miscellaneous/Camus-to-Gobblin-Migration/">Camus to Gobblin Migration</a>
</li>
<li class="">
<a class="" href="../../miscellaneous/Exactly-Once-Support/">Exactly Once Support</a>
</li>
</ul>
</li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../..">Apache Gobblin</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../..">Docs</a> &raquo;</li>
<li>User Guide &raquo;</li>
<li>Gobblin CLI</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/apache/incubator-gobblin/edit/master/docs/user-guide/Gobblin-CLI.md" rel="nofollow"> Edit on Gobblin</a>
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h2 id="table-of-contents">Table of Contents</h2>
<div class="toc">
<ul>
<li><a href="#table-of-contents">Table of Contents</a></li>
<li><a href="#gobblin-commands-execution-modes">Gobblin Commands &amp; Execution Modes</a></li>
<li><a href="#gobblin-commands">Gobblin Commands</a></li>
<li><a href="#the-distcp-quick-app">The Distcp Quick App</a></li>
<li><a href="#the-oneshot-quick-app">The OneShot Quick App</a></li>
<li><a href="#developing-quick-apps-for-the-cli">Developing quick apps for the CLI</a></li>
<li><a href="#implementing-new-gobblin-commands">Implementing new Gobblin commands</a></li>
<li><a href="#gobblin-service-execution-modes-as-daemon">Gobblin Service Execution Modes ( as Daemon )</a></li>
<li><a href="#gobblin-system-configurations">Gobblin System Configurations</a></li>
</ul>
</div>
<h2 id="gobblin-commands-execution-modes">Gobblin Commands &amp; Execution Modes</h2>
<p>The Gobblin distribution comes with a script <code>./bin/gobblin</code> for all commands and services.
Here is the usage: </p>
<pre><code class="bash">Usage:
gobblin.sh cli &lt;cli-command&gt; &lt;params&gt;
gobblin.sh service &lt;execution-mode&gt; &lt;start|stop|status&gt;
Use &quot;gobblin &lt;cli|service&gt; --help&quot; for more information. (Gobblin Version: 0.15.0)
</code></pre>
<p>For Gobblin CLI commands, run following: </p>
<pre><code class="bash">Usage:
gobblin.sh cli &lt;cli-command&gt; &lt;params&gt;
options:
cli-commands:
passwordManager Encrypt or decrypt strings for the password manager.
decrypt Decryption utilities
run Run a Gobblin application.
config Query the config library
jobs Command line job info and operations
stateMigration Command line tools for migrating state store
job-state-to-json To convert Job state to JSON
cleaner Data retention utility
keystore Examine JCE Keystore files
watermarks Inspect streaming watermarks
job-store-schema-manager Database job history store schema manager
--conf-dir &lt;gobblin-conf-dir-path&gt; Gobblon config path. default is '$GOBBLIN_HOME/conf/&lt;exe-mode-name&gt;'.
--log4j-conf &lt;path-of-log4j-file&gt; default is '&lt;gobblin-conf-dir-path&gt;/&lt;execution-mode&gt;/log4j.properties'.
--jvmopts &lt;jvm or gc options&gt; String containing JVM flags to include, in addition to &quot;-Xmx1g -Xms512m&quot;.
--jars &lt;csv list of extra jars&gt; Column-separated list of extra jars to put on the CLASSPATH.
--enable-gc-logs enables gc logs &amp; dumps.
--show-classpath prints gobblin runtime classpath.
--help Display this help.
--verbose Display full command used to start the process.
Gobblin Version: 0.15.0
</code></pre>
<p>Argument details:
<em> <code>--conf-dir</code>: specifies the path to directory containing gobblin system configuration files, like <code>application.conf</code> or <code>reference.conf</code>, <code>log4j.properties</code> and <code>quartz.properties</code>.
</em> <code>--log4j-conf</code>: specify the path of log4j config file to override the one in config directory (default is <code>&lt;conf&gt;/&lt;gobblin-mode&gt;/log4j.properties</code>. Gobblin uses <a href="http://www.slf4j.org/" rel="nofollow">SLF4J</a> and the <a href="http://mvnrepository.com/artifact/org.slf4j/slf4j-log4j12" rel="nofollow">slf4j-log4j12</a> binding for logging.
<em> <code>--jvmopts</code>: to specify any JVM parameters, default is <code>-Xmx1g -Xms512m</code>.
</em> <code>--enable-gc-logs</code>: adds GC options to JVM parameters: <code>-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+UseCompressedOops -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$GOBBLIN_LOGS/ -Xloggc:$GOBBLIN_LOGS/gobblin-$GOBBLIN_MODE-gc.log</code>
<em> <code>--show-classpath</code>: It prints the full value of the classpath that gobblin uses.
</em> all other arguments are self-explanatory.</p>
<h2 id="gobblin-commands">Gobblin Commands</h2>
<p>Gobblin provides following CLI commands:</p>
<pre><code>Available commands:
job-state-to-json To convert Job state to JSON
jobs Command line job info and operations
passwordManager Encrypt or decrypt strings for the password manager.
run Run a Gobblin application.
decrypt Decryption utilities
job-store-schema-manager Database job history store schema manager
stateMigration Command line tools for migrating state store
keystore Examine JCE Keystore files
config Query the config library
watermarks Inspect streaming watermarks
cleaner Data retention utility
</code></pre>
<p>Details on how to use <code>run</code> command: </p>
<p>Gobblin ingestion applications can be accessed through the following command:</p>
<pre><code>gobblin cli run [listQuickApps] [&lt;quick-app&gt;] -jobName &lt;jobName&gt; [OPTIONS]
</code></pre>
<p>For usage run <code>./bin/gobblin cli run</code>.</p>
<p><code>gobblin cli run</code> uses <a href="../Gobblin-as-a-Library/">Embedded Gobblin</a> and subclasses to run Gobblin ingestion jobs, giving CLI access to most functionality that could be achieved using <code>EmbeddedGobblin</code>. For example, the following command will run a Hello World job (it will print "Hello World 1 !" somewhere in the logs).</p>
<pre><code>gobblin cli run -jobName helloWorld -setTemplate resource:///templates/hello-world.template
</code></pre>
<p>Obviously, it is daunting to have to know the path to templates and exactly which configurations to set. The alternative is to use a quick app. Running:</p>
<pre><code>gobblin cli run listQuickApps
</code></pre>
<p>will provide with a list of available quick apps. To run a quick app:</p>
<pre><code>gobblin cli run &lt;quick-app-name&gt;
</code></pre>
<p>Quick apps may require additional arguments. For the usage of a particular app, run <code>bin/gobblin cli run &lt;quick-app-name&gt; -h</code>.</p>
<h2 id="the-distcp-quick-app">The Distcp Quick App</h2>
<p>For example, consider the quick app distcp:</p>
<pre><code class="bash">$ gobblin cli run distcp -h
usage: gobblin cli run distcp [OPTIONS] &lt;source&gt; &lt;target&gt;
-delete Delete files in target that don't exist
on source.
-deleteEmptyParentDirectories If deleting files on target, also delete
newly empty parent directories.
-distributeJar &lt;arg&gt;
-h,--help
-l Uses log to print out erros in the base CLI code.
-mrMode
-setConfiguration &lt;arg&gt;
-setJobTimeout &lt;arg&gt;
-setLaunchTimeout &lt;arg&gt;
-setShutdownTimeout &lt;arg&gt;
-simulate
-update Specifies files should be updated if they're different in the source.
-useStateStore &lt;arg&gt;
</code></pre>
<p>This provides usage for the app distcp, as well as listing all available options. Distcp could then be run:</p>
<pre><code class="bash">gobblin cli run distcp file:///source/path file:///target/path
</code></pre>
<h2 id="the-oneshot-quick-app">The OneShot Quick App</h2>
<p>The Gobblin cli also ships with a generic job runner, the <strong>oneShot</strong> quick app. You can use it to run a single job using a standard config file. This is very useful during development, testing and also makes it easy to integrate with schedulers that just need to fire off a command line job. The <strong>oneShot</strong> app allows you to run a job in standalone mode or in map-reduce mode.</p>
<pre><code class="bash">$ gobblin cli run oneShot -baseConf &lt;base-config-file&gt; -appConf &lt;path-to-job-conf-file&gt;
# The Base Config file is an optional parameter and contains defaults for your mode of
# execution (e.g. standalone modes would typically use
# gobblin-dist/conf/standalone/application.conf and
# mapreduce mode would typically use gobblin-dist/conf/mapreduce/application.conf)
#
# The Job Config file is your regular .pull or .conf file and is a required parameter.
# You should use a fully qualified URI to your pull file. Otherwise Gobblin will pick the
# default FS configured in the environment, which may not be what you want.
# e.g file:///gobblin-conf/my-job/wikipedia.pull or hdfs:///gobblin-conf/my-job/kafka-hdfs.pull
</code></pre>
<p>The <strong>oneShot</strong> app comes with certain hardcoded defaults (that it inherits from EmbeddedGobblin <a href="https://github.com/apache/incubator-gobblin/blob/master/gobblin-runtime/src/main/resources/embedded/embedded.conf" rel="nofollow">here</a>), that you may not be expecting. Make sure you understand what they do and override them in your baseConf or appConf files if needed.</p>
<p>Notable differences at the time of this writing include:</p>
<ul>
<li>state.store.enabled = false (set this to true in your appConfig or baseConfig if you want state storage for repeated oneshot runs)</li>
<li>data.publisher.appendExtractToFinalDir = false (set this to true in your appConfig or baseConfig if you want to see the extract name appended to the job output directory)</li>
</ul>
<p>The <strong>oneShot</strong> app allows for specifying the log4j file of your job execution which can be very helpful while debugging pesky failures.
You can launch the job in MR-Mode by using the -mrMode switch.</p>
<ul>
<li>oneShot execution of standalone with a log4j file.</li>
</ul>
<pre><code class="bash">$ gobblin cli run oneShot -baseConf /app/gobblin-dist/conf/standalone/application.conf -appConf file:///app/kafkaConfDir/kafka-simple-hdfs.pull --log4j-conf /app/gobblin-dist/conf/standalone/log4j.properties
</code></pre>
<ul>
<li>oneShot execution of map-reduce job with a log4j file</li>
</ul>
<pre><code class="bash">$ gobblin cli run oneShot -mrMode -baseConf /app/gobblin-dist/conf/standalone/application.conf -appConf file:///app/kafkaConfDir/kafka-simple-hdfs.pull --log4j-conf /app/gobblin-dist/conf/standalone/log4j.properties
</code></pre>
<h2 id="developing-quick-apps-for-the-cli">Developing quick apps for the CLI</h2>
<p>It is very easy to convert a subclass of <code>EmbeddedGobblin</code> into a quick application for Gobblin CLI. All that is needed is to implement a <code>EmbeddedGobblinCliFactory</code> which knows how instantiate the <code>EmbeddedGobblin</code> from a <code>CommandLine</code> object and annotate it with the <code>Alias</code> annotation. There are two utility classes that make this very easy:</p>
<ul>
<li><code>PublicMethodsGobblinCliFactory</code>: this class will automatically infer CLI options from the public methods of a subclass of <code>EmbeddedGobblin</code>. All the developer has to do is implement the method <code>constructEmbeddedGobblin(CommandLine)</code> that calls the appropriate constructor of the desired <code>EmbeddedGobblin</code> subclass with parameters extracted from the CLI. Additionally, it is a good idea to override <code>getUsageString()</code> with the appropriate usage string. For an example, see <code>gobblin.runtime.embedded.EmbeddedGobblinDistcp.CliFactory</code>.</li>
<li><code>ConstructorAndPublicMethodsGobblinCliFactory</code>: this class does everything <code>PublicMethodsGobblinCliFactory</code> does, but it additionally automatically infers how to construct the <code>EmbeddedGobblin</code> object from a constructor annotated with <code>EmbeddedGobblinCliSupport</code>. For an example, see <code>gobblin.runtime.embedded.EmbeddedGobblin.CliFactory</code>.</li>
</ul>
<h2 id="implementing-new-gobblin-commands">Implementing new Gobblin commands</h2>
<p>To implement a new Gobblin command to list and execute using <code>./bin/gobblin</code>, implement the class <code>gobblin.runtime.cli.CliApplication</code>, and annotate it with the <code>Alias</code> annotation. The Gobblin CLI will automatically find the command, and users can invoke it by the Alias value.</p>
<h2 id="gobblin-service-execution-modes-as-daemon">Gobblin Service Execution Modes ( as Daemon )</h2>
<p>For more info on Gobblin service execution modes, run <code>bin/gobblin service --help</code>: </p>
<pre><code class="bash">Usage:
gobblin.sh service &lt;execution-mode&gt; &lt;start|stop|status&gt;
Argument Options:
&lt;execution-mode&gt; standalone, cluster-master, cluster-worker, aws,
yarn, mapreduce, service-manager.
--conf-dir &lt;gobblin-conf-dir-path&gt; Gobblon config path. default is '$GOBBLIN_HOME/conf/&lt;exe-mode-name&gt;'.
--log4j-conf &lt;path-of-log4j-file&gt; default is '&lt;gobblin-conf-dir-path&gt;/&lt;execution-mode&gt;/log4j.properties'. --jvmopts &lt;jvm or gc options&gt; String containing JVM flags to include, in addition to &quot;-Xmx1g -Xms512m&quot;.
--jars &lt;csv list of extra jars&gt; Column-separated list of extra jars to put on the CLASSPATH.
--enable-gc-logs enables gc logs &amp; dumps.
--show-classpath prints gobblin runtime classpath.
--cluster-name Name of the cluster to be used by helix &amp; other services. ( default: gobblin_cluster).
--jt &lt;resource manager URL&gt; Only for mapreduce mode: Job submission URL, if not set, taken from ${HADOOP_HOME}/conf.
--fs &lt;file system URL&gt; Only for mapreduce mode: Target file system, if not set, taken from ${HADOOP_HOME}/conf.
--help Display this help.
--verbose Display full command used to start the process.
Gobblin Version: 0.15.0
</code></pre>
<ol>
<li>
<p>Standalone:
This mode starts all Gobblin services in single JVM on a single node. This mode is useful for development and light weight usage:
<code>gobblin service standalone start</code>
For more details and architecture on each execution mode, refer <a href="/gobblin-docs/user-guide/Gobblin-Deployment.md">Standalone-Deployment</a></p>
</li>
<li>
<p>Mapreduce:</p>
<p>This mode is dependent on Hadoop (both MapReduce and HDFS) running locally or remote cluster. Before launching any Gobblin jobs on Hadoop MapReduce, check the Gobblin system configuration file located at <code>conf/mapreduce/application.properties</code> for property <code>fs.uri</code>, which defines the file system URI used. The default value is <code>hdfs://localhost:8020</code>, which points to the local HDFS on the default port 8020. Change it to the right value depending on your Hadoop/HDFS setup. For example, if you have HDFS setup somwhere on port 9000, then set the property as follows:
<code>fs.uri=hdfs://&lt;namenode host name&gt;:9000/</code>
<em> <code>--jt</code>: resource manager URL
</em> <code>--fs</code>: file system type value for <code>fs.uri</code></p>
<p>This mode will have the minimum set of Gobblin jars, selected using <code>libs/gobblin-&lt;module_name&gt;-$GOBBLIN_VERSION.jar</code>, which is passed as <code>-libjar</code> to hadoop command while running the job. These same set of jars also gets added to the Hadoop <code>DistributedCache</code> for use in the mappers. If a job has additional jars needed for task executions (in the mappers), those jars can also be included by using the <code>--jars</code> option or the following job configuration property in the job configuration file:
<code>job.jars=&lt;comma-separated list of jars the job depends on&gt;</code>
if <code>HADOOP_HOME</code> is set in the environment, Gobblin will add result of <code>hadoop classpath</code> prior to default <code>GOBBLIN_CLASSPATH</code> to give them precedence while running <code>bin/gobblin</code>. </p>
<p>All job data and persisted job/task states will be written to the specified file system. Before launching any jobs, make sure the environment variable <code>HADOOP_HOME</code> is set so that it can access hadoop binaries under <code>{HADOOP_HOME}/bin</code> and also working directory should be set with configuration <code>{gobblin.cluster.work.dir}</code>. Note that the Gobblin working directory will be created on the file system specified above.</p>
<p>An important side effect of this is that (depending on the application) non-fully-qualified paths (like <code>/my/path</code>) will default to local file system if <code>HADOOP_HOME</code> is not set, while they will default to HDFS if the variable is set. When referring to local paths, it is always a good idea to use the fully qualified path (e.g. <code>file:///my/path</code>).</p>
</li>
<li>
<p>Cluster Mode (master &amp; worker)
This is a cluster mode consist of master and worker process.
<code>gobblin service cluster-master start
gobblin service cluster-worker start</code></p>
</li>
<li>
<p>AWS
This mode starts Gobblin on AWS cloud cluster.
<code>gobblin service aws start</code></p>
</li>
<li>
<p>YARN
This mode starts Gobblin on YARN cluster.
<code>gobblin service yarn start</code></p>
</li>
</ol>
<h2 id="gobblin-system-configurations">Gobblin System Configurations</h2>
<p>Following values can be override by setting it in <code>gobblin-env.sh</code></p>
<p><code>GOBBLIN_LOGS</code> : by default the logs are written to <code>$GOBBLIN_HOME/logs</code>, it can be override by setting <code>GOBBLIN_LOGS</code>\
<code>GOBBLIN_VERSION</code> : by default gobblin version is set by the build process, it can be override by setting <code>GOBBLIN_VERSION</code>\</p>
<p>All Gobblin system configurations details can be found here: <a href="user-guide/Configuration-Properties-Glossary">Configuration Properties Glossary</a>.</p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../Gobblin-Compliance/" class="btn btn-neutral float-right" title="Gobblin Compliance">Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../Gobblin-as-a-Library/" class="btn btn-neutral" title="Gobblin as a Library"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org" rel="nofollow">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme" rel="nofollow">theme</a> provided by <a href="https://readthedocs.org" rel="nofollow">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<span><a href="../Gobblin-as-a-Library/" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../Gobblin-Compliance/" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
<script>var base_url = '../..';</script>
<script src="../../js/theme.js" defer></script>
<script src="../../js/extra.js" defer></script>
<script src="../../search/main.js" defer></script>
</body>
</html>