blob: d6ad16b7ad004d90f8b8036f6cae9b876e4d8b20 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Getting Started</title>
<meta name="description" content="Prerequisites Java 8 Maven Latest REEF snapshot YARN settings Download Hadoop 2.7.4 at http://apache.tt.co.kr/hadoop/common/hadoop-2.7.4/ S...">
<link rel="stylesheet" href="/css/main.css">
<link rel="stylesheet" href="/css/font-awesome.min.css">
<link rel="shortcut icon" href="/favicon.ico?1">
<!-- Begin Jekyll SEO tag v2.3.0 -->
<title>Getting Started | Nemo</title>
<meta property="og:title" content="Getting Started" />
<meta property="og:locale" content="en_US" />
<meta name="description" content="Prerequisites Java 8 Maven Latest REEF snapshot YARN settings Download Hadoop 2.7.4 at http://apache.tt.co.kr/hadoop/common/hadoop-2.7.4/ Set the shell profile as following: export HADOOP_HOME=/path/to/hadoop-2.7.4 export YARN_HOME=$HADOOP_HOME export PATH=$PATH:$HADOOP_HOME/bin Protobuf 2.5.0 Downloadable at https://github.com/google/protobuf/releases/tag/v2.5.0 On Ubuntu: Run sudo apt-get install autoconf automake libtool curl make g++ unzip Extract the downloaded tarball and run sudo ./configure sudo make sudo make check sudo make install 3. To check for a successful installation of version 2.5.0, run protoc --version" />
<meta property="og:description" content="Prerequisites Java 8 Maven Latest REEF snapshot YARN settings Download Hadoop 2.7.4 at http://apache.tt.co.kr/hadoop/common/hadoop-2.7.4/ Set the shell profile as following: export HADOOP_HOME=/path/to/hadoop-2.7.4 export YARN_HOME=$HADOOP_HOME export PATH=$PATH:$HADOOP_HOME/bin Protobuf 2.5.0 Downloadable at https://github.com/google/protobuf/releases/tag/v2.5.0 On Ubuntu: Run sudo apt-get install autoconf automake libtool curl make g++ unzip Extract the downloaded tarball and run sudo ./configure sudo make sudo make check sudo make install 3. To check for a successful installation of version 2.5.0, run protoc --version" />
<link rel="canonical" href="http://nemo.apache.org//docs/getting_started/" />
<meta property="og:url" content="http://nemo.apache.org//docs/getting_started/" />
<meta property="og:site_name" content="Nemo" />
<meta property="og:type" content="article" />
<meta property="article:published_time" content="2018-03-23T11:27:52+09:00" />
<script type="application/ld+json">
{"name":null,"description":"Prerequisites Java 8 Maven Latest REEF snapshot YARN settings Download Hadoop 2.7.4 at http://apache.tt.co.kr/hadoop/common/hadoop-2.7.4/ Set the shell profile as following: export HADOOP_HOME=/path/to/hadoop-2.7.4 export YARN_HOME=$HADOOP_HOME export PATH=$PATH:$HADOOP_HOME/bin Protobuf 2.5.0 Downloadable at https://github.com/google/protobuf/releases/tag/v2.5.0 On Ubuntu: Run sudo apt-get install autoconf automake libtool curl make g++ unzip Extract the downloaded tarball and run sudo ./configure sudo make sudo make check sudo make install 3. To check for a successful installation of version 2.5.0, run protoc --version","author":null,"@type":"WebPage","url":"http://nemo.apache.org//docs/getting_started/","publisher":null,"image":null,"headline":"Getting Started","dateModified":"2018-03-23T11:27:52+09:00","datePublished":"2018-03-23T11:27:52+09:00","sameAs":null,"mainEntityOfPage":null,"@context":"http://schema.org"}</script>
<!-- End Jekyll SEO tag -->
<link rel="canonical" href="http://nemo.apache.org//docs/getting_started/">
<link rel="alternate" type="application/rss+xml" title="Nemo" href="http://nemo.apache.org//feed.xml" />
</head>
<body>
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container navbar-container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="/">
<span><img src="/img/nemo-logo.png"></span>
</a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li class="active" ><a href="/docs/home/">Docs</a></li>
<li ><a href="/apidocs">APIs</a></li>
<li ><a href="/pages/downloads">Downloads</a></li>
<li ><a href="/pages/talks">Talks</a></li>
<li ><a href="/pages/team">Team</a></li>
<li ><a href="/pages/license">License</a></li>
<li ><a href="/blog/2018/03/23/shuffle-on-nemo/">Blog</a></li>
</ul>
<div class="navbar-right">
<form class="navbar-form navbar-left">
<div class="form-group has-feedback">
<input id="search-box" type="text" class="form-control" placeholder="Search...">
<i class="fa fa-search form-control-feedback"></i>
</div>
</form>
<ul class="nav navbar-nav">
<li><a href="https://github.com/apache/incubator-nemo-website"><i class="fa fa-github" aria-hidden="true"></i></a></li>
</ul>
</div>
</div>
</div>
</nav>
<div class="page-content">
<div class="wrapper">
<div class="container">
<div class="row">
<div class="col-md-4">
<div class="panel-group" id="accordion" role="tablist" aria-multiselectable="true">
<div class="panel panel-default">
<div class="panel-heading">
<h4 class="panel-title">
<a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapse-1" aria-expanded="false" aria-controls="collapse-1">
Getting Started
</a>
</h4>
</div>
<div id="collapse-1" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne">
<ul class="list-group">
<a class="list-group-item " href="/docs/home/">Overview</a>
<a class="list-group-item active" href="/docs/getting_started/">Getting Started</a>
</ul>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">
<h4 class="panel-title">
<a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapse-2" aria-expanded="false" aria-controls="collapse-2">
Optimizations
</a>
</h4>
</div>
<div id="collapse-2" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne">
<ul class="list-group">
<a class="list-group-item " href="/docs/ir/">Nemo Intermediate Representation (IR)</a>
<a class="list-group-item " href="/docs/passes_and_policies/">Passes and Policies</a>
</ul>
</div>
</div>
<div class="panel panel-default">
<div class="panel-heading">
<h4 class="panel-title">
<a role="button" data-toggle="collapse" data-parent="#accordion" href="#collapse-3" aria-expanded="false" aria-controls="collapse-3">
System Designs
</a>
</h4>
</div>
<div id="collapse-3" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingOne">
<ul class="list-group">
<a class="list-group-item " href="/docs/compiler_design/">Compiler Design</a>
<a class="list-group-item " href="/docs/runtime_design/">Runtime Design</a>
</ul>
</div>
</div>
</div>
</div>
<div class="col-md-8">
<h1>Getting Started</h1>
<div id="markdown-content-container"><h3 id="prerequisites">Prerequisites</h3>
<ul>
<li>Java 8</li>
<li>Maven</li>
<li>Latest REEF snapshot</li>
<li>YARN settings
<ul>
<li>Download Hadoop 2.7.4 at http://apache.tt.co.kr/hadoop/common/hadoop-2.7.4/</li>
<li>Set the shell profile as following:
<div class="language-bash highlighter-rouge"><pre class="highlight"><code> <span class="nb">export </span><span class="nv">HADOOP_HOME</span><span class="o">=</span>/path/to/hadoop-2.7.4
<span class="nb">export </span><span class="nv">YARN_HOME</span><span class="o">=</span><span class="nv">$HADOOP_HOME</span>
<span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="nv">$PATH</span>:<span class="nv">$HADOOP_HOME</span>/bin
</code></pre>
</div>
</li>
</ul>
</li>
<li>Protobuf 2.5.0
<ul>
<li>Downloadable at https://github.com/google/protobuf/releases/tag/v2.5.0</li>
<li>On Ubuntu:
<ol>
<li>Run <code class="highlighter-rouge">sudo apt-get install autoconf automake libtool curl make g++ unzip</code></li>
<li>Extract the downloaded tarball and run</li>
</ol>
<ul>
<li><code class="highlighter-rouge">sudo ./configure</code></li>
<li><code class="highlighter-rouge">sudo make</code></li>
<li><code class="highlighter-rouge">sudo make check</code></li>
<li><code class="highlighter-rouge">sudo make install</code>
3. To check for a successful installation of version 2.5.0, run <code class="highlighter-rouge">protoc --version</code></li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="installing-nemo">Installing Nemo</h3>
<ul>
<li>Run all tests and install: <code class="highlighter-rouge">mvn clean install -T 2C</code></li>
<li>Run only unit tests and install: <code class="highlighter-rouge">mvn clean install -DskipITs -T 2C</code></li>
</ul>
<h2 id="running-beam-applications">Running Beam applications</h2>
<h3 id="running-an-external-beam-application">Running an external Beam application</h3>
<ul>
<li>Use run_external_app.sh instead of run.sh</li>
<li>Set the first argument the path to the external Beam application jar</li>
</ul>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>./bin/run_external_app.sh <span class="se">\</span>
<span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/nemo_app/target/bd17f-1.0-SNAPSHOT.jar <span class="se">\</span>
-job_id mapreduce <span class="se">\</span>
-executor_json <span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/nemo_runtime/config/default.json <span class="se">\</span>
-user_main MapReduce <span class="se">\</span>
-user_args <span class="s2">"</span><span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span><span class="s2">/mr_input_data </span><span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span><span class="s2">/nemo_output/output_data"</span>
</code></pre>
</div>
<h3 id="configurable-options">Configurable options</h3>
<ul>
<li><code class="highlighter-rouge">-job_id</code>: ID of the Beam job</li>
<li><code class="highlighter-rouge">-user_main</code>: Canonical name of the Beam application</li>
<li><code class="highlighter-rouge">-user_args</code>: Arguments that the Beam application accepts</li>
<li><code class="highlighter-rouge">-optimization_policy</code>: Canonical name of the optimization policy to apply to a job DAG in Nemo Compiler</li>
<li><code class="highlighter-rouge">-deploy_mode</code>: <code class="highlighter-rouge">yarn</code> is supported(default value is <code class="highlighter-rouge">local</code>)</li>
</ul>
<h3 id="examples">Examples</h3>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code><span class="c">## MapReduce example</span>
./bin/run.sh <span class="se">\</span>
-job_id mr_default <span class="se">\</span>
-user_main edu.snu.nemo.examples.beam.MapReduce <span class="se">\</span>
-optimization_policy edu.snu.nemo.compiler.optimizer.policy.DefaultPolicy <span class="se">\</span>
-user_args <span class="s2">"</span><span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span><span class="s2">/src/main/resources/sample_input_mr </span><span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span><span class="s2">/src/main/resources/sample_output"</span>
<span class="c">## YARN cluster example</span>
./bin/run.sh <span class="se">\</span>
-deploy_mode yarn <span class="se">\</span>
-job_id mr_pado <span class="se">\</span>
-user_main edu.snu.nemo.examples.beam.MapReduce <span class="se">\</span>
-optimization_policy edu.snu.nemo.compiler.optimizer.policy.PadoPolicy <span class="se">\</span>
-user_args <span class="s2">"hdfs://v-m:9000/sample_input_mr hdfs://v-m:9000/sample_output_mr"</span>
</code></pre>
</div>
<h2 id="resource-configuration">Resource Configuration</h2>
<p><code class="highlighter-rouge">-executor_json</code> command line option can be used to provide a path to the JSON file that describes resource configuration for executors. Its default value is <code class="highlighter-rouge">config/default.json</code>, which initializes one of each <code class="highlighter-rouge">Transient</code>, <code class="highlighter-rouge">Reserved</code>, and <code class="highlighter-rouge">Compute</code> executor, each of which has one core and 1024MB memory.</p>
<h3 id="configurable-options-1">Configurable options</h3>
<ul>
<li><code class="highlighter-rouge">num</code> (optional): Number of containers. Default value is 1</li>
<li><code class="highlighter-rouge">type</code>: Three container types are supported:
<ul>
<li><code class="highlighter-rouge">Transient</code> : Containers that store eviction-prone resources. When batch jobs use idle resources in <code class="highlighter-rouge">Transient</code> containers, they can be arbitrarily evicted when latency-critical jobs attempt to use the resources.</li>
<li><code class="highlighter-rouge">Reserved</code> : Containers that store eviction-free resources. <code class="highlighter-rouge">Reserved</code> containers are used to reliably store intermediate data which have high eviction cost.</li>
<li><code class="highlighter-rouge">Compute</code> : Containers that are mainly used for computation.</li>
</ul>
</li>
<li><code class="highlighter-rouge">memory_mb</code>: Memory size in MB</li>
<li><code class="highlighter-rouge">capacity</code>: Number of <code class="highlighter-rouge">TaskGroup</code>s that can be run in an executor. Set this value to be the same as the number of CPU cores of the container.</li>
</ul>
<h3 id="examples-1">Examples</h3>
<div class="language-json highlighter-rouge"><pre class="highlight"><code><span class="p">[</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nt">"num"</span><span class="p">:</span><span class="w"> </span><span class="mi">12</span><span class="p">,</span><span class="w">
</span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Transient"</span><span class="p">,</span><span class="w">
</span><span class="nt">"memory_mb"</span><span class="p">:</span><span class="w"> </span><span class="mi">1024</span><span class="p">,</span><span class="w">
</span><span class="nt">"capacity"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="p">{</span><span class="w">
</span><span class="nt">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Reserved"</span><span class="p">,</span><span class="w">
</span><span class="nt">"memory_mb"</span><span class="p">:</span><span class="w"> </span><span class="mi">1024</span><span class="p">,</span><span class="w">
</span><span class="nt">"capacity"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre>
</div>
<p>This example configuration specifies</p>
<ul>
<li>12 transient containers with 4 cores and 1024MB memory each</li>
<li>1 reserved container with 2 cores and 1024MB memory</li>
</ul>
<h2 id="monitoring-your-job-using-web-ui">Monitoring your job using web UI</h2>
<p>Nemo Compiler and Runtime can store JSON representation of intermediate DAGs.</p>
<ul>
<li><code class="highlighter-rouge">-dag_dir</code> command line option is used to specify the directory where the JSON files are stored. The default directory is <code class="highlighter-rouge">./dag</code>.
Using our <a href="https://service.jangho.io/Nemo-dag/">online visualizer</a>, you can easily visualize a DAG. Just drop the JSON file of the DAG as an input to it.</li>
</ul>
<h3 id="examples-2">Examples</h3>
<div class="language-bash highlighter-rouge"><pre class="highlight"><code>./bin/run.sh <span class="se">\</span>
-job_id als <span class="se">\</span>
-user_main edu.snu.nemo.examples.beam.AlternatingLeastSquare <span class="se">\</span>
-optimization_policy edu.snu.nemo.compiler.optimizer.policy.PadoPolicy <span class="se">\</span>
-dag_dir <span class="s2">"./dag/als"</span> <span class="se">\</span>
-user_args <span class="s2">"</span><span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span><span class="s2">/src/main/resources/sample_input_als 10 3"</span>
</code></pre>
</div>
</div>
<p class="text-center">
<br />
<a target="_blank" href="https://github.com/apache/incubator-nemo-website/tree/asf-site/_docs/getting_started.md" class="btn btn-default btn-sm githubEditButton" role="button">
<i class="fa fa-pencil"></i> Improve this page
</a>
</p>
<hr>
<ul class="pager">
<li class="previous">
<a href="/docs/home/">
<span aria-hidden="true">&larr;</span> Previous
</a>
</li>
<li class="next">
<a href="/docs/ir/">
Next <span aria-hidden="true">&rarr;</span>
</a>
</li>
</div>
<div class="clear"></div>
</div>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="container">
<p class="text-center">
Nemo 2018 |
Powered by <a href="https://github.com/aksakalli/jekyll-doc-theme">Jekyll Doc Theme</a>
</p>
<!-- <p class="text-muted">Place sticky footer content here.</p> -->
</div>
</footer>
<script>
var baseurl = ''
</script>
<script src="//code.jquery.com/jquery-1.10.2.min.js"></script>
<script src="/js/bootstrap.min.js "></script>
<script src="/js/typeahead.bundle.min.js "></script>
<script src="/js/main.js "></script>
</body>
</html>