blob: 82f98fa89285ae47fb1f991966cdff639949fda8 [file] [log] [blame]
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- The above 3 meta tags *must* come first in the head; any other head content must come *after* these tags -->
<title>Apache Flink 0.9.0 Documentation: Quickstart: Setup</title>
<link rel="shortcut icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon">
<link rel="icon" href="http://flink.apache.org/docs/0.9/page/favicon.ico" type="image/x-icon">
<!-- Bootstrap -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/css/bootstrap.min.css">
<link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/flink.css">
<link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/syntax.css">
<link rel="stylesheet" href="http://flink.apache.org/docs/0.9/page/css/codetabs.css">
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<!-- Top navbar. -->
<nav class="navbar navbar-default navbar-fixed-top">
<div class="container">
<!-- The logo. -->
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#bs-example-navbar-collapse-1">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<div class="navbar-logo">
<a href="http://flink.apache.org"><img alt="Apache Flink" src="http://flink.apache.org/docs/0.9/page/img/navbar-brand-logo.jpg"></a>
</div>
</div><!-- /.navbar-header -->
<!-- The navigation links. -->
<div class="collapse navbar-collapse" id="bs-example-navbar-collapse-1">
<ul class="nav navbar-nav">
<li><a href="http://flink.apache.org/docs/0.9/index.html">Overview<span class="hidden-sm hidden-xs"> 0.9.0</span></a></li>
<!-- Setup -->
<li class="dropdown">
<a href="http://flink.apache.org/docs/0.9/setup" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Setup <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="http://flink.apache.org/docs/0.9/setup/building.html">Get Flink 0.9-SNAPSHOT</a></li>
<li class="divider"></li>
<li role="presentation" class="dropdown-header"><strong>Deployment</strong></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/local_setup.html" class="active">Local</a></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/cluster_setup.html">Cluster (Standalone)</a></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/yarn_setup.html">YARN</a></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/gce_setup.html">GCloud</a></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/flink_on_tez.html">Flink on Tez <span class="badge">Beta</span></a></li>
<li class="divider"></li>
<li><a href="http://flink.apache.org/docs/0.9/setup/config.html">Configuration</a></li>
</ul>
</li>
<!-- Programming Guides -->
<li class="dropdown">
<a href="http://flink.apache.org/docs/0.9/apis" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Programming Guides <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="http://flink.apache.org/docs/0.9/apis/programming_guide.html"><strong>Batch: DataSet API</strong></a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/streaming_guide.html"><strong>Streaming: DataStream API</strong> <span class="badge">Beta</span></a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/python.html">Python API <span class="badge">Beta</span></a></li>
<li class="divider"></li>
<li><a href="scala_shell.html">Interactive Scala Shell</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/dataset_transformations.html">Dataset Transformations</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/best_practices.html">Best Practices</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/example_connectors.html">Connectors</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/examples.html">Examples</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/local_execution.html">Local Execution</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/cluster_execution.html">Cluster Execution</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/cli.html">Command Line Interface</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/web_client.html">Web Client</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/iterations.html">Iterations</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/java8.html">Java 8</a></li>
<li><a href="http://flink.apache.org/docs/0.9/apis/hadoop_compatibility.html">Hadoop Compatability <span class="badge">Beta</span></a></li>
</ul>
</li>
<!-- Libraries -->
<li class="dropdown">
<a href="http://flink.apache.org/docs/0.9/libs" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Libraries <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li><a href="http://flink.apache.org/docs/0.9/libs/spargel_guide.html">Graphs: Spargel</a></li>
<li><a href="http://flink.apache.org/docs/0.9/libs/gelly_guide.html">Graphs: Gelly <span class="badge">Beta</span></a></li>
<li><a href="http://flink.apache.org/docs/0.9/libs/ml/">Machine Learning <span class="badge">Beta</span></a></li>
<li><a href="http://flink.apache.org/docs/0.9/libs/table.html">Relational: Table <span class="badge">Beta</span></a></li>
</ul>
</li>
<!-- Internals -->
<li class="dropdown">
<a href="http://flink.apache.org/docs/0.9/internals" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-expanded="false">Internals <span class="caret"></span></a>
<ul class="dropdown-menu" role="menu">
<li role="presentation" class="dropdown-header"><strong>Contribute</strong></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/how_to_contribute.html">How to Contribute</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/coding_guidelines.html">Coding Guidelines</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/ide_setup.html">IDE Setup</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/logging.html">Logging</a></li>
<li class="divider"></li>
<li role="presentation" class="dropdown-header"><strong>Internals</strong></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/general_arch.html">Architecture &amp; Process Model</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/types_serialization.html">Type Extraction &amp; Serialization</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/job_scheduling.html">Jobs &amp; Scheduling</a></li>
<li><a href="http://flink.apache.org/docs/0.9/internals/add_operator.html">How-To: Add an Operator</a></li>
</ul>
</li>
</ul>
<form class="navbar-form navbar-right hidden-sm hidden-md" role="search" action="http://flink.apache.org/docs/0.9/search-results.html">
<div class="form-group">
<input type="text" class="form-control" name="q" placeholder="Search all pages">
</div>
<button type="submit" class="btn btn-default">Search</button>
</form>
</div><!-- /.navbar-collapse -->
</div><!-- /.container -->
</nav>
<!-- Main content. -->
<div class="container">
<div class="row">
<div class="col-sm-10 col-sm-offset-1">
<h1>Quickstart: Setup</h1>
<ul id="markdown-toc">
<li><a href="#requirements" id="markdown-toc-requirements">Requirements</a></li>
<li><a href="#download" id="markdown-toc-download">Download</a></li>
<li><a href="#start" id="markdown-toc-start">Start</a></li>
<li><a href="#run-example" id="markdown-toc-run-example">Run Example</a></li>
<li><a href="#cluster-setup" id="markdown-toc-cluster-setup">Cluster Setup</a></li>
<li><a href="#flink-on-yarn" id="markdown-toc-flink-on-yarn">Flink on YARN</a></li>
</ul>
<p>Get Flink up and running in a few simple steps.</p>
<h2 id="requirements">Requirements</h2>
<p>Flink runs on <strong>Linux, Mac OS X, and Windows</strong>. To be able to run Flink, the
only requirement is to have a working <strong>Java 6.x</strong> (or higher)
installation. Windows users, please take a look at the
<a href="local_setup.html#flink-on-windows">Flink on Windows</a> guide which describes
how to run Flink on Windows for local setups.</p>
<h2 id="download">Download</h2>
<p>Download the ready to run binary package. Choose the Flink distribution that <strong>matches your Hadoop version</strong>. If you are unsure which version to choose or you just want to run locally, pick the package for Hadoop 1.2.</p>
<ul class="nav nav-tabs">
<li class="active"><a href="#bin-hadoop1" data-toggle="tab">Hadoop 1.2</a></li>
<li><a href="#bin-hadoop2" data-toggle="tab">Hadoop 2 (YARN)</a></li>
</ul>
<p>
<div class="tab-content text-center">
<div class="tab-pane active" id="bin-hadoop1">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-1',this.href]);" href="http://www.apache.org/dyn/closer.cgi/flink/flink-0.9.0/flink-0.9.0-bin-hadoop1.tgz"><i class="icon-download"> </i> Download Flink for Hadoop 1.2</a>
</div>
<div class="tab-pane" id="bin-hadoop2">
<a class="btn btn-info btn-lg" onclick="_gaq.push(['_trackEvent','Action','download-quickstart-setup-2',this.href]);" href="http://www.apache.org/dyn/closer.cgi/flink/flink-0.9.0/flink-0.9.0-bin-hadoop2.tgz"><i class="icon-download"> </i> Download Flink for Hadoop 2</a>
</div>
</div>
</p>
<h2 id="start">Start</h2>
<ol>
<li>Go to the download directory.</li>
<li>Unpack the downloaded archive.</li>
<li>Start Flink.</li>
</ol>
<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span><span class="nb">cd</span> ~/Downloads <span class="c"># Go to download directory</span>
<span class="nv">$ </span>tar xzf flink-*.tgz <span class="c"># Unpack the downloaded archive</span>
<span class="nv">$ </span><span class="nb">cd </span>flink-0.9.0
<span class="nv">$ </span>bin/start-local.sh <span class="c"># Start Flink</span></code></pre></div>
<p>Check the <strong>JobManager’s web frontend</strong> at <a href="http://localhost:8081">http://localhost:8081</a> and make
sure everything is up and running.</p>
<p>Instead of starting Flink with <code>bin/start-local.sh</code> you can also start Flink in an streaming optimized
mode, using <code>bin/start-local-streaming.sh</code>.</p>
<h2 id="run-example">Run Example</h2>
<p>Run the <strong>Word Count example</strong> to see Flink at work.</p>
<ul>
<li>
<p><strong>Download test data</strong>:</p>
<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>wget -O hamlet.txt http://www.gutenberg.org/cache/epub/1787/pg1787.txt</code></pre></div>
</li>
<li>You now have a text file called <em>hamlet.txt</em> in your working directory.</li>
<li>
<p><strong>Start the example program</strong>:</p>
<div class="highlight"><pre><code class="language-bash"><span class="nv">$ </span>bin/flink run ./examples/flink-java-examples-0.9.0-WordCount.jar file://<span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/hamlet.txt file://<span class="sb">`</span><span class="nb">pwd</span><span class="sb">`</span>/wordcount-result.txt</code></pre></div>
</li>
<li>You will find a file called <strong>wordcount-result.txt</strong> in your current directory.</li>
</ul>
<h2 id="cluster-setup">Cluster Setup</h2>
<p><strong>Running Flink on a cluster</strong> is as easy as running it locally. Having <strong>passwordless SSH</strong> and
<strong>the same directory structure</strong> on all your cluster nodes lets you use our scripts to control
everything.</p>
<ol>
<li>Copy the unpacked <strong>flink</strong> directory from the downloaded archive to the same file system path
on each node of your setup.</li>
<li>Choose a <strong>master node</strong> (JobManager) and set the <code>jobmanager.rpc.address</code> key in
<code>conf/flink-conf.yaml</code> to its IP or hostname. Make sure that all nodes in your cluster have the same
<code>jobmanager.rpc.address</code> configured.</li>
<li>Add the IPs or hostnames (one per line) of all <strong>worker nodes</strong> (TaskManager) to the slaves files
in <code>conf/slaves</code>.</li>
</ol>
<p>You can now <strong>start the cluster</strong> at your master node with <code>bin/start-cluster.sh</code>. If you are planning
to run only streaming jobs with Flink, you can also an optimized streaming mode: <code>start-cluster-streaming.sh</code>.</p>
<p>The following <strong>example</strong> illustrates the setup with three nodes (with IP addresses from <em>10.0.0.1</em>
to <em>10.0.0.3</em> and hostnames <em>master</em>, <em>worker1</em>, <em>worker2</em>) and shows the contents of the
configuration files, which need to be accessible at the same path on all machines:</p>
<div class="row">
<div class="col-md-6 text-center">
<img src="http://flink.apache.org/docs/0.9/page/img/quickstart_cluster.png" style="width: 85%" />
</div>
<div class="col-md-6">
<div class="row">
<p class="lead text-center">
/path/to/<strong>flink/conf/<br />flink-conf.yaml</strong>
<pre>jobmanager.rpc.address: 10.0.0.1</pre>
</p>
</div>
<div class="row" style="margin-top: 1em;">
<p class="lead text-center">
/path/to/<strong>flink/<br />conf/slaves</strong>
<pre>
10.0.0.2
10.0.0.3</pre>
</p>
</div>
</div>
</div>
<p>Have a look at the <a href="config.html">Configuration</a> section of the documentation to see other available configuration options.
For Flink to run efficiently, a few configuration values need to be set.</p>
<p>In particular,</p>
<ul>
<li>the amount of available memory per TaskManager (<code>taskmanager.heap.mb</code>),</li>
<li>the number of available CPUs per machine (<code>taskmanager.numberOfTaskSlots</code>),</li>
<li>the total number of CPUs in the cluster (<code>parallelism.default</code>) and</li>
<li>the temporary directories (<code>taskmanager.tmp.dirs</code>)</li>
</ul>
<p>are very important configuration values.</p>
<h2 id="flink-on-yarn">Flink on YARN</h2>
<p>You can easily deploy Flink on your existing <strong>YARN cluster</strong>.</p>
<ol>
<li>Download the <strong>Flink Hadoop2 package</strong>: <a href="http://www.apache.org/dyn/closer.cgi/flink/flink-0.9.0/flink-0.9.0-bin-hadoop2.tgz">Flink with Hadoop 2</a></li>
<li>Make sure your <strong>HADOOP_HOME</strong> (or <em>YARN_CONF_DIR</em> or <em>HADOOP_CONF_DIR</em>) <strong>environment variable</strong> is set to read your YARN and HDFS configuration.</li>
<li>Run the <strong>YARN client</strong> with: <code>./bin/yarn-session.sh</code>. You can run the client with options <code>-n 10 -tm 8192</code> to allocate 10 TaskManagers with 8GB of memory each.</li>
</ol>
<p>For <strong>more detailed instructions</strong>, check out the programming Guides and examples.</p>
</div>
<div class="col-sm-10 col-sm-offset-1">
<!-- Disqus thread and some vertical offset -->
<div style="margin-top: 75px; margin-bottom: 50px" id="disqus_thread"></div>
</div>
</div>
</div><!-- /.container -->
<!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.2/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
<script src="http://flink.apache.org/docs/0.9/page/js/codetabs.js"></script>
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-52545728-1', 'auto');
ga('send', 'pageview');
</script>
<!-- Disqus -->
<script type="text/javascript">
var disqus_shortname = 'stratosphere-eu';
(function() {
var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
dsq.src = '//' + disqus_shortname + '.disqus.com/embed.js';
(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
})();
</script>
</body>
</html>