blob: 4cd31206205b0a41fed14a85337a7ee113151763 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Multi-Lang Protocol</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.3.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Multi-Lang Protocol</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><p>This page explains the multilang protocol as of Storm 0.7.1. Versions prior to 0.7.1 used a somewhat different protocol, documented [here](Storm-multi-language-protocol-(versions-0.7.0-and-below).html).</p>
<h1 id="storm-multi-language-protocol">Storm Multi-Language Protocol</h1>
<h2 id="supported-lanugages">Supported Lanugages</h2>
<p>Storm Multi-Language has implementation in the following languages:</p>
<ul>
<li><a href="https://github.com/apache/storm/tree/master/storm-multilang/javascript">JavaScript</a></li>
<li><a href="https://github.com/apache/storm/tree/master/storm-multilang/python">Python</a></li>
<li><a href="https://github.com/apache/storm/tree/master/storm-multilang/ruby">Ruby</a></li>
</ul>
<p>Third party libraries are available for the following languages:</p>
<ul>
<li><a href="https://github.com/Azure/net-storm-multilang-adapter">c# (on .net core 2.0)</a></li>
</ul>
<h2 id="shell-components">Shell Components</h2>
<p>Support for multiple languages is implemented via the ShellBolt,
ShellSpout, and ShellProcess classes. These classes implement the
IBolt and ISpout interfaces and the protocol for executing a script or
program via the shell using Java&#39;s ProcessBuilder class.</p>
<h3 id="packaging-of-shell-scripts">Packaging of shell scripts</h3>
<p>By default the ShellProcess assumes that your code is packaged inside of your topology jar under the resources subdirectory of your jar and by default will change the current working directory of
the executable process to be that resources directory extracted from the jar.
A jar file does not store permissions of the files in it. This includes the execute bit that would allow a shell script to be laoded and run by the operating systme.
As such in most examples the scripts are of the form <code>python mybolt.py</code> because the python executable is already on the supervisor and mybolt is packaged in the resources directory of the jar.</p>
<p>If you want to package something more complicated, like a new version of python itself, you need to instead use the blob store for this and a <code>.tgz</code> archive that does support permissions.</p>
<p>See the docs on the <a href="distcache-blobstore.html">Blob Store</a> for more details on how to ship a jar.</p>
<p>To make a ShellBolt/ShellSpout work with executables + scripts shipped in the blob store dist cache add</p>
<div class="highlight"><pre><code class="language-" data-lang="">changeChildCWD(false);
</code></pre></div>
<p>in the constructor of your ShellBolt/ShellSpout. The shell command will then be relative to the cwd of the worker. Where the sym-links to the resources are.</p>
<p>So if I shipped python with a symlink named <code>newPython</code> and a python ShellSpout I shipped into <code>shell_spout.py</code> I would have a something like</p>
<div class="highlight"><pre><code class="language-" data-lang="">public MyShellSpout() {
super("./newPython/bin/python", "./shell_spout.py");
changeChildCWD(false);
}
</code></pre></div>
<h2 id="output-fields">Output fields</h2>
<p>Output fields are part of the Thrift definition of the topology. This means that when you multilang in Java, you need to create a bolt that extends ShellBolt, implements IRichBolt, and declare the fields in <code>declareOutputFields</code> (similarly for ShellSpout).</p>
<p>You can learn more about this on <a href="Concepts.html">Concepts</a></p>
<h2 id="protocol-preamble">Protocol Preamble</h2>
<p>A simple protocol is implemented via the STDIN and STDOUT of the
executed script or program. All data exchanged with the process is
encoded in JSON, making support possible for pretty much any language.</p>
<h1 id="packaging-your-stuff">Packaging Your Stuff</h1>
<p>To run a shell component on a cluster, the scripts that are shelled
out to must be in the <code>resources/</code> directory within the jar submitted
to the master.</p>
<p>However, during development or testing on a local machine, the resources
directory just needs to be on the classpath.</p>
<h2 id="the-protocol">The Protocol</h2>
<p>Notes:</p>
<ul>
<li>Both ends of this protocol use a line-reading mechanism, so be sure to
trim off newlines from the input and to append them to your output.</li>
<li>All JSON inputs and outputs are terminated by a single line containing &quot;end&quot;. Note that this delimiter is not itself JSON encoded.</li>
<li>The bullet points below are written from the perspective of the script writer&#39;s
STDIN and STDOUT.</li>
</ul>
<h3 id="initial-handshake">Initial Handshake</h3>
<p>The initial handshake is the same for both types of shell components:</p>
<ul>
<li>STDIN: Setup info. This is a JSON object with the Storm configuration, a PID directory, and a topology context, like this:</li>
</ul>
<div class="highlight"><pre><code class="language-" data-lang="">{
"conf": {
"topology.message.timeout.secs": 3,
// etc
},
"pidDir": "...",
"context": {
"task-&gt;component": {
"1": "example-spout",
"2": "__acker",
"3": "example-bolt1",
"4": "example-bolt2"
},
"taskid": 3,
// Everything below this line is only available in Storm 0.10.0+
"componentid": "example-bolt"
"stream-&gt;target-&gt;grouping": {
"default": {
"example-bolt2": {
"type": "SHUFFLE"}}},
"streams": ["default"],
"stream-&gt;outputfields": {"default": ["word"]},
"source-&gt;stream-&gt;grouping": {
"example-spout": {
"default": {
"type": "FIELDS",
"fields": ["word"]
}
}
}
"source-&gt;stream-&gt;fields": {
"example-spout": {
"default": ["word"]
}
}
}
}
</code></pre></div>
<p>Your script should create an empty file named with its PID in this directory. e.g.
the PID is 1234, so an empty file named 1234 is created in the directory. This
file lets the supervisor know the PID so it can shutdown the process later on.</p>
<p>As of Storm 0.10.0, the context sent by Storm to shell components has been
enhanced substantially to include all aspects of the topology context available
to JVM components. One key addition is the ability to determine a shell
component&#39;s source and targets (i.e., inputs and outputs) in the topology via
the <code>stream-&gt;target-&gt;grouping</code> and <code>source-&gt;stream-&gt;grouping</code> dictionaries. At
the innermost level of these nested dictionaries, groupings are represented as
a dictionary that minimally has a <code>type</code> key, but can also have a <code>fields</code> key
to specify which fields are involved in a <code>FIELDS</code> grouping.</p>
<ul>
<li>STDOUT: Your PID, in a JSON object, like <code>{&quot;pid&quot;: 1234}</code>. The shell component will log the PID to its log.</li>
</ul>
<p>What happens next depends on the type of component:</p>
<h3 id="spouts">Spouts</h3>
<p>Shell spouts are synchronous. The rest happens in a while(true) loop:</p>
<ul>
<li>STDIN: Either a next, ack, activate, deactivate or fail command.</li>
</ul>
<p>&quot;next&quot; is the equivalent of ISpout&#39;s <code>nextTuple</code>. It looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{"command": "next"}
</code></pre></div>
<p>&quot;ack&quot; looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{"command": "ack", "id": "1231231"}
</code></pre></div>
<p>&quot;activate&quot; is the equivalent of ISpout&#39;s <code>activate</code>:
<code>
{&quot;command&quot;: &quot;activate&quot;}
</code></p>
<p>&quot;deactivate&quot; is the equivalent of ISpout&#39;s <code>deactivate</code>:
<code>
{&quot;command&quot;: &quot;deactivate&quot;}
</code></p>
<p>&quot;fail&quot; looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{"command": "fail", "id": "1231231"}
</code></pre></div>
<ul>
<li>STDOUT: The results of your spout for the previous command. This can
be a sequence of emits and logs.</li>
</ul>
<p>An emit looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "emit",
// The id for the tuple. Leave this out for an unreliable emit. The id can
// be a string or a number.
"id": "1231231",
// The id of the stream this tuple was emitted to. Leave this empty to emit to default stream.
"stream": "1",
// If doing an emit direct, indicate the task to send the tuple to
"task": 9,
// All the values in this tuple
"tuple": ["field1", 2, 3]
}
</code></pre></div>
<p>If not doing an emit direct, you will immediately receive the task ids to which the tuple was emitted on STDIN as a JSON array.</p>
<p>A &quot;log&quot; will log a message in the worker log. It looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "log",
// the message to log
"msg": "hello world!"
}
</code></pre></div>
<ul>
<li>STDOUT: a &quot;sync&quot; command ends the sequence of emits and logs. It looks like:</li>
</ul>
<div class="highlight"><pre><code class="language-" data-lang="">{"command": "sync"}
</code></pre></div>
<p>After you sync, ShellSpout will not read your output until it sends another next, ack, or fail command.</p>
<p>Note that, similarly to ISpout, all of the spouts in the worker will be locked up after a next, ack, or fail, until you sync. Also like ISpout, if you have no tuples to emit for a next, you should sleep for a small amount of time before syncing. ShellSpout will not automatically sleep for you.</p>
<h3 id="bolts">Bolts</h3>
<p>The shell bolt protocol is asynchronous. You will receive tuples on STDIN as soon as they are available, and you may emit, ack, and fail, and log at any time by writing to STDOUT, as follows:</p>
<ul>
<li>STDIN: A tuple! This is a JSON encoded structure like this:</li>
</ul>
<div class="highlight"><pre><code class="language-" data-lang="">{
// The tuple's id - this is a string to support languages lacking 64-bit precision
"id": "-6955786537413359385",
// The id of the component that created this tuple
"comp": "1",
// The id of the stream this tuple was emitted to
"stream": "1",
// The id of the task that created this tuple
"task": 9,
// All the values in this tuple
"tuple": ["snow white and the seven dwarfs", "field2", 3]
}
</code></pre></div>
<ul>
<li>STDOUT: An ack, fail, emit, or log. Emits look like:</li>
</ul>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "emit",
// The ids of the tuples this output tuples should be anchored to
"anchors": ["1231231", "-234234234"],
// The id of the stream this tuple was emitted to. Leave this empty to emit to default stream.
"stream": "1",
// If doing an emit direct, indicate the task to send the tuple to
"task": 9,
// All the values in this tuple
"tuple": ["field1", 2, 3]
}
</code></pre></div>
<p>If not doing an emit direct, you will receive the task ids to which
the tuple was emitted on STDIN as a JSON array. Note that, due to the
asynchronous nature of the shell bolt protocol, when you read after
emitting, you may not receive the task ids. You may instead read the
task ids for a previous emit or a new tuple to process. You will
receive the task id lists in the same order as their corresponding
emits, however.</p>
<p>An ack looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "ack",
// the id of the tuple to ack
"id": "123123"
}
</code></pre></div>
<p>A fail looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "fail",
// the id of the tuple to fail
"id": "123123"
}
</code></pre></div>
<p>A &quot;log&quot; will log a message in the worker log. It looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"command": "log",
// the message to log
"msg": "hello world!"
}
</code></pre></div>
<ul>
<li>Note that, as of version 0.7.1, there is no longer any need for a
shell bolt to &#39;sync&#39;.</li>
</ul>
<h3 id="handling-heartbeats-0-9-3-and-later">Handling Heartbeats (0.9.3 and later)</h3>
<p>As of Storm 0.9.3, heartbeats have been between ShellSpout/ShellBolt and their
multi-lang subprocesses to detect hanging/zombie subprocesses. Any libraries
for interfacing with Storm via multi-lang must take the following actions
regarding hearbeats:</p>
<h4 id="spout">Spout</h4>
<p>Shell spouts are synchronous, so subprocesses always send <code>sync</code> commands at the
end of <code>next()</code>, so you should not have to do much to support heartbeats for
spouts. That said, you must not let subprocesses sleep more than the worker
timeout during <code>next()</code>.</p>
<h4 id="bolt">Bolt</h4>
<p>Shell bolts are asynchronous, so a ShellBolt will send heartbeat tuples to its
subprocess periodically. Heartbeat tuple looks like:</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"id": "-6955786537413359385",
"comp": "1",
"stream": "__heartbeat",
// this shell bolt's system task id
"task": -1,
"tuple": []
}
</code></pre></div>
<p>When subprocess receives heartbeat tuple, it must send a <code>sync</code> command back to
ShellBolt.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>