blob: 39c1e116b30d3b6531332f598bed5dafc261b7fc [file] [log] [blame]
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]> <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]> <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"/>
<title>Reliable Message Delivery - Gearpump 0.7.6 Documentation</title>
<link rel="stylesheet" href="css/bootstrap-3.3.5.min.css">
<style>
body {
padding-top: 60px;
padding-bottom: 40px;
}
</style>
<link rel="stylesheet" href="css/main.css">
<link rel="stylesheet" href="css/pygments-default.css">
<script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
</head>
<body>
<!--[if lt IE 7]>
<p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
<![endif]-->
<div class="navbar navbar-inverse navbar-fixed-top" id="topbar">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar" aria-expanded="false" aria-controls="navbar">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="http://gearpump.io">Gearpump
<span class="label label-primary" style="font-size: .6em">0.7.6</span>
</a>
</div>
<div id="navbar" class="collapse navbar-collapse">
<ul class="nav navbar-nav">
<li><a href="index.html">Overview</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Introduction<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="submit-your-1st-application.html">Submit Your 1st Application</a></li>
<li><a href="commandline.html">Client Command Line</a></li>
<li class="divider"></li>
<li><a href="basic-concepts.html">Basic Concepts</a></li>
<li><a href="features.html">Technical Highlights</a></li>
<li><a href="message-delivery.html">Reliable Message Delivery</a></li>
<li><a href="performance-report.html">Performance</a></li>
<li><a href="gearpump-internals.html">Gearpump Internals</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
<ul class="dropdown-menu">
<li class="dropdown-header">Deployment</li>
<li><a href="deployment-local.html">Local Mode</a><li>
<li><a href="deployment-standalone.html">Standalone Mode</a></li>
<li><a href="deployment-yarn.html">YARN Mode</a></li>
<li><a href="deployment-docker.html">Docker Mode</a><li>
<li class="divider"></li>
<li><a href="deployment-ui-authentication.html">UI Authentication</a></li>
<li><a href="deployment-ha.html">High Availability</a></li>
<li><a href="deployment-msg-delivery.html">Reliable Message Delivery</a></li>
<li><a href="deployment-configuration.html">Configuration</a></li>
<li><a href="deployment-resource-isolation.html">Resource Isolation</a></li>
<li class="divider"></li>
<li><a href="deployment-security.html">YARN Security Guide</a></li>
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guide<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="dev-write-1st-app.html">Write Your 1st App</a></li>
<li><a href="dev-custom-serializer.html">Customized Message Passing</a></li>
<li class="divider"></li>
<li><a href="api/scala/index.html">Scala API</a></li>
<li><a href="api/java/index.html">Java API</a></li>
<li><a href="dev-rest-api.html">RESTful API</a></li>
<li class="divider"></li>
<li><a href="dev-connectors.html">Gearpump Connectors</a></li>
<li class="divider"></li>
<li><a href="dev-storm.html">Storm Compatibility</a></li>
<!--
<li><a href="dev-samoa.html">Samoa Compatibility</a></li>
<li class="divider"></li>
<li><a href="dev-iot.html">Gearpump with IoT</a></li>
-->
</ul>
</li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="how-to-contribute.html">How to Contribute</a></li>
<li><a href="coding-style.html">Coding Style</a></li>
<li class="divider"></li>
<li><a href="faq.html">FAQ</a><li>
<li><a href="about.html">About</a></li>
</ul>
</li>
</ul>
</div>
</div>
</div>
<div class="container" id="content">
<h1 class="title">Reliable Message Delivery</h1>
<h2 id="what-is-at-least-once-message-delivery">What is At Least Once Message Delivery?</h2>
<p>Messages could be lost on delivery due to network partitions. <strong>At Least Once Message Delivery</strong> (at least once) means the lost messages are delivered one or more times such that at least one is processed and acked by the whole flow.</p>
<p>Gearpump guarantees at least once for any source that is able to replay message from a past timestamp. In Gearpump, each message is tagged with a timestamp, and the system tracks the minimum timestamp of all pending messages (the global minimum clock). On message loss, application will be restarted to the global minimum clock. Since the source is able to replay from the gloabl minimum clock, all pending messages before the restart will be replayed. Gearpump calls that kind of source <code>TimeReplayableSource</code> and already provides a built in
<a href="gearpump-internals.html#at-least-once-message-delivery-and-kafka">KafkaSource</a>. With the KafkaSource to ingest data into Gearpump, users are guaranteed at least once message delievery.</p>
<h2 id="what-is-exactly-once-message-delivery">What is Exactly Once Message Delivery?</h2>
<p>At least once delivery doesn&#8217;t guarantee the correctness of the application result. For instance, for a task keeping the count of received messages, there could be overcount with duplicated messages and the count is lost on task failure.
In that case, <strong>Exactly Once Message Delivery</strong> (exactly once) is required, where state is updated by a message exactly once. This further requires that duplicated messages are filtered out and in-memory states are persisted.</p>
<p>Users are guaranteed exactly once in Gearpump if they use both a <code>TimeReplayableSource</code> to ingest data and the Persistent API to manage their in memory states. With the Persistent API, user state is periodically checkpointed by the system to a persistent store (e.g HDFS) along with its checkpointed time. Gearpump tracks the global minimum checkpoint timestamp of all pending states (global minimum checkpoint clock), which is persisted as well. On application restart, the system restores states at the global minimum checkpoint clock and source replays messages from that clock. This ensures that a message updates all states exactly once.</p>
<h3 id="persistent-api">Persistent API</h3>
<p>Persistent API consists of <code>PersistentTask</code> and <code>PersistentState</code>.</p>
<p>Here is an example of using them to keep count of incoming messages.</p>
<div class="highlight"><pre><code class="language-scala"><span class="k">class</span> <span class="nc">CountProcessor</span><span class="o">(</span><span class="n">taskContext</span><span class="k">:</span> <span class="kt">TaskContext</span><span class="o">,</span> <span class="n">conf</span><span class="k">:</span> <span class="kt">UserConfig</span><span class="o">)</span>
<span class="k">extends</span> <span class="nc">PersistentTask</span><span class="o">[</span><span class="kt">Long</span><span class="o">](</span><span class="n">taskContext</span><span class="o">,</span> <span class="n">conf</span><span class="o">)</span> <span class="o">{</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">persistentState</span><span class="k">:</span> <span class="kt">PersistentState</span><span class="o">[</span><span class="kt">Long</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
<span class="k">import</span> <span class="nn">com.twitter.algebird.Monoid.longMonoid</span>
<span class="k">new</span> <span class="nc">NonWindowState</span><span class="o">[</span><span class="kt">Long</span><span class="o">](</span><span class="k">new</span> <span class="nc">AlgebirdMonoid</span><span class="o">(</span><span class="n">longMonoid</span><span class="o">),</span> <span class="k">new</span> <span class="nc">ChillSerializer</span><span class="o">[</span><span class="kt">Long</span><span class="o">])</span>
<span class="o">}</span>
<span class="k">override</span> <span class="k">def</span> <span class="n">processMessage</span><span class="o">(</span><span class="n">state</span><span class="k">:</span> <span class="kt">PersistentState</span><span class="o">[</span><span class="kt">Long</span><span class="o">],</span> <span class="n">message</span><span class="k">:</span> <span class="kt">Message</span><span class="o">)</span><span class="k">:</span> <span class="kt">Unit</span> <span class="o">=</span> <span class="o">{</span>
<span class="n">state</span><span class="o">.</span><span class="n">update</span><span class="o">(</span><span class="n">message</span><span class="o">.</span><span class="n">timestamp</span><span class="o">,</span> <span class="mi">1L</span><span class="o">)</span>
<span class="o">}</span>
<span class="o">}</span></code></pre></div>
<p>The <code>CountProcessor</code> creates a customized <code>PersistentState</code> which will be managed by <code>PersistentTask</code> and overrides the <code>processMessage</code> method to define how the state is updated on a new message (each new message counts as <code>1</code>, which is added to the existing value)</p>
<p>Gearpump has already offered two types of states</p>
<ol>
<li>NonWindowState - state with no time or other boundary</li>
<li>WindowState - each state is bounded by a time window</li>
</ol>
<p>They are intended for states that satisfy monoid laws.</p>
<ol>
<li>has binary associative operation, like <code>+</code></li>
<li>has an identity element, like <code>0</code></li>
</ol>
<p>In the above example, we make use of the <code>longMonoid</code> from <a href="https://github.com/twitter/algebird">Twitter&#8217;s Algebird</a> library which provides a bunch of useful monoids.</p>
</div> <!-- /container -->
<script src="js/vendor/jquery-2.1.4.min.js"></script>
<script src="js/vendor/bootstrap-3.3.5.min.js"></script>
<script src="js/vendor/anchor-1.1.1.min.js"></script>
<script src="js/main.js"></script>
<!-- MathJax Section -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: { equationNumbers: { autoNumber: "AMS" } }
});
</script>
<script>
// Note that we load MathJax this way to work with local file (file://), HTTP and HTTPS.
// We could use "//cdn.mathjax...", but that won't support "file://".
(function(d, script) {
script = d.createElement('script');
script.type = 'text/javascript';
script.async = true;
script.onload = function(){
MathJax.Hub.Config({
tex2jax: {
inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
displayMath: [ ["$$","$$"], ["\\[", "\\]"] ],
processEscapes: true,
skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
}
});
};
script.src = ('https:' == document.location.protocol ? 'https://' : 'http://') +
'cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
d.getElementsByTagName('head')[0].appendChild(script);
}(document));
</script>
</body>
</html>