blob: 859d99fdf78c95dee3219d52da06feea2d9e7c5c [file] [log] [blame]
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Reliable Message Delivery - Apache Gearpump(incubating)</title>
<link rel="shortcut icon" href="../../img/favicon.ico">
<link href='https://fonts.googleapis.com/css?family=Lato:400,700|Roboto+Slab:400,700|Inconsolata:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="../../css/theme.css" type="text/css" />
<link rel="stylesheet" href="../../css/theme_extra.css" type="text/css" />
<link rel="stylesheet" href="../../css/highlight.css">
<script>
// Current page data
var mkdocs_page_name = "Reliable Message Delivery";
</script>
<script src="../../js/jquery-2.1.1.min.js"></script>
<script src="../../js/modernizr-2.8.3.min.js"></script>
<script type="text/javascript" src="../../js/highlight.pack.js"></script>
<script src="../../js/theme.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
<div class="wy-side-nav-search">
<a href="../../index.html" class="icon icon-home"> Apache Gearpump(incubating)</a>
<div role="search">
<form id ="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<ul class="current">
<li>
<li class="toctree-l1 ">
<a class="" href="../../index.html">Overview</a>
</li>
<li>
<li>
<ul class="subnav">
<li><span>Introduction</span></li>
<li class="toctree-l1 ">
<a class="" href="../submit-your-1st-application/index.html">Submit Your 1st Application</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../commandline/index.html">Client Command Line</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../basic-concepts/index.html">Basic Concepts</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../features/index.html">Technical Highlights</a>
</li>
<li class="toctree-l1 current">
<a class="current" href="index.html">Reliable Message Delivery</a>
<ul>
<li class="toctree-l3"><a href="#what-is-at-least-once-message-delivery">What is At Least Once Message Delivery?</a></li>
<li class="toctree-l3"><a href="#what-is-exactly-once-message-delivery">What is Exactly Once Message Delivery?</a></li>
<li><a class="toctree-l4" href="#persistent-api">Persistent API</a></li>
</ul>
</li>
<li class="toctree-l1 ">
<a class="" href="../performance-report/index.html">Performance</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../gearpump-internals/index.html">Gearpump Internals</a>
</li>
</ul>
<li>
<li>
<ul class="subnav">
<li><span>Deployment</span></li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-local/index.html">Local Mode</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-standalone/index.html">Standalone Mode</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-yarn/index.html">YARN Mode</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-docker/index.html">Docker Mode</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-ui-authentication/index.html">UI Authentication</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-ha/index.html">High Availability</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-msg-delivery/index.html">Reliable Message Delivery</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-configuration/index.html">Configuration</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-resource-isolation/index.html">Resource Isolation</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/deployment-security/index.html">YARN Security Guide</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/get-gearpump-distribution/index.html">How to Get Your Gearpump Distribution</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../deployment/hardware-requirement/index.html">Hardware Requirement</a>
</li>
</ul>
<li>
<li>
<ul class="subnav">
<li><span>Programming Guide</span></li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-write-1st-app/index.html">Write Your 1st App</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-custom-serializer/index.html">Customized Message Passing</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-connectors/index.html">Gearpump Connectors</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-storm/index.html">Storm Compatibility</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-ide-setup/index.html">IDE Setup</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-non-streaming-example/index.html">Non Streaming Examples</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../dev/dev-rest-api/index.html">REST API</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../api/scala/index.html">Scala API</a>
</li>
<li class="toctree-l1 ">
<a class="" href="../../api/java/index.html">Java API</a>
</li>
</ul>
<li>
</ul>
</div>
&nbsp;
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">Apache Gearpump(incubating)</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html">Docs</a> &raquo;</li>
<li>Introduction &raquo;</li>
<li>Reliable Message Delivery</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/apache/incubator-gearpump" class="icon icon-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main">
<div class="section">
<h2 id="what-is-at-least-once-message-delivery">What is At Least Once Message Delivery?</h2>
<p>Messages could be lost on delivery due to network partitions. <strong>At Least Once Message Delivery</strong> (at least once) means the lost messages are delivered one or more times such that at least one is processed and acknowledged by the whole flow. </p>
<p>Gearpump guarantees at least once for any source that is able to replay message from a past timestamp. In Gearpump, each message is tagged with a timestamp, and the system tracks the minimum timestamp of all pending messages (the global minimum clock). On message loss, application will be restarted to the global minimum clock. Since the source is able to replay from the global minimum clock, all pending messages before the restart will be replayed. Gearpump calls that kind of source <code>TimeReplayableSource</code> and already provides a built in
<a href="../gearpump-internals#at-least-once-message-delivery-and-kafka">KafkaSource</a>. With the KafkaSource to ingest data into Gearpump, users are guaranteed at least once message delivery.</p>
<h2 id="what-is-exactly-once-message-delivery">What is Exactly Once Message Delivery?</h2>
<p>At least once delivery doesn't guarantee the correctness of the application result. For instance, for a task keeping the count of received messages, there could be overcount with duplicated messages and the count is lost on task failure.
In that case, <strong>Exactly Once Message Delivery</strong> (exactly once) is required, where state is updated by a message exactly once. This further requires that duplicated messages are filtered out and in-memory states are persisted.</p>
<p>Users are guaranteed exactly once in Gearpump if they use both a <code>TimeReplayableSource</code> to ingest data and the Persistent API to manage their in memory states. With the Persistent API, user state is periodically checkpointed by the system to a persistent store (e.g HDFS) along with its checkpointed time. Gearpump tracks the global minimum checkpoint timestamp of all pending states (global minimum checkpoint clock), which is persisted as well. On application restart, the system restores states at the global minimum checkpoint clock and source replays messages from that clock. This ensures that a message updates all states exactly once.</p>
<h3 id="persistent-api">Persistent API</h3>
<p>Persistent API consists of <code>PersistentTask</code> and <code>PersistentState</code>.</p>
<p>Here is an example of using them to keep count of incoming messages.</p>
<pre class="codehilite"><code class="language-scala">class CountProcessor(taskContext: TaskContext, conf: UserConfig)
extends PersistentTask[Long](taskContext, conf) {
override def persistentState: PersistentState[Long] = {
import com.twitter.algebird.Monoid.longMonoid
new NonWindowState[Long](new AlgebirdMonoid(longMonoid), new ChillSerializer[Long])
}
override def processMessage(state: PersistentState[Long], message: Message): Unit = {
state.update(message.timestamp, 1L)
}
}</code></pre>
<p>The <code>CountProcessor</code> creates a customized <code>PersistentState</code> which will be managed by <code>PersistentTask</code> and overrides the <code>processMessage</code> method to define how the state is updated on a new message (each new message counts as <code>1</code>, which is added to the existing value)</p>
<p>Gearpump has already offered two types of states</p>
<ol>
<li>NonWindowState - state with no time or other boundary</li>
<li>WindowState - each state is bounded by a time window</li>
</ol>
<p>They are intended for states that satisfy monoid laws.</p>
<ol>
<li>has binary associative operation, like <code>+</code> </li>
<li>has an identity element, like <code>0</code></li>
</ol>
<p>In the above example, we make use of the <code>longMonoid</code> from <a href="https://github.com/twitter/algebird">Twitter's Algebird</a> library which provides a bunch of useful monoids. </p>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="../performance-report/index.html" class="btn btn-neutral float-right" title="Performance"/>Next <span class="icon icon-circle-arrow-right"></span></a>
<a href="../features/index.html" class="btn btn-neutral" title="Technical Highlights"><span class="icon icon-circle-arrow-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<!-- Copyright etc -->
</div>
Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<div class="rst-versions" role="note" style="cursor: pointer">
<span class="rst-current-version" data-toggle="rst-current-version">
<a class="icon icon-github" style="float: left; color: #fcfcfc"> GitHub</a>
<span><a href="../features/index.html" style="color: #fcfcfc;">&laquo; Previous</a></span>
<span style="margin-left: 15px"><a href="../performance-report/index.html" style="color: #fcfcfc">Next &raquo;</a></span>
</span>
</div>
</body>
</html>