blob: 9cfd9ccc94368a6bb980e03537fbaacdc2507c69 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>Structure of the Codebase</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.3.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">Structure of the Codebase</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><p>There are three distinct layers to Storm&#39;s codebase.</p>
<p>First, Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.</p>
<p>Second, all of Storm&#39;s interfaces are specified as Java interfaces. This means that every feature of Storm is always available via Java.</p>
<p>The following sections explain each of these layers in more detail.</p>
<h3 id="storm-thrift">storm.thrift</h3>
<p>The first place to look to understand the structure of Storm&#39;s codebase is the <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/storm.thrift">storm.thrift</a> file.</p>
<p>Every spout or bolt in a topology is given a user-specified identifier called the &quot;component id&quot;. The component id is used to specify subscriptions from a bolt to the output streams of other spouts or bolts. A <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/storm.thrift">StormTopology</a> structure contains a map from component id to component for each type of component (spouts and bolts).</p>
<p>Spouts and bolts have the same Thrift definition, so let&#39;s just take a look at the <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/storm.thrift">Thrift definition for bolts</a>. It contains a <code>ComponentObject</code> struct and a <code>ComponentCommon</code> struct.</p>
<p>The <code>ComponentObject</code> defines the implementation for the bolt. It can be one of three types:</p>
<ol>
<li>A serialized java object (that implements <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/task/IBolt.java">IBolt</a>)</li>
<li>A <code>ShellComponent</code> object that indicates the implementation is in another language. Specifying a bolt this way will cause Storm to instantiate a <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/task/ShellBolt.java">ShellBolt</a> object to handle the communication between the JVM-based worker process and the non-JVM-based implementation of the component.</li>
<li>A <code>JavaObject</code> structure which tells Storm the classname and constructor arguments to use to instantiate that bolt. This is useful if you want to define a topology in a non-JVM language. This way, you can make use of JVM-based spouts and bolts without having to create and serialize a Java object yourself.</li>
</ol>
<p><code>ComponentCommon</code> defines everything else for this component. This includes:</p>
<ol>
<li>What streams this component emits and the metadata for each stream (whether it&#39;s a direct stream, the fields declaration)</li>
<li>What streams this component consumes (specified as a map from component_id:stream_id to the stream grouping to use)</li>
<li>The parallelism for this component</li>
<li>The component-specific <a href="Configuration.html">configuration</a> for this component</li>
</ol>
<p>Note that the structure spouts also have a <code>ComponentCommon</code> field, and so spouts can also have declarations to consume other input streams. Yet the Storm Java API does not provide a way for spouts to consume other streams, and if you put any input declarations there for a spout you would get an error when you tried to submit the topology. The reason that spouts have an input declarations field is not for users to use, but for Storm itself to use. Storm adds implicit streams and bolts to the topology to set up the <a href="Acking-framework-implementation.html">acking framework</a>, and two of these implicit streams are from the acker bolt to each spout in the topology. The acker sends &quot;ack&quot; or &quot;fail&quot; messages along these streams whenever a tuple tree is detected to be completed or failed. The code that transforms the user&#39;s topology into the runtime topology is located <a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java">here</a>.</p>
<h3 id="java-interfaces">Java interfaces</h3>
<p>The interfaces for Storm are generally specified as Java interfaces. The main interfaces are:</p>
<ol>
<li><a href="javadocs/org/apache/storm/topology/IRichBolt.html">IRichBolt</a></li>
<li><a href="javadocs/org/apache/storm/topology/IRichSpout.html">IRichSpout</a></li>
<li><a href="javadocs/org/apache/storm/topology/TopologyBuilder.html">TopologyBuilder</a></li>
</ol>
<p>The strategy for the majority of the interfaces is to:</p>
<ol>
<li>Specify the interface using a Java interface</li>
<li>Provide a base class that provides default implementations when appropriate</li>
</ol>
<p>You can see this strategy at work with the <a href="javadocs/org/apache/storm/topology/base/BaseRichSpout.html">BaseRichSpout</a> class.</p>
<p>Spouts and bolts are serialized into the Thrift definition of the topology as described above. </p>
<p>One subtle aspect of the interfaces is the difference between <code>IBolt</code> and <code>ISpout</code> vs. <code>IRichBolt</code> and <code>IRichSpout</code>. The main difference between them is the addition of the <code>declareOutputFields</code> method in the &quot;Rich&quot; versions of the interfaces. The reason for the split is that the output fields declaration for each output stream needs to be part of the Thrift struct (so it can be specified from any language), but as a user you want to be able to declare the streams as part of your class. What <code>TopologyBuilder</code> does when constructing the Thrift representation is call <code>declareOutputFields</code> to get the declaration and convert it into the Thrift structure. The conversion happens in the <code>TopologyBuilder</code> code.</p>
<h3 id="implementation">Implementation</h3>
<p>Specifying all the functionality via Java interfaces ensures that every feature of Storm is available via Java. Moreso, the focus on Java interfaces ensures that the user experience from Java-land is pleasant as well.</p>
<p>Storm was originally implemented in Clojure, but most of the code has since been ported to Java.</p>
<p>Here&#39;s a summary of the purpose of the main Java packages:</p>
<h4 id="java-packages">Java packages</h4>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/coordination">org.apache.storm.coordination</a>: Implements the pieces required to coordinate batch-processing on top of Storm, which DRPC uses. <code>CoordinatedBolt</code> is the most important class here.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/drpc">org.apache.storm.drpc</a>: Implementation of the DRPC higher level abstraction</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/generated">org.apache.storm.generated</a>: The generated Thrift code for Storm.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/grouping">org.apache.storm.grouping</a>: Contains interface for making custom stream groupings</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/hooks">org.apache.storm.hooks</a>: Interfaces for hooking into various events in Storm, such as when tasks emit tuples, when tuples are acked, etc. User guide for hooks is <a href="Hooks.html">here</a>.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/serialization">org.apache.storm.serialization</a>: Implementation of how Storm serializes/deserializes tuples. Built on top of <a href="https://github.com/EsotericSoftware/kryo">Kryo</a>.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/spout">org.apache.storm.spout</a>: Definition of spout and associated interfaces (like the <code>SpoutOutputCollector</code>). Also contains <code>ShellSpout</code> which implements the protocol for defining spouts in non-JVM languages.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/task">org.apache.storm.task</a>: Definition of bolt and associated interfaces (like <code>OutputCollector</code>). Also contains <code>ShellBolt</code> which implements the protocol for defining bolts in non-JVM languages. Finally, <code>TopologyContext</code> is defined here as well, which is provided to spouts and bolts so they can get data about the topology and its execution at runtime.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/testing">org.apache.storm.testing</a>: Contains a variety of test bolts and utilities used in Storm&#39;s unit tests.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/topology">org.apache.storm.topology</a>: Java layer over the underlying Thrift structure to provide a clean, pure-Java API to Storm (users don&#39;t have to know about Thrift). <code>TopologyBuilder</code> is here as well as the helpful base classes for the different spouts and bolts. The slightly-higher level <code>IBasicBolt</code> interface is here, which is a simpler way to write certain kinds of bolts.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/tuple">org.apache.storm.tuple</a>: Implementation of Storm&#39;s tuple data model.</p>
<p><a href="http://github.com/apache/storm/tree/v2.3.0/storm-client/src/jvm/org/apache/storm/utils">org.apache.storm.utils</a>: Data structures and miscellaneous utilities used throughout the codebase. This includes utilities for time simulation.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-core/src/jvm/org/apache/storm/command">org.apache.storm.command.*</a>: These implement various commands for the <code>storm</code> command line client. These implementations are very short.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/cluster">org.apache.storm.cluster</a>: This code manages how cluster state (like what tasks are running where, what spout/bolt each task runs as) is stored, typically in Zookeeper.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/daemon/Acker.java">org.apache.storm.daemon.Acker</a>: Implementation of the &quot;acker&quot; bolt, which is a key part of how Storm guarantees data processing.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-webapp/src/jvm/org/apache/storm/daemon/DrpcServer.java">org.apache.storm.daemon.DrpcServer</a>: Implementation of the DRPC server for use with DRPC topologies.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-server/src/jvm/org/apache/storm/event">org.apache.storm.event</a>: Implements a simple asynchronous function executor. Used in various places in Nimbus and Supervisor to make functions execute in serial to avoid any race conditions.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-server/src/main/java/org/apache/storm/LocalCluster.java">org.apache.storm.LocalCluster</a>: Utility to boot up Storm inside an existing Java process. Often used in conjunction with <code>Testing.java</code> to implement integration tests.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/messaging">org.apache.storm.messaging.*</a>: Defines a higher level interface to implementing point to point messaging. In local mode Storm uses in-memory Java queues to do this; on a cluster, it uses Netty, but it is pluggable.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/stats">org.apache.storm.stats</a>: Implementation of stats rollup routines used when sending stats to ZK for use by the UI. Does things like windowed and rolling aggregations at multiple granularities.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/Thrift.java">org.apache.storm.Thrift</a>: Wrappers around the generated Thrift API to make working with Thrift structures more pleasant.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/StormTimer.java">org.apache.storm.StormTimer</a>: Implementation of a background timer to execute functions in the future or on a recurring interval. Storm couldn&#39;t use the <a href="http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Timer.html">Timer</a> class because it needed integration with time simulation in order to be able to unit test Nimbus and the Supervisor.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-server/src/jvm/org/apache/storm/daemon/nimbus/Nimbus.java">org.apache.storm.daemon.nimbus</a>: Implementation of Nimbus.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-server/src/jvm/org/apache/storm/daemon/supervisor/Supervisor.java">org.apache.storm.daemon.supervisor</a>: Implementation of Supervisor.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/daemon/Task.java">org.apache.storm.daemon.task</a>: Implementation of an individual task for a spout or bolt. Handles message routing, serialization, stats collection for the UI, as well as the spout-specific and bolt-specific execution implementations.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java">org.apache.storm.daemon.worker</a>: Implementation of a worker process (which will contain many tasks within). Implements message transferring and task launching.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-server/src/main/java/org/apache/storm/Testing.java">org.apache.storm.Testing</a>: Various utilities for working with local clusters during tests, e.g. <code>completeTopology</code> for running a fixed set of tuples through a topology for capturing the output, tracker topologies for having fine grained control over detecting when a cluster is &quot;idle&quot;, and other utilities.</p>
<h4 id="clojure-namespaces">Clojure namespaces</h4>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-clojure/src/clj/org/apache/storm/clojure.clj">org.apache.storm.clojure</a>: Implementation of the Clojure DSL for Storm.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-clojure/src/clj/org/apache/storm/config.clj">org.apache.storm.config</a>: Created clojure symbols for config names in <a href="javadocs/org/apache/storm/Config.html">Config.java</a></p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-clojure/src/clj/org/apache/storm/log.clj">org.apache.storm.log</a>: Defines the functions used to log messages to log4j.</p>
<p><a href="http://github.com/apache/storm/blob/v2.3.0/storm-core/src/clj/org/apache/storm/ui">org.apache.storm.ui.*</a>: Implementation of Storm UI. Completely independent from rest of code base and uses the Nimbus Thrift API to get data.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>