blob: 95e240834b12cdc6ad455847e57db8b9b3debd94 [file] [log] [blame]
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
<link rel="icon" href="/favicon.ico" type="image/x-icon">
<title>CGroup Enforcement</title>
<!-- Bootstrap core CSS -->
<link href="/assets/css/bootstrap.min.css" rel="stylesheet">
<!-- Bootstrap theme -->
<link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css">
<link href="/css/style.css" rel="stylesheet">
<link href="/assets/css/owl.theme.css" rel="stylesheet">
<link href="/assets/css/owl.carousel.css" rel="stylesheet">
<script type="text/javascript" src="/assets/js/jquery.min.js"></script>
<script type="text/javascript" src="/assets/js/bootstrap.min.js"></script>
<script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script>
<script type="text/javascript" src="/assets/js/storm.js"></script>
<!-- Just for debugging purposes. Don't actually copy these 2 lines! -->
<!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]-->
<!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->
<!--[if lt IE 9]>
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
<script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<div class="container-fluid">
<div class="row">
<div class="col-md-5">
<a href="/index.html"><img src="/images/logo.png" class="logo" /></a>
</div>
<div class="col-md-5">
<h1>Version: 2.2.0</h1>
</div>
<div class="col-md-2">
<a href="/downloads.html" class="btn-std btn-block btn-download">Download</a>
</div>
</div>
</div>
</header>
<!--Header End-->
<!--Navigation Begin-->
<div class="navbar" role="banner">
<div class="container-fluid">
<div class="navbar-header">
<button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
</div>
<nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation">
<ul class="nav navbar-nav">
<li><a href="/index.html" id="home">Home</a></li>
<li><a href="/getting-help.html" id="getting-help">Getting Help</a></li>
<li><a href="/about/integrates.html" id="project-info">Project Information</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/releases/2.3.0/index.html">2.3.0</a></li>
<li><a href="/releases/2.2.0/index.html">2.2.0</a></li>
<li><a href="/releases/2.1.0/index.html">2.1.0</a></li>
<li><a href="/releases/2.0.0/index.html">2.0.0</a></li>
<li><a href="/releases/1.2.3/index.html">1.2.3</a></li>
</ul>
</li>
<li><a href="/talksAndVideos.html">Talks and Slideshows</a></li>
<li class="dropdown">
<a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a>
<ul class="dropdown-menu">
<li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li>
<li><a href="/contribute/People.html">People</a></li>
<li><a href="/contribute/BYLAWS.html">ByLaws</a></li>
</ul>
</li>
<li><a href="/2021/09/27/storm230-released.html" id="news">News</a></li>
</ul>
</nav>
</div>
</div>
<div class="container-fluid">
<h1 class="page-title">CGroup Enforcement</h1>
<div class="row">
<div class="col-md-12">
<!-- Documentation -->
<p class="post-meta"></p>
<div class="documentation-content"><h1 id="cgroups-in-storm">CGroups in Storm</h1>
<p>CGroups are used by Storm to limit the resource usage of workers to guarantee fairness and QOS. </p>
<p><strong>Please note: CGroups is currently supported only on Linux platforms (kernel version 2.6.24 and above)</strong> </p>
<h2 id="setup">Setup</h2>
<p>To use CGroups make sure to install cgroups and configure cgroups correctly. For more information about setting up and configuring, please visit:</p>
<p><a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html">https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html</a></p>
<p>A sample/default cgconfig.conf file is supplied in the <stormroot>/conf directory. The contents are as follows:</p>
<div class="highlight"><pre><code class="language-" data-lang="">mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/storm_resources;
cpuacct = /cgroup/storm_resources;
memory = /cgroup/storm_resources;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}
group storm {
perm {
task {
uid = 500;
gid = 500;
}
admin {
uid = 500;
gid = 500;
}
}
cpu {
}
memory {
}
cpuacct {
}
}
</code></pre></div>
<p>For a more detailed explanation of the format and configs for the cgconfig.conf file, please visit:</p>
<p><a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html#The_cgconfig.conf_File">https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html#The_cgconfig.conf_File</a></p>
<p>To let storm manage the cgroups for individual workers you need to make sure that the resources you want to control are mounted under the same directory as in the example above.
If they are not in the same directory the supervisor will throw an exception.</p>
<p>The perm section needs to be configured so that the user the supervisor is running as can modify the group.</p>
<p>If &quot;run as user&quot; is enabled so that the supervisor spawns other processes as the user that launched the topology, make sure that the permissions are such that individual users have read access but not write access.</p>
<h1 id="settings-related-to-cgroups-in-storm">Settings Related To CGroups in Storm</h1>
<table><thead>
<tr>
<th>Setting</th>
<th>Function</th>
</tr>
</thead><tbody>
<tr>
<td>storm.resource.isolation.plugin.enable</td>
<td>This config is used to set whether or not cgroups will be used. Set &quot;true&quot; to enable use of cgroups. Set &quot;false&quot; to not use cgroups. When this config is set to false, unit tests related to cgroups will be skipped. Default set to &quot;false&quot;</td>
</tr>
<tr>
<td>storm.cgroup.hierarchy.dir</td>
<td>The path to the cgroup hierarchy that storm will use. Default set to &quot;/cgroup/storm_resources&quot;</td>
</tr>
<tr>
<td>storm.cgroup.resources</td>
<td>A list of subsystems that will be regulated by CGroups. Default set to cpu and memory. Currently only cpu and memory are supported</td>
</tr>
<tr>
<td>storm.supervisor.cgroup.rootdir</td>
<td>The root cgroup used by the supervisor. The path to the cgroup will be &lt;storm.cgroup.hierarchy.dir&gt;/&lt;storm.supervisor.cgroup.rootdir&gt;. Default set to &quot;storm&quot;</td>
</tr>
<tr>
<td>storm.cgroup.cgexec.cmd</td>
<td>Absolute path to the cgexec command used to launch workers within a cgroup. Default set to &quot;/bin/cgexec&quot;</td>
</tr>
<tr>
<td>storm.worker.cgroup.memory.mb.limit</td>
<td>The memory limit in MB for each worker. This can be set on a per supervisor node basis. This config is used to set the cgroup config memory.limit_in_bytes. For more details about memory.limit_in_bytes, please visit: <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html">https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-memory.html</a>. Please note, if you are using the Resource Aware Scheduler, please do NOT set this config as this config will override the values calculated by the Resource Aware Scheduler</td>
</tr>
<tr>
<td>storm.worker.cgroup.cpu.limit</td>
<td>The cpu share for each worker. This can be set on a per supervisor node basis. This config is used to set the cgroup config cpu.share. For more details about cpu.share, please visit: <a href="https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html">https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html</a>. Please note, if you are using the Resource Aware Scheduler, please do NOT set this config as this config will override the values calculated by the Resource Aware Scheduler.</td>
</tr>
</tbody></table>
<p>Since limiting CPU usage via cpu.shares only limits the proportional CPU usage of a process, to limit the amount of CPU usage of all the worker processes on a supervisor node, please set the config supervisor.cpu.capacity. Where each increment represents 1% of a core thus if a user sets supervisor.cpu.capacity: 200, the user is indicating the use of 2 cores.</p>
<h2 id="integration-with-resource-aware-scheduler">Integration with Resource Aware Scheduler</h2>
<p>CGroups can be used in conjunction with the Resource Aware Scheduler. CGroups will then enforce the resource usage of workers as allocated by the Resource Aware Scheduler. To use cgroups with the Resource Aware Scheduler, simply enable cgroups and be sure NOT to set storm.worker.cgroup.memory.mb.limit and storm.worker.cgroup.cpu.limit configs.</p>
<h1 id="cgroup-metrics">CGroup Metrics</h1>
<p>CGroups not only can limit the amount of resources a worker has access to, but it can also help monitor the resource consumption of a worker. There are several metrics enabled by default that will check if the worker is a part of a CGroup and report corresponding metrics.</p>
<h2 id="cgroupcpu">CGroupCPU</h2>
<p>org.apache.storm.metric.cgroup.CGroupCPU reports back metrics similar to org.apache.storm.metrics.sigar.CPUMetric, except for everything within the CGroup. It reports both user and system CPU usage in ms as a map</p>
<div class="highlight"><pre><code class="language-" data-lang="">{
"user-ms": number
"sys-ms": number
}
</code></pre></div>
<p>CGroup reports these as CLK_TCK counts, and not milliseconds so the accuracy is determined by what CLK_TCK is set to. On most systems it is 100 times a second so at most the accuracy is 10 ms.</p>
<p>To make this metric work cpuacct must be mounted.</p>
<h2 id="cgroupcpuguarantee">CGroupCpuGuarantee</h2>
<p>org.apache.storm.metric.cgroup.CGroupCpuGuarantee reports back an approximate number of ms of CPU time that this worker is guaranteed to get. This is calculated from the resources requested by the tasks in that given worker.</p>
<h2 id="cgroupmemory">CGroupMemory</h2>
<p>org.apache.storm.metric.cgroup.CGroupMemoryUsage reports the current memory usage of all processes in the cgroup in bytes</p>
<h2 id="cgroupmemorylimit">CGroupMemoryLimit</h2>
<p>org.apache.storm.metric.cgroup.CGroupMemoryLimit report the current limit in bytes for all of the processes in the cgroup. If running with CGroups enabled in storm this is the on-heap request + the off-heap request for all tasks within the worker + any extra slop space given to workers.</p>
<h2 id="usage-debugging-cgroups-in-your-topology">Usage/Debugging CGroups in your topology</h2>
<p>These metrics can be very helpful in debugging what has happened or is happening to your code when it is running under a CGroup.</p>
<h3 id="cpu">CPU</h3>
<p>CPU guarantees under storm are soft. It means that a worker can ea sly go over their guarantee if there is free CPU available. To detect that your worker is using more CPU then it requested you can sum up the values in CGroupCPU and compare them to CGroupCpuGuarantee.<br>
If CGroupCPU is consistently higher then or equal to CGroupCpuGuarantee you probably want to look at requesting more CPU as your worker may be starved for CPU if more load is placed on the cluster. Being equal to CGroupCpuGuarantee means your worker may already
be throttled. If the used CPU is much smaller than CGroupCpuGuarantee then you are probably wasting resources and may want to reduce your CPU ask.</p>
<p>If you do have high CPU you probably also want to check out the GC metrics and/or the GC log for your worker. Memory pressure on the heap can result in increased CPU as garbage collection happens.</p>
<h3 id="memory">Memory</h3>
<p>Memory debugging of java under a cgroup can be difficult for multiple reasons.</p>
<ol>
<li>JVM memory management is complex</li>
<li>As of the writing of this documentation only experimental support for cgroups is in a few JVMs</li>
<li>JNI and other processes can use up memory within the cgroup that the JVM is not always aware of.</li>
<li>Memory pressure within the heap can result in increased CPU load instead of increased memory allocation.</li>
</ol>
<p>There are several metrics that storm provides by default that can help you understand what is happening within your worker.</p>
<p>If CGroupMemory gets close to CGroupMemoryLimit then you know that bad things are likely to start happening soon with this worker. Memory is not a soft guarantee like CPU.
If you go over the OOM killer on Linux will start to shoot processes withing your worker. Please pay attention to these metrics. If you are running a version of java that
is cgroup aware then going over the limit typically means that you will need to increase your off heap request. If you are not, it could be that you need more off heap
memory or it could be that java has allocated more memory then it should have as part of the garbage collection process. Figuring out which is typically best done with
trial and error (sorry).</p>
<p>Storm also reports the JVM&#39;s on heap and off heap usage through the &quot;memory/heap&quot; and &quot;memory/nonHeap&quot; metrics respectively. These can be used to give you a hint on
which to increase. Looking at the &quot;usedBytes&quot; field under each can help you understand how much memory the JVM is currently using. Although, like I said the off heap
portion is not always accurate and when the heap grows it can result in unrecorded off heap memory that will cause the cgroup to shoot processes.</p>
<p>The name of the GC metrics vary based off of the garbage collector you use, but they all start with &quot;GC/&quot;. If you sum up all of the &quot;GC/*.timeMs&quot; metrics for a given worker/window pair
you should be able to see how much of the CPU guarantee went to GC. By default java allows 98% of cpu time to go towards GC before it throws an OutOfMemoryError. This is far from ideal
for a near real time streaming system so pay attention to this ratio.</p>
<p>If the ratio is at a fairly steady state and your memory usage is not even close to the limit you might want to look at reducing your memory request. This too can be complicated to figure
out.</p>
<h2 id="future-work">Future Work</h2>
<p>There is a lot of work on adding in elasticity to storm. Eventually we hope to be able to do all of the above analysis for you and grow/shrink your topology on demand.</p>
</div>
</div>
</div>
</div>
<footer>
<div class="container-fluid">
<div class="row">
<div class="col-md-3">
<div class="footer-widget">
<h5>Meetups</h5>
<ul class="latest-news">
<li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li>
<li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li>
<li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li>
<li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li>
<li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li>
<li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li>
<!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> -->
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>About Apache Storm</h5>
<p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>First Look</h5>
<ul class="footer-list">
<li><a href="/releases/current/Rationale.html">Rationale</a></li>
<li><a href="/releases/current/Tutorial.html">Tutorial</a></li>
<li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li>
<li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li>
</ul>
</div>
</div>
<div class="col-md-3">
<div class="footer-widget">
<h5>Documentation</h5>
<ul class="footer-list">
<li><a href="/releases/current/index.html">Index</a></li>
<li><a href="/releases/current/javadocs/index.html">Javadoc</a></li>
<li><a href="/releases/current/FAQ.html">FAQ</a></li>
</ul>
</div>
</div>
</div>
<hr/>
<div class="row">
<div class="col-md-12">
<p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved.
<br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation.
<br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p>
</div>
</div>
</div>
</footer>
<!--Footer End-->
<!-- Scroll to top -->
<span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span>
</body>
</html>