blob: 2cede9abeaf138827f2f6b64daf22fcfce0c863b [file] [log] [blame]
<!--
Documentation/_templates/layout.html
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership. The
ASF licenses this file to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance with the
License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE html>
<html class="writer-html5" lang="en">
<head>
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Work Queue Deadlocks &mdash; NuttX latest documentation</title>
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../../_static/css/theme.css" />
<link rel="stylesheet" type="text/css" href="../../_static/copybutton.css" />
<link rel="stylesheet" type="text/css" href="../../_static/custom.css" />
<link rel="shortcut icon" href="../../_static/favicon.ico"/>
<script src="../../_static/jquery.js"></script>
<script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
<script src="../../_static/doctools.js"></script>
<script src="../../_static/sphinx_highlight.js"></script>
<script src="../../_static/clipboard.min.js"></script>
<script src="../../_static/copybutton.js"></script>
<script src="../../_static/js/theme.js"></script>
<link rel="index" title="Index" href="../../genindex.html" />
<link rel="search" title="Search" href="../../search.html" />
<link rel="next" title="Memory Management" href="../mm/index.html" />
<link rel="prev" title="SLIP" href="slip.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="../../index.html" class="icon icon-home"> NuttX
</a>
<!-- this version selector is quite ugly, should be probably replaced by something
more modern -->
<div class="version-selector">
<select onchange="javascript:location.href = this.value;">
<option value="../../../latest" selected="selected">latest</option>
<option value="../../../10.0.0" >10.0.0</option>
<option value="../../../10.0.1" >10.0.1</option>
<option value="../../../10.1.0" >10.1.0</option>
<option value="../../../10.2.0" >10.2.0</option>
<option value="../../../10.3.0" >10.3.0</option>
<option value="../../../11.0.0" >11.0.0</option>
<option value="../../../12.0.0" >12.0.0</option>
<option value="../../../12.1.0" >12.1.0</option>
<option value="../../../12.2.0" >12.2.0</option>
<option value="../../../12.2.1" >12.2.1</option>
<option value="../../../12.3.0" >12.3.0</option>
<option value="../../../12.4.0" >12.4.0</option>
<option value="../../../12.5.0" >12.5.0</option>
<option value="../../../12.5.1" >12.5.1</option>
<option value="../../../12.6.0" >12.6.0</option>
<option value="../../../12.7.0" >12.7.0</option>
<option value="../../../12.8.0" >12.8.0</option>
<option value="../../../12.9.0" >12.9.0</option>
<option value="../../../12.10.0" >12.10.0</option>
<option value="../../../12.11.0" >12.11.0</option>
</select>
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="../../search.html" method="get">
<input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
<p class="caption" role="heading"><span class="caption-text">Table of Contents</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../index.html">Home</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../introduction/index.html">Introduction</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../quickstart/index.html">Getting Started</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../contributing/index.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../introduction/inviolables.html">The Inviolable Principles of NuttX</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../platforms/index.html">Supported Platforms</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">OS Components</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../binfmt.html">Binary Loader</a></li>
<li class="toctree-l2"><a class="reference internal" href="../drivers/index.html">Device Drivers</a></li>
<li class="toctree-l2"><a class="reference internal" href="../nxflat.html">NXFLAT</a></li>
<li class="toctree-l2"><a class="reference internal" href="../nxgraphics/index.html">NX Graphics Subsystem</a></li>
<li class="toctree-l2"><a class="reference internal" href="../paging.html">On-Demand Paging</a></li>
<li class="toctree-l2"><a class="reference internal" href="../audio/index.html">Audio Subsystem</a></li>
<li class="toctree-l2"><a class="reference internal" href="../filesystem/index.html">NuttX File System</a></li>
<li class="toctree-l2"><a class="reference internal" href="../libs/index.html">NuttX libraries</a></li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">Network Support</a><ul class="current">
<li class="toctree-l3"><a class="reference internal" href="sixlowpan.html">6LoWPAN</a></li>
<li class="toctree-l3"><a class="reference internal" href="socketcan.html">SocketCAN Device Drivers</a></li>
<li class="toctree-l3"><a class="reference internal" href="pkt.html">“Raw” packet socket support</a></li>
<li class="toctree-l3"><a class="reference internal" href="ipfilter.html">IP Packet Filter</a></li>
<li class="toctree-l3"><a class="reference internal" href="nat.html">Network Address Translation (NAT)</a></li>
<li class="toctree-l3"><a class="reference internal" href="netdev.html">Network Devices</a></li>
<li class="toctree-l3"><a class="reference internal" href="netdriver.html">Network Drivers</a></li>
<li class="toctree-l3"><a class="reference internal" href="netguardsize.html">CONFIG_NET_GUARDSIZE</a></li>
<li class="toctree-l3"><a class="reference internal" href="netlink.html">Netlink Route support</a></li>
<li class="toctree-l3"><a class="reference internal" href="slip.html">SLIP</a></li>
<li class="toctree-l3 current"><a class="current reference internal" href="#">Work Queue Deadlocks</a><ul>
<li class="toctree-l4"><a class="reference internal" href="#use-of-work-queues">Use of Work Queues</a></li>
<li class="toctree-l4"><a class="reference internal" href="#high-and-low-priority-work-queues">High and Low Priority Work Queues</a></li>
<li class="toctree-l4"><a class="reference internal" href="#downsides-of-work-queues">Downsides of Work Queues</a></li>
<li class="toctree-l4"><a class="reference internal" href="#networking-on-work-queues">Networking on Work Queues</a></li>
<li class="toctree-l4"><a class="reference internal" href="#deadlocks">Deadlocks</a></li>
<li class="toctree-l4"><a class="reference internal" href="#iobs">IOBs</a></li>
<li class="toctree-l4"><a class="reference internal" href="#alternatives-to-work-queues">Alternatives to Work Queues</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../mm/index.html">Memory Management</a></li>
<li class="toctree-l2"><a class="reference internal" href="../syscall.html">Syscall Layer</a></li>
<li class="toctree-l2"><a class="reference internal" href="../tools/index.html"><code class="docutils literal notranslate"><span class="pre">/tools</span></code> Host Tools</a></li>
<li class="toctree-l2"><a class="reference internal" href="../arch/index.html">Architecture-Specific Code</a></li>
<li class="toctree-l2"><a class="reference internal" href="../boards.html">Boards Support</a></li>
<li class="toctree-l2"><a class="reference internal" href="../cmake.html">CMake Support</a></li>
<li class="toctree-l2"><a class="reference internal" href="../openamp.html">OpenAMP Support</a></li>
<li class="toctree-l2"><a class="reference internal" href="../video.html">Video Subsystem</a></li>
<li class="toctree-l2"><a class="reference internal" href="../crypto.html">Crypto API Subsystem</a></li>
<li class="toctree-l2"><a class="reference internal" href="../wireless.html">Wireless Subsystem</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../applications/index.html">Applications</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../implementation/index.html">Implementation Details</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../reference/index.html">API Reference</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../faq/index.html">FAQ</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../guides/index.html">Guides</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../glossary.html">Glossary</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../logos/index.html">NuttX Logos</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="../../index.html">NuttX</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="Page navigation">
<ul class="wy-breadcrumbs">
<li><a href="../../index.html" class="icon icon-home" aria-label="Home"></a></li>
<li class="breadcrumb-item"><a href="../index.html">OS Components</a></li>
<li class="breadcrumb-item"><a href="index.html">Network Support</a></li>
<li class="breadcrumb-item active">Work Queue Deadlocks</li>
<li class="wy-breadcrumbs-aside">
<a href="../../_sources/components/net/wqueuedeadlocks.rst.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<section id="work-queue-deadlocks">
<h1>Work Queue Deadlocks<a class="headerlink" href="#work-queue-deadlocks" title="Permalink to this heading"></a></h1>
<section id="use-of-work-queues">
<h2>Use of Work Queues<a class="headerlink" href="#use-of-work-queues" title="Permalink to this heading"></a></h2>
<p>Most network drivers use a work queue to handle network events. This is done for
two reason: (1) Most of the example code to leverage from does it that way, and (2)
it is easier and is a more efficient use memory resources to use the work queue
rather than creating a dedicated task/thread to service the network.</p>
</section>
<section id="high-and-low-priority-work-queues">
<h2>High and Low Priority Work Queues<a class="headerlink" href="#high-and-low-priority-work-queues" title="Permalink to this heading"></a></h2>
<p>There are two work queues: A single, high priority work queue that is intended
only to service the back end interrupt processing in a semi-normal, tasking
context. And low priority work queue(s) that are similar but as then name implies
are lower in priority and not dedicated for time-critical back end interrupt
processing.</p>
</section>
<section id="downsides-of-work-queues">
<h2>Downsides of Work Queues<a class="headerlink" href="#downsides-of-work-queues" title="Permalink to this heading"></a></h2>
<p>There are two important downsides to the use of work queues. First, the work queues
are inherently non-deterministic. The time delay from the point at which you
schedule work and the time at which the work is performed in highly random and
that delay is due not only to the strict priority scheduling but also to what
work as been queued ahead of you.</p>
<p>Why do you bother to use an RTOS if you rely on non-deterministic work queues to do
most of the work?</p>
<p>A second problem is related: Only one work queue job can be performed at a time.
That job should be brief so that it can make the work queue available again for
the next work queue job as soon as possible. And that job should never block
waiting for resources! If the job blocks, then it blocks the entire work queue
and makes the whole work queue unavailable for the duration of the wait.</p>
</section>
<section id="networking-on-work-queues">
<h2>Networking on Work Queues<a class="headerlink" href="#networking-on-work-queues" title="Permalink to this heading"></a></h2>
<p>As mentioned, most network drivers use a work queue to handle network events.
(some are even configurable to use high priority work queue… YIKES!). Most
network operations are not really suited for execution on a work queue: The
networking operations can be quite extended and also can block waiting for for
the availability of resources. So, at a minimum, networking should never use
the high priority work queue.</p>
</section>
<section id="deadlocks">
<h2>Deadlocks<a class="headerlink" href="#deadlocks" title="Permalink to this heading"></a></h2>
<p>If there is only a single instance of a work queue, then it is easy to create a
deadlock on the work queue if a work job blocks on the work queue. Here is the
generic work queue deadlock scenario:</p>
<ul class="simple">
<li><p>A job runs on a work queue and waits for the availability of a resource.</p></li>
<li><p>The operation that provides that resource also runs on the same work queue.</p></li>
<li><p>But since the work queue is blocked waiting for the resource, the job that
provides the resource cannot run and a deadlock results.</p></li>
</ul>
</section>
<section id="iobs">
<h2>IOBs<a class="headerlink" href="#iobs" title="Permalink to this heading"></a></h2>
<p>IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to
efficiently buffer variable sized network packet data. This is a much more
efficient use of buffering space than full packet buffers since the packets
content is often much smaller than the full packet size (the MSS).</p>
<p>The network allocates IOBs to support TCP and UDP read-ahead buffering and write
buffering. Read-head buffering is used when TCP/UDP data is received and there is
no receiver in place waiting to accept the data. In this case, the received
payload is buffered in the IOB-based, read-ahead buffers. When the application
next calls <code class="docutils literal notranslate"><span class="pre">revc()</span></code> or <code class="docutils literal notranslate"><span class="pre">recvfrom()</span></code>, the date will be removed from the read-ahead
buffer and returned to the caller immediately.</p>
<p>Write-buffering refers to the similar feature on the outgoing side. When application
calls <code class="docutils literal notranslate"><span class="pre">send()</span></code> or <code class="docutils literal notranslate"><span class="pre">sendto()</span></code> and the driver is not available to accept the new packet
data, then data is buffered in IOBs in the write buffer chain. When the network
driver is finally available to take more data, then packet data is removed from
the write-buffer and provided to the driver.</p>
<p>The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated
when the system starts. If the network runs out of IOBs, additional IOBs will not
be allocated dynamically, rather, the IOB allocator, <code class="docutils literal notranslate"><span class="pre">iob_alloc()</span></code> will block waiting
until an IOB is finally returned to pool of free IOBs. There is also a non-blocking
IOB allocator, <code class="docutils literal notranslate"><span class="pre">iob_tryalloc()</span></code>.</p>
<p>Under conditions of high utilization, such as sending large amount of data at high
rates or receiving large amounts of data at high rates, it is inevitable that the
system will run out of pre-allocated IOBs. For read-ahead buffering, the packets
are simply dropped in this case. For TCP this means that there will be a subsequent
timeout on the remote peer because no ACK will be received and the remote peer will
eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or
dropped datagrams must be included in any UDP design.</p>
<p>For write-buffering, there are three possible behaviors that can occur when the
IOB pool has been exhausted: First, if there are no available IOBs at the beginning
of a <code class="docutils literal notranslate"><span class="pre">send()</span></code> or <code class="docutils literal notranslate"><span class="pre">sendto()</span></code> transfer, then the operation will block until IOBs are again
available if <code class="docutils literal notranslate"><span class="pre">O_NONBLOCK</span></code> is not selected. This delay can can be a substantial amount
of time.</p>
<p>Second, if <code class="docutils literal notranslate"><span class="pre">O_NONBLOCK</span></code> is selected, the send will, of course, return immediately,
failing with errno set <code class="docutils literal notranslate"><span class="pre">EAGAIN</span></code> if we cannot allocate the first IOB for the transfer.</p>
<p>The third behavior occurs if the we run out of IOBs in the middle of the transfer.
Then the send operation will not wait but will instead send then number of bytes that
it has successfully buffered. Applications should always check the return value from
<code class="docutils literal notranslate"><span class="pre">send()</span></code> or <code class="docutils literal notranslate"><span class="pre">sendto()</span></code>. If it a is a byte count less then the requested transfer
size, then the send function should be called again.</p>
<p>The blocking iob_alloc() call is also the a common cause of work queue deadlocks.
The scenario again is:</p>
<ul class="simple">
<li><p>Some logic in the OS runs on a work queue and blocks waiting for an IOB to
become available,</p></li>
<li><p>The logic that releases the IOB also runs on the same work queue, but</p></li>
<li><p>That logic that provides the IOB cannot execute, however, because the other job
is blocked waiting for the IOB on the same work queue.</p></li>
</ul>
</section>
<section id="alternatives-to-work-queues">
<h2>Alternatives to Work Queues<a class="headerlink" href="#alternatives-to-work-queues" title="Permalink to this heading"></a></h2>
<p>To avoid network deadlocks here is the rule: Never run the network on a singleton
work queue!</p>
<p>Most network implementation do just that! Here are a couple of alternatives:</p>
<ol class="arabic">
<li><p>Use Multiple Low Priority Work Queues
Unlike the high priority work queues, the low priority work queues utilize a
thread pool. The number of threads in the pool is controlled by the
<code class="docutils literal notranslate"><span class="pre">CONFIG_SCHED_LPNTHREADS</span></code>. If <code class="docutils literal notranslate"><span class="pre">CONFIG_SCHED_LPNTHREADS</span></code> is greater than one,
then such deadlocks should not be possible: In that case, if a thread is busy with
some other job (even if it is only waiting for a resource), then the job will be
assigned to a different thread and the deadlock will be broken. The cost of the
additional low priority work queue thread is primarily the memory set aside for
the thread’s stack.</p></li>
<li><p>Use a Dedicated Network Thread
The best solution would be to write a custom kernel thread to handle driver
network operations. This would be the highest performing and the most manageable.
It would also, however, but substantially more work.</p></li>
<li><p>Interactions with Network Locks
The network lock is a re-entrant mutex that enforces mutually exclusive access to
the network. The network lock can also cause deadlocks and can also interact with
the work queues to degrade performance. Consider this scenario:</p>
<blockquote>
<div><ul class="simple">
<li><p>Some network logic, perhaps running on on the application thread, takes the network
lock then waits for an IOB to become available (on the application thread, not a
work queue).</p></li>
<li><p>Some network related event runs on the work queue but is blocked waiting for
the network lock.</p></li>
<li><p>Another job is queued behind that network job. This is the one that provides the
IOB, but it cannot run because the other thread is blocked waiting for the network
lock on the work queue.</p></li>
</ul>
</div></blockquote>
<p>But the network will not be unlocked because the application logic holds the network
lock and is waiting for the IOB which can never be released.</p>
<p>Within the network, this deadlock condition is avoided using a special function
<code class="docutils literal notranslate"><span class="pre">net_ioballoc()</span></code>. <code class="docutils literal notranslate"><span class="pre">net_ioballoc()</span></code> is a wrapper around the blocking <code class="docutils literal notranslate"><span class="pre">iob_alloc()</span></code>
that momentarily releases the network lock while waiting for the IOB to become available.</p>
<p>Similarly, the network functions <code class="docutils literal notranslate"><span class="pre">net_lockedait()</span></code> and <code class="docutils literal notranslate"><span class="pre">net_timedait()</span></code> are wrappers
around <code class="docutils literal notranslate"><span class="pre">nxsem_wait()</span></code> <code class="docutils literal notranslate"><span class="pre">nxsem_timedwait()</span></code>, respectively, and also release the network
lock for the duration of the wait.</p>
<p>Caution should be used with any of these wrapper functions. Because the network lock is
relinquished during the wait, there could changes in the network state that occur before
the lock is recovered. Your design should account for this possibility.</p>
</li>
</ol>
</section>
</section>
</div>
</div>
<footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
<a href="slip.html" class="btn btn-neutral float-left" title="SLIP" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
<a href="../mm/index.html" class="btn btn-neutral float-right" title="Memory Management" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
</div>
<hr/>
<div role="contentinfo">
<p>&#169; Copyright 2023, The Apache Software Foundation.</p>
</div>
</footer>
</div>
</div>
</section>
</div>
<script>
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>