Documentation/components/net/wqueuedeadlocks.rst - nuttx - Git at Google

 ====================
 Work Queue Deadlocks
 ====================

 Use of Work Queues
 ==================

 Most network drivers use a work queue to handle network events. This is done for
 two reason: (1) Most of the example code to leverage from does it that way, and (2)
 it is easier and is a more efficient use memory resources to use the work queue
 rather than creating a dedicated task/thread to service the network.

 High and Low Priority Work Queues
 =================================

 There are two work queues: A single, high priority work queue that is intended
 only to service the back end interrupt processing in a semi-normal, tasking
 context. And low priority work queue(s) that are similar but as then name implies
 are lower in priority and not dedicated for time-critical back end interrupt
 processing.

 Downsides of Work Queues
 ========================

 There are two important downsides to the use of work queues. First, the work queues
 are inherently non-deterministic. The time delay from the point at which you
 schedule work and the time at which the work is performed in highly random and
 that delay is due not only to the strict priority scheduling but also to what
 work as been queued ahead of you.

 Why do you bother to use an RTOS if you rely on non-deterministic work queues to do
 most of the work?

 A second problem is related: Only one work queue job can be performed at a time.
 That job should be brief so that it can make the work queue available again for
 the next work queue job as soon as possible. And that job should never block
 waiting for resources! If the job blocks, then it blocks the entire work queue
 and makes the whole work queue unavailable for the duration of the wait.

 Networking on Work Queues
 =========================

 As mentioned, most network drivers use a work queue to handle network events.
 (some are even configurable to use high priority work queue... YIKES!). Most
 network operations are not really suited for execution on a work queue: The
 networking operations can be quite extended and also can block waiting for for
 the availability of resources. So, at a minimum, networking should never use
 the high priority work queue.

 Deadlocks
 =========

 If there is only a single instance of a work queue, then it is easy to create a
 deadlock on the work queue if a work job blocks on the work queue. Here is the
 generic work queue deadlock scenario:

 * A job runs on a work queue and waits for the availability of a resource.
 * The operation that provides that resource also runs on the same work queue.
 * But since the work queue is blocked waiting for the resource, the job that
   provides the resource cannot run and a deadlock results.

 IOBs
 ====

 IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to
 efficiently buffer variable sized network packet data. This is a much more
 efficient use of buffering space than full packet buffers since the packets
 content is often much smaller than the full packet size (the MSS).

 The network allocates IOBs to support TCP and UDP read-ahead buffering and write
 buffering. Read-head buffering is used when TCP/UDP data is received and there is
 no receiver in place waiting to accept the data. In this case, the received
 payload is buffered in the IOB-based, read-ahead buffers. When the application
 next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead
 buffer and returned to the caller immediately.

 Write-buffering refers to the similar feature on the outgoing side. When application
 calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet
 data, then data is buffered in IOBs in the write buffer chain. When the network
 driver is finally available to take more data, then packet data is removed from
 the write-buffer and provided to the driver.

 The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated
 when the system starts. If the network runs out of IOBs, additional IOBs will not
 be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting
 until an IOB is finally returned to pool of free IOBs. There is also a non-blocking
 IOB allocator, ``iob_tryalloc()``.

 Under conditions of high utilization, such as sending large amount of data at high
 rates or receiving large amounts of data at high rates, it is inevitable that the
 system will run out of pre-allocated IOBs. For read-ahead buffering, the packets
 are simply dropped in this case. For TCP this means that there will be a subsequent
 timeout on the remote peer because no ACK will be received and the remote peer will
 eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or
 dropped datagrams must be included in any UDP design.

 For write-buffering, there are three possible behaviors that can occur when the
 IOB pool has been exhausted: First, if there are no available IOBs at the beginning
 of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again
 available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount
 of time.

 Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediately,
 failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer.

 The third behavior occurs if the we run out of IOBs in the middle of the transfer.
 Then the send operation will not wait but will instead send then number of bytes that
 it has successfully buffered. Applications should always check the return value from
 ``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer
 size, then the send function should be called again.

 The blocking iob_alloc() call is also the a common cause of work queue deadlocks.
 The scenario again is:

 * Some logic in the OS runs on a work queue and blocks waiting for an IOB to
   become available,
 * The logic that releases the IOB also runs on the same work queue, but
 * That logic that provides the IOB cannot execute, however, because the other job
   is blocked waiting for the IOB on the same work queue.

 Alternatives to Work Queues
 ===========================

 To avoid network deadlocks here is the rule: Never run the network on a singleton
 work queue!

 Most network implementation do just that! Here are a couple of alternatives:

 #. Use Multiple Low Priority Work Queues
    Unlike the high priority work queues, the low priority work queues utilize a
    thread pool. The number of threads in the pool is controlled by the
    ``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one,
    then such deadlocks should not be possible: In that case, if a thread is busy with
    some other job (even if it is only waiting for a resource), then the job will be
    assigned to a different thread and the deadlock will be broken. The cost of the
    additional low priority work queue thread is primarily the memory set aside for
    the thread's stack.

 #. Use a Dedicated Network Thread
    The best solution would be to write a custom kernel thread to handle driver
    network operations. This would be the highest performing and the most manageable.
    It would also, however, but substantially more work.

 #. Interactions with Network Locks
    The network lock is a re-entrant mutex that enforces mutually exclusive access to
    the network. The network lock can also cause deadlocks and can also interact with
    the work queues to degrade performance. Consider this scenario:

      * Some network logic, perhaps running on on the application thread, takes the network
        lock then waits for an IOB to become available (on the application thread, not a
        work queue).
      * Some network related event runs on the work queue but is blocked waiting for
        the network lock.
      * Another job is queued behind that network job. This is the one that provides the
        IOB, but it cannot run because the other thread is blocked waiting for the network
        lock on the work queue.

    But the network will not be unlocked because the application logic holds the network
    lock and is waiting for the IOB which can never be released.

    Within the network, this deadlock condition is avoided using a special function
    ``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()``
    that momentarily releases the network lock while waiting for the IOB to become available.

    Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers
    around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network
    lock for the duration of the wait.

    Caution should be used with any of these wrapper functions. Because the network lock is
    relinquished during the wait, there could changes in the network state that occur before
    the lock is recovered. Your design should account for this possibility.
	====================
	Work Queue Deadlocks
	====================

	Use of Work Queues
	==================

	Most network drivers use a work queue to handle network events. This is done for
	two reason: (1) Most of the example code to leverage from does it that way, and (2)
	it is easier and is a more efficient use memory resources to use the work queue
	rather than creating a dedicated task/thread to service the network.

	High and Low Priority Work Queues
	=================================

	There are two work queues: A single, high priority work queue that is intended
	only to service the back end interrupt processing in a semi-normal, tasking
	context. And low priority work queue(s) that are similar but as then name implies
	are lower in priority and not dedicated for time-critical back end interrupt
	processing.

	Downsides of Work Queues
	========================

	There are two important downsides to the use of work queues. First, the work queues
	are inherently non-deterministic. The time delay from the point at which you
	schedule work and the time at which the work is performed in highly random and
	that delay is due not only to the strict priority scheduling but also to what
	work as been queued ahead of you.

	Why do you bother to use an RTOS if you rely on non-deterministic work queues to do
	most of the work?

	A second problem is related: Only one work queue job can be performed at a time.
	That job should be brief so that it can make the work queue available again for
	the next work queue job as soon as possible. And that job should never block
	waiting for resources! If the job blocks, then it blocks the entire work queue
	and makes the whole work queue unavailable for the duration of the wait.

	Networking on Work Queues
	=========================

	As mentioned, most network drivers use a work queue to handle network events.
	(some are even configurable to use high priority work queue... YIKES!). Most
	network operations are not really suited for execution on a work queue: The
	networking operations can be quite extended and also can block waiting for for
	the availability of resources. So, at a minimum, networking should never use
	the high priority work queue.

	Deadlocks
	=========

	If there is only a single instance of a work queue, then it is easy to create a
	deadlock on the work queue if a work job blocks on the work queue. Here is the
	generic work queue deadlock scenario:

	* A job runs on a work queue and waits for the availability of a resource.
	* The operation that provides that resource also runs on the same work queue.
	* But since the work queue is blocked waiting for the resource, the job that
	provides the resource cannot run and a deadlock results.

	IOBs
	====

	IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to
	efficiently buffer variable sized network packet data. This is a much more
	efficient use of buffering space than full packet buffers since the packets
	content is often much smaller than the full packet size (the MSS).

	The network allocates IOBs to support TCP and UDP read-ahead buffering and write
	buffering. Read-head buffering is used when TCP/UDP data is received and there is
	no receiver in place waiting to accept the data. In this case, the received
	payload is buffered in the IOB-based, read-ahead buffers. When the application
	next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead
	buffer and returned to the caller immediately.

	Write-buffering refers to the similar feature on the outgoing side. When application
	calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet
	data, then data is buffered in IOBs in the write buffer chain. When the network
	driver is finally available to take more data, then packet data is removed from
	the write-buffer and provided to the driver.

	The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated
	when the system starts. If the network runs out of IOBs, additional IOBs will not
	be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting
	until an IOB is finally returned to pool of free IOBs. There is also a non-blocking
	IOB allocator, ``iob_tryalloc()``.

	Under conditions of high utilization, such as sending large amount of data at high
	rates or receiving large amounts of data at high rates, it is inevitable that the
	system will run out of pre-allocated IOBs. For read-ahead buffering, the packets
	are simply dropped in this case. For TCP this means that there will be a subsequent
	timeout on the remote peer because no ACK will be received and the remote peer will
	eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or
	dropped datagrams must be included in any UDP design.

	For write-buffering, there are three possible behaviors that can occur when the
	IOB pool has been exhausted: First, if there are no available IOBs at the beginning
	of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again
	available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount
	of time.

	Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediately,
	failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer.

	The third behavior occurs if the we run out of IOBs in the middle of the transfer.
	Then the send operation will not wait but will instead send then number of bytes that
	it has successfully buffered. Applications should always check the return value from
	``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer
	size, then the send function should be called again.

	The blocking iob_alloc() call is also the a common cause of work queue deadlocks.
	The scenario again is:

	* Some logic in the OS runs on a work queue and blocks waiting for an IOB to
	become available,
	* The logic that releases the IOB also runs on the same work queue, but
	* That logic that provides the IOB cannot execute, however, because the other job
	is blocked waiting for the IOB on the same work queue.

	Alternatives to Work Queues
	===========================

	To avoid network deadlocks here is the rule: Never run the network on a singleton
	work queue!

	Most network implementation do just that! Here are a couple of alternatives:

	#. Use Multiple Low Priority Work Queues
	Unlike the high priority work queues, the low priority work queues utilize a
	thread pool. The number of threads in the pool is controlled by the
	``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one,
	then such deadlocks should not be possible: In that case, if a thread is busy with
	some other job (even if it is only waiting for a resource), then the job will be
	assigned to a different thread and the deadlock will be broken. The cost of the
	additional low priority work queue thread is primarily the memory set aside for
	the thread's stack.

	#. Use a Dedicated Network Thread
	The best solution would be to write a custom kernel thread to handle driver
	network operations. This would be the highest performing and the most manageable.
	It would also, however, but substantially more work.

	#. Interactions with Network Locks
	The network lock is a re-entrant mutex that enforces mutually exclusive access to
	the network. The network lock can also cause deadlocks and can also interact with
	the work queues to degrade performance. Consider this scenario:

	* Some network logic, perhaps running on on the application thread, takes the network
	lock then waits for an IOB to become available (on the application thread, not a
	work queue).
	* Some network related event runs on the work queue but is blocked waiting for
	the network lock.
	* Another job is queued behind that network job. This is the one that provides the
	IOB, but it cannot run because the other thread is blocked waiting for the network
	lock on the work queue.

	But the network will not be unlocked because the application logic holds the network
	lock and is waiting for the IOB which can never be released.

	Within the network, this deadlock condition is avoided using a special function
	``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()``
	that momentarily releases the network lock while waiting for the IOB to become available.

	Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers
	around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network
	lock for the duration of the wait.

	Caution should be used with any of these wrapper functions. Because the network lock is
	relinquished during the wait, there could changes in the network state that occur before
	the lock is recovered. Your design should account for this possibility.