| ==================== |
| Work Queue Deadlocks |
| ==================== |
| |
| Use of Work Queues |
| ================== |
| |
| Most network drivers use a work queue to handle network events. This is done for |
| two reason: (1) Most of the example code to leverage from does it that way, and (2) |
| it is easier and is a more efficient use memory resources to use the work queue |
| rather than creating a dedicated task/thread to service the network. |
| |
| High and Low Priority Work Queues |
| ================================= |
| |
| There are two work queues: A single, high priority work queue that is intended |
| only to service the back end interrupt processing in a semi-normal, tasking |
| context. And low priority work queue(s) that are similar but as then name implies |
| are lower in priority and not dedicated for time-critical back end interrupt |
| processing. |
| |
| Downsides of Work Queues |
| ======================== |
| |
| There are two important downsides to the use of work queues. First, the work queues |
| are inherently non-deterministic. The time delay from the point at which you |
| schedule work and the time at which the work is performed in highly random and |
| that delay is due not only to the strict priority scheduling but also to what |
| work as been queued ahead of you. |
| |
| Why do you bother to use an RTOS if you rely on non-deterministic work queues to do |
| most of the work? |
| |
| A second problem is related: Only one work queue job can be performed at a time. |
| That job should be brief so that it can make the work queue available again for |
| the next work queue job as soon as possible. And that job should never block |
| waiting for resources! If the job blocks, then it blocks the entire work queue |
| and makes the whole work queue unavailable for the duration of the wait. |
| |
| Networking on Work Queues |
| ========================= |
| |
| As mentioned, most network drivers use a work queue to handle network events. |
| (some are even configurable to use high priority work queue... YIKES!). Most |
| network operations are not really suited for execution on a work queue: The |
| networking operations can be quite extended and also can block waiting for for |
| the availability of resources. So, at a minimum, networking should never use |
| the high priority work queue. |
| |
| Deadlocks |
| ========= |
| |
| If there is only a single instance of a work queue, then it is easy to create a |
| deadlock on the work queue if a work job blocks on the work queue. Here is the |
| generic work queue deadlock scenario: |
| |
| * A job runs on a work queue and waits for the availability of a resource. |
| * The operation that provides that resource also runs on the same work queue. |
| * But since the work queue is blocked waiting for the resource, the job that |
| provides the resource cannot run and a deadlock results. |
| |
| IOBs |
| ==== |
| |
| IOBs (I/O Blocks) are small I/O buffers that can be linked together in chains to |
| efficiently buffer variable sized network packet data. This is a much more |
| efficient use of buffering space than full packet buffers since the packets |
| content is often much smaller than the full packet size (the MSS). |
| |
| The network allocates IOBs to support TCP and UDP read-ahead buffering and write |
| buffering. Read-head buffering is used when TCP/UDP data is received and there is |
| no receiver in place waiting to accept the data. In this case, the received |
| payload is buffered in the IOB-based, read-ahead buffers. When the application |
| next calls ``revc()`` or ``recvfrom()``, the date will be removed from the read-ahead |
| buffer and returned to the caller immediately. |
| |
| Write-buffering refers to the similar feature on the outgoing side. When application |
| calls ``send()`` or ``sendto()`` and the driver is not available to accept the new packet |
| data, then data is buffered in IOBs in the write buffer chain. When the network |
| driver is finally available to take more data, then packet data is removed from |
| the write-buffer and provided to the driver. |
| |
| The IOBs are allocated with a fixed size. A fixed number of IOBs are pre-allocated |
| when the system starts. If the network runs out of IOBs, additional IOBs will not |
| be allocated dynamically, rather, the IOB allocator, ``iob_alloc()`` will block waiting |
| until an IOB is finally returned to pool of free IOBs. There is also a non-blocking |
| IOB allocator, ``iob_tryalloc()``. |
| |
| Under conditions of high utilization, such as sending large amount of data at high |
| rates or receiving large amounts of data at high rates, it is inevitable that the |
| system will run out of pre-allocated IOBs. For read-ahead buffering, the packets |
| are simply dropped in this case. For TCP this means that there will be a subsequent |
| timeout on the remote peer because no ACK will be received and the remote peer will |
| eventually re-transmit the packet. UDP is a lossy transfer and handling of lost or |
| dropped datagrams must be included in any UDP design. |
| |
| For write-buffering, there are three possible behaviors that can occur when the |
| IOB pool has been exhausted: First, if there are no available IOBs at the beginning |
| of a ``send()`` or ``sendto()`` transfer, then the operation will block until IOBs are again |
| available if ``O_NONBLOCK`` is not selected. This delay can can be a substantial amount |
| of time. |
| |
| Second, if ``O_NONBLOCK`` is selected, the send will, of course, return immediately, |
| failing with errno set ``EAGAIN`` if we cannot allocate the first IOB for the transfer. |
| |
| The third behavior occurs if the we run out of IOBs in the middle of the transfer. |
| Then the send operation will not wait but will instead send then number of bytes that |
| it has successfully buffered. Applications should always check the return value from |
| ``send()`` or ``sendto()``. If it a is a byte count less then the requested transfer |
| size, then the send function should be called again. |
| |
| The blocking iob_alloc() call is also the a common cause of work queue deadlocks. |
| The scenario again is: |
| |
| * Some logic in the OS runs on a work queue and blocks waiting for an IOB to |
| become available, |
| * The logic that releases the IOB also runs on the same work queue, but |
| * That logic that provides the IOB cannot execute, however, because the other job |
| is blocked waiting for the IOB on the same work queue. |
| |
| Alternatives to Work Queues |
| =========================== |
| |
| To avoid network deadlocks here is the rule: Never run the network on a singleton |
| work queue! |
| |
| Most network implementation do just that! Here are a couple of alternatives: |
| |
| #. Use Multiple Low Priority Work Queues |
| Unlike the high priority work queues, the low priority work queues utilize a |
| thread pool. The number of threads in the pool is controlled by the |
| ``CONFIG_SCHED_LPNTHREADS``. If ``CONFIG_SCHED_LPNTHREADS`` is greater than one, |
| then such deadlocks should not be possible: In that case, if a thread is busy with |
| some other job (even if it is only waiting for a resource), then the job will be |
| assigned to a different thread and the deadlock will be broken. The cost of the |
| additional low priority work queue thread is primarily the memory set aside for |
| the thread's stack. |
| |
| #. Use a Dedicated Network Thread |
| The best solution would be to write a custom kernel thread to handle driver |
| network operations. This would be the highest performing and the most manageable. |
| It would also, however, but substantially more work. |
| |
| #. Interactions with Network Locks |
| The network lock is a re-entrant mutex that enforces mutually exclusive access to |
| the network. The network lock can also cause deadlocks and can also interact with |
| the work queues to degrade performance. Consider this scenario: |
| |
| * Some network logic, perhaps running on on the application thread, takes the network |
| lock then waits for an IOB to become available (on the application thread, not a |
| work queue). |
| * Some network related event runs on the work queue but is blocked waiting for |
| the network lock. |
| * Another job is queued behind that network job. This is the one that provides the |
| IOB, but it cannot run because the other thread is blocked waiting for the network |
| lock on the work queue. |
| |
| But the network will not be unlocked because the application logic holds the network |
| lock and is waiting for the IOB which can never be released. |
| |
| Within the network, this deadlock condition is avoided using a special function |
| ``net_ioballoc()``. ``net_ioballoc()`` is a wrapper around the blocking ``iob_alloc()`` |
| that momentarily releases the network lock while waiting for the IOB to become available. |
| |
| Similarly, the network functions ``net_lockedait()`` and ``net_timedait()`` are wrappers |
| around ``nxsem_wait()`` ``nxsem_timedwait()``, respectively, and also release the network |
| lock for the duration of the wait. |
| |
| Caution should be used with any of these wrapper functions. Because the network lock is |
| relinquished during the wait, there could changes in the network state that occur before |
| the lock is recovered. Your design should account for this possibility. |
| |
| |
| |