Documentation/components/paging.rst - nuttx - Git at Google

 .. _ondemandpaging:

 ================
 On-Demand Paging
 ================

 Kernel Build Implementation
 ===========================

 On-demand paging and lazy loading are techniques used to manage physical
 memory. The basic idea is to allow a program to execute even though the
 entire program is not resident in memory. The program is loaded into
 memory on demand. This is a technique that is used in many operating
 systems to allow large programs to execute on small memory systems.
 Commonly, a Memory Management Unit (MMU) is used to map virtual memory
 into physical memory. Applications are then loaded into virtual memory
 address spaces and access to physical memory is managed by the MMU. If
 the virtual memory is not resident in physical memory, then a page fault
 occurs. The operating system then loads the missing page into memory and
 resumes execution.

 Requirements and Assumptions
 ----------------------------

 On-demand paging requires *Kernel Build* (``CONFIG_BUILD_KERNEL=y``) mode.
 In this mode, no applications are built within the NuttX kernel. Instead,
 the applications are built as separate programs that are loaded into memory
 (``CONFIG_ELF=y`` and ``CONFIG_BINFMT_LOADABLE=y``). In this mode, each
 process has its own address environment (``CONFIG_ARCH_ADDRENV=y``).

 Logic Design Description
 ------------------------

 When an application is being loaded ``up_addrenv_create`` is called to create
 the process's address environment. This includes mapping the commonly used
 ``text``, ``data`` and ``heap`` sections within the virtual memory space.
 Without on-demand paging, the physical memory is then allocated and mapped
 accordingly, before the process is started. When on-demand paging is enabled,
 usually only one single page for each section is allocated and mapped.

 The process starts executing within its address environment, accessing the
 virtual memory. Whenever it tries to access a virtual memory address that is
 not mapped in the MMU, a page fault occurs. The MMU then triggers an
 exception that is handled by the kernel. The kernel then checks if there are
 enough free physical pages available and maps the virtual memory address to
 it. Finally, execution is resumed from the same point where the page fault
 first occurred.

 Example: RISC-V
 ^^^^^^^^^^^^^^^

 RISC-V's ``up_addrenv_create`` calls ``create_region`` (both defined in
 ``arch/risc-v/src/common/riscv_addrenv.c``). ``create_region`` maps a single
 region to MMU by allocating physical memory for the page tables. When
 ``CONFIG_PAGING=y`` is not selected, all the physical page tables are
 allocated from the physical memory space and then mapped to the virtual
 memory space. When ``CONFIG_PAGING=y`` is selected, only the first page of
 each section is mapped to the virtual memory space. The rest of the pages are
 mapped to the virtual memory space only when a page fault occurs.

 The page fault is handled by the ``riscv_fillpage`` function in the exception
 handler (defined in ``arch/risc-v/src/common/riscv_exception.c``). Whenever
 a page fault occurs, the ``riscv_fillpage`` function is called. This function
 allocates a physical page and maps it to the virtual memory space that
 triggered the page fault exception and then resumes execution from the same
 point where the page fault first occurred.

 :ref:`knsh_paging` simulates a device with 4MiB physical memory with 8MiB
 of virtual heap memory allocated for each process. This is possible by
 enabling on-demand paging.

 Legacy Implementation
 =====================

 This legacy implementation runs on *Flat Build* (*Kernel Build* did not
 even exist at that time).

 What kind of platforms can support NuttX legacy on-demand paging?

   #. The MCU should have some large, probably low-cost non-volatile
      storage such as serial FLASH or an SD card. This storage probably
      does not support non-random access (otherwise, why not just execute
      the program directly on the storage media). SD and serial FLASH are
      inexpensive and do not require very many pins and SPI support is
      prevalent in just about all MCUs. This large serial FLASH would
      contain a big program. Perhaps a program of several megabytes in
      size.
   #. The MCU must have a (relatively) small block of fast SRAM from which
      it can execute code. A size of, say 256K (or 192K as in the NXP
      LPC3131) would be sufficient for many applications.
   #. The MCU has an MMU (again like the NXP LPC3131).

 If the platform meets these requirements, then NuttX can provide
 on-demand paging: It can copy .text from the large program in
 non-volatile media into RAM as needed to execute a huge program from the
 small RAM.

 Terminology
 -----------

   ``g_waitingforfill``:
      An OS list that is used to hold the TCBs of tasks that are waiting
      for a page fill.
   ``g_pftcb``:
      A variable that holds a reference to the TCB of the thread that is
      currently be re-filled.
   ``g_pgworker``:
      The *process* ID of the thread that will perform the page fills.
   ``pg_callback()``:
      The callback function that is invoked from a driver when the fill is
      complete.
   ``pg_miss()``:
      The function that is called from architecture-specific code to handle
      a page fault.
   ``TCB``:
      Task Control Block

 NuttX Common Logic Design Description
 -------------------------------------

 Initialization
 ^^^^^^^^^^^^^^

 The following declarations will be added.

 -  ``g_waitingforfill``. A doubly linked list that will be used to
    implement a prioritized list of the TCBs of tasks that are waiting
    for a page fill.
 -  ``g_pgworker``. The *process* ID of the thread that will perform
    the page fills

 During OS initialization in ``sched/init/nx_start.c``, the following
 steps will be performed:

 -  The ``g_waitingforfill`` queue will be initialized.
 -  The special, page fill worker thread, will be started. The ``pid`` of
    the page will worker thread will be saved in ``g_pgworker``. Note
    that we need a special worker thread to perform fills; we cannot use
    the "generic" worker thread facility because we cannot be assured
    that all actions called by that worker thread will always be resident
    in memory.

 Declarations for ``g_waitingforfill``, ``g_pgworker``, and other
 internal, private definitions will be provided in
 ``sched/paging/paging.h``. All public definitions that should be used by
 the architecture-specific code will be available in
 ``include/nuttx/page.h``. Most architecture-specific functions are
 declared in ``include/nuttx/arch.h``, but for the case of this paging
 logic, those architecture specific functions are instead declared in
 ``include/nuttx/page.h``.

 Page Faults
 ^^^^^^^^^^^

 **Page fault exception handling**. Page fault handling is performed by
 the function ``pg_miss()``. This function is called from
 architecture-specific memory segmentation fault handling logic. This
 function will perform the following operations:

 #. **Sanity checking**. This function will ASSERT if the currently
    executing task is the page fill worker thread. The page fill worker
    thread is how the page fault is resolved and all logic associated
    with the page fill worker must be "`locked <#MemoryOrg>`__" and
    always present in memory.
 #. **Block the currently executing task**. This function will call
    ``up_switch_context()`` to block the task at the head of the ready-to-run
    list. This should cause an interrupt level context switch to the next
    highest priority task. The blocked task will be marked with state
    ``TSTATE_WAIT_PAGEFILL`` and will be retained in the
    ``g_waitingforfill`` prioritized task list.
 #. **Boost the page fill worker thread priority**. Check the priority of
    the task at the head of the ``g_waitingforfill`` list. If the
    priority of that task is higher than the current priority of the page
    fill worker thread, then boost the priority of the page fill worker
    thread to that priority. Thus, the page fill worker thread will
    always run at the priority of the highest priority task that is
    waiting for a fill.
 #. **Signal the page fill worker thread**. Is there a page already being
    filled? If not then signal the page fill worker thread to start
    working on the queued page fill requests.

 When signaled from ``pg_miss()``, the page fill worker thread will be
 awakenend and will initiate the fill operation.

 **Input Parameters.** None -- The head of the ready-to-run list is
 assumed to be that task that caused the exception. The current task
 context should already be saved in the TCB of that task. No additional
 inputs are required.

 **Assumptions**.

 -  It is assumed that this function is called from the level of an
    exception handler and that all interrupts are disabled.
 -  The ``pg_miss()`` must be "`locked <#MemoryOrg>`__" in memory.
    Calling ``pg_miss()`` cannot cause a nested page fault.
 -  It is assumed that currently executing task (the one at the head of
    the ready-to-run list) is the one that cause the fault. This will
    always be true unless the page fault occurred in an interrupt
    handler. Interrupt handling logic must always be available and
    "`locked <#MemoryOrg>`__" into memory so that page faults never come
    from interrupt handling.
 -  The architecture-specific page fault exception handling has already
    verified that the exception did not occur from interrupt/exception
    handling logic.
 -  As mentioned above, the task causing the page fault must not be the
    page fill worker thread because that is the only way to complete the
    page fill.

 Fill Initiation
 ^^^^^^^^^^^^^^^

 The page fill worker thread will be awakened on one of three conditions:

 -  When signaled by ``pg_miss()``, the page fill worker thread will be
    awakenend (see above),
 -  From ``pg_callback()`` after completing last fill (when
    ``CONFIG_PAGING_BLOCKINGFILL`` is defined... see below), or
 -  A configurable timeout expires with no activity. This timeout can be
    used to detect failure conditions such things as fills that never
    complete.

 The page fill worker thread will maintain a static variable called
 ``struct tcb_s *g_pftcb``. If no fill is in progress, ``g_pftcb`` will
 be NULL. Otherwise, it will point to the TCB of the task which is
 receiving the fill that is in progress.

 When awakened from ``pg_miss()``, no fill will be in progress and
 ``g_pftcb`` will be NULL. In this case, the page fill worker thread will
 call ``pg_startfill()``. That function will perform the following
 operations:

 -  Call the architecture-specific function ``up_checkmapping()`` to see
    if the page fill still needs to be performed. In certain conditions,
    the page fault may occur on several threads and be queued multiple
    times. In this corner case, the blocked task will simply be restarted
    (see the logic below for the case of normal completion of the fill
    operation).
 -  Call ``up_allocpage(tcb, &vpage)``. This architecture-specific
    function will set aside page in memory and map to virtual address
    (vpage). If all available pages are in-use (the typical case), this
    function will select a page in-use, un-map it, and make it available.
 -  Call the architecture-specific function ``up_fillpage()``. Two
    versions of the up_fillpage function are supported -- a blocking and
    a non-blocking version based upon the configuration setting
    ``CONFIG_PAGING_BLOCKINGFILL``.

    -  If ``CONFIG_PAGING_BLOCKINGFILL`` is defined, then up_fillpage is
       blocking call. In this case, ``up_fillpage()`` will accept only
       (1) a reference to the TCB that requires the fill.
       Architecture-specific context information within the TCB will be
       sufficient to perform the fill. And (2) the (virtual) address of
       the allocated page to be filled. The resulting status of the fill
       will be provided by return value from ``up_fillpage()``.
    -  If ``CONFIG_PAGING_BLOCKINGFILL`` is defined, then up_fillpage is
       non-blocking call. In this case ``up_fillpage()`` will accept an
       additional argument: The page fill worker thread will provide a
       callback function, ``pg_callback``. This function is non-blocking,
       it will start an asynchronous page fill. After calling the
       non-blocking ``up_fillpage()``, the page fill worker thread will
       wait to be signaled for the next event -- the fill completion
       event. The callback function will be called when the page fill is
       finished (or an error occurs). The resulting status of the fill
       will be providing as an argument to the callback functions. This
       callback will probably occur from interrupt level.

 In any case, while the fill is in progress, other tasks may execute. If
 another page fault occurs during this time, the faulting task will be
 blocked, its TCB will be added (in priority order) to
 ``g_waitingforfill``, and the priority of the page worker task may be
 boosted. But no action will be taken until the current page fill
 completes. NOTE: The IDLE task must also be fully
 `locked <#MemoryOrg>`__ in memory. The IDLE task cannot be blocked. It
 the case where all tasks are blocked waiting for a page fill, the IDLE
 task must still be available to run.

 The architecture-specific functions, ``up_checkmapping()``,
 ``up_allocpage(tcb, &vpage)`` and ``up_fillpage(page, pg_callback)``
 will be prototyped in ``include/nuttx/arch.h``

 Fill Complete
 ^^^^^^^^^^^^^

 For the blocking ``up_fillpage()``, the result of the fill will be
 returned directly from the call to ``up_fillpage``.

 For the non-blocking ``up_fillpage()``, the architecture-specific driver
 call the ``pg_callback()`` that was provided to ``up_fillpage()`` when
 the fill completes. In this case, the ``pg_callback()`` will probably be
 called from driver interrupt-level logic. The driver will provide the
 result of the fill as an argument to the callback function. NOTE:
 ``pg_callback()`` must also be `locked <#MemoryOrg>`__ in memory.

 In this non-blocking case, the callback ``pg_callback()`` will perform
 the following operations when it is notified that the fill has
 completed:

 -  Verify that ``g_pftcb`` is non-NULL.
 -  Find the higher priority between the task waiting for the fill to
    complete in ``g_pftcb`` and the task waiting at the head of the
    ``g_waitingforfill`` list. That will be the priority of he highest
    priority task waiting for a fill.
 -  If this higher priority is higher than current page fill worker
    thread, then boost worker thread's priority to that level. Thus, the
    page fill worker thread will always run at the priority of the
    highest priority task that is waiting for a fill.
 -  Save the result of the fill operation.
 -  Signal the page fill worker thread.

 Task Resumption
 ^^^^^^^^^^^^^^^

 For the non-blocking ``up_fillpage()``, the page fill worker thread will
 detect that the page fill is complete when it is awakened with
 ``g_pftcb`` non-NULL and fill completion status from ``pg_callback``. In
 the non-blocking case, the page fill worker thread will know that the
 page fill is complete when ``up_fillpage()`` returns.

 In this either, the page fill worker thread will:

 -  Verify consistency of state information and ``g_pftcb``.
 -  Verify that the page fill completed successfully, and if so,
 -  Call ``up_unblocktask(g_pftcb)`` to make the task that just received
    the fill ready-to-run.
 -  Check if the ``g_waitingforfill`` list is empty. If not:

    -  Remove the highest priority task waiting for a page fill from
       ``g_waitingforfill``,
    -  Save the task's TCB in ``g_pftcb``,
    -  If the priority of the thread in ``g_pftcb``, is higher in
       priority than the default priority of the page fill worker thread,
       then set the priority of the page fill worker thread to that
       priority.
    -  Call ``pg_startfill()`` which will start the next fill (as
       described above).

 -  Otherwise,

    -  Set ``g_pftcb`` to NULL.
    -  Restore the default priority of the page fill worker thread.
    -  Wait for the next fill related event (a new page fault).

 Architecture-Specific Support Requirements
 ------------------------------------------

 Memory Organization
 ^^^^^^^^^^^^^^^^^^^

 **Memory Regions**. Chip specific logic will map the virtual and
 physical address spaces into three general regions:

 #. A .text region containing "`locked-in-memory <#MemoryOrg>`__" code
    that is always available and will never cause a page fault. This
    locked memory is loaded at boot time and remains resident for all
    time. This memory regions must include:

    -  All logic for all interrupt paths. All interrupt logic must be
       locked in memory because the design present here will not support
       page faults from interrupt handlers. This includes the page fault
       handling logic and ```pg_miss()`` <#PageFaults>`__ that is called
       from the page fault handler. It also includes the
       ```pg_callback()`` <#FillComplete>`__ function that wakes up the
       page fill worker thread and whatever architecture-specific logic
       that calls ``pg_callback()``.
    -  All logic for the IDLE thread. The IDLE thread must always be
       ready to run and cannot be blocked for any reason.
    -  All of the page fill worker thread must be locked in memory. This
       thread must execute in order to unblock any thread waiting for a
       fill. It this thread were to block, there would be no way to
       complete the fills!

 #. A .text region containing pages that can be assigned allocated,
    mapped to various virtual addresses, and filled from some mass
    storage medium.
 #. And a fixed RAM space for .bss, .text, and .heap.

 This memory organization is illustrated in the following table. Notice
 that:

 -  There is a one-to-one relationship between pages in the virtual
    address space and between pages of .text in the non-volatile mass
    storage device.
 -  There are, however, far fewer physical pages available than virtual
    pages. Only a subset of physical pages will be mapped to virtual
    pages at any given time. This mapping will be performed on-demand as
    needed for program execution.

 =============================  ============================  ====================
 SRAM                           Virtual Address Space         Non-Volatile Storage
 =============================  ============================  ====================
 .                              DATA                          .
 .                              Virtual Page *n* (*n* > *m*)  Stored Page *n*
 .                              Virtual Page *n-1*            Stored Page *n-1*
 DATA                           ...                           ...
 Physical Page *m* (*m* < *n*)  ...                           ...
 Physical Page *m-1*            ...                           ...
 ...                            ...                           ...
 Physical Page *1*              Virtual Page *1*              Stored Page *1*
 Locked Memory                  Locked Memory                 Memory Resident
 =============================  ============================  ====================

 **Example**. As an example, suppose that the size of the SRAM is 192K
 (as in the NXP LPC3131). And suppose further that:

 -  The size of the locked, memory resident .text area is 32K, and
 -  The size of the DATA area is 64K.
 -  The size of one, managed page is 1K.
 -  The size of the whole .text image on the non-volatile, mass storage
    device is 1024K.

 Then, the size of the locked, memory resident code is 32K (*m*\ =32
 pages). The size of the physical page region is 96K (96 pages), and the
 size of the data region is 64 pages. And the size of the virtual paged
 region must then be greater than or equal to (1024-32) or 992 pages
 (*n*).

 **Building the Locked, In-Memory Image**. One way to accomplish this
 would be a two phase link:

 -  In the first phase, create a partially linked objected containing all
    interrupt/exception handling logic, the page fill worker thread plus
    all parts of the IDLE thread (which must always be available for
    execution).
 -  All of the ``.text`` and ``.rodata`` sections of this partial link
    should be collected into a single section.
 -  The second link would link the partially linked object along with the
    remaining object to produce the final binary. The linker script
    should position the "special" section so that it lies in a reserved,
    "non-swappable" region.

 Architecture-Specific Functions
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Most standard, architecture-specific functions are declared in
 ``include/nuttx/arch.h``. However, for the case of this paging logic,
 the architecture specific functions are declared in
 ``include/nuttx/page.h``. Standard, architecture-specific functions that
 should already be provided in the architecture port are
 :c:func:`up_switch_context`. New, additional functions that must be
 implemented just for on-demand paging support are:

 .. c:function:: int up_checkmapping(FAR struct tcb_s *tcb)

   The function ``up_checkmapping()`` returns an indication if the page
   fill still needs to performed or not. In certain conditions, the page
   fault may occur on several threads and be queued multiple times. This
   function will prevent the same page from be filled multiple times.

 .. c:function:: int up_allocpage(FAR struct tcb_s *tcb, FAR void *vpage)

   This architecture-specific function will set aside page in memory and
   map to its correct virtual address. Architecture-specific context
   information saved within the TCB will provide the function with the
   information needed to identify the virtual miss address. This function
   will return the allocated physical page address in ``vpage``. The size
   of the underlying physical page is determined by the configuration
   setting ``CONFIG_PAGING_PAGESIZE``. NOTE: This function must *always*
   return a page allocation. If all available pages are in-use (the typical
   case), then this function will select a page in-use, un-map it, and make
   it available.

 .. c:function:: int up_fillpage(FAR struct tcb_s *tcb, FAR const void *vpage, void (*pg_callback)(FAR struct tcb_s *tcb, int result))

   The actual filling of the page with data from the non-volatile, must be
   performed by a separate call to the architecture-specific function,
   ``up_fillpage()``. This will start asynchronous page fill. The common
   paging logic will provide a callback function, ``pg_callback``, that
   will be called when the page fill is finished (or an error occurs). This
   callback is assumed to occur from an interrupt level when the device
   driver completes the fill operation.