| .. _ondemandpaging: |
| |
| ================ |
| On-Demand Paging |
| ================ |
| |
| Kernel Build Implementation |
| =========================== |
| |
| On-demand paging and lazy loading are techniques used to manage physical |
| memory. The basic idea is to allow a program to execute even though the |
| entire program is not resident in memory. The program is loaded into |
| memory on demand. This is a technique that is used in many operating |
| systems to allow large programs to execute on small memory systems. |
| Commonly, a Memory Management Unit (MMU) is used to map virtual memory |
| into physical memory. Applications are then loaded into virtual memory |
| address spaces and access to physical memory is managed by the MMU. If |
| the virtual memory is not resident in physical memory, then a page fault |
| occurs. The operating system then loads the missing page into memory and |
| resumes execution. |
| |
| Requirements and Assumptions |
| ---------------------------- |
| |
| On-demand paging requires *Kernel Build* (``CONFIG_BUILD_KERNEL=y``) mode. |
| In this mode, no applications are built within the NuttX kernel. Instead, |
| the applications are built as separate programs that are loaded into memory |
| (``CONFIG_ELF=y`` and ``CONFIG_BINFMT_LOADABLE=y``). In this mode, each |
| process has its own address environment (``CONFIG_ARCH_ADDRENV=y``). |
| |
| Logic Design Description |
| ------------------------ |
| |
| When an application is being loaded ``up_addrenv_create`` is called to create |
| the process's address environment. This includes mapping the commonly used |
| ``text``, ``data`` and ``heap`` sections within the virtual memory space. |
| Without on-demand paging, the physical memory is then allocated and mapped |
| accordingly, before the process is started. When on-demand paging is enabled, |
| usually only one single page for each section is allocated and mapped. |
| |
| The process starts executing within its address environment, accessing the |
| virtual memory. Whenever it tries to access a virtual memory address that is |
| not mapped in the MMU, a page fault occurs. The MMU then triggers an |
| exception that is handled by the kernel. The kernel then checks if there are |
| enough free physical pages available and maps the virtual memory address to |
| it. Finally, execution is resumed from the same point where the page fault |
| first occurred. |
| |
| Example: RISC-V |
| ^^^^^^^^^^^^^^^ |
| |
| RISC-V's ``up_addrenv_create`` calls ``create_region`` (both defined in |
| ``arch/risc-v/src/common/riscv_addrenv.c``). ``create_region`` maps a single |
| region to MMU by allocating physical memory for the page tables. When |
| ``CONFIG_PAGING=y`` is not selected, all the physical page tables are |
| allocated from the physical memory space and then mapped to the virtual |
| memory space. When ``CONFIG_PAGING=y`` is selected, only the first page of |
| each section is mapped to the virtual memory space. The rest of the pages are |
| mapped to the virtual memory space only when a page fault occurs. |
| |
| The page fault is handled by the ``riscv_fillpage`` function in the exception |
| handler (defined in ``arch/risc-v/src/common/riscv_exception.c``). Whenever |
| a page fault occurs, the ``riscv_fillpage`` function is called. This function |
| allocates a physical page and maps it to the virtual memory space that |
| triggered the page fault exception and then resumes execution from the same |
| point where the page fault first occurred. |
| |
| :ref:`knsh_paging` simulates a device with 4MiB physical memory with 8MiB |
| of virtual heap memory allocated for each process. This is possible by |
| enabling on-demand paging. |
| |
| Legacy Implementation |
| ===================== |
| |
| This legacy implementation runs on *Flat Build* (*Kernel Build* did not |
| even exist at that time). |
| |
| What kind of platforms can support NuttX legacy on-demand paging? |
| |
| #. The MCU should have some large, probably low-cost non-volatile |
| storage such as serial FLASH or an SD card. This storage probably |
| does not support non-random access (otherwise, why not just execute |
| the program directly on the storage media). SD and serial FLASH are |
| inexpensive and do not require very many pins and SPI support is |
| prevalent in just about all MCUs. This large serial FLASH would |
| contain a big program. Perhaps a program of several megabytes in |
| size. |
| #. The MCU must have a (relatively) small block of fast SRAM from which |
| it can execute code. A size of, say 256K (or 192K as in the NXP |
| LPC3131) would be sufficient for many applications. |
| #. The MCU has an MMU (again like the NXP LPC3131). |
| |
| If the platform meets these requirements, then NuttX can provide |
| on-demand paging: It can copy .text from the large program in |
| non-volatile media into RAM as needed to execute a huge program from the |
| small RAM. |
| |
| Terminology |
| ----------- |
| |
| ``g_waitingforfill``: |
| An OS list that is used to hold the TCBs of tasks that are waiting |
| for a page fill. |
| ``g_pftcb``: |
| A variable that holds a reference to the TCB of the thread that is |
| currently be re-filled. |
| ``g_pgworker``: |
| The *process* ID of the thread that will perform the page fills. |
| ``pg_callback()``: |
| The callback function that is invoked from a driver when the fill is |
| complete. |
| ``pg_miss()``: |
| The function that is called from architecture-specific code to handle |
| a page fault. |
| ``TCB``: |
| Task Control Block |
| |
| NuttX Common Logic Design Description |
| ------------------------------------- |
| |
| Initialization |
| ^^^^^^^^^^^^^^ |
| |
| The following declarations will be added. |
| |
| - ``g_waitingforfill``. A doubly linked list that will be used to |
| implement a prioritized list of the TCBs of tasks that are waiting |
| for a page fill. |
| - ``g_pgworker``. The *process* ID of the thread that will perform |
| the page fills |
| |
| During OS initialization in ``sched/init/nx_start.c``, the following |
| steps will be performed: |
| |
| - The ``g_waitingforfill`` queue will be initialized. |
| - The special, page fill worker thread, will be started. The ``pid`` of |
| the page will worker thread will be saved in ``g_pgworker``. Note |
| that we need a special worker thread to perform fills; we cannot use |
| the "generic" worker thread facility because we cannot be assured |
| that all actions called by that worker thread will always be resident |
| in memory. |
| |
| Declarations for ``g_waitingforfill``, ``g_pgworker``, and other |
| internal, private definitions will be provided in |
| ``sched/paging/paging.h``. All public definitions that should be used by |
| the architecture-specific code will be available in |
| ``include/nuttx/page.h``. Most architecture-specific functions are |
| declared in ``include/nuttx/arch.h``, but for the case of this paging |
| logic, those architecture specific functions are instead declared in |
| ``include/nuttx/page.h``. |
| |
| Page Faults |
| ^^^^^^^^^^^ |
| |
| **Page fault exception handling**. Page fault handling is performed by |
| the function ``pg_miss()``. This function is called from |
| architecture-specific memory segmentation fault handling logic. This |
| function will perform the following operations: |
| |
| #. **Sanity checking**. This function will ASSERT if the currently |
| executing task is the page fill worker thread. The page fill worker |
| thread is how the page fault is resolved and all logic associated |
| with the page fill worker must be "`locked <#MemoryOrg>`__" and |
| always present in memory. |
| #. **Block the currently executing task**. This function will call |
| ``up_switch_context()`` to block the task at the head of the ready-to-run |
| list. This should cause an interrupt level context switch to the next |
| highest priority task. The blocked task will be marked with state |
| ``TSTATE_WAIT_PAGEFILL`` and will be retained in the |
| ``g_waitingforfill`` prioritized task list. |
| #. **Boost the page fill worker thread priority**. Check the priority of |
| the task at the head of the ``g_waitingforfill`` list. If the |
| priority of that task is higher than the current priority of the page |
| fill worker thread, then boost the priority of the page fill worker |
| thread to that priority. Thus, the page fill worker thread will |
| always run at the priority of the highest priority task that is |
| waiting for a fill. |
| #. **Signal the page fill worker thread**. Is there a page already being |
| filled? If not then signal the page fill worker thread to start |
| working on the queued page fill requests. |
| |
| When signaled from ``pg_miss()``, the page fill worker thread will be |
| awakenend and will initiate the fill operation. |
| |
| **Input Parameters.** None -- The head of the ready-to-run list is |
| assumed to be that task that caused the exception. The current task |
| context should already be saved in the TCB of that task. No additional |
| inputs are required. |
| |
| **Assumptions**. |
| |
| - It is assumed that this function is called from the level of an |
| exception handler and that all interrupts are disabled. |
| - The ``pg_miss()`` must be "`locked <#MemoryOrg>`__" in memory. |
| Calling ``pg_miss()`` cannot cause a nested page fault. |
| - It is assumed that currently executing task (the one at the head of |
| the ready-to-run list) is the one that cause the fault. This will |
| always be true unless the page fault occurred in an interrupt |
| handler. Interrupt handling logic must always be available and |
| "`locked <#MemoryOrg>`__" into memory so that page faults never come |
| from interrupt handling. |
| - The architecture-specific page fault exception handling has already |
| verified that the exception did not occur from interrupt/exception |
| handling logic. |
| - As mentioned above, the task causing the page fault must not be the |
| page fill worker thread because that is the only way to complete the |
| page fill. |
| |
| Fill Initiation |
| ^^^^^^^^^^^^^^^ |
| |
| The page fill worker thread will be awakened on one of three conditions: |
| |
| - When signaled by ``pg_miss()``, the page fill worker thread will be |
| awakenend (see above), |
| - From ``pg_callback()`` after completing last fill (when |
| ``CONFIG_PAGING_BLOCKINGFILL`` is defined... see below), or |
| - A configurable timeout expires with no activity. This timeout can be |
| used to detect failure conditions such things as fills that never |
| complete. |
| |
| The page fill worker thread will maintain a static variable called |
| ``struct tcb_s *g_pftcb``. If no fill is in progress, ``g_pftcb`` will |
| be NULL. Otherwise, it will point to the TCB of the task which is |
| receiving the fill that is in progress. |
| |
| When awakened from ``pg_miss()``, no fill will be in progress and |
| ``g_pftcb`` will be NULL. In this case, the page fill worker thread will |
| call ``pg_startfill()``. That function will perform the following |
| operations: |
| |
| - Call the architecture-specific function ``up_checkmapping()`` to see |
| if the page fill still needs to be performed. In certain conditions, |
| the page fault may occur on several threads and be queued multiple |
| times. In this corner case, the blocked task will simply be restarted |
| (see the logic below for the case of normal completion of the fill |
| operation). |
| - Call ``up_allocpage(tcb, &vpage)``. This architecture-specific |
| function will set aside page in memory and map to virtual address |
| (vpage). If all available pages are in-use (the typical case), this |
| function will select a page in-use, un-map it, and make it available. |
| - Call the architecture-specific function ``up_fillpage()``. Two |
| versions of the up_fillpage function are supported -- a blocking and |
| a non-blocking version based upon the configuration setting |
| ``CONFIG_PAGING_BLOCKINGFILL``. |
| |
| - If ``CONFIG_PAGING_BLOCKINGFILL`` is defined, then up_fillpage is |
| blocking call. In this case, ``up_fillpage()`` will accept only |
| (1) a reference to the TCB that requires the fill. |
| Architecture-specific context information within the TCB will be |
| sufficient to perform the fill. And (2) the (virtual) address of |
| the allocated page to be filled. The resulting status of the fill |
| will be provided by return value from ``up_fillpage()``. |
| - If ``CONFIG_PAGING_BLOCKINGFILL`` is defined, then up_fillpage is |
| non-blocking call. In this case ``up_fillpage()`` will accept an |
| additional argument: The page fill worker thread will provide a |
| callback function, ``pg_callback``. This function is non-blocking, |
| it will start an asynchronous page fill. After calling the |
| non-blocking ``up_fillpage()``, the page fill worker thread will |
| wait to be signaled for the next event -- the fill completion |
| event. The callback function will be called when the page fill is |
| finished (or an error occurs). The resulting status of the fill |
| will be providing as an argument to the callback functions. This |
| callback will probably occur from interrupt level. |
| |
| In any case, while the fill is in progress, other tasks may execute. If |
| another page fault occurs during this time, the faulting task will be |
| blocked, its TCB will be added (in priority order) to |
| ``g_waitingforfill``, and the priority of the page worker task may be |
| boosted. But no action will be taken until the current page fill |
| completes. NOTE: The IDLE task must also be fully |
| `locked <#MemoryOrg>`__ in memory. The IDLE task cannot be blocked. It |
| the case where all tasks are blocked waiting for a page fill, the IDLE |
| task must still be available to run. |
| |
| The architecture-specific functions, ``up_checkmapping()``, |
| ``up_allocpage(tcb, &vpage)`` and ``up_fillpage(page, pg_callback)`` |
| will be prototyped in ``include/nuttx/arch.h`` |
| |
| Fill Complete |
| ^^^^^^^^^^^^^ |
| |
| For the blocking ``up_fillpage()``, the result of the fill will be |
| returned directly from the call to ``up_fillpage``. |
| |
| For the non-blocking ``up_fillpage()``, the architecture-specific driver |
| call the ``pg_callback()`` that was provided to ``up_fillpage()`` when |
| the fill completes. In this case, the ``pg_callback()`` will probably be |
| called from driver interrupt-level logic. The driver will provide the |
| result of the fill as an argument to the callback function. NOTE: |
| ``pg_callback()`` must also be `locked <#MemoryOrg>`__ in memory. |
| |
| In this non-blocking case, the callback ``pg_callback()`` will perform |
| the following operations when it is notified that the fill has |
| completed: |
| |
| - Verify that ``g_pftcb`` is non-NULL. |
| - Find the higher priority between the task waiting for the fill to |
| complete in ``g_pftcb`` and the task waiting at the head of the |
| ``g_waitingforfill`` list. That will be the priority of he highest |
| priority task waiting for a fill. |
| - If this higher priority is higher than current page fill worker |
| thread, then boost worker thread's priority to that level. Thus, the |
| page fill worker thread will always run at the priority of the |
| highest priority task that is waiting for a fill. |
| - Save the result of the fill operation. |
| - Signal the page fill worker thread. |
| |
| Task Resumption |
| ^^^^^^^^^^^^^^^ |
| |
| For the non-blocking ``up_fillpage()``, the page fill worker thread will |
| detect that the page fill is complete when it is awakened with |
| ``g_pftcb`` non-NULL and fill completion status from ``pg_callback``. In |
| the non-blocking case, the page fill worker thread will know that the |
| page fill is complete when ``up_fillpage()`` returns. |
| |
| In this either, the page fill worker thread will: |
| |
| - Verify consistency of state information and ``g_pftcb``. |
| - Verify that the page fill completed successfully, and if so, |
| - Call ``up_unblocktask(g_pftcb)`` to make the task that just received |
| the fill ready-to-run. |
| - Check if the ``g_waitingforfill`` list is empty. If not: |
| |
| - Remove the highest priority task waiting for a page fill from |
| ``g_waitingforfill``, |
| - Save the task's TCB in ``g_pftcb``, |
| - If the priority of the thread in ``g_pftcb``, is higher in |
| priority than the default priority of the page fill worker thread, |
| then set the priority of the page fill worker thread to that |
| priority. |
| - Call ``pg_startfill()`` which will start the next fill (as |
| described above). |
| |
| - Otherwise, |
| |
| - Set ``g_pftcb`` to NULL. |
| - Restore the default priority of the page fill worker thread. |
| - Wait for the next fill related event (a new page fault). |
| |
| Architecture-Specific Support Requirements |
| ------------------------------------------ |
| |
| Memory Organization |
| ^^^^^^^^^^^^^^^^^^^ |
| |
| **Memory Regions**. Chip specific logic will map the virtual and |
| physical address spaces into three general regions: |
| |
| #. A .text region containing "`locked-in-memory <#MemoryOrg>`__" code |
| that is always available and will never cause a page fault. This |
| locked memory is loaded at boot time and remains resident for all |
| time. This memory regions must include: |
| |
| - All logic for all interrupt paths. All interrupt logic must be |
| locked in memory because the design present here will not support |
| page faults from interrupt handlers. This includes the page fault |
| handling logic and ```pg_miss()`` <#PageFaults>`__ that is called |
| from the page fault handler. It also includes the |
| ```pg_callback()`` <#FillComplete>`__ function that wakes up the |
| page fill worker thread and whatever architecture-specific logic |
| that calls ``pg_callback()``. |
| - All logic for the IDLE thread. The IDLE thread must always be |
| ready to run and cannot be blocked for any reason. |
| - All of the page fill worker thread must be locked in memory. This |
| thread must execute in order to unblock any thread waiting for a |
| fill. It this thread were to block, there would be no way to |
| complete the fills! |
| |
| #. A .text region containing pages that can be assigned allocated, |
| mapped to various virtual addresses, and filled from some mass |
| storage medium. |
| #. And a fixed RAM space for .bss, .text, and .heap. |
| |
| This memory organization is illustrated in the following table. Notice |
| that: |
| |
| - There is a one-to-one relationship between pages in the virtual |
| address space and between pages of .text in the non-volatile mass |
| storage device. |
| - There are, however, far fewer physical pages available than virtual |
| pages. Only a subset of physical pages will be mapped to virtual |
| pages at any given time. This mapping will be performed on-demand as |
| needed for program execution. |
| |
| ============================= ============================ ==================== |
| SRAM Virtual Address Space Non-Volatile Storage |
| ============================= ============================ ==================== |
| . DATA . |
| . Virtual Page *n* (*n* > *m*) Stored Page *n* |
| . Virtual Page *n-1* Stored Page *n-1* |
| DATA ... ... |
| Physical Page *m* (*m* < *n*) ... ... |
| Physical Page *m-1* ... ... |
| ... ... ... |
| Physical Page *1* Virtual Page *1* Stored Page *1* |
| Locked Memory Locked Memory Memory Resident |
| ============================= ============================ ==================== |
| |
| **Example**. As an example, suppose that the size of the SRAM is 192K |
| (as in the NXP LPC3131). And suppose further that: |
| |
| - The size of the locked, memory resident .text area is 32K, and |
| - The size of the DATA area is 64K. |
| - The size of one, managed page is 1K. |
| - The size of the whole .text image on the non-volatile, mass storage |
| device is 1024K. |
| |
| Then, the size of the locked, memory resident code is 32K (*m*\ =32 |
| pages). The size of the physical page region is 96K (96 pages), and the |
| size of the data region is 64 pages. And the size of the virtual paged |
| region must then be greater than or equal to (1024-32) or 992 pages |
| (*n*). |
| |
| **Building the Locked, In-Memory Image**. One way to accomplish this |
| would be a two phase link: |
| |
| - In the first phase, create a partially linked objected containing all |
| interrupt/exception handling logic, the page fill worker thread plus |
| all parts of the IDLE thread (which must always be available for |
| execution). |
| - All of the ``.text`` and ``.rodata`` sections of this partial link |
| should be collected into a single section. |
| - The second link would link the partially linked object along with the |
| remaining object to produce the final binary. The linker script |
| should position the "special" section so that it lies in a reserved, |
| "non-swappable" region. |
| |
| Architecture-Specific Functions |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Most standard, architecture-specific functions are declared in |
| ``include/nuttx/arch.h``. However, for the case of this paging logic, |
| the architecture specific functions are declared in |
| ``include/nuttx/page.h``. Standard, architecture-specific functions that |
| should already be provided in the architecture port are |
| :c:func:`up_switch_context`. New, additional functions that must be |
| implemented just for on-demand paging support are: |
| |
| .. c:function:: int up_checkmapping(FAR struct tcb_s *tcb) |
| |
| The function ``up_checkmapping()`` returns an indication if the page |
| fill still needs to performed or not. In certain conditions, the page |
| fault may occur on several threads and be queued multiple times. This |
| function will prevent the same page from be filled multiple times. |
| |
| .. c:function:: int up_allocpage(FAR struct tcb_s *tcb, FAR void *vpage) |
| |
| This architecture-specific function will set aside page in memory and |
| map to its correct virtual address. Architecture-specific context |
| information saved within the TCB will provide the function with the |
| information needed to identify the virtual miss address. This function |
| will return the allocated physical page address in ``vpage``. The size |
| of the underlying physical page is determined by the configuration |
| setting ``CONFIG_PAGING_PAGESIZE``. NOTE: This function must *always* |
| return a page allocation. If all available pages are in-use (the typical |
| case), then this function will select a page in-use, un-map it, and make |
| it available. |
| |
| .. c:function:: int up_fillpage(FAR struct tcb_s *tcb, FAR const void *vpage, void (*pg_callback)(FAR struct tcb_s *tcb, int result)) |
| |
| The actual filling of the page with data from the non-volatile, must be |
| performed by a separate call to the architecture-specific function, |
| ``up_fillpage()``. This will start asynchronous page fill. The common |
| paging logic will provide a callback function, ``pg_callback``, that |
| will be called when the page fill is finished (or an error occurs). This |
| callback is assumed to occur from an interrupt level when the device |
| driver completes the fill operation. |