| ======= |
| SMARTFS |
| ======= |
| |
| This page contains information about the implementation of the NuttX |
| Sector Mapped Allocation for Really Tiny (SMART) FLASH file system, SMARTFS. |
| |
| Features |
| ======== |
| |
| This implementation is a full-feature file system from the perspective of |
| file and directory access (i.e. not considering low-level details like the |
| lack of bad block management). The SMART File System was designed specifically |
| for small SPI based FLASH parts (1-8 Mbyte for example), though this is not |
| a limitation. It can certainly be used for any size FLASH and can work with |
| any MTD device by binding it with the SMART MTD layer and has been tested with |
| devices as large as 128MByte (using a 2048 byte sector size with 65534 sectors). |
| The FS includes support for: |
| |
| - Multiple open files from different threads. |
| - Open for read/write access with seek capability. |
| - Appending to end of files in either write, append or read/write open modes. |
| - Directory support. |
| - Support for multiple mount points on a single volume / partition (see details |
| below). |
| - Selectable FLASH Wear leveling algorithm |
| - Selectable CRC-8 or CRC-16 error detection for sector data |
| - Reduced RAM model for FLASH geometries with large number of sectors (16K-64K) |
| |
| General operation |
| ================= |
| |
| The SMART File System divides the FLASH device or partition into equal |
| sized sectors which are allocated and "released" as needed to perform file |
| read/write and directory management operations. Sectors are then "chained" |
| together to build files and directories. The operations are split into two |
| layers: |
| |
| 1. The MTD block layer (nuttx/drivers/mtd/smart.c). This layer manages |
| all low-level FLASH access operations including sector allocations, |
| logical to physical sector mapping, erase operations, etc. |
| 2. The FS layer (nuttx/fs/smart/smartfs_smart.c). This layer manages |
| high-level file and directory creation, read/write, deletion, sector |
| chaining, etc. |
| |
| SMART MTD Block layer |
| ===================== |
| |
| The SMART MTD block layer divides the erase blocks of the FLASH device into |
| "sectors". Sectors have both physical and logical number assignments. |
| The physicl sector number represents the actual offset from the beginning |
| of the device, while the logical sector number is assigned as needed. |
| A physical sector can have any logical sector assignment, and as files |
| are created, modified and destroyed, the logical sector number assignment |
| for a given physical sector will change over time. The logical sector |
| number is saved in the physical sector header as the first 2 bytes, and |
| the MTD layer maintains an in-memory map of the logical to physical mapping. |
| Only physical sectors that are in use will have a logical assignment. |
| |
| Also contained in the sector header is a flags byte and a sequence number. |
| When a sector is allocated, the COMMITTED flag will be "set" (changed from |
| erase state to non-erase state) to indicate the sector data is valid. When |
| a sector's data needs to be deleted, the RELEASED flag will be "set" to |
| indicate the sector is no longer in use. This is done because the erase |
| block containing the sector cannot necessarily be erased until all sectors |
| in that block have been "released". This allows sectors in the erase |
| block to remain active while others are inactive until a "garbage collection" |
| operation is needed on the volume to reclaim released sectors. |
| |
| The sequence number is used when a logical sector's data needs to be |
| updated with new information. When this happens, a new physical sector |
| will be allocated which has a duplicate logical sector number but a |
| higher sequence number. This allows maintaining flash consistency in the |
| event of a power failure by writing new data prior to releasing the old. |
| In the event of a power failure causing duplicate logical sector numbers, |
| the sector with the higher sequence number will win, and the older logical |
| sector will be released. |
| |
| The SMART MTD block layer reserves some logical sector numbers for internal |
| use, including:: |
| |
| Sector 0: The Format Sector. Has a format signature, format version, etc. |
| Also contains wear leveling information if enabled. |
| Sector 1-2: Additional wear-leveling info storage if needed. |
| Sector 3: The 1st (or only) Root Directory entry |
| Sector 4-10: Additional root directories when Multi-Mount points are supported. |
| Sector 11-12: Reserved |
| |
| To perform allocations, the SMART MTD block layer searches each erase block |
| on the device to identify the one with the most free sectors. Free sectors |
| are those that have all bytes in the "erased state", meaning they have not |
| been previously allocated/released since the last block erase. Not all |
| sectors on the device can be allocated ... the SMART MTD block driver must |
| reserve at least one erase-block worth of unused sectors to perform |
| garbage collection, which will be performed automatically when no free |
| sectors are available. When wear leveling is enabled, the allocator also takes |
| into account the erase block erasure status to maintain level wearing. |
| |
| Garbage collection is performed by identifying the erase block with the most |
| "released" sectors (those that were previously allocated but no longer being |
| used) and moving all still-active sectors to a different erase block. Then |
| the now "vacant" erase block is erased, thus changing a group of released |
| sectors into free sectors. This may occur several times depending on the |
| number of released sectors on the volume such that better "wear leveling" |
| is achieved. |
| |
| Standard MTD block layer functions are provided for block read, block write, |
| etc. so that system utilities such as the "dd" command can be used, |
| however, all SMART operations are performed using SMART specific ioctl |
| codes to perform sector allocate, sector release, sector write, etc. |
| |
| A couple of config items that the SMART MTD layer can take advantage of |
| in the underlying MTD drivers is SUBSECTOR_ERASE and BYTE_WRITE. Most |
| flash devices have a 32K to 128K Erase block size, but some of them |
| have a smaller erase size available also. Vendors have different names |
| for the smaller erase size; In the NuttX MTD layer it is called |
| SUBSECTOR_ERASE. For FLASH devices that support the smaller erase size, |
| this configuration item can be added to the underlying MTD driver, and |
| SMART will use it. As of the writing of this page, only the |
| drivers/mtd/m25px.c driver had support for SUBSECTOR_ERASE. |
| |
| The BYTE_WRITE config option enables use of the underlying MTD driver's |
| ability to write data a byte or a few bytes at a time vs. a full page |
| at at time (which is typically 256 bytes). For FLASH devices that support |
| byte write mode, support for this config item can be added to the MTD |
| driver. Enabling and supporting this feature reduces the traffic on the |
| SPI bus considerably because SMARTFS performs many operations that affect |
| only a few bytes on the device. Without BYTE_WRITE, the code must |
| perform a full page read-modify-write operation on a 256 or even 512 |
| byte page. |
| |
| Wear Leveling |
| ============= |
| |
| When wear leveling is enabled, the code automatically writes data across |
| the entire FLASH device in a manner that causes each erase block to be |
| worn (i.e. erased) evenly. This is accomplished by maintaining a 4-bit |
| wear level count for each erase block and forcing less worn blocks to be |
| used for writing new data. The code maintains each block's erase count |
| to be within 16 erases of each other, though through testing, the span |
| so far was never greater than 10 erases of each other. |
| |
| As the data in a block is modified repeatedly, the erase count will |
| increase. When the wear level reaches a value of 8 or higher, and the block |
| needs to be erased (because the data in it has been modified, etc.) the code |
| will select an erase block with the lowest wear count and relocate it to |
| this block (with the higher wear count). The idea being that a block with |
| the lowest wear count contains more "static" data and should require fewer |
| additional erase operations. This relocation process will continue on the |
| block (only when it needs to be erased again). |
| |
| When the wear level of all erase blocks has increased to a level of |
| SMART_WEAR_MIN_LEVEL (currently set to 5), then the wear level counts |
| will all be reduced by this value. This keeps the wear counts normalized |
| so they fit in a 4-bit value. Note that theoretically, it *IS* possible to |
| write data to the flash in a manner that causes the wear count of a single |
| erase block to increment beyond it's maximum value of 15. This would have |
| to be a very, very, very specific and un-predictable write sequence though |
| as data is always spread out across the sectors and relocated dynamically. |
| In the extremely rare event this does occur, the code will automatically |
| cap the maximum wear level at 15 an increment an "uneven wear count" |
| variable to indicate the number times this event has occurred. So far, I |
| have not been able to get the wear count above 10 though my testing. |
| |
| The wear level status bits are saved in the format sector (logical sector |
| number zero) with overflow saved in the reserved logical sectors one and |
| two. Additionally, the uneven wear count (and total block erases if |
| PROCFS is enabled) are stored in the format sector. When the PROCFS file |
| system is enabled and a SMARTFS volume is mounted, the SMART block driver |
| details and / or wear level details can be viewed with a command such as:: |
| |
| cat /proc/fs/smartfs/smart0/status |
| Format version: 1 |
| Name Len: 16 |
| Total Sectors: 2048 |
| Sector Size: 512 |
| Format Sector: 1487 |
| Dir Sector: 8 |
| Free Sectors: 67 |
| Released Sectors: 572 |
| Unused Sectors: 817 |
| Block Erases: 5680 |
| Sectors Per Block: 8 |
| Sector Utilization:98% |
| Uneven Wear Count: 0 |
| |
| cat /proc/fs/smartfs/smart0/erasemap |
| DDDCGCCDDCDCCDCBDCCDDGBBDBCDCCDDDCDDDDCCDDCCCGCGDCCDBCDDGBDBDCDD |
| BCCCDDCCDDDCBCCDGCCCBDDCCGBBCBCCGDCCDCBDBCCCDCDDCDDGCDCGDCBCDBDG |
| BCDDCDCBGCCCDDCGBCCGBCCBDDBDDCGDCDDDCGCDDBCDCBDDBCDCGDDCCBCGBCCC |
| GCBCCGCCCDDDBGCCCCGDCCCCCDCDDGBBDACABDBBABCAABCCCDAACBADADDDAECB |
| |
| Enabling wear leveling can increase the total number of block erases on the |
| device in favor of even wearing (erasing). This is caused by writing / |
| moving sectors that otherwise don't need to be written to move static data |
| to the more highly worn blocks. This additional write requirement is known |
| as write amplification. To get an idea of the amount of write amplification |
| incurred by enabling wear leveling, I conducted the smart_test example using |
| four different configurations (wear, no wear, CRC-8, no CRC) and the results |
| are shown below. This was done on a 1M Byte simulated FLASH with 4K erase |
| block size, 512 sectors per byte. The smart_test creates a 700K file and |
| then performs 20,000 random seek, write, verify tests. The seek write forces |
| a multitude of sector relocation operations (with or without CRC enabled), |
| causing a boatload of block erases. |
| |
| Enabling wear leveling actually decreased the number of erase operations |
| with CRC enabled or disabled. This is only a single test point based one |
| testing method ... results will likely vary based on the method the data |
| is written, the amount of static vs. dynamic data, the amount of free space |
| on the volume, and the volume geometry (erase block size, sector size, etc.). |
| |
| The results of the tests are:: |
| |
| Case Total Block erases |
| ================================================ |
| No wear leveling CRC-8 6632 |
| Wear leveling CRC-8 5585 |
| |
| No wear leveling no CRC 6658 |
| Wear leveling no CRC 5398 |
| |
| Reduced RAM model |
| ================= |
| |
| On devices with a larger number of logical sectors (i.e. a lot of erase |
| blocks with a small selected sector size), the RAM requirement can become |
| fairly significant. This is caused by the in-memory sector map which |
| keeps track of the logical to physical mapping of all sectors. This is |
| a RAM array which is 2 * totalsectors in size. For a device with 64K |
| sectors, this means 128K of RAM is required just for the sector map, not |
| counting RAM for read/write buffers, erase block management, etc. |
| |
| So a reduced RAM model has been added which only keeps track of which |
| logical sectors have been used (a table which is totalsectors / 8 in size) |
| and a configurable sized sector map cache. Each entry in the sector map |
| cache is 6 bytes (logical sector, physical sector and cache entry age). |
| ON DEVICES WITH SMALLER TOTAL SECTOR COUNT, ENABLING THIS OPTION COULD |
| ACTUALLY INCREASE THE RAM FOOTPRINT INSTEAD OF REDUCE IT. |
| |
| The sector map cache size should be selected to balance the desired RAM |
| usage and the file system performance. When a logical to physical sector |
| mapping is not found in the cache, the code must perform a physical search |
| of the FLASH to find the requested logical sector. This involves reading |
| the 5-byte header from each sector on the device until the sector is |
| found. Performing a full read, seek or open for append on a large file |
| can cause the sector map cache to flush completely if the file is larger |
| than (cache entries * sector size). For example, in a configuration with |
| 256 cache entries and a 512 byte sector size, a full read, seek or open for |
| append on a 128K file will flush the cache. |
| |
| An additional RAM savings is realized on FLASH parts that contain 16 or |
| fewer logical sectors per erase block by packing the free and released |
| sector counts into a single byte (plus a little extra for 16 sectors per |
| erase block). A device with a 64K erase block size can benefit from this |
| savings by selecting a 4096 or 8192 byte logical sector size, for example. |
| |
| SMART FS Layer |
| ============== |
| |
| This layer interfaces with the SMART MTD block layer to allocate / release |
| logical sectors, create and destroy sector chains, and perform directory and |
| file I/O operations. Each directory and file on the volume is represented |
| as a chain or "linked list" of logical sectors. Thus the actual physical |
| sectors that a give file or directory uses does not need to be contiguous |
| and in fact can (and will) move around over time. To manage the sector |
| chains, the SMARTFS layer adds a "chain header" after the sector's "sector |
| header". This is a 5-byte header which contains the chain type (file or |
| directory), a "next logical sector" entry and the count of bytes actually |
| used within the sector. |
| |
| Files are stored in directories, which are sector chains that have a |
| specific data format to track file names and "first" logical sector |
| numbers. Each file in the directory has a fixed-size "directory entry" |
| that has bits to indicate if it is still active or has been deleted, file |
| permission bits, first sector number, date (utc stamp), and filename. The |
| filename length is set from the CONFIG_SMARTFS_NAMLEN config value at the |
| time the mksmartfs command is executed. Changes to the |
| CONFIG_SMARTFS_NAMLEN parameter will not be reflected on the volume |
| unless it is reformatted. The same is true of the sector size parameter. |
| |
| Subdirectories are supported by creating a new sector chain (of type |
| directory) and creating a standard directory entry for it in it's parent |
| directory. Then files and additional sub-directories can be added to |
| that directory chain. As such, each directory on the volume will occupy |
| a minimum of one sector on the device. Subdirectories can be deleted |
| only if they are "empty" (i.e they reference no active entries). There |
| are no provision made for performing a recursive directory delete. |
| |
| New files and subdirectories can be added to a directory without needing |
| to copy and release the original directory sector. This is done by |
| writing only the new entry data to the sector and ignoring the "bytes |
| used" field of the chain header for directories. Updates (modifying |
| existing data) or appending to a sector for regular files requires copying |
| the file data to a new sector and releasing the old one. |
| |
| SMARTFS organization |
| ==================== |
| |
| The following example assumes 2 logical blocks per FLASH erase block. The |
| actual relationship is determined by the FLASH geometry reported by the MTD |
| driver:: |
| |
| ERASE LOGICAL Sectors begin with a sector header. Sectors may |
| BLOCK SECTOR CONTENTS be marked as "released," pending garbage collection |
| n 2*n --+---------------+ |
| Sector Hdr |LLLLLLLLLLLLLLL| Logical sector number (2 bytes) |
| |QQQQQQQQQQQQQQQ| Sequence number (2 bytes) |
| |SSSSSSSSSSSSSSS| Status bits (1 byte) |
| +---------------+ |
| FS Hdr |TTTTTTTTTTTTTTT| Sector Type (dir or file) (1 byte) |
| |NNNNNNNNNNNNNNN| Number of next logical sector in chain |
| |UUUUUUUUUUUUUUU| Number of bytes used in this sector |
| | | |
| | | |
| | (Sector Data) | |
| | | |
| | | |
| 2*n+1 --+---------------+ |
| Sector Hdr |LLLLLLLLLLLLLLL| Logical sector number (2 bytes) |
| |QQQQQQQQQQQQQQQ| Sequence number (2 bytes) |
| |SSSSSSSSSSSSSSS| Status bits (1 byte) |
| +---------------+ |
| FS Hdr |TTTTTTTTTTTTTTT| Sector Type (dir or file) (1 byte) |
| |NNNNNNNNNNNNNNN| Number of next logical sector in chain |
| |UUUUUUUUUUUUUUU| Number of bytes used in this sector |
| | | |
| | | |
| | (Sector Data) | |
| | | |
| | | |
| n+1 2*(n+1) --+---------------+ |
| Sector Hdr |LLLLLLLLLLLLLLL| Logical sector number (2 bytes) |
| |QQQQQQQQQQQQQQQ| Sequence number (2 bytes) |
| |SSSSSSSSSSSSSSS| Status bits (1 byte) |
| +---------------+ |
| FS Hdr |TTTTTTTTTTTTTTT| Sector Type (dir or file) (1 byte) |
| |NNNNNNNNNNNNNNN| Number of next logical sector in chain |
| |UUUUUUUUUUUUUUU| Number of bytes used in this sector |
| | | |
| | | |
| | (Sector Data) | |
| | | |
| | | |
| --+---------------+ |
| |
| Headers |
| ======= |
| ``SECTOR HEADER`` |
| Each sector contains a header (currently 5 bytes) for identifying the |
| status of the sector. The header contains the sector's logical sector |
| number mapping, an incrementing sequence number to manage changes to |
| logical sector data, and sector flags (committed, released, version, etc.). |
| At the block level, there is no notion of sector chaining, only |
| allocated sectors within erase blocks. |
| |
| ``FORMAT HEADER`` |
| Contains information regarding the format on the volume, including |
| a format signature, formatted block size, name length within the directory |
| chains, etc. |
| |
| ``CHAIN HEADER`` |
| The file system header (next 5 bytes) tracks file and directory sector |
| chains and actual sector usage (number of bytes that are valid in the |
| sector). Also indicates the type of chain (file or directory). |
| |
| Multiple Mount Points |
| ===================== |
| |
| Typically, a volume contains a single root directory entry (logical sector |
| number 1) and all files and subdirectories are "children" of that root |
| directory. This is a traditional scheme and allows the volume to |
| be mounted in a single location within the VFS. As a configuration |
| option, when the volume is formatted via the mksmartfs command, multiple |
| root directory entries can be created instead. The number of entries to |
| be created is an added parameter to the mksmartfs command in this |
| configuration. |
| |
| When this option has been enabled in the configuration and specified |
| during the format, then the volume will have multiple root directories |
| and can support a mount point in the VFS for each. In this mode, |
| the device entries reported in the /dev directory will have a directory |
| number postfixed to the name, such as:: |
| |
| /dev/smart0d1 |
| /dev/smart0d2 |
| /dev/smart1p1d1 |
| /dev/smart1p2d2 |
| etc. |
| |
| Each device entry can then be mounted at different locations, such as:: |
| |
| /dev/smart0d1 --> /usr |
| /dev/smart0d2 --> /home |
| etc. |
| |
| Using multiple mount points is slightly different from using partitions |
| on the volume in that each mount point has the potential to use the |
| entire space on the volume vs. having a pre-allocated reservation of |
| space defined by the partition sizes. Also, all files and directories |
| of all mount-points will be physically "mixed in" with data from the |
| other mount-points (though files from one will never logically "appear" |
| in the others). Each directory structure is isolated from the others, |
| they simply share the same physical media for storage. |
| |
| SMARTFS Limitations |
| =================== |
| |
| This implementation has several limitations that you should be aware |
| before opting to use SMARTFS: |
| |
| 1. There is currently no FLASH bad-block management code. The reason for |
| this is that the FS was geared for Serial NOR FLASH parts. To use |
| SMARTFS with a NAND FLASH, bad block management would need to be added, |
| along with a few minor changes to eliminate single bit writes to release |
| a sector, etc. |
| |
| 2. The implementation can support CRC-8 or CRC-16 error detection, and can |
| relocate a failed write operation to a new sector. However with no bad |
| block management implementation, the code will continue it attempts at |
| using failing block / sector, reducing efficiency and possibly successfully |
| saving data in a block with questionable integrity. |
| |
| 3. The released-sector garbage collection process occurs only during a write |
| when there are no free FLASH sectors. Thus, occasionally, file writing |
| may take a long time. This typically isn't noticeable unless the volume |
| is very full and multiple copy / erase cycles must be performed to |
| complete the garbage collection. |
| |
| 4. The total number of logical sectors on the device must be 65534 or less. |
| The number of logical sectors is based on the total device / partition |
| size and the selected sector size. For larger flash parts, a larger |
| sector size would need to be used to meet this requirement. Creating a |
| geometry which results in 65536 sectors (a 32MByte FLASH with 512 byte |
| logical sector, for example) will cause the code to automatically reduce |
| the total sector count to 65534, thus "wasting" the last two logical |
| sectors on the device (they will never be used). |
| |
| This restriction exists because: |
| |
| a. The logical sector number is a 16-bit field (i.e. 65535 is the max). |
| b. Logical sector number 65535 (0xFFFF) is reserved as this is typically |
| the "erased state" of the FLASH. |
| |
| ioctls |
| ====== |
| |
| ``BIOC_LLFORMAT`` |
| Performs a SMART low-level format on the volume. This erases the volume |
| and writes the FORMAT HEADER to the first physical sector on the volume. |
| |
| ``BIOC_GETFORMAT`` |
| Returns information about the format found on the volume during the |
| "scan" operation which is performed when the volume is mounted. |
| |
| ``BIOC_ALLOCSECT`` |
| Allocates a logical sector on the device. |
| |
| ``BIOC_FREESECT`` |
| Frees a logical sector that had been previously allocated. This |
| causes the sector to be marked as "released" and possibly causes the |
| erase block to be erased if it is the last active sector in the |
| it's erase block. |
| |
| ``BIOC_READSECT`` |
| Reads data from a logical sector. This uses a structure to identify |
| the offset and count of data to be read. |
| |
| ``BIOC_WRITESECT`` |
| Writes data to a logical sector. This uses a structure to identify |
| the offset and count of data to be written. May cause a logical |
| sector to be physically relocated and may cause garbage collection |
| if needed when moving data to a new physical sector. |
| |
| Things to Do |
| ============ |
| |
| - Add file permission checking to open / read / write routines. |
| - Add reporting of actual FLASH usage for directories (each directory |
| occupies one or more physical sectors, yet the size is reported as |
| zero for directories). |