On disk, each area is prefixed with the following header:
/** On-disk representation of an area header. */ struct nffs_disk_area { uint32_t nda_magic[4]; /* NFFS_AREA_MAGIC{0,1,2,3} */ uint32_t nda_length; /* Total size of area, in bytes. */ uint8_t nda_ver; /* Current nffs version: 0 */ uint8_t nda_gc_seq; /* Garbage collection count. */ uint8_t reserved8; uint8_t nda_id; /* 0xff if scratch area. */ };
Beyond its header, an area contains a sequence of disk objects, representing the contents of the file system. There are two types of objects: inodes and data blocks. An inode represents a file or directory; a data block represents part of a file's contents.
/** On-disk representation of an inode (file or directory). */ struct nffs_disk_inode { uint32_t ndi_magic; /* NFFS_INODE_MAGIC */ uint32_t ndi_id; /* Unique object ID. */ uint32_t ndi_seq; /* Sequence number; greater supersedes lesser. */ uint32_t ndi_parent_id; /* Object ID of parent directory inode. */ uint8_t reserved8; uint8_t ndi_filename_len; /* Length of filename, in bytes. */ uint16_t ndi_crc16; /* Covers rest of header and filename. */ /* Followed by filename. */ };
An inode filename's length cannot exceed 256 bytes. The filename is not null-terminated. The following ASCII characters are not allowed in a filename:
/** On-disk representation of a data block. */ struct nffs_disk_block { uint32_t ndb_magic; /* NFFS_BLOCK_MAGIC */ uint32_t ndb_id; /* Unique object ID. */ uint32_t ndb_seq; /* Sequence number; greater supersedes lesser. */ uint32_t ndb_inode_id; /* Object ID of owning inode. */ uint32_t ndb_prev_id; /* Object ID of previous block in file; NFFS_ID_NONE if this is the first block. */ uint16_t ndb_data_len; /* Length of data contents, in bytes. */ uint16_t ndb_crc16; /* Covers rest of header and data. */ /* Followed by 'ndb_data_len' bytes of data. */ };
Each data block contains the ID of the previous data block in the file. Together, the set of blocks in a file form a reverse singly-linked list.
The maximum number of data bytes that a block can contain is determined at initialization-time. The result is the greatest number which satisfies all of the following restrictions:
The 2048 number was chosen somewhat arbitrarily, and may change in the future.
All disk objects have a unique 32-bit ID. The ID space is partitioned as follows:
ID range | Node type |
---|---|
0x00000000 - 0x0fffffff | Directory inodes |
0x10000000 - 0x7fffffff | File inodes |
0x80000000 - 0xfffffffe | Data blocks |
0xffffffff | Reserved (NFFS_ID_NONE) |
A valid nffs file system must contain a single “scratch area.” The scratch area does not contain any objects of its own, and is only used during garbage collection. The scratch area must have a size greater than or equal to each of the other areas in flash.
Every object in the file system is stored in a 256-entry hash table. An object's hash key is derived from its 32-bit ID. Each list in the hash table is sorted by time of use; most-recently-used is at the front of the list. All objects are represented by the following structure:
/** * What gets stored in the hash table. Each entry represents a data block or * an inode. */ struct nffs_hash_entry { SLIST_ENTRY(nffs_hash_entry) nhe_next; uint32_t nhe_id; /* 0 - 0x7fffffff if inode; else if block. */ uint32_t nhe_flash_loc; /* Upper-byte = area idx; rest = area offset. */ };
For each data block, the above structure is all that is stored in RAM. To acquire more information about a data block, the block header must be read from flash.
Inodes require a fuller RAM representation to capture the structure of the file system. There are two types of inodes: files and directories. Each inode hash entry is actually an instance of the following structure:
/** Each inode hash entry is actually one of these. */ struct nffs_inode_entry { struct nffs_hash_entry nie_hash_entry; SLIST_ENTRY(nffs_inode_entry) nie_sibling_next; union { struct nffs_inode_list nie_child_list; /* If directory */ struct nffs_hash_entry *nie_last_block_entry; /* If file */ }; uint8_t nie_refcnt; };
A directory inode contains a list of its child files and directories (fie_child_list). These entries are sorted alphabetically using the ASCII character set.
A file inode contains a pointer to the last data block in the file (nie_last_block_entry). For most file operations, the reversed block list must be walked backwards. This introduces a number of speed inefficiencies:
Furthermore, obtaining information about any constituent data block requires a separate flash read.
The speed issues are addressed by a pair of caches. Cached inodes entries contain the file length and a much more convenient doubly-linked list of cached data blocks. The benefit of using caches is that the size of the caches need not be proportional to the size of the file system. In other words, caches can address speed efficiency concerns without negatively impacting the file system's scalability.
nffs requires both caches during normal operation, so it is not possible to disable them. However, the cache sizes are configurable, and both caches can be configured with a size of one if RAM usage must be minimized.
The following data structures are used in the inode and data block caches.
/** Full data block representation; not stored permanently in RAM. */ struct nffs_block { struct nffs_hash_entry *nb_hash_entry; /* Points to real block entry. */ uint32_t nb_seq; /* Sequence number; greater supersedes lesser. */ struct nffs_inode_entry *nb_inode_entry; /* Owning inode. */ struct nffs_hash_entry *nb_prev; /* Previous block in file. */ uint16_t nb_data_len; /* # of data bytes in block. */ uint16_t reserved16; };
/** Represents a single cached data block. */ struct nffs_cache_block { TAILQ_ENTRY(nffs_cache_block) ncb_link; /* Next / prev cached block. */ struct nffs_block ncb_block; /* Full data block. */ uint32_t ncb_file_offset; /* File offset of this block. */ };
/** Full inode representation; not stored permanently in RAM. */ struct nffs_inode { struct nffs_inode_entry *ni_inode_entry; /* Points to real inode entry. */ uint32_t ni_seq; /* Sequence number; greater supersedes lesser. */ struct nffs_inode_entry *ni_parent; /* Points to parent directory. */ uint8_t ni_filename_len; /* # chars in filename. */ uint8_t ni_filename[NFFS_SHORT_FILENAME_LEN]; /* First 3 bytes. */ };
/** Doubly-linked tail queue of cached blocks; contained in cached inodes. */ TAILQ_HEAD(nffs_block_cache_list, nffs_block_cache_entry); /** Represents a single cached file inode. */ struct nffs_cache_inode { TAILQ_ENTRY(nffs_cache_inode) nci_link; /* Sorted; LRU at tail. */ struct nffs_inode nci_inode; /* Full inode. */ struct nffs_cache_block_list nci_block_list; /* List of cached blocks. */ uint32_t nci_file_size; /* Total file size. */ };
Only file inodes are cached; directory inodes are never cached.
Within a cached inode, all cached data blocks are contiguous. E.g., if the start and end of a file are cached, then the middle must also be cached. A data block is only cached if its owning file is also cached.
Internally, cached inodes are stored in a singly-linked list, ordered by time of use. The most-recently-used entry is the first element in the list. If a new inode needs to be cached, but the inode cache is full, the least-recently-used entry is freed to make room for the new one. The following operations cause an inode to be cached:
The following operations cause a data block to be cached:
If one of the above operations is applied to a data block that is not currently cached, nffs uses the following procedure to cache the necessary block:
If the system is unable to allocate a cached block entry at any point during the above procedure, the system frees up other blocks currently in the cache. This is accomplished as follows:
Because the system imposes a minimum block cache size of one, the above procedure will always reclaim at least one cache block entry. The above procedure may result in the freeing of the block list that belongs to the very inode being operated on. This is OK, as the final block to get cached is always the block being requested.
The file system detection process consists of scanning a specified set of flash regions for valid nffs areas, and then populating the RAM representation of the file system with the detected objects. Detection is initiated with the nffs_detect() function.
Not every area descriptor passed to nffs_detect()
needs to reference a valid nffs area. Detection is successful as long as a complete file system is detected somewhere in the specified regions of flash. If an application is unsure where a file system might be located, it can initiate detection across the entire flash region.
A detected file system is valid if:
During detection, each indicated region of flash is checked for a valid area header. The contents of each valid non-scratch area are then restored into the nffs RAM representation. The following procedure is applied to each object in the area:
If nffs encounters an object that cannot be identified (i.e., its magic number is not valid), it scans the remainder of the flash area for the next valid magic number. Upon encountering a valid object, nffs resumes the procedure described above.
After all areas have been restored, a sweep is performed across the entire RAM representation so that invalid inodes can be deleted from memory.
For each directory inode:
For each file inode:
When an object is deleted during this sweep, it is only deleted from the RAM representation; nothing is written to disk.
When objects are migrated to the lost+found directory, their parent inode reference is permanently updated on the disk.
In addition, a single scratch area is identified during the detection process. The first area whose fda_id value is set to 0xff is designated as the file system scratch area. If no valid scratch area is found, the cause could be that the system was restarted while a garbage collection cycle was in progress. Such a condition is identified by the presence of two areas with the same ID. In such a case, the shorter of the two areas is erased and designated as the scratch area.
A new nffs file system is created via formatting. Formatting is achieved via the nffs_format() function.
During a successful format, an area header is written to each of the specified locations. One of the areas in the set is designated as the initial scratch area.
The nffs implementation always writes in a strictly sequential fashion within an area. For each area, the system keeps track of the current offset. Whenever an object gets written to an area, it gets written to that area‘s current offset, and the offset is increased by the object’s disk size.
When a write needs to be performed, the nffs implementation selects the appropriate destination area by iterating though each area until one with sufficient free space is encountered.
There is no write buffering. Each call to a write function results in a write operation being sent to the flash hardware.
Whenever a new object is written to disk, it is assigned the following properties:
When a new file or directory is created, a corresponding inode is written to flash. Likewise, a new data block also results in the writing of a corresponding disk object.
When a file or directory is moved or renamed, its corresponding inode is rewritten to flash with the following properties:
Because the inode's ID is unchanged, all dependent objects remain valid.
When a file or directory is unlinked from its parent directory, a deletion record for the unlinked inode gets written to flash. The deletion record is an inode with the following properties:
When an inode is unlinked, no deletion records need to be written for the inode's dependent objects (constituent data blocks or child inodes). During the next file system detection, it is recognized that the objects belong to a deleted inode, so they are not restored into the RAM representation.
If a file has an open handle at the time it gets unlinked, application code can continued to use the file handle to read and write data. All files retain a reference count, and a file isn't deleted from the RAM representation until its reference code drops to 0. Any attempt to open an unlinked file fails, even if the file is referenced by other file handles.
The following procedure is used whenever the application code writes to a file. First, if the write operation specifies too much data to fit into a single block, the operation is split into several separate write operations. Then, for each write operation:
Appended data can only be written to the end of the file. That is, “holes” are not supported.
When the file system is too full to accommodate a write operation, the system must perform garbage collection to make room. The garbage collection procedure is described below: