Journaled Soft-Updates

Journaled Soft-Updates

Journaled Soft-updates Marshall Kirk McKusick Author and Consultant JeffRoberson Consultant ABSTRACT This paper describes the work to add ‘‘journaling lite’’tosoft updates and its incorporation into the FreeBSD fast filesystem. Because soft updates prevent most inconsistencies, the journal need only track those inconsistencies that soft updates fails to address. Specifically,the journal con- tains the information needed to recoverthe block and inode resources that have been freed but whose freed status failed to makeittodisk before a system failure. After acrash, a variant of the venerable fsck program runs through the journal to identify and free the lost resources. Only if an inconsistencybetween the log and filesystem is detected is it necessary to run fsck.The jour- nal is tiny, 16Mb is usually enough independent of filesystem size. Although journal processing needs to be done before restarting, the processing time is typically just a fewseconds and in the worst case a minute. It is not necessary to build a newfilesystem to use soft-updates journaling. The addition or deletion of soft-updates journaling to existing fast filesystems is done using the tunefs program. 2. Compatibility with Other Implementations Journaling is enabled via tunefs and only 1. Background and Introduction requires a fewspare superblock fields and 16Mb of The soft updates dependencytracking system free blocks for the journal. These minimal require- wasadopted by FreeBSD in 1998 as an alternative to ments makeiteasily enabled on existing FreeBSD the popular journaled-filesystem technique [Ganger & filesystems. The journal’sfilesystem blocks are Patt, 1994; McKusick, Bostic, Karels, & Quarterman, placed in an inode named .sujournal in the root of the 1996]. While the runtime performance and consis- filesystem and filesystem flags are set such that older tencyguarantees of soft updates are comparable to non-journaling kernels will trigger a full filesystem journaled filesystems [Seltzer et al, 2000], it relies on check upon mounting a previously journaled volume. an expensive and time-consuming background filesys- When mounting a journaled filesystem, older kernels tem recovery operation after a crash [McKusick, clear a flag indicating that journaling is being done so 2002]. This paper outlines a method for eliminating that when the filesystem is next encountered by a ker- the necessity of an expensive background or fore- nel that does journaling, it will knowthat that the ground whole-filesystem check operation through the journal is invalid and will ensure that the filesystem is use of a small journal which logs the only twoincon- consistent and clear the journal before resuming use sistencies possible in soft updates. The first is allo- of the filesystem. cated but unreferenced blocks; the second is incor- rectly high link counts. Incorrectly high link counts 3. Journal Format include unreferenced inodes that were being deleted The journal is kept as a circular log of segments and files that were unlinked but open [Ganger,McKu- containing records which describe metadata opera- sick, & Patt, 2000]. This journal allows a journal- tions. If the journal fills, the filesystem must complete analysis program to complete recovery in just a few enough operations to expire journal entries before seconds independent of filesystem size. allowing newoperations. In practice, the journal almost neverfills. 4. Modifications that RequireJournaling Each journal segment contains a unique The next subsections describe the operations sequence number and a timestamp which identifies that must be journaled so that the information needed the filesystem mount instance so old segments can be to clean up the filesystem is available to fsck. discarded during journal processing. Journal entries are aggregated into segments to minimize the number 4.1. Increased Link Count of writes to the journal. Each segment contains the Alink count may be increased through a hard last valid sequence number at the time it was written link or file creation. The link count is temporarily to allow fsck to recoverthe head and tail by scanning increased during a rename. Here, the operation is the the entire journal. Segments are variably sized as same. The inode number,parent inode number,direc- some multiple of the disk block size and are written tory offset, and initial link count are all recorded in atomically to avoid read/modify/write cycles in run- the journal. Soft updates guarantees that the inode ning filesystems. link count will be increased and stable on disk prior to The journal-analysis has been incorporated into anydirectory write. The journal write must occur the fsck program. This incorporation into the existing prior to the inode write that updates the link count and fsck program has several benefits. The existing startup prior to the bitmap write that allocates the inode if it is scripts already call fsck to see if it needs to be run in newly allocated. foreground or background. Forfilesystems running with journaled soft updates, fsck can request to run in 4.2. Decreased Link Count foreground and do the needed journaled operations The inode link count is decreased through before the filesystem is brought online. If the journal unlink or rename. The inode number,parent inode, fails for some reason, it can instead report that a full directory offset, and initial link count are all recorded fsck needs to be run as the traditional fallback. Thus, in the journal. The deleted directory entry is guaran- this newfunctionality can be introduced without any teed to be written before the link is adjusted down. need for system administrators to change the way that As with increasing the link count, the journal write theystart up their systems. Finally,the invoking of must happen prior to all other writes. fsck means that after the journal has been processed, it is possible for debugging purposes to fall through and 4.3. Unlink While Referenced run a complete check of the filesystem to ensure that the journal is working properly. Unlinked yet referenced files pose a unique problem for journaled filesystems. In UNIX, an The journal entry size is 32 bytes, providing inode’sstorage is not reclaimed until after the final quite a dense representation allowing for 16 entries name is removedand the last reference is closed. per-sector.The journal is created in a single area of Simply leaving the journal entry valid while waiting the filesystem in as contiguous an allocation as is for applications to close their dangling references is available. Weconsidered spreading it out across untenable as it will easily exhaust journal space. A cylinder groups to optimize locality for writes but it solution which scales to the total number of inodes in ended up being so small that this approach was not the filesystem is required. At least twoapproaches are practical and would makescanning the entire journal possible, a replication of the inode allocation bitmap, during cleanup too slow. or a linked list of inodes to be freed. We hav e chosen The journal blocks are claimed by a named to use the linked-list approach. immutable inode. This approach allows user-level In the linked-list case, which is employed by access to the journal for debugging and statistics gath- several filesystems (xfs, ext4, etc.), the super-block ering purposes as well as providing backwards com- contains the inode number that serves as the head of a patibility with older kernels that do not support jour- singly linked list of inodes to be freed, with each naling. Wehav e found that a journal size of 16Mb is inode storing a pointer to the next inode in the list. sufficient in eventhe most tortuous and worst-case The advantage of this approach is that at recovery benchmarks. A 16Mb journal can coverover500,000 time you need only examine a single pointer in the namespace operations or 8Gb of outstanding alloca- superblock which will already be in memory.The tions (assuming a standard 16Kb block size). disadvantage is that you must keep an in memory dou- bly-linked list so that you can rapidly remove aninode once it is unreferenced. This approach ingrains a filesystem-wide lock in the design and incurs non- local writes when maintaining the list. In practice we 5.1. Cylinder Group Rollbacks have found that unreferenced inodes occur rarely Soft updates previously did not require anyroll- enough that this approach is not a bottleneck. backs of cylinder groups as theywere always the first Removalfrom the list may be done lazily but or last write in a group of changes. When a block or must be completed prior to anyre-use of the inode. inode has been allocated but its journal record has not Additions to the list must be stable prior to reclaiming yet been written to disk, it is not safe to write the journal space for the final unlink but otherwise may be updated bitmaps and associated allocation informa- delayed long enough to avoid needing the write at all tion. The routines which write blocks with if the file is quickly closed. Addition and removal bmsafemap dependencies nowrollback anyalloca- involveonly a single write to update the preceding tions with unwritten journal operations. pointer to the subsequent inode. 5.2. Inode Rollbacks 4.4. Change of Directory Offset The inode link count must be rolled back to the Anytime a directory compaction movesan link count as it existed prior to anyunwritten journal entry,ajournal entry must be created indicating the entries. Allowing it to growbeyond this count would old and newlocations of the entry.The kernel does not cause filesystem corruption but it would prohibit not knowatthe time of the move whether a remove the journal recovery from adjusting the link count will followit, so at this time all offset changes are properly.Soft updates already prevents the link count journaled. Without this information fsck would be from decreasing before the directory entry is removed unable to disambiguate multiple revisions of the same as a premature decrement could cause filesystem cor- directory block.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us