International Journal of Trend in Research and Development, Volume 3(2), ISSN: 2394-9333 www.ijtrd.com Survey of Cleaning Policies in Log Structured File Systems 1Abhishek Marathe, 2Prof.Pravin Patil, 3Akshay Borse, 4Chetan Satpute and 5Gopal Heda, 1,2,3,4,5Computer Engineering Department, Pune Institute of Computer Technology, Pune, India

Abstract: Over the last decade CPU speeds have increased dramatically while disk access times have improved slowly. This trend is likely to be continued in the future and it will cause more and more applications to become disk bound. Hence to lessen the impact of this problem, a new disk storage management technique called Log Structured file-system has been devised. This uses disk as an order of magnitude more efficiently than the current file systems. Segment cleaning policies have been a topic of research for over two decades, but still the most effective segment cleaning policy, cost-benefit, was invented in the late ’80s. Cost-benefit policy is only dependent on throughput, but there are other metrics which are necessary to be considered. Figure 1: NILFS Volume Structure So purpose of this paper is to discuss comparison between The payload blocks are organized per file, and each file consists performances of various log structured file systems like NIL-FS of data blocks and B-tree node blocks: and F2FS. We also discuss the policies used in such log structured policies. After analyzing these performances and identifying their limitations, we conclude with several promising directions for future research.

Keywords: , File-Systems Management, Figure 2: NILFS Payload Block Structure Garbage Collection, Storage. The organization of the blocks is recorded in the summary I. INTRODUCTION information blocks, which contains a header structure, per file structures and per block structures In log structured file system, data is laid out on disk in an append-only manner. Since writes only happen at cursor position, the disk space has to be reclaimed following a garbage collection mechanism called segment cleaning. Segment cleaning has to be continuously performed in the background unlike Figure 3: NILFS Block Organization defragmentation of conventional file systems which is usually The logs include regular files, directory files, files user invoked. Continuous segment cleaning is a necessity of the and several meta data files. The meta data files are the files used log structured file systems in order to function but at the same to maintain file system meta data. The current version of NILFS2 time it is a costly operation and requires intelligent policies and uses the following meta data files: efficient design. 1. Inode file (ifile) A. New Implementation of Log Structured File Systems: NILFS 2. Checkpoint file (cpfile) NILFS is a log structured file system implementation of the linux 3. Segment usage file (sufile) kernel. NILFS supports continuous snapshotting. Snapshots can 4. Data address translation file. be used to restore the past state of the NILFS file system. NILFS The following figure shows a typical organization of the logs: is a B-Tree based file system which uses B-tree data structure for inode and file management. The cleaning policies used for segment cleaning are Cost-Benefit and Greedy policies. However background cleaning uses Cost-Benefit policy whereas foreground cleaning uses Greedy policy. Figure 4: NILFS Log Structure NILFS volume is equally divided into a number of segments B. Flash Friendly File System: F2FS except for the super block (SB) and segment. A segment is the container of logs. Each log is composed of summary information F2FS (Flash-Friendly File System) is a Linux file system blocks, payload blocks, and an optional super root block (SR) designed to perform well on modern flash storage devices. This file system is build on the principles used in LFS i.e. append- only logging. Considerations made while designing F2FS were: 1. Flash-friendly on disk layout: F2FS employs three configurable units: segment, section and zone. It allocates IJTRD | Mar-Apr 2016 Available [email protected] 46 International Journal of Trend in Research and Development, Volume 3(2), ISSN: 2394-9333 www.ijtrd.com storage blocks in the unit of segments from a number of NAND is organized and managed. As illustrated in individual zones. It performs “clean-ing” in the unit of Figure 1, F2FS divides the whole volume into fixed-size section. These units are introduced to align with the segments. The segment is a basic unit of management in F2FS underlying FTL’s operational units to avoid unnecessary and is used to determine the initial file system metadata layout. A (yet costly) data copying. section is comprised of consecutive segments, and a zone 2. Cost-effective index structure: LFS writes data and index consists of a series of sections. These units are important during blocks to newly allocated free space. If a leaf data block is logging and cleaning. F2FS splits the entire volume into 6 parts updated (and written to somewhere), its direct index block as follows: should be updated, too. Once the direct index block is 1. Superblock (SB): Superblock has basic partition written, again its indirect index block should be updated. information and default parameters of F2FS, which are Such recursive updates result in a chain of writes, creating given at the format time and not changeable. the “wandering tree” problem. In order to attack this 2. Checkpoint (CP): Checkpoint keeps the file system status, problem, novel index table called node address table is bitmaps for valid NAT/SIT sets (see below), orphan inode proposed. lists and summary entries of currently active segments. A 3. Multi-head logging: They have devised an effective hot/cold successful “checkpoint pack” should store a consistent F2FS data separation scheme applied during logging time (i.e., status at a given point of time—a recovery point after a block allocation time). It runs multiple active log segments sudden power-off event. The CP area stores two checkpoint concurrently and appends data and metadata to separate log packs across the two segments (0 and 1): one for the last segments based on their anticipated update frequency. Since stable version and the other for the intermediate (obsolete) the flash storage devices exploit media parallelism, multiple version, alternatively. active segments can run simultaneously without frequent 3. Segment Information Table (SIT): SIT contains persegment management operations, making performance degradation information such as the number of valid blocks and the due to multiple logging (vs. single-segment logging) bitmap for the validity of all blocks in the “Main” area (see insignificant. below). The SIT information is retrieved to select victim 4. Adaptive logging: F2FS builds basically on append-only segments and identify valid blocks in them during the logging to turn random writes into sequential ones. At high cleaning process. storage utilization, however, it changes the logging strategy 4. Node Address Table (NAT): NAT is a block address table to threaded logging to avoid long write latency. In essence, to locate all the “node blocks” stored in the Main area. threaded logging writes new data to free space in a dirty 5. Segment Summary Area (SSA): SSA stores summary segment without cleaning it in the foreground. This strategy entries representing the owner information of all blocks in works well on modern flash devices but may not do so on the Main area, such as parent inode number and its HDDs. node/data offsets. The SSA entries identify parent node 5. fsync acceleration with roll-forward recovery: F2FS blocks before migrating valid blocks during cleaning. optimizes small synchronous writes to reduce the latency of 6. Main Area: Main Area is filled with 4KB blocks. Each fsync requests, by minimizing required metadata writes and block is allocated and typed to be node or data. A node recovering synchronized data with an efficient roll-forward block contains inode or indices of data blocks, while a data mechanism. block contains either directory or user file data. Note that a section does not store data and node blocks simultaneously. II. PROBLEM DEFINITION Comparison between the performances of cleaning policies of the log structured file systems i.e. NILFS and F2FS. By taking throughput into consideration, compile-bench tool is used for Figure 5: On Disk Layout of LFS comparison.

III. SURVEY WITH CRITICAL ANALYSIS A. Cleaning in F2FS: Cleaning is a process to reclaim scattered and invalidated blocks, and secures free segments for further logging. Because cleaning occurs constantly once the underlying storage capacity has been filled up, limiting the costs related with cleaning is extremely important for the sustained performance of F2FS. In F2FS, cleaning is done in the unit of a section. F2FS performs cleaning in two distinct manners, foreground and background. Foreground cleaning is triggered only when there are not enough free sections, while a kernel thread wakes up periodically to conduct

cleaning in background. F2FS uses greedy strategy for Figure 6: File System for F2FS foreground cleaning and it used cost-benefit policy for

background cleaning. Design of Flash-Friendly File System(F2FS): The on-disk data structures of F2FS are carefully laid out to match how underlying A whole cleaning process in F2FS can be described as follows: IJTRD | Mar-Apr 2016 Available [email protected] 47 International Journal of Trend in Research and Development, Volume 3(2), ISSN: 2394-9333 www.ijtrd.com 1. Victim Selection: The cleaning process starts first to identify a Instead it uses a heuristic approach to prevent the worst case victim section among non-empty sections. There are two well- scenario. The performance is still significantly better than known policies for victim selection during F2FS cleaning— timestamp for my benchmarks. greedy and cost-benefit. The greedy policy selects a section with The worst case scenario is, the following: the smallest number of valid blocks. Intuitively, this policy 1. Segment 1 is written controls overheads of migrating valid blocks. F2FS adopts the 2. Snapshot is created greedy policy for its foreground cleaning to minimize the latency 3. GC tries to reclaim Segment 1, but all blocks are visible to applications. Moreover, F2FS reserves a small unused protected by the Snapshot. The GC has to set the number capacity (5% of the storage space by default) so that the cleaning of live blocks to maximum to avoid reclaiming this process has room for adequate operation at high storage Segment again in the near future. utilization levels. On the other hand, the cost-benefit policy is 4. Snapshot is deleted practiced in the background cleaning process of F2FS. This 5. Segment 1 is reclaimable, but its counter is so high, that policy selects a victim section not only based on its utilization the GC will never try to reclaim it again. but also its “age”. F2FS infers the age of a section by averaging the age of segments in the section, which, in turn, can be To prevent this kind of starvation, use another field in the obtained from their last modification time recorded in SIT. With SUFILE entry, to store the number of blocks that are protected the cost-benefit policy, F2FS gets another chance to separate hot by a snapshot. and cold data. This value is just a heuristic and it is usually set to 0. Only if the 2. Valid block identification and migration: After selecting a GC reclaims a segment, it is written to the SUFILE entry. The victim section, F2FS must identify valid blocks in the section GC has to check for snapshots anyway, so we get this quickly. To this end, F2FS maintains a validity bitmap per information for free. By storing this information in the SUFILE segment in SIT. Once having identified all valid blocks by we can avoid starvation in the following way: scanning the bitmaps, F2FS retrieves parent node blocks 1. Segment 1 is written containing their indices from the SSA information. If the blocks 2. Snapshot is created are valid, F2FS migrates them to other free logs. For background 3. GC tries to reclaim Segment 1, but all blocks are cleaning, F2FS does not issue actual I/Os to migrate valid blocks. protected by the Snapshot. The GC has to set the number Instead, F2FS loads the blocks into page cache and marks them as dirty. Then, F2FS just leaves them in the page cache for the of live blocks to maximum to avoid reclaiming this kernel worker thread to flush them to the storage later. This lazy Segment again in the near future. 4. GC sets the number of snapshot blocks in Segment 1 in migration not only alleviates the performance impact on the SUFILE entry foreground I/O activities, but also allows small writes to be 5. Snapshot is deleted combined. Background cleaning does not kick in when normal 6. On Snapshot deletion we walk through every entry in the I/O or foreground cleaning is in progress. SUFILE and reduce the number of live blocks to half, if 3. Post-Cleaning process: After all valid blocks are migrated, a the number of snapshot blocks is bigger than half of the victim section is registered as a candidate to become a new free maximum. section (called a “pre-free” section in F2FS). After a checkpoint 7. Segment 1 is reclaimable and the number of live blocks is made, the section finally becomes a free section, to be entry is at half the maximum. The GC will try to reclaim reallocated. We do this because if a pre-free section is reused this segment as soon as there are no other better choices. before checkpointing, the file system may lose the data referenced by a previous checkpoint when unexpected power As discussed above, the performance of these file systems outage occurs. majorly depend on their cleaning policies. So the graphical statistics of performance of cleaning policies on compile- bench B. Cleaning in NILFS: platform on the basis of throughput is as below: One of the biggest performance problems of NILFS is its inefficient Timestamp GC policy. This patch set introduces two new GC policies, namely Cost-Benefit and Greedy. The Cost- Benefit policy is nothing new. It has been around for a long time with log-structured file systems. But it relies on accurate information, about the number of live blocks in a segment. NILFS currently does not provide the necessary information. So the patch set extends the entries in the SUFILE to include a counter for the number of live blocks. This counter is decremented whenever a file is deleted or overwritten. Except for some tricky parts, the counting of live blocks is quite trivial. The problem is snapshots. At any time, a checkpoint can be turned into a snapshot or vice versa. So blocks that are reclaimable at one point in time, are protected by a snapshot a moment later.

Figure 7: Time Analysis IJTRD | Mar-Apr 2016 Available [email protected] 48 International Journal of Trend in Research and Development, Volume 3(2), ISSN: 2394-9333 www.ijtrd.com Based on this table the following graph has been created for comparison of these file systems:

Figure 8: User Time Analysis CONCLUSION Thus from the survey conducted on the file systems NILFS and F2FS based on their throughput, the performance of the NILFS is considerably slow as compared to the F2FS. This is because of the structure of NILFS which consists of the snapshots which result in increased time for the updation in the policy. However F2FS file system doesn’t consists of such structure which results in increased performance speed.

References [1] Changman Lee, Dongo Sim, Joo-Young Hwang and SanGyeun Choo F2FS:A New File System for Flash Storage. [2] Yuanting Wei Sch. of ICE, Sungkyunkwan Univ, Suwon, South Korea, Dongkun Shin NAND flash storage device performance in Linux file System [3] Compile-Bench https://oss.oracle.com/ mason/compilebench/ [4] Mendel Rosenblum and John K. Ousterhout, University of California at Berkeley The Design and Implementation of a Log-Structured File System

IJTRD | Mar-Apr 2016 Available [email protected] 49