11.1 File-System Structure

„ File structure z Logical storage unit z Collection of related information „ resides on secondary storage (disks) „ File system organized into layers Chapter 11: „ File control block – storage structure consisting Chapter 11: of information about a file Implementing File Systems

Figure 11.2 A typical file-control block Figure 11.1 Layered File System 11.3 Chapter 11: Implementing File Systems In-Memory File-System Structures

11.1 File-System Structure 11.2 File-System Implementation 11.3 Directory Implementation 11.4 Allocation Methods 11.5 Free-Space Management 11.6 Efficiency and Performance 11.7 Recovery 11.8 Log-Structured File Systems 11.9 NFS Figure 11.3 (a) File open (b) File read

11.2 11.4 Virtual File Systems 11.4 Allocation Methods „ Virtual File Systems (VFS) provide an object-oriented way of implementing file systems. „ VFS allows the same system call interface (API) to be used for different types of An allocation method refers to how disk blocks are file systems. allocated for files: „ The API is to the VFS interface, rather than any specific type of file system.

„ 11.4.1 Contiguous allocation

„ 11.4.2 Linked allocation

„ 11.4.3 Indexed allocation

Figure 11.4 Schematic View of

11.5 11.7 11.3 Directory Implementation 11.4.1 Contiguous Allocation

„ Linear list of file names with pointer to the data blocks. „ Each file occupies a set of contiguous blocks on the disk z advantage: simple to program „ Advantages z disadvantage: time-consuming to execute z Simple – only require  starting location (block #) and „ Hash Table – linear list with hash data structure.  length (number of blocks) z Immediate access block i z advantage: decreases directory search time „ Difficulties z disadvantage: z Finding space for a new file  collisions – situations where two file names hash to the same location  first fit, best fit  external fragmentation  fixed size z File size is unknown while created „ Disadvantages z Wasteful of space (dynamic storage- allocation problem) z Files cannot grow

Figure 11.5 contiguous allocation of disk space

11.6 11.8 -Based Systems File-Allocation Table (FAT)

„ Many newer file systems (I.e. ) use a modified contiguous allocation scheme „ Extent-based file systems allocate disk blocks in extents z An extent is a contiguous block of disks z A file consists of one or more extents. „ Design z initially, a contiguous chunk of space is allocated z if not enough, another chunk (called extent) is added  the location of a file’s blocks is recorded as a location and a block, plus  a link to the first block of the next extent „ Issues z Internal fragmentation becomes a problem if the extents are too large z External fragmentation becomes a problem as extents of varying sizes are allocated and deallocated Figure 11.7 File-allocation table

11.9 11.11 11.4.2 Linked Allocation 11.4.3 Indexed Allocation

„Each file is a linked list of disk blocks: blocks may be scattered anywhere on disk „ Brings all pointers per file together into the index block. „ Logical view. block = pointer Example: 512 bytes / block = 508 bytes of data + 4 bytes of pointer

„Simple – need only starting address index table „Free-space management system – no waste of space „ Advantages „Disadvantages z solve external fragmentation z space require for the pointers z no size-declaration problem  unit of cluster (multiple blocks) z support direct access – one pointer for multiple blocks „ Disadvantages – fast disk access time z Internal fragmentation – internal fragmentation z greater pointer overhead than z slow access to block i linked allocation Figure 11.6 Linked allocation of disk space z low reliability due to crash z Need index table: how large? Figure 11.8 Indexed allocation of disk space „Examples: File-allocation table (FAT) – disk- space allocation used by MS-DOS and OS/2

11.10 11.12 Index Block Allocation 11.5 Free-Space Management

„ Index block is normally one disk block „ Bit vector (n blocks) „ Mapping from logical to physical in a file of unbounded length (block of 512 words) 01 2 n-1 „ Linked scheme – Link together several index blocks (no limit on size) … z small header giving file name and a set o the first disk-block addresses Block number calculation z The next address (the last word in the index block) is  nil (for a small file) (number of bits per word) * 0 ⇒ block[i] free  a pointer to another index block (for a large file) (number of 0-value words) + bit[i] = 1 ⇒ block[i] occupied „ Multilevel index offset of first 1 bit 678 z a first-level index block to point to a set of second-level index blocks „ Bit map requires extra space z a second-level index block point to the file blocks z Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32K bytes) z Easy to get contiguous files M „ Linked list (free list): link together all the free disk blocks Two-level Index Example z Cannot get contiguous space easily z No waste of space „ Grouping outer-index „ Counting index table file

11.13 11.15 Index Block Allocation: Combined Scheme UNIX (4K bytes per block) Free-Space Management (Cont.)

„ Need to protect: z Pointer to free list „ keep first 15 pointers of the z Bit map index block in the file’s  Must be kept on disk z 12 pointers: direct blocks  Copy in memory and disk may differ z indirect block  Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk z Solution:  Set bit[i] = 1 in disk  Allocate block[i]  Set bit[i] = 1 in memory

11.14 11.16 Linked Free Space List on Disk I/O Without a Unified Buffer Cache

11.17 11.19 11.6 Efficiency and Performance Unified Buffer Cache

„ Efficiency dependent on: „ A unified buffer cache uses the same page cache to cache both z disk allocation and directory algorithms memory-mapped pages and ordinary file system I/O z types of data kept in file’s directory entry „ Performance z disk cache – separate section of main memory for frequently used blocks z free-behind and read-ahead – techniques to optimize sequential access z improve PC performance by dedicating section of memory as virtual disk, or RAM disk „ Page cache z A page cache caches pages rather than disk blocks using virtual memory techniques z Memory-mapped I/O uses a page cache z Routine I/O through the file system uses the buffer (disk) cache z This leads to the following figure

11.18 11.20 11.7 Recovery 11.9 The Sun (NFS)

„ Consistency checking – compares data in directory structure with „ An implementation and a specification of a software system for data blocks on disk, and tries to fix inconsistencies accessing remote files across LANs (or WANs)

„ Use system programs to back up data from disk to another storage „ The implementation is part of the Solaris and SunOS operating device (floppy disk, magnetic tape, other magnetic disk, optical) systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol and Ethernet „ Recover lost file or disk by restoring data from backup

11.21 11.23 11.8 Log Structured File Systems NFS (Cont.)

„ Interconnected workstations viewed as a set of independent machines with „ Log structured (or journaling) file systems record each update to independent file systems, which allows sharing among these file systems in a the file system as a transaction transparent manner z A remote directory is mounted over a local file system directory  The mounted directory looks like an integral subtree of the local file „ All transactions are written to a log system, replacing the subtree descending from the local directory z A transaction is considered committed once it is written to the z Specification of the remote directory for the mount operation is log nontransparent; the host name of the remote directory has to be provided  Files in the remote directory can then be accessed in a transparent manner z However, the file system may not yet be updated z Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory „ The transactions in the log are asynchronously written to the file „ NFS is designed to operate in a heterogeneous environment of different system machines, operating systems, and network architectures; the NFS specifications independent of these media z When the file system is modified, the transaction is removed „ This independence is achieved through the use of RPC primitives built on top of from the log an External Data Representation (XDR) protocol used between two implementation-independent interfaces „ If the file system crashes, all remaining transactions in the log must „ The NFS specification distinguishes between the services provided by a mount still be performed mechanism and the actual remote-file-access services

11.22 11.24 Three Independent File Systems NFS Protocol

„ Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: z searching for a file within a directory z reading a set of directory entries z manipulating links and directories z accessing file attributes z reading and writing files „ NFS servers are stateless; each request has to provide a full set of arguments (NFS V4 is just coming available – very different, stateful) „ Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching) „ The NFS protocol does not provide concurrency-control mechanisms (a) mounts (b) Cascading mounts

11.25 11.27 Three Major Layers of NFS Architecture NFS Mount Protocol

„ Establishes initial logical connection between server and client „ UNIX file-system interface (based on the open, read, write, and „ Mount operation includes name of remote directory to be mounted and close calls, and file descriptors) name of server machine storing it „ Virtual File System (VFS) layer – distinguishes local files from z Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine remote ones, and local files are further distinguished according to their file-system types z Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to z The VFS activates file-system-specific operations to handle mount them local requests according to their file-system types „ Following a mount request that conforms to its export list, the server z Calls the NFS protocol procedures for remote requests returns a file handle—a key for further accesses „ File handle – a file-system identifier, and an inode number to identify „ NFS service layer – bottom layer of the architecture the mounted directory within the exported file system z Implements the NFS protocol „ The mount operation changes only the user’s view and does not affect the server side

11.26 11.28 Schematic View of NFS Architecture NFS Remote Operations

„ Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files) „ NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance „ File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes z Cached file blocks are used only if the corresponding cached attributes are up to date „ File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server „ Clients do not free delayed-write blocks until the server confirms that the data have been written to disk

11.29 11.31 NFS Path-Name Translation Summary

„ Performed by breaking the path into component names and Chapter 11: Implementing File Systems performing a separate NFS lookup call for every pair of component name and directory vnode 11.1 File-System Structure

„ To make lookup faster, a directory name lookup cache on the 11.2 File-System Implementation client’s side holds the vnodes for remote directory names 11.3 Directory Implementation 11.4 Allocation Methods 11.5 Free-Space Management 11.6 Efficiency and Performance 11.7 Recovery 11.8 Log-Structured File Systems 11.9 NFS

11.30 11.32