THE BUFFER CACHE – Can Contain “Dirty” Blocks (With Modifications Not on Disk)

Recall: Components of a File System CS162 File path Operating Systems and Systems Programming Directory Structure Lecture 21 File Header One Block = multiple sectors Filesystems 3: Filesystem Case Studies (Con’t), File number Structure Ex: 512 sector, 4K block Buffering, Reliability, and Transactions “inumber” Data blocks November 9th, 2020 Prof. John Kubiatowicz “inode” http://cs162.eecs.Berkeley.edu … 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.2 Recall: FAT Properties Recall: Multilevel Indexed Files (Original 4.1 BSD) • File is collection of disk blocks • Sample file in multilevel indexed format: FAT Disk Blocks (Microsoft calls them “clusters”) – 10 direct ptrs, 1K blocks • FAT is array of integers mapped 1-1 0: 0: File number – How many accesses for block #23? with disk blocks (assume file header accessed on open)? – Each integer is either: 31: File 31, Block 0 » Two: One for indirect block, one for data » Pointer to next block in file; or File 31, Block 1 – How about block #5? » End of file flag; or » One: One for data » Free block flag – Block #340? • File Number is index of root » Three: double indirect block, of block list for the file File 31, Block 3 indirect block, and data free – Follow list to get block # • UNIX 4.1 Pros and cons – Directory must map name to block – Pros: Simple (more or less) number at start of file File 31, Block 2 Files can easily expand (up to a point) Small files particularly cheap and easy • But: Where is FAT stored? – Cons: Lots of seeks – Beginning of disk, before the data blocks N-1: N-1: Very large files must read many indirect block (four I/Os per block!) – Usually 2 copies (to handle errors) memory 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.3 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.4 Fast File System (BSD 4.2, 1984) • Same inode structure as in BSD 4.1 – same file header and triply indirect blocks like we just studied – Some changes to block sizes from 1024 4096 bytes for performance • Paper on FFS: “A Fast File System for UNIX” – Marshall McKusick, William Joy, Samuel Leffler and Robert Fabry – Off the “resources” page of course website – Take a look! • Optimization for Performance and Reliability: – Distribute inodes among different tracks to be closer to data – Uses bitmap allocation in place of freelist CASE STUDY: – Attempt to allocate files contiguously – 10% reserved disk space BERKELEY FAST FILE SYSTEM (FFS) – Skip-sector positioning (mentioned later) 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.5 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.6 FFS Changes in Inode Placement: Motivation FFS Locality: Block Groups • In early UNIX and DOS/Windows’ FAT file system, headers stored in special • The UNIX BSD 4.2 (FFS) distributed the header information array in outermost cylinders (inodes) closer to the data blocks – Fixed size, set when disk is formatted – Often, inode for file stored in same “cylinder group” » At formatting time, a fixed number of inodes are created as parent directory of the file » Each is given a unique number, called an “inumber” – makes an “ls” of that directory run very fast • File system volume divided into set of block groups • Problem #1: Inodes all in one place (outer tracks) – Close set of tracks – Head crash potentially destroys all files by destroying inodes – Inodes not close to the data that the point to • Data blocks, metadata, and free space interleaved within block group » To read a small file, seek to get header, seek back to data – Avoid huge seeks between user data and system structure • Problem #2: When create a file, don’t know how big it will become (in UNIX, most writes are by appending) • Put directory and its files in common block group – How much contiguous space do you allocate for a file? – Makes it hard to optimize for performance 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.7 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.8 FFS Locality: Block Groups (Con’t) UNIX 4.2 BSD FFS First Fit Block Allocation • First-Free allocation of new file blocks – To expand file, first try successive blocks in bitmap, then choose new range of blocks – Few little holes at start, big sequential runs at end of group – Avoids fragmentation – Sequential layout for big files • Important: keep 10% or more free! – Reserve space in the Block Group • Summary: FFS Inode Layout Pros – For small directories, can fit all data, file headers, etc. in same cylinder no seeks! – File headers much smaller than whole block • Fills in the small holes at the start of block group (a few hundred bytes), so multiple headers fetched from disk at same time – Reliability: whatever happens to the disk, you can find many of the files • Avoids fragmentation, leaves contiguous free space at end (even if directories disconnected) 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.9 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.10 Attack of the Rotational Delay UNIX 4.2 BSD FFS • Problem 3: Missing blocks due to rotational delay •Pros – Issue: Read one block, do processing, and read next block. In meantime, disk has continued turning: missed next block! Need 1 revolution/block! – Efficient storage for both small and large files Skip Sector – Locality for both small and large files – Locality for metadata and data – No defragmentation necessary! Track Buffer (Holds complete track) • Cons – Solution1: Skip sector positioning (“interleaving”) – Inefficient for tiny files (a 1 byte file requires both an inode and a data » Place the blocks from one file on every other block of a track: give time for processing block) to overlap rotation » Can be done by OS or in modern drives by the disk controller – Inefficient encoding when file is mostly contiguous on disk – Solution 2: Read ahead: read next block right after first, even if application hasn’t asked – Need to reserve 10-20% of free space to prevent fragmentation for it yet » This can be done either by OS (read ahead) » By disk itself (track buffers) - many disk controllers have internal RAM that allows them to read a complete track • Modern disks + controllers do many things “under the covers” – Track buffers, elevator algorithms, bad block filtering 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.11 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.12 Administrivia Linux Example: Ext2/3 Disk Layout • Disk divided into block groups – Provides locality – Each group has two block-sized bitmaps (free blocks/inodes) – Block sizes settable at format time: 1K, 2K, 4K, 8K… • Actual inode structure similar to 4.2 BSD • Midterm 2: Graded! – with 12 direct pointers – Mean: 55.34, Stdev 15.09 • Ext3: Ext2 with Journaling – Historical offset: +26 – Several degrees of protection with • No Class on Wednesday! comparable overhead – It is a holiday – We will talk about Journalling later • If you have any group issues going on, make sure you: – Make sure that your TA understands what is happing – Make sure that you reflect these issues on your group evaluations • Example: create a file1.dat under /dir1/ in Ext3 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.13 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.14 Recall: Directory Abstraction Hard Links • Hard link • Directories are specialized files – Mapping from name to file number in the directory structure – Contents: List of pairs <file name, file number> /usr – First hard link to a file is made when file created /usr • System calls to access directories – Create extra hard links to a file with the link() system call – open / creat traverse the structure – Remove links with unlink() system call – mkdir /rmdir add/remove entries /usr/lib /usr/lib /usr/lib4.3 • When can file contents be deleted? /usr/lib4.3 – link / unlink (rm) – When there are no more hard links to the file • libc support – Inode maintains reference count for this purpose – DIR * opendir (const char *dirname) – struct dirent * readdir (DIR *dirstream) – int readdir_r (DIR *dirstream, struct dirent *entry, /usr/lib4.3/foo /usr/lib4.3/foo struct dirent **result) 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.15 11/9/20 Kubiatowicz CS162 © UCB Fall 2020 Lec 21.16 Soft Links (Symbolic Links) Directory Traversal • What happens when we open /home/cs162/stuff.txt? inode • Soft link or Symbolic Link or Shortcut • “/” - inumber for root inode configured into kernel, say 2 block 49358 – Directory entry contains the path and name of the file 2 – Read inode 2 from its position in inode array on disk “home”:8086 – Map one name to another name – Extract the direct and indirect block pointers 732 block 12132 – Determine block that holds root directory (say block 49358) • Contrast these two different types of directory entries: – Read that block, scan it for “home” to get inumber for this “stuff.txt”:9909 – Normal directory entry: <file name, file #> directory (say 8086) 8086 block 7756 – Symbolic link: <file name, dest. file name> • Read inode 8086 for /home, extract its blocks, read block (say 7756), scan it for “cs162” to get its inumber (say 732) “cs162”:732 • Read inode 732 for /home/cs162, extract its blocks, read • OS looks up destination file name each time program accesses block (say 12132), scan it for “stuff.txt” to get its inumber, Blocks of source file name say 9909 9909 stuff.txt – Lookup can fail (error result from open) • Read inode 9909 for /home/cs162/stuff.txt • Set up file description to refer to this inode so reads / • Unix: Create soft links with symlink syscall write can access the data blocks referenced by its direct and indirect pointers 2 9099 “home”:8086 • Check permissions on

Load more