Distributed File Systems

Recall: A Little Queuing Theory: Some Results • Assumptions: CS162 – System in equilibrium; No limit to the queue Operating Systems and – Time between successive arrivals is random and memoryless Systems Programming Queue Server Lecture 19 Arrival Rate Service Rate μ=1/Tser • Parameters that describe our system: File Systems (Con’t), – : mean number of arriving customers/second –Tser: mean time to service a customer (“m1”) MMAP, Buffer Cache –C: squared coefficient of variance = 2/m12 – μ: service rate = 1/Tser –u: server utilization (0u1): u = /μ = Tser April 7th, 2020 • Parameters we wish to compute: –T: Time spent in queue Prof. John Kubiatowicz q –Lq: Length of queue = Tq (by Little’s law) http://cs162.eecs.Berkeley.edu • Results: –Memoryless service distribution (C = 1): (an “M/M/1 queue”): »Tq = Tser x u/(1 – u) –General service distribution (no restrictions), 1 server (an “M/G/1 queue”): »Tq = Tser x ½(1+C) x u/(1 – u) 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.2 Recall: A Little Queuing Theory: Some Results Components of a File System • Assumptions: – System in equilibrium; No limit to theWhy queue does response/queueing File path – Time between successive arrivals isdelay random grow and unboundedly memoryless even One file system block though the utilization is < 1 ? Directory File Index Queue Server usually = multiple sectors Arrival Rate Service300 RateResponse Structure Structure Time (ms) File number Ex: 512 sector, 4K block μ=1/Tser • Parameters that describe our system: “inumber” 200 – : mean number of arriving customers/second Data blocks –Tser: mean time to service a customer (“m1”) –C: squared coefficient of variance100 = 2/m12 – μ: service rate = 1/Tser –u: server utilization (0u1): u = /μ = Tser 0 100% • Parameters we wish to compute: 0% “inode” … –Tq: Time spent in queue Throughput (Utilization) Note: we have an unfortunate conflicting –Lq: Length of queue = Tq (by Little’s law) (% total BW) terminology for FLASH: • Results: • Block 256KB group of pages –Memoryless service distribution (C = 1): (an “M/M/1 queue”): • Page 4KB chunk inside block »Tq = Tser x u/(1 – u) A FLASH page corresponds to a disk –General service distribution (no restrictions), 1 server (an “M/G/1 queue”): sector (minimum addressable unit) and »Tq = Tser x ½(1+C) x u/(1 – u) block (file system unit), i.e. sector=block 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.3 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.4 Components of a file system Directories file name file number Index Storage block offset directory offset structure • Open performs Name Resolution – Translates pathname into a “file number” » Used as an “index” to locate the blocks – Creates a file descriptor in PCB within kernel – Returns a “handle” (another integer) to user process • Read, Write, Seek, and Sync operate on handle – Mapped to file descriptor and to blocks 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.5 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.6 Directory Directory Structure • Basically a hierarchical structure • How many disk accesses to resolve “/my/book/count”? – Read in file header for root (fixed spot on disk) • Each directory entry is a collection of – Read in first data block for root – Files » Table of file name/index pairs. Search linearly – ok since – Directories directories typically very small » A link to another entries – Read in file header for “my” – Read in first data block for “my”; search for “book” • Each has a name and attributes – Read in file header for “book” – Files have data – Read in first data block for “book”; search for “count” – Read in file header for “count” • Links (hard links) make it a DAG, not just a tree • Current working directory: Per-address-space pointer to – Softlinks (aliases) are another name for an entry a directory (inode) used for resolving file names – Allows user to specify relative filename instead of absolute path (say CWD=“/my/book” can resolve “count”) 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.7 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.8 File In-Memory File System Structures • Named permanent storage Data blocks • Contains –Data » Blocks on disk somewhere – Metadata (Attributes) … » Owner, size, last opened, … » Access rights •R, W, X • Owner, Group, Other (in Unix systems) • Open system call: • Access control list in Windows system – Resolves file name, finds file control block (inode) – Makes entries in per-process and system-wide tables – Returns index (called “file handle”) in open-file table 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.9 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.10 In-Memory File System Structures Our first filesystem: FAT (File Allocation Table) • The most commonly used filesystem in the world! • Assume (for now) we have a FAT Disk Blocks way to translate a path to 0: 0: a “file number” File number – i.e., a directory structure 31: File 31, Block 0 • Disk Storage is a collection of Blocks File 31, Block 1 – Just hold file data (offset o = < B, x >) • Example: file_read 31, < 2, x > – Index into FAT with file number – Follow linked list to block • Read/write system calls: – Read the block from disk into memory File 31, Block 2 –Use file handle to locate inode –Perform appropriate reads or writes N-1: N-1: memory 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.11 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.12 FAT Properties FAT Properties • File is collection of disk blocks • File is collection of disk blocks • FAT is linked list 1-1 with blocks FAT Disk Blocks • FAT is linked list 1-1 with blocks FAT Disk Blocks • File Number is index of root 0: 0: • File Number is index of root 0: 0: of block list for the file File number of block list for the file File number • File offset (o = < B, x >) 31: File 31, Block 0 • File offset (o = < B, x > ) 31: File 31, Block 0 File 31, Block 1 File 31, Block 1 • Follow list to get block # • Follow list to get block # • Unused blocks Marked free (no • Unused blocks Marked free (no ordering, must scan to find) ordering, must scan to find) • Ex: file_write(31, < 3, y >) File 31, Block 3 free free – Grab free block File 31, Block 2 – Linking them into file File 31, Block 2 N-1: N-1: N-1: N-1: memory memory 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.13 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.14 FAT Properties FAT Assessment • File is collection of disk blocks • FAT32 (32 instead of 12 bits) used in Windows, USB drives, • FAT is linked list 1-1 with blocks FAT Disk Blocks SD cards, … FAT Disk Blocks • File Number is index of root 0: 0: • Where is FAT stored? 0: 0: File 1 number File 1 number of block list for the file – On Disk, on boot cache in memory, • Grow file by allocating free blocks 31: File 31, Block 0 second (backup) copy on disk 31: File 31, Block 0 File 31, Block 1 File 31, Block 1 and linking them in free • What happens when you format a disk? File 63, Block 1 File 63, Block 1 • Ex: Create file, write, write – Zero the blocks, Mark FAT entries “free” • What happens when you File 31, Block 3 quick format a disk? File 31, Block 3 File 63, Block 0 File 63, Block 0 63: – Mark all entries in FAT as free 63: File 2 number File 31, Block 2 File 2 number File 31, Block 2 • Simple – Can implement in N-1: N-1: N-1: N-1: device firmware memory memory 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.15 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.16 FAT Assessment – Issues What about FAT directories? • Time to find block (large files) ?? FAT Disk Blocks • Block layout for file ??? 0: 0: File #1 • Sequential Access ??? 31: File 31, Block 0 • Directory is a file containing <file_name: file_number> File 31, Block 1 mappings • Random Access ??? File 63, Block 1 – Free space for new/deleted entries • Fragmentation ??? – In FAT: file attributes are kept in directory (!!!) File 31, Block 3 – MSDOS defrag tool – Each directory is a linked list of entries File 63, Block 0 63: • Where do you find root directory ( “/” )? • Small files ??? File #2 File 31, Block 2 – At well-defined place on disk – For FAT, this is at block 2 (there are no blocks 0 or 1) • Big files ??? N-1: N-1: – Remaining directories are accessed via their file_number 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.17 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.18 Many Huge FAT Security Holes! Characteristics of Files • FAT has no access rights – No way, even in principle, to track ownership of data • FAT has no header in the file blocks – No way to enforce control over data, since all processes have access of FAT table – Just follow pointer to disk blocks • Just gives an index into the FAT to read data – (file number = block number) – Could start in middle of file or access deleted data 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.19 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.20 Characteristics of Files Characteristics of Files • Most files are small, growing numbers of files • Most of the space is occupied by the rare big ones over time 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.21 4/7/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 19.22 Unix File System (1/2) Unix File System (2/2) • Original inode format appeared in BSD 4.1 • Metadata associated with the file – Berkeley Standard Distribution Unix – Rather than in the directory that points to it – Part of your heritage! • UNIX Fast File System (FFS) BSD 4.2 Locality Heuristics: – Similar structure for Linux Ext2/3 – Block group placement • File Number is index into inode arrays – Reserve space • Multi-level index structure

Distributed File Systems

Some Results Recall: FAT Properties FAT Assessment – Issues

Recent Filesystem Optimisations in Freebsd

Sys/Sys/Bio.H 1 1 /* 65 Void Bio Caller2; / Private Use by the Caller

Operating Systems in Depth This Page Intentionally Left Blank OPERATING SYSTEMS in DEPTH

Conference Reports From

Freebsd Documentation Release 10.1

Debugging Kernel Problems

ZFS and Freebsd

Debugging Kernel Problems

Scaleengine and Freebsd

A Dynamic Aspect-Oriented System for Data-Driven Profiling of OS Kernels

Windows+BSD+Linux Installation Guide, V1.1 Windows+BSD+Linux Installation Guide