Operating Systems 2020 File Systems
Total Page:16
File Type:pdf, Size:1020Kb
Operating Systems 2020 1 Operating Systems 2020 File systems November 10, 2020 November 10, 2020 Operating Systems 2020 2 Motivation RAM ... to small to contain all the data ... not persistent beyond the lifetime of a process ... does not easily allow sharing over time... Solution: store stuff on devices (HD, SSD,...) information becomes persistent – no longer dependent on processes, disappears only if deleted administration by operating system November 10, 2020 Operating Systems 2020 3 Main Points File systems Useful abstractions on top of physical devices File system usage patterns File- and Directory Layout November 10, 2020 Operating Systems 2020 4 Users view User does not want to see, know and understand where and how data is stored November 10, 2020 Operating Systems 2020 5 What users want Reliability data survives software crashes etc. data survives device failure Large capacity at low cost High performance Easy Access to data (“finding” data - organisation) Controlled sharing November 10, 2020 Operating Systems 2020 6 Storage Devices Magnetic disks Storage that rarely becomes corrupted Large capacity at low cost Block level random access Slow performance for random access Better performance for streaming access November 10, 2020 Operating Systems 2020 7 Storage Devices Flash memory Storage that rarely becomes corrupted Capacity at intermediate cost (~3 x disk) Block level random access Good performance for reads; worse for random writes November 10, 2020 Operating Systems 2020 8 File System as Illusionist Hides Limitations of Physical Storage Persistence of data stored in file system: Even if crash happens during an update Even if disk block becomes corrupted Even if disk has to be replaced Even if flash memory wears out Naming: Named data instead of disk block numbers Directory organization on top of flat storage Byte addressable data even though devices are block-oriented November 10, 2020 Operating Systems 2020 9 File System as Illusionist (2) Improve performance: Caching data Data placement and data structure organization Security Controlled access to shared data November 10, 2020 Operating Systems 2020 10 types of files regular files text (ASCII or UTF-data) binary data (usually with some internal structure) directories special files (/dev/. , /proc/...) block special files character special files November 10, 2020 Operating Systems 2020 11 accessing data tape drive model sequential access one byte after the other at the end: wind back introduction of floppy disks: any sequence possible random access seek-system-call added allowing to change access-position within file November 10, 2020 Operating Systems 2020 12 File System Workload File sizes Are most files small or large? small Which accounts for more total storage: small or large files? large November 10, 2020 Operating Systems 2020 13 File System Workload File access Are most accesses to small or large files? small Which accounts for more total I/O bytes: small or large files? large November 10, 2020 Operating Systems 2020 14 File System Workload How are files used? Most files are read/written sequentially Some files are read/written randomly Ex: database files, swap files Some files have a pre-defined size at creation Some files start small and grow over time Ex: program stdout, system logs November 10, 2020 Operating Systems 2020 15 Impact on File System Design For small files: Small blocks for storage efficiency Files used together should be stored together For large files: Large blocks for storage efficiciency Contiguous allocation for sequential access Efficient lookup supporting random access November 10, 2020 Operating Systems 2020 16 Impact on File System Design But: we may not know at file creation Whether file will become small or large Whether file is persistent or temporary Whether file will be used sequentially or randomly November 10, 2020 Operating Systems 2020 17 Accessing files - API (Syscalls) Access to files language dependent Implementations build on syscalls creating and deleting files: create(), unlink() reading/writing/positioning from a file: read()/write()/seek() start accessing a file: open() end accessing the file: close() ... November 10, 2020 Operating Systems 2020 18 Design Challenges How do we locate the blocks of a file? Index structure What block size do we use? Index granularity How do we find unused blocks on disk? Free space management November 10, 2020 Operating Systems 2020 19 Design Challenges How can we ensure that blocks are stored “near” each other preserving spatial locality What if machine crashes in middle of a file system op? Reliability November 10, 2020 Operating Systems 2020 20 FS Design Examples FAT FFS NTFS Index structure Linked list Tree (fixed, assym) Tree (dynamic) granularity block block extent free space allocation FAT array Bitmap (fixed location) Bitmap (file) Locality defragmentation Block groups Extents reserve space Best Fit, defragment November 10, 2020 Operating Systems 2020 21 How to find Named Data File Name Directory File Number Index Structure Storage Offset Offset Block Data represented by Filename and Offset (and how many bytes) Directory maps this to FIle number and Offset Index-Structure maps this to Storage and Blocknumber(and likely some offset within that block) November 10, 2020 Operating Systems 2020 22 Directory are files music 320 work 219 foo.txt 871 Directory: File that contains a collection of File Name → file number mappings November 10, 2020 Operating Systems 2020 23 Directories How to find a file in the directory tree Find root directory e.g. stored in a file with a fixed number recursive lookup speed up by caching principle of locality November 10, 2020 Operating Systems 2020 24 Recursive File Name Lookup File 2 bin 737 ˝/˝ usr 924 home 158 File 158 mike 682 ˝/home˝ ada 818 tom 830 File 830 music 320 ˝/home/tom˝ work 219 foo.txt 871 File 871 The quick ˝/home/tom/foo.txt˝ brown fox jumped over the lazy dog. November 10, 2020 Operating Systems 2020 25 Directory API We could use open/read/write/close (from user program) Requires knowing the structure we shall not corrupt directories Operating System must have control over file numbers part of open/create syscall functionality special system calls for directory operations read by user process possible special library functions or syscalls (e.g. getdents - „get directory entries“) helpful November 10, 2020 Operating Systems 2020 26 Directory Layout Directory stored as a file linear search to find filename OK for small directories November 10, 2020 Operating Systems 2020 27 Large Directories: B Trees (XFS) November 10, 2020 Operating Systems 2020 28 Large Directories: Storage November 10, 2020 Operating Systems 2020 29 Linking A file may have multiple names in multiple directories Hard-Link: multiple file directory entries that map different path names to the same file number Soft-Link: map one name to another name (including path) November 10, 2020 Operating Systems 2020 30 Linking - Hard Link November 10, 2020 Operating Systems 2020 31 Linking - Soft Link November 10, 2020 Operating Systems 2020 32 Finding Data We must be able to find the blocks belonging to a file Other goals: support sequential data placement to maximize sequential file access provide efficient random access to any block be efficient for small files be scalable to support large files provide a place to store Metadata November 10, 2020 Operating Systems 2020 33 Case Studies FAT Unix FFS NTFS COW/ZFS November 10, 2020 Operating Systems 2020 34 FAT MS FAT file system implemented late 70ies main file system for MS-DOS and early windows many enhancements FAT-32: 228blocks and 232 − 1bytes Today: Variant still in use for SD-Cards etc. November 10, 2020 Operating Systems 2020 35 FAT-32 November 10, 2020 Operating Systems 2020 36 FAT-32 FAT - File Allocation Table array of 32 bit entries reserved area of volume file: linked list of FAT entries, each entry points to successor or „end of file“ One entry for each block in the volume File number: index of the first entry in the FAT November 10, 2020 Operating Systems 2020 37 FAT free space tracking/locality FAT[i] == 0 → block is free Find free block: search FAT allocation: scan through FAT from last entry allocated may fragment file-system run defragmentation tool November 10, 2020 Operating Systems 2020 38 FAT discussionI simple, widely in use / supported Poor locality → fragmented files Poor random access → need to traverse FAT Limited metadata and access control No hard links limitation of volume and file size max. 228 blocks block size up to 256 KB - internal fragmentation lack of modern reliability techniques November 10, 2020 Operating Systems 2020 39 UNIX FFS Unix Fast File System - released mid 1980 many data-structures identical to Ritchie/Thomposon’s original UNIX FS (1970ies) Tree-based multilevel index locality heuristics Today: ext2/ext3 based on FFS November 10, 2020 Operating Systems 2020 40 Unix Figure: Disk layout, classical example November 10, 2020 Operating Systems 2020 41 Implementation („classical Unix“)I boot block: Boot Loader, to boot system super block: Infos on file system, e.g. size of partition block size pointer to free block list numberings of the root directory „magic number“ (identifies file system type) November 10, 2020 Operating Systems 2020 42 Implementation („classical Unix“) index nodes (inodes) 1:1-relation between inodes and files data blocks November 10, 2020 Operating Systems 2020 43 Index structure - inodes November 10, 2020 Operating Systems 2020 44 inodes November 10, 2020 Operating Systems 2020 45 InodeI Attributes: type (file, directory, character special