Buffer Cache File System Buffer Cache

Buffer Cache File System Buffer Cache

Recall: Multilevel Indexed Files (Original 4.1 BSD) CS162 • Sample file in multilevel Operating Systems and indexed format: – 10 direct ptrs, 1K blocks Systems Programming – How many accesses for block #23? (assume file Lecture 20 header accessed on open)? » Two: One for indirect block, one for data Filesystems (Con’t) – How about block #5? Reliability, Transactions » One: One for data – Block #340? » Three: double indirect block, indirect block, and data April 14th, 2020 • UNIX 4.1 Pros and cons – Pros: Simple (more or less) Prof. John Kubiatowicz Files can easily expand (up to a point) Small files particularly cheap and easy http://cs162.eecs.Berkeley.edu – Cons: Lots of seeks (lead to 4.2 Fast File System Optimizations) • Ext2/3 (Linux): – 12 direct ptrs, triply-indirect blocks, settable block size (4K is common) 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.2 Recall: Buffer Cache File System Buffer Cache Disk • Kernel must copy disk blocks to main memory to Data blocks access their contents and write them back if modified Reading – Could be data blocks, inodes, directory contents, etc. PCB iNodes – Possibly dirty (modified and not written back) file • Key Idea: Exploit locality by caching disk data in desc memory Writing – Name translations: Mapping from pathsinodes Dir Data blocks – Disk blocks: Mapping from block addressdisk content Free bitmap • Buffer Cache: Memory used to cache kernel resources, Memory including disk blocks and name translations Blocks – Can contain “dirty” blocks (blocks yet on disk) State free free • OS implements a cache of disk blocks for efficient access to data, directories, inodes, freemap 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.3 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.4 File System Buffer Cache: open File System Buffer Cache: open Disk Disk Data blocks Data blocks Reading Reading PCB PCB iNodes iNodes file file desc desc Writing Writing Dir Data blocks Dir Data blocks <name>:inumber Free bitmap Free bitmap Memory Memory Blocks Blocks State free freedirrd State free dir inoderd • {load block of directory; search for map}+ ; • {load block of directory; search for map}+ ; Load inode ; • Create reference via open file descriptor 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.5 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.6 File System Buffer Cache: Read? File System Buffer Cache: Write? Disk Disk Data blocks Data blocks Reading Reading PCB PCB iNodes iNodes file file desc desc Writing Writing Dir Data blocks Dir Data blocks <name>:inumber <name>:inumber Free bitmap Free bitmap Memory Memory Blocks Blocks State free dir inode State free dir inode • From inode, traverse index structure to find data block; • Process similar to read, but may allocate new blocks (update free load data block; copy all or part to read data buffer map), blocks need to be written back to disk; inode? 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.7 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.8 File System Buffer Cache: Eviction? Buffer Cache Discussion Disk Data blocks • Implemented entirely in OS software Reading – Unlike memory caches and TLB PCB • Blocks go through transitional states between free and iNodes in-use file desc – Being read from disk, being written to disk Writing – Other processes can run, etc. Dir Data blocks • Blocks are used for a variety of purposes <name>:inumber – inodes, data for dirs and files, freemap Free bitmap – OS maintains pointers into them Memory • Termination – e.g., process exit – open, read, write Blocks • Replacement – what to do when it fills up? State free dir dirty inode • Blocks being written back to disc go through a transient state 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.9 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.10 File System Caching File System Caching (con’t) • Replacement policy? LRU • Cache Size: How much memory should the OS allocate to the – Can afford overhead full LRU implementation buffer cache vs virtual memory? – Advantages: – Too much memory to the file system cache won’t be able to » Works very well for name translation run many applications at once » Works well in general as long as memory is big enough to – Too little memory to file system cache many applications may accommodate a host’s working set of files. run slowly (disk caching not effective) – Disadvantages: – Solution: adjust boundary dynamically so that the disk access » Fails when some application scans through file system, thereby flushing the cache with data used only once rates for paging and file access are balanced »Example: find . –exec grep foo {} \; • Read Ahead Prefetching: fetch sequential blocks early • Other Replacement Policies? – Key Idea: exploit fact that most common file access is – Some systems allow applications to request other policies sequential by prefetching subsequent disk blocks ahead of – Example, ‘Use Once’: current read request (if they are not already in memory) » File system can discard blocks as soon as they are used – Elevator algorithm can efficiently interleave groups of prefetches from concurrent applications – How much to prefetch? » Too many imposes delays on requests by other applications » Too few causes many seeks (and rotational delays) among concurrent file requests 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.11 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.12 Delayed Writes Delayed Writes • Delayed Writes: Writes to files not immediately sent to disk • Delay block allocation: May be able to allocate multiple – So, Buffer Cache is a write-back cache blocks at same time for file, keep them contiguous • write() copies data from user space buffer to kernel buffer • Some files never actually make it all the way to disk – Enabled by presence of buffer cache: can leave written file blocks in cache for a while – Many short-lived files – Other apps read data from cache instead of disk • But what if system crashes before buffer cache block is – Cache is transparent to user programs flushed to disk? • Flushed to disk periodically • And what if this was for a directory file? – In Linux: kernel threads flush buffer cache very 30 sec. in – Lose pointer to inode default setup • file systems need recovery mechanisms • Disk scheduler can efficiently order lots of requests – Elevator Algorithm can rearrange writes to avoid random seeks 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.13 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.14 Important “ilities” How to Make File System Durable? • Availability: the probability that the system can accept and • Disk blocks contain Reed-Solomon error correcting process requests codes (ECC) to deal with small defects in disk drive – Often measured in “nines” of probability. So, a 99.9% probability is considered “3-nines of availability” – Can allow recovery of data from small media defects – Key idea here is independence of failures • Durability: the ability of a system to recover data despite • Make sure writes survive in short term faults – Either abandon delayed writes or – This idea is fault tolerance applied to data – Use special, battery-backed RAM (called non-volatile – Doesn’t necessarily imply availability: information on pyramids RAM or NVRAM) for dirty blocks in buffer cache was very durable, but could not be accessed until discovery of Rosetta Stone • Make sure that data survives in long term • Reliability: the ability of a system or component to perform its required functions under stated conditions for a specified – Need to replicate! More than one copy of data! period of time (IEEE definition) – Important element: independence of failure – Usually stronger than simply availability: means that the » Could put copies on one disk, but if disk head fails… system is not only “up”, but also working correctly » Could put copies on different disks, but if server fails… – Includes availability, security, fault tolerance/durability » Could put copies on different servers, but if building is – Must make sure data survives system crashes, disk crashes, struck by lightning…. other problems » Could put copies on servers in different continents… 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.15 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.16 RAID: Redundant Arrays of Inexpensive Disks RAID 1: Disk Mirroring/Shadowing • Classified by David Patterson, Garth A. Gibson, and Randy Katz here at UCB in 1987 – Classic paper was first to evaluate multiple schemes recovery group • Data stored on multiple disks (redundancy) • Each disk is fully duplicated onto its “shadow” – For high I/O rate, high availability environments – Berkeley researchers were looking for alternatives to – Most expensive solution: 100% capacity overhead big expensive disks – Redundancy necessary because cheap disks were • Bandwidth sacrificed on write: more error prone – Logical write = two physical writes – Highest bandwidth when disk heads and rotation fully synchronized (hard to do exactly) • Either in software or hardware • Reads may be optimized – In hardware case, done by disk controller; file system may – Can have two independent reads to same data not even know that there is more than one disk in use • Recovery: • Initially, five levels of RAID (more now) – Disk failure replace disk and copy data to new disk – Hot Spare: idle disk already attached to system to be used for immediate replacement 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.17 4/14/20 Kubiatowicz CS162 © UCB Spring 2020 Lec 20.18 RAID 5+: High I/O Rate Parity Allow more disks to fail! Stripe

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us