Storage Devices, Basic File System Design

Storage Devices, Basic File System Design

CS162 Operating Systems and Systems Programming Lecture 17 Data Storage to File Systems October 29, 2019 Prof. David Culler http://cs162.eecs.Berkeley.edu Read: A&D Ch 12, 13.1-3.2 Recall: OS Storage abstractions Key Unix I/O Design Concepts • Uniformity – everything is a file – file operations, device I/O, and interprocess communication through open, read/write, close – Allows simple composition of programs • find | grep | wc … • Open before use – Provides opportunity for access control and arbitration – Sets up the underlying machinery, i.e., data structures • Byte-oriented – Even if blocks are transferred, addressing is in bytes What’s below the surface ?? • Kernel buffered reads – Streaming and block devices looks the same, read blocks yielding processor to other task • Kernel buffered writes Application / Service – Completion of out-going transfer decoupled from the application, File descriptor number allowing it to continue - an int High Level I/O streams • Explicit close Low Level I/O handles 9/12/19 The cs162file fa19 L5system abstraction 53 Syscall registers • File File System descriptors File Descriptors – Named collection of data in a file system • a struct with all the info I/O Driver Commands and Data Transfers – POSIX File data: sequence of bytes • about the files Disks, Flash, Controllers, DMA Could be text, binary, serialized objects, … – File Metadata: information about the file • Size, Modification Time, Owner, Security info • Basis for access control • Directory – “Folder” containing files & Directories – Hierachical (graphical) naming • Path through the directory graph 9/12/19 cs162 fa19 L5 57 • Uniquely identifies a file or directory – /home/ff/cs162/public_html/fa14/index.html – Links and Volumes (later) 10/29/19 CS162 © UCB Fa199/12/19 cs162 fa19 L5 Lec 17.2 28 Review: Memory-Mapped Display Controller • Memory-Mapped: – Hardware maps control registers and display 0x80020000 Graphics memory into physical address space Command » Addresses set by HW jumpers or at boot time Queue 0x80010000 – Simply writing to display memory (also called Display the “frame buffer”) changes image on screen Memory » Addr: 0x8000F000 — 0x8000FFFF 0x8000F000 – Writing graphics description to cmd queue » Say enter a set of triangles describing some 0x0007F004 Command scene 0x0007F000 Status » Addr: 0x80010000 — 0x8001FFFF – Writing to the command register may cause on- board graphics hardware to do something » Say render the above scene Physical » Addr: 0x0007F004 Address • Can protect with address translation Space 10/29/19 CS162 © UCB Fa19 Lec 17.3 Review: Transferring Data To/From Controller • Programmed I/O: – Each byte transferred via processor in/out or load/store – Pro: Simple hardware, easy to program – Con: Consumes processor cycles proportional to data size • Direct Memory Access: – Give controller access to memory bus – Ask it to transfer 1 data blocks to/from memory directly • Sample interaction 2 with DMA controller (from OSC book): 3 10/29/19 CS162 © UCB Fa19 Lec 17.4 Review: Transferring Data To/From Controller • Programmed I/O: – Each byte transferred via processor in/out or load/store – Pro: Simple hardware, easy to program – Con: Consumes processor cycles proportional to data size • Direct Memory Access: – Give controller access to memory bus – Ask it to transfer data blocks to/from memory directly 6 • Sample interaction with DMA controller 5 (from OSC book): 4 10/29/19 CS162 © UCB Fa19 Lec 17.5 Recall: I/O Device Notifying the OS • The OS needs to know when: – The I/O device has completed an operation – The I/O operation has encountered an error • I/O Interrupt: – Device generates an interrupt whenever it needs service – Pro: handles unpredictable events well – Con: interrupts relatively high overhead • Polling: – OS periodically checks a device-specific status register » I/O device puts completion information in status register – Pro: low overhead – Con: may waste many cycles on polling if infrequent or unpredictable I/O operations • Actual devices combine both polling and interrupts – For instance – High-bandwidth network adapter: » Interrupt for first incoming packet » Poll for following packets until hardware queues are empty 10/29/19 CS162 © UCB Fa19 Lec 17.6 Recall: Device Drivers • Device Driver: Device-specific code in the kernel that interacts directly with the device hardware – Supports a standard, internal interface – Same kernel I/O system can interact easily with different device drivers – Special device-specific configuration supported with the ioctl() system call • Device Drivers typically divided into two pieces: – Top half: accessed in call path from system calls » implements a set of standard, cross-device calls like open(), close(), read(), write(), ioctl(), strategy() » This is the kernel’s interface to the device driver » Top half will start I/O to device, may put thread to sleep until finished – Bottom half: run as interrupt routine » Gets input or transfers next block of output » May wake sleeping threads if I/O now complete 10/29/19 CS162 © UCB Fa19 Lec 17.7 Recall: Kernel Device Structure The System Call Interface Process Memory Device Filesystems Networking Management Management Control Concurrency, Files and dirs: Virtual TTYs and Connectivity multitasking memory the VFS device access File System Types Network Architecture Subsystem Memory Device Dependent Manager Control Code Block IF drivers Devices 10/29/19 CS162 © UCB Fa19 Lec 17.8 Review: Life Cycle of An I/O Request User Program Kernel I/O Subsystem Device Driver Top Half Device Driver Bottom Half Device Hardware 10/29/19 CS162 © UCB Fa19 Lec 17.9 Basic I/O Performance Concepts • Response Time or Latency: Time to perform an operation(s) • Bandwidth or Throughput: Rate at which operations are performed (op/s) – Files: MB/s, Networks: Mb/s, Arithmetic: GFLOP/s • Start up or “Overhead”: time to initiate an operation • Most I/O operations are roughly linear in b bytes – Latency(b) = Overhead + b / TransferCapacity 10/29/19 CS162 © UCB Fa19 Lec 17.10 Example (Fast Network or SSD) • Consider: 1 Gb/s link (B = 125 MB/s) w/ startup cost S = 1 ms – Latency(b) = S + b/B – Bandwidth = b/(S + b/B) = B*b/(B*S + b) = B/(B*S/b + 1) 10/29/19 CS162 © UCB Fa19 Lec 17.11 Example (Fast Network or SSD) • Consider a 1 Gb/s link (B = 125 MB/s) w/ startup cost S = 1 ms – Half-power Bandwidth Þ B/(B*S/b + 1) = B/2 – Half-power point occurs at b=S*B= 125,000 bytes 10/29/19 CS162 © UCB Fa19 Lec 17.12 Example: at 10 ms startup (like Disk) Performance)of)gbps)link)with)10)ms)startup) 18,000"" 50"" 16,000"" 45"" 40"" 14,000"" 35"" 12,000"" 30"" 10,000"" 25"" 8,000"" 20"" Latency)(us)) 6,000"" 15"" Bandwidth)(mB/s)) 4,000"" 10"" Half-power b = 1,250,000 bytes! 2,000"" 5"" 0"" 0"" 0"" 50,000""100,000""150,000""200,000""250,000""300,000""350,000""400,000""450,000""500,000"" Length)(b)) 10/29/19 CS162 © UCB Fa19 Lec 17.13 What Determines Peak BW for I/O ? • Bus Speed – PCI-X: 1064 MB/s = 133 MHz x 64 bit (per lane) – ULTRA WIDE SCSI: 40 MB/s – Serial Attached SCSI & Serial ATA & IEEE 1394 (firewire): 1.6 Gb/s full duplex (200 MB/s) – USB 3.0 – 5 Gb/s – Thunderbolt 3 – 40 Gb/s • Device Transfer Bandwidth – Rotational speed of disk – Write / Read rate of NAND flash – Signaling rate of network link • Whatever is the bottleneck in the path… 10/29/19 CS162 © UCB Fa19 Lec 17.14 The Amazing Magnetic Disk • Unit of Transfer: Sector Spindle – Ring of sectors form a track Head Arm – Stack of tracks form a cylinder Surface Sector – Heads position on cylinders Platter Surface Track • Disk Tracks ~ 1µm (micron) wide Arm Assembly – Wavelength of light is ~ 0.5µm – Resolution of human eye: 50µm – 100K tracks on a typical 2.5” disk • Separated by unused guard regions – Reduces likelihood neighboring tracks are corrupted during writes (still a small Motor Motor non-zero chance) 10/29/19 CS162 © UCB Fa19 Lec 17.15 The Amazing Magnetic Disk • Track length varies across disk – Outside: More sectors per track, Spindle Head Arm higher bandwidth – Disk is organized into Surface Sector regions of tracks with Platter same # of sectors/track Surface Track – Only outer half of radius is used Arm Assembly » Most of the disk area in the outer regions of the disk • Disks so big that some companies (like Google) reportedly only use part of disk for active data – Rest is archival data Motor Motor 10/29/19 CS162 © UCB Fa19 Lec 17.16 Shingled Magnetic Recording (SMR) • Overlapping tracks yields greater density, capacity • Restrictions on writing, complex DSP for reading • Examples: Seagate (8TB), Hitachi (10TB) 10/29/19 CS162 © UCB Fa19 Lec 17.17 Magnetic Disks - Performance Track • Cylinders: all the tracks under the Sector head at a given point on all surface Head • Read/write data is a three-stage process: Cylinder – Seek time: position the head/arm over the proper track Platter – Rotational latency: wait for desired sector to rotate under r/w head – Transfer time: transfer a block of bits (sector) under r/w head Seek time = 4-8ms One rotation = 1-2ms (3600-7200 RPM) 10/29/19 CS162 © UCB Fa19 Lec 17.18 Magnetic Disks - Queuing Track • Cylinders: all the tracks under the Sector head at a given point on all surface Head • Read/write data is a three-stage process: Cylinder Platter – Seek time: position the head/arm over the proper track – Rotational latency: wait for desired sector to rotate under r/w head – Transfer time: transfer a block of bits (sector) under r/w head Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + Xfer Time Controller Hardware Request Software Result Media Time Queue (Seek+Rot+Xfer) (Device Driver) 10/29/19 CS162 © UCB Fa19 Lec 17.19 Typical Numbers for Magnetic Disk Parameter Info / Range Space/Density Space: 14TB (Seagate), 8 platters, in 3½ inch form factor! Areal Density: ≥ 1Terabit/square inch! (PMR, Helium, …) Average seek time Typically 4-6 milliseconds.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    62 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us