Lecture 22: Mass Storage

Fall 2018 Jason Tang

Slides based upon Concept slides, http://codex.cs.yale.edu/avi/os-book/OS9/slide-dir/index.html Copyright Silberschatz, Galvin, and Gagne, 2013 1 Topics

• Mass Storage Systems

• Disk Scheduling

• Disk Management

• Boot Process

2 Mass Storage Systems

• After startup, OS loads applications from secondary storage into main memory:

• ROM / EPROM / EEPROM / FPGA

• Magnetic hard disk

• Non-volatile random-access memory (NVRAM)

• Tape drive

• Or others: boot CD, Zip drive, punch card, …

3 EPROM

• Erasable programmable read-only memory

• Manufacturer or OEM burns image into EPROM

• Use in older systems and in modern embedded systems

4 Magnetic Hard Disk (HDD)

• Spindle motor spins a stack of platters coated with magnetic material

• Spins from 5400 to over 10000 RPMs

• Actuator motor moves a disk head over the platters, to sense polarity of the track underneath

https://www.technobuffalo.com/2012/11/24/western-digital- 5 expands-high-performance-wd-black-hard-drive-line-to-4tb/ Magnetic Hard Disk (HDD)

• Transfer rate: rate which data flow between drive and

• Positioning time (random-access time): time to move disk arm to desired cylinder (seek time) plus time for desired sector to rotate under disk head (rotational latency)

• Head crash: when disk head hits platter

• Attached to computer via a : SCSI, IDE, SATA, Fibre Channel, USB, Thunderbolt, others

• Host controller in computer uses bus to talk to disk controller

6 Hard Disk Performance

• Average access time = average seek time + average latency

• Server-grade hard disks average 5 ms (3 ms seek time + 2 ms latency)

• Average I/O time = average access time + (transfer size / transfer rate) + controller overhead

• Example: transfer 4 KB data with 9 ms average access time, 1 Gb/s transfer rate, 0.1 ms controller overhead

= 9 ms + (4 KB × (1 GB / 1000×1000 KB) × (8 b / 1B) / (1 Gb/s)) + 0.1 ms

= 9.100032 ms (or about 100,000 times slower than modern RAM)

7 Non-Volatile RAM (NVRAM)

• Used in modern for secondary storage, like a hard drive

• Also known as flash memory, flash drive, or solid-state drive (SSD)

• Two variants: NAND flash (most common) or NOR flash

• Requires less power, much faster than magnetic hard disk

• Does not suffer from head crash

• Block erasure: must erase entire block at a time

• Memory wear: may only erase the same block finite times (usually over 10,000)

8 Tape Drive

• Early read/write secondary storage medium

• Linear search: tape drive had to fast-forward or rewind spool of tape to correct place; very slow

• Can hold up to 200 TB

• Transfer rate on order of 140 MB/s (40 times slower than hard disk)

• Origin of tar (“tape archive”) command

9 Disk Structure

• Addressed as large 1-dimensional array of logical blocks

• Block is smallest unit of transfer; HDD is usually 512 or 4096 bytes, NAND flash anywhere from 512 bytes to 128 KiB

• On HDD, sector 0 is first sector on first track on outermost cylinder

• Logical to physical addressing tricky, due to bad sectors

• For HDD, non-constant number of sectors per track due to constant platter rotational speed

10 I/O Scheduling

• Just as OS has a process scheduler to decide which process to run , OS has an I/O scheduler to decide which disk operation to perform next

• On HDD, minimize seek time, by decreasing physical distance that platter must rotate and for disk arm to move to correct cylinder

• On SSD, combine write requests to same block

• Disk bandwidth: total number of bytes transferred, divided by total time between start of first request to completion of last transfer

• While data being transferred via DMA, OS can do other things

11 Disk Scheduling

• Disk I/O request includes input or output mode, disk address, memory address, number of sectors to transfer

• OS maintains queue of requests

• Idle disk can immediately work on I/O request, while requests are queued for a busy disk

• Optimization algorithms only make sense when a queue exists

• HDD controllers have small buffers and can manage a queue of I/O requests

12 FCFS Scheduling (HDD)

• Example: requests for cylinders 98, 183, 37, 122, 14, 124, 65, 67; head is currently on cylinder 53

• Requests serviced first-come, first-serve

• Note wild swing between cylinders 37 to 122 to 14; would be faster if 37 and 14 were serviced consecutively

13 SSTF (HDD)

• Shortest Seek Time first selects request with minimum seek time from current head position

• Form of shortest-job first scheduling, but may starve a request

14 SCAN (HDD)

• Disk arm starts at one end of disk, moves towards other end, servicing requests until it reaches other end; head then reverses direction

• Also known as elevator algorithm

• Works well if requests are uniformly dense; a large density of requests at other end of disk will wait the longest

15 Circular SCAN (C-SCAN) (HDD)

• More uniform wait time than SCAN

• When head reaches one end, return to beginning of disk instead of reversing direction

• Treats cylinders as circular list that wraps around from last cylinder to first

16 LOOK and Circular-LOOK (C-LOOK) (HDD)

• Arm only goes as far as last request in each direction, then reverses direction immediately, without going all of the way to end of disk

• C-LOOK is LOOK with circular list

17 Disk Management (HDD)

• Low-level formatting: dividing a disk into sectors that disk controller can read and write

• Each sector holds header information, data, plus (ECC)

• Usually done by manufacturer

• Usually, disk is partitioned into one or more groups of cylinders, where each partition treated as a logical disk

• Logical formatting: creating a file system

18 Disk Management (SSD)

• On flash drives, can always flip a single bit from 1 to 0, but reverse not physically possible

• Must instead erase entire flash sector (flip all bits in that sector from 0 to 1)

• Erasing is slow, from 3 ms per sector for NAND flash up to 5 s for NOR flash

• No such thing as low-level formatting a flash drive, though file systems still exist

• File systems optimized for SSDs operate very differently than ones designed for HDDs

19 Bad Blocks

• For HDD, bad blocks discovered during low-level initialization

• Disk controller automatically skips over that block when formatting disk

• For SSD, bad blocks discovered during erase

• OS maintains list of bad blocks and skips them during operations

• Bad blocks can also be found during operations, when controller/OS calculates a different ECC than one currently stored

• Spare sectors: extra space not normally allocated, used when replacing a bad sector

20 BIOS versus UEFI

• At startup, ROM firmware loads OS from a known location within secondary storage into main memory

Feature BIOS UEFI

Floppy, hard disk, Any, including Storage System CD-ROM PXE boot

Partition Table MBR GPT

Maximum Hard 2.1 TB 9.4 ZB Disk Size

Security None Secure Boot

21 BIOS versus UEFI

https://phoenixts.com/blog/uefi-vs-legacy-bios/ 22 BIOS versus UEFI

http://teck-in.blogspot.com/2013/09/who-invented-uefi.html 23