Storage Systems Main Points
Total Page:16
File Type:pdf, Size:1020Kb
Storage Systems Main Points • File systems – Useful abstractions on top of physical devices • Storage hardware characteristics – Magnetic disks and flash memory File Systems • Abstraction on top of persistent storage – Magnetic disk – Flash memory (e.g., USB thumb drive) • Devices provide – Storage that (usually) survives across machine crashes – Block level (random) access – Large capacity at low cost – Relatively slow performance • Magnetic disk read takes 10-20M processor instructions File System as Illusionist: Hide Limitations of Physical Storage • Persistence of data stored in file system: – Even if crash happens during an update – Even if disk block becomes corrupted – Even if flash memory wears out • Naming: – Named data instead of disk block numbers – Directories instead of flat storage – Byte addressable data even though devices are block- oriented • Performance: – Cached data – Data placement and data structure organization • Controlled access to shared data File System Abstraction • File system – Hierarchical organization (directories, subdirectories) – Access control on data • File: named collection of data – Linear sequence of bytes (or a set of sequences) – Read/write or memory mapped • Crash and storage error tolerance – Operating system crashes (and disk errors) leave file system in a valid state • Performance – Achieve close to the hardware limit* in the average case File System Abstraction • Directory – Group of named files or subdirectories – Mapping from file name to file metadata location • Path – String that uniquely identifies file or directory – Ex: /cse/www/education/courses/cse451/12au • Links – Hard link: link from name to metadata location – Soft link: link from name to alternate name • Mount – Mapping from name in one file system to root of another File System: tree structure Root directory sys user dev ker Mario Paolo Emma prog dati dati txt sor bin File System: acyclic graph structure Root directory sys user dev ker Mario Paolo Emma prog dati Dati di Mario txt1 txt2 sor bin f --> Link to file f Data structures: Directories • Each file (and directory) belongs to a directory; • Each directory is a data structure that links file names to file attributes • Example of attributes: file size, address on disk, access rights, time of last access, time of creation, … A possible implementation (FAT file systems): • The directory is a table, it associates each file name to its file descriptor (which includes all attributes of the file) mario name descriptor prog.c Attribute: tipo, indirizzi, ecc. prog.exe Attribute: tipo, indirizzi, ecc. prog.c prog.exe dati dati Attribute: tipo, indirizzi, ecc. Implementation od directory mario Data structures: Directories Implementation of directories in Unix: – The directory is a table that includes the references to the file descriptors (i-nodes), which are stored in a separate data structure in the disk Attributes of prog.c mario name prog.c prog.exe Attributes of data prog.c prog.exe data data Implementation Attributes of prog.exe of directory mario Table (i-list) of file descriptors (i-nodes) Access to files • File access operations: • Read a logical record from a file. • Write logical records on a file To execute these operations on a file it is necessary to retrieve the attributes of the file: The addresses on disk of the logical records Access rights etc…….. Too expensive to read these data from the disk at each access: • they are read once for all before accessing the file, by means of the open system call • the close system call is then necessary to deallocate such information from memory Access methods • File access can be: – Sequential – Direct The access method: • independent of the physical device • independent of the allocation method of the files on the device Sequential access • The file is a sequence of logical records [R1, R2,.. RN] : • To access each logical record Ri it is necessary to access first all the previous (i-1) records: R1 R2 Ri-1 Ri • Access operations: • readnext: read next logical record • writenext: write next logical record • Each operation (read/write) positions an access pointer on the next logical record Direct Access • The file is a set {R1, R2,.. RN} of logical records • The user can directly access each logical record • Access operations: • read(f, i, &V): read logical record i of file f ; the read data is stored in buffer V; • write(f, i, &V): write the content of buffer V on the logical record i of file f . UNIX File System API • create, link, unlink, createdir, rmdir – Create file, link to file, remove link – Create directory, remove directory • open, close, read, write, seek – Open/close a file for reading/writing – Seek resets current position • fsync – File modifications can be cached – fsync forces modifications to disk (like a memory barrier) File System Interface • UNIX file open is a Swiss Army knife: – Open the file, return file descriptor – Options: • if file doesn’t exist, return an error • If file doesn’t exist, create file and open it • If file does exist, return an error • If file does exist, open file • If file exists but isn’t empty, nix it then open • If file exists but isn’t empty, return an error • … Interface Design Question • Why not separate syscalls for open/create/exists? – Would be more modular! – But rather boring as well… if (!exists(name)) create(name); // can create fail? fd = open(name); // does the file exist? Device Access • A software stack that provides ways to access a wide range of I/O devices • A common interface – In posix equivalent to file access – In terms of open/close, read/write system calls • Device drivers for each specific device – Upper part HW-independent – Lower part HW-dependent Device Access Application User-level Library File system Device access in Device driver the system Kernel Memory mapped I/O; Hardware DMA; Interrupts Device Access • The file system: – Defines a name space for devices – Implements caching policies – It is HW-independent • The device driver: – Interfaces with the hardware (HW-dependent) – Manages synchronization with the device – Manages data transfer to/from device/process – Manages device failures Memory mapped I/O • The internal registers CPU CPU of the devices are mapped to memory Memory bus locations: – Read/write ops. to CPU MEMORY those location are redirected to the I/O bus device controller Disk Keyboard Audio controller controller controller A simple device controller commands signals Control register Status register device CPU status Data register data data controller bus A simple device controller Interrupt enable bit Start bit i s Control register Error condition bit Flag bit e f Status register Direct Memory Access (DMA) • Many devices transfer data in bulk – Disks, printers, … • Data transfer operated by the CPU at byte-level is inefficient • DMA can manage bulk data transfer without the CPU – The device driver programs the DMA – Once the transfer is complete the DMA issues an interrupt – The interrupt resumes the device driver that concludes the I/O Direct Memory Access (DMA) buffer Pointer buffer cont Data register Counter cont Device CPU Memory DMA Controller bus Storage Devices • Magnetic disks – Storage that rarely becomes corrupted – Large capacity at low cost – Block level random access – Slow performance for random access – Better performance for streaming access • Flash memory – Storage that rarely becomes corrupted – Capacity at intermediate cost (2 x disk) – Block level random access – Good performance for reads; worse for random writes Magnetic Disk Disk Tracks • ~ 1 micron wide – Wavelength of light is ~ 0.5 micron – Resolution of human eye: 50 microns – 100K on a typical 2.5” disk • Separated by unused guard regions – Reduces likelihood neighboring tracks are corrupted during writes (still a small non-zero chance) • Track length varies across disk – Outside: More sectors per track, higher bandwidth – Disk is organized into regions of tracks with same # of sectors/track – Only outer half of radius is used • Most of the disk area in the outer regions of the disk Sectors Sectors contain sophisticated error correcting codes – Disk head magnet has a field wider than track – Hide corruptions due to neighboring track writes • Sector sparing – Remap bad sectors transparently to spare sectors on the same surface • Slip sparing – Remap all sectors (when there is a bad sector) to preserve sequential behavior • Track skewing – Sector numbers offset from one track to the next, to allow for disk head movement for sequential ops Sectors & blocks • At low level the controller accesses individual sectors • Typical sector size is 256/512 bytes • Identified by a triple: <#cylinder, #face, #sector> • The disk driver group a set of contiguous sectors in a block – Typical block size is 2/4/8 Kbytes – identified by a pointer on a contiguous address space (from 0 to max-blocks) Sectors & blocks Given a sector number , and a triple < >: • is the number of faces in the disk • is the number of sectors per trace Consequently: Disk Performance Disk Latency = Seek Time + Rotation Time + Transfer Time Seek Time: time to move disk arm over track (1-20ms) Fine-grained position adjustment necessary for head to “settle” Head switch time ~ track switch time (on modern disks) Rotation Time: time to wait for disk to rotate under disk head Disk rotation: 4 – 15ms (depending on price of disk) Transfer Time: time to transfer data onto/off of disk Disk head transfer rate: 50-100MB/s (5-10 usec/sector) Host transfer rate dependent on I/O connector (USB, SATA,