Storage Systems Main Points

Storage Systems Main Points

Storage Systems Main Points • File systems – Useful abstractions on top of physical devices • Storage hardware characteristics – Magnetic disks and flash memory File Systems • Abstraction on top of persistent storage – Magnetic disk – Flash memory (e.g., USB thumb drive) • Devices provide – Storage that (usually) survives across machine crashes – Block level (random) access – Large capacity at low cost – Relatively slow performance • Magnetic disk read takes 10-20M processor instructions File System as Illusionist: Hide Limitations of Physical Storage • Persistence of data stored in file system: – Even if crash happens during an update – Even if disk block becomes corrupted – Even if flash memory wears out • Naming: – Named data instead of disk block numbers – Directories instead of flat storage – Byte addressable data even though devices are block- oriented • Performance: – Cached data – Data placement and data structure organization • Controlled access to shared data File System Abstraction • File system – Hierarchical organization (directories, subdirectories) – Access control on data • File: named collection of data – Linear sequence of bytes (or a set of sequences) – Read/write or memory mapped • Crash and storage error tolerance – Operating system crashes (and disk errors) leave file system in a valid state • Performance – Achieve close to the hardware limit* in the average case File System Abstraction • Directory – Group of named files or subdirectories – Mapping from file name to file metadata location • Path – String that uniquely identifies file or directory – Ex: /cse/www/education/courses/cse451/12au • Links – Hard link: link from name to metadata location – Soft link: link from name to alternate name • Mount – Mapping from name in one file system to root of another File System: tree structure Root directory sys user dev ker Mario Paolo Emma prog dati dati txt sor bin File System: acyclic graph structure Root directory sys user dev ker Mario Paolo Emma prog dati Dati di Mario txt1 txt2 sor bin f --> Link to file f Data structures: Directories • Each file (and directory) belongs to a directory; • Each directory is a data structure that links file names to file attributes • Example of attributes: file size, address on disk, access rights, time of last access, time of creation, … A possible implementation (FAT file systems): • The directory is a table, it associates each file name to its file descriptor (which includes all attributes of the file) mario name descriptor prog.c Attribute: tipo, indirizzi, ecc. prog.exe Attribute: tipo, indirizzi, ecc. prog.c prog.exe dati dati Attribute: tipo, indirizzi, ecc. Implementation od directory mario Data structures: Directories Implementation of directories in Unix: – The directory is a table that includes the references to the file descriptors (i-nodes), which are stored in a separate data structure in the disk Attributes of prog.c mario name prog.c prog.exe Attributes of data prog.c prog.exe data data Implementation Attributes of prog.exe of directory mario Table (i-list) of file descriptors (i-nodes) Access to files • File access operations: • Read a logical record from a file. • Write logical records on a file To execute these operations on a file it is necessary to retrieve the attributes of the file: The addresses on disk of the logical records Access rights etc…….. Too expensive to read these data from the disk at each access: • they are read once for all before accessing the file, by means of the open system call • the close system call is then necessary to deallocate such information from memory Access methods • File access can be: – Sequential – Direct The access method: • independent of the physical device • independent of the allocation method of the files on the device Sequential access • The file is a sequence of logical records [R1, R2,.. RN] : • To access each logical record Ri it is necessary to access first all the previous (i-1) records: R1 R2 Ri-1 Ri • Access operations: • readnext: read next logical record • writenext: write next logical record • Each operation (read/write) positions an access pointer on the next logical record Direct Access • The file is a set {R1, R2,.. RN} of logical records • The user can directly access each logical record • Access operations: • read(f, i, &V): read logical record i of file f ; the read data is stored in buffer V; • write(f, i, &V): write the content of buffer V on the logical record i of file f . UNIX File System API • create, link, unlink, createdir, rmdir – Create file, link to file, remove link – Create directory, remove directory • open, close, read, write, seek – Open/close a file for reading/writing – Seek resets current position • fsync – File modifications can be cached – fsync forces modifications to disk (like a memory barrier) File System Interface • UNIX file open is a Swiss Army knife: – Open the file, return file descriptor – Options: • if file doesn’t exist, return an error • If file doesn’t exist, create file and open it • If file does exist, return an error • If file does exist, open file • If file exists but isn’t empty, nix it then open • If file exists but isn’t empty, return an error • … Interface Design Question • Why not separate syscalls for open/create/exists? – Would be more modular! – But rather boring as well… if (!exists(name)) create(name); // can create fail? fd = open(name); // does the file exist? Device Access • A software stack that provides ways to access a wide range of I/O devices • A common interface – In posix equivalent to file access – In terms of open/close, read/write system calls • Device drivers for each specific device – Upper part HW-independent – Lower part HW-dependent Device Access Application User-level Library File system Device access in Device driver the system Kernel Memory mapped I/O; Hardware DMA; Interrupts Device Access • The file system: – Defines a name space for devices – Implements caching policies – It is HW-independent • The device driver: – Interfaces with the hardware (HW-dependent) – Manages synchronization with the device – Manages data transfer to/from device/process – Manages device failures Memory mapped I/O • The internal registers CPU CPU of the devices are mapped to memory Memory bus locations: – Read/write ops. to CPU MEMORY those location are redirected to the I/O bus device controller Disk Keyboard Audio controller controller controller A simple device controller commands signals Control register Status register device CPU status Data register data data controller bus A simple device controller Interrupt enable bit Start bit i s Control register Error condition bit Flag bit e f Status register Direct Memory Access (DMA) • Many devices transfer data in bulk – Disks, printers, … • Data transfer operated by the CPU at byte-level is inefficient • DMA can manage bulk data transfer without the CPU – The device driver programs the DMA – Once the transfer is complete the DMA issues an interrupt – The interrupt resumes the device driver that concludes the I/O Direct Memory Access (DMA) buffer Pointer buffer cont Data register Counter cont Device CPU Memory DMA Controller bus Storage Devices • Magnetic disks – Storage that rarely becomes corrupted – Large capacity at low cost – Block level random access – Slow performance for random access – Better performance for streaming access • Flash memory – Storage that rarely becomes corrupted – Capacity at intermediate cost (2 x disk) – Block level random access – Good performance for reads; worse for random writes Magnetic Disk Disk Tracks • ~ 1 micron wide – Wavelength of light is ~ 0.5 micron – Resolution of human eye: 50 microns – 100K on a typical 2.5” disk • Separated by unused guard regions – Reduces likelihood neighboring tracks are corrupted during writes (still a small non-zero chance) • Track length varies across disk – Outside: More sectors per track, higher bandwidth – Disk is organized into regions of tracks with same # of sectors/track – Only outer half of radius is used • Most of the disk area in the outer regions of the disk Sectors Sectors contain sophisticated error correcting codes – Disk head magnet has a field wider than track – Hide corruptions due to neighboring track writes • Sector sparing – Remap bad sectors transparently to spare sectors on the same surface • Slip sparing – Remap all sectors (when there is a bad sector) to preserve sequential behavior • Track skewing – Sector numbers offset from one track to the next, to allow for disk head movement for sequential ops Sectors & blocks • At low level the controller accesses individual sectors • Typical sector size is 256/512 bytes • Identified by a triple: <#cylinder, #face, #sector> • The disk driver group a set of contiguous sectors in a block – Typical block size is 2/4/8 Kbytes – identified by a pointer on a contiguous address space (from 0 to max-blocks) Sectors & blocks Given a sector number , and a triple < >: • is the number of faces in the disk • is the number of sectors per trace Consequently: Disk Performance Disk Latency = Seek Time + Rotation Time + Transfer Time Seek Time: time to move disk arm over track (1-20ms) Fine-grained position adjustment necessary for head to “settle” Head switch time ~ track switch time (on modern disks) Rotation Time: time to wait for disk to rotate under disk head Disk rotation: 4 – 15ms (depending on price of disk) Transfer Time: time to transfer data onto/off of disk Disk head transfer rate: 50-100MB/s (5-10 usec/sector) Host transfer rate dependent on I/O connector (USB, SATA,

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    68 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us