CS 590 Introduction to Operating Systems

Lecture 11 : File-System Implementation

File-System Structure

• Most files systems are maintained on disks. • Disk I/O is performed in units of blocks. • Disk are convenient for storing files because: – they can be rewritten in place. – data can be accessed directly. File-System Organization

Application Programs

Logical

File-Organization Module

Basic File System

I/O Control

I/O Devices

An Open File Table

Index File Name Permissions Access Disk Dates Address 0 test.c rw-r—r-- … ® 1 mbox rw------… ® 2 test.o rw-r—r-- … ® 3 test rwxr-xr-x … ® 4 msg.txt rw------… ® File-System Mounting

/

/bin /etc /usr

/dev/dk01 mounting

local joe mary

src bin

Allocation Methods

• Storing more than one file on a disk means that we need to allocate space for the file on the disk. • The most common allocation methods are: – Contiguous allocation – Linked Allocation – Indexed Allocation Contiguous Allocation

Block Allocation

There are several different ways that we can determine which collection of blocks to assign when we are writing a file: • First fit • Best fit • Worst fit Linked Allocation

File Allocation Table 0

43 86 catalog … 43 86 mbox … 116 160

… …. … … 110 end-of-file test … 168 160 110 Indexed Allocation

Index: FileA : 1, 2, 3, 7, 1 2 3 4 5 6 7 10, 11 8 9 10 11 12 13 14 FileB: 4, 5, 6, 8, 12, 13 15 16 17 18 19 20 21

28 FileC: 9, 14, 15 22 23 24 25 26 27

Free: 16, 17, 18, 19

Indexed Allocation In UNIX

… … … … … … … … Allocation Method Performance

• The performance of different allocation methods depends on how these file systems are being used. • A system with mostly sequential access has different needs that one with mostly direct access. • Most systems use contiguous allocation for direct access and linked allocation for sequential access

Free-Space Management

Free space can be managed by using: • a bit vector • linked list • grouping • counting Bit Vectors

Block # 15 is in use

001111001111110001100000011100000……

Block # 2 is free

Linked List Management of Free Space Grouping

• A variation of the linked list approach where the first free block contains the addresses of the next n free blocks. • The first n-1 blocks are free; the nth block contains the address of the next n free blocks.

Counting

• We can take advantage of the fact that several contiguous blocks are allocated or freed at the same time. • We will record the address of the first free block and the number of free contiguous free blocks following it. Directory Implementation

• The choice of a directory implementation method has a profound impact on the performance of a file system. • The most obvious choices are a linked list and a hash table.

Linear List

… … Hash Table

Efficiency

• Several issues come up in designing a file system implementation that can have an impact on the efficiency of the . • This include: – disk allocation and directory algorithms – use of clustering – data written in directory entries – file pointer size Disk Caching

RAM

Disk CPU Cache Disk

RAM Disks

• RAM disks are areas of main memory that are organized like disk-based file systems. • RAM disks are accessible by file-system commands but work with the spped of RAM. • Unlike disk caches, RAM disks are user- controlled. Consistency Checking

• Data is stored in cache before being written on disk. • Computer crashes before this data can be written on disk leaves the file system in an inconsistent state. • A consistency checker checks directory data with what is stored on disk to see if there are inconsistencies and if any, tries to fix them.

Backup and Restore

• To avoid losing data in the event of a disk failure, data is archived on other storage media such as magnetic tape. This is called a backup. • Recovering data from a loss is called restoring. Disk Structure

sector track

Disk Structure

Physical view Logical view Access Time

Rotational delay or latency

Seek time

Scheduling A Disk Operation

• When a process issues a system call requesting input/output from disk, it must include: – whether the operation is a read or a write – the disk address – the memory address – the number of being transferred Disk Scheduling Algorithms

• Disk scheduling algorithms have two main considerations: – minimizing seek time – guaranteeing that all disk operations wil be performed. • We will examine several algorithms assuming that process request operations involving the following tracks: – 55, 58, 39, 18, 90, 160, 150, 38, 184

FCFS Scheduling

200

150

100 Track 50

0 Time FCFS Performance Starts at Track 100 Next Track Accessed # of Tracks Traversed 55 45 58 3 39 19 18 21 90 72 160 70 150 10 38 112 184 146 Average Seek Length 55.3

SSTF Scheduling

200

150

100 Track 50

0 Time SSTF Performance Starts at Track 100 Next Track Accessed # of Tracks Traversed 90 10 58 32 55 3 39 16 38 1 18 20 150 132 160 10 184 24 Average Seek Length 27.5

SCAN Scheduling

200 150 100 Track 50 0 1 2 3 4 5 6 7 8 9 Time SCAN Performance Starts at Track 100 Next Track Accessed # of Tracks Traversed 150 50 160 10 184 24 90 94 58 32 55 3 39 16 38 1 18 20 Average Seek Length 27.8

C-SCAN Scheduling

200 150 100 Track 50 0 1 2 3 4 5 6 7 8 9 Time C-SCAN Performance Starts at Track 100 Next Track Accessed # of Tracks Traversed 150 50 160 10 184 24 18 166 38 20 39 1 55 16 58 31 90 32 Average Seek Length 27.8

LOOK Scheduling

• SCAN and C-SCAN scheduling require that the read/write head goes to the last track before reversing direction. • LOOK and C-LOOK scheduling requires that the read/write head only go as far as the last track with a scheduled disk operation. Random Scheduling

200 180 160 140 120 100

Track 80 60 40 20 0 1 2 3 4 5 6 7 8 9 Time

Random Scheduling Performance Starts at Track 100 Next Track Accessed # of Tracks Traversed 55 45 18 10 58 40 90 32 184 94 39 145 160 121 58 102 38 20 Average Seek Length 67.7 Choosing a Disk-Scheduling Algorithm • SSTF is popular because of the short seek length. • SCAN and C-SCAN are better for systems with greater disk activity. • Directory and index block locations should be considered in choosing an algorithm.

Disk Formatting

• Before a disk can be used, it must be formatted, which includes having each track divided into sectors and creating the necessary data structures for reading and writing files. • Low-level formatting creates a series of structures, one for each sector that includes a header, data area and trailer. • High-level formatting involves creating partitions, maps of free and allocated space and a directory. Boot Blocks

Boot Block

FAT

Root Directory

Data Blocks

Bad Blocks

• Like anything else with moving parts, disks can be damaged, usually causing bad blocks. • Bad blocks are handled in different ways: – MS-DOS writes a special value in the FAT entry to indicate that they are damaged. – Some systems use sector sparing, where spare blocks replace them in the disk’s logical map. – Other systems use sector slipping, where everything after the bad block is remapped and the spare block is included at the end. Swap Space

Swap-Space Use

• Swap space can be used for: – Holding an image of the entire process – Storing pages not in physical memory • The amount of swap space needed depends on how it is used. • UNIX and some other operating systems can use more than one swap space on different disks to share the load among them. • It is better to overestimate needed swap space than to underestimate it. Swap-Space Location

• Swap space can be one huge file within the file system or a separate partition. • A large swap file is easy to implement and can take a very long time to accessed. • A separate partition can be accessed very quickly but ties up a huge amount of storage whose size is difficult to change.

Text Page Swap Map in 4.3 BSD

swap map

512K 512K 512K 512K

71K Data Page Swap Map in 4.3 BSD

swap map

16K 32K 64K 128K 256K