File Systems
Chapter 4
1
What do we need to know?
• How are files viewed on different OS’s? • What is a file system from the programmer’s viewpoint? – You mostly know this, but we’ll review the main points. • How are file systems put together? – How is the disk laid out for directories? For files? What kind of memory structures are needed? • What do some real file systems look like? – cp/m, ms-dos (fat-12/16/32), ntfs, nfs, ext2, … • What directions are file systems going?
2
1 Long-term Information Storage
1. Must store large amounts of data
2. Information stored must survive the termination of the process using it
3. Multiple processes must be able to access the information concurrently
3
File Naming Issues
• Character Set • Length • Extensions
4
2 File Naming
Typical file extensions.
5
File Structure
There are lots of files types. Here are three: – byte sequence, record sequence, tree
6
3 Sample Files
(a) An executable file (b) An archive 7
File Access • Sequential access – read all bytes/records from the beginning – cannot jump around, could rewind or back up – convenient when medium was mag tape • Random access – bytes/records read in any order – essential for data base systems – read can be … • move file marker (seek), then read or … • read and then move file marker
8
4 File Attributes
Possible file attributes 9
File Operations
1. Create 7. Append 2. Delete 8. Seek 3. Open 9. Get attributes 4. Close 10. Set Attributes 5. Read 11. Rename 6. Write
10
5 An Example Program Using Unix File System Calls (1/2)
11
An Example Program Using File System Calls (2/2)
12
6 Memory-Mapped Files
(a) Segmented process before mapping files into its address space (b) Process after mapping existing file abc into one segment creating new segment for xyz 13
Directories Single-Level Directory Systems
• A single level directory system – contains 4 files – owned by 3 different people, A, B, and C
14
7 Two-level Directory Systems
Letters indicate owners of the directories and files
15
Hierarchical Directory Systems
A hierarchical directory system
16
8 Directory Operations
1. Create 5. Readdir 2. Delete 6. Rename 3. Opendir 7. Link 4. Closedir 8. Unlink
17
File System Implementation
A possible file system layout
18
9 Implementing Files (1)
(a) Contiguous allocation of disk space for 7 files (b) State of the disk after files D and E have been removed
19
Implementing Files (2)
Storing a file as a linked list of disk blocks
20
10 Implementing Files (3)
File Allocation Table (FAT) uses a linked list in memory 21
Implementing Files (4)
Combination of Direct and Indirect Block Pointers Note: This is a simplified version of Unix i-node 22
11 The UNIX V7 File System
A UNIX i-node
23
Implementing Directories (1)
(a) A simple directory fixed size entries disk addresses and attributes in directory entry (b) Directory in which each entry just refers to an i-node 24
12 Implementing Directories (2)
• Two ways of handling long file names in directory – (a) In-line
– (b) In a heap 25
Linking (1)
File system containing a file that is “shared” between two directories 26
13 Links (2)
(a) Situation prior to linking (b) After the link is created (c) After the original owner removes the file
27
Disk Space Management (1)
Block size
• Dark line (left hand scale) gives data rate of a disk • Dotted line (right hand scale) gives disk space efficiency
• All files here are 2KB 28
14 Disk Space Management (2)
(a) Storing the free list on a linked list (b) A bit map 29
File System Checking
• Possible results while running fsck (a) consistent (b) missing block (c) duplicate block in free list (d) duplicate data block 30
15 File System Performance (1)
The block cache data structures
31
File System Writes
• Unix – “Critical Blocks” are written immediately – Data blocks are written periodically or when the block is removed from the block cache • MSDOS – Uses “Write-through cache”. All writes are immediate.
32
16 Read Ahead
• When block N is requested, the file system can issue a read for block N+1 also. • What if the file is not being read sequentially? – Initially assume it is, but monitor disk access and set a flag to non-sequential if needed. This can be used to disable read ahead.
33
File System Performance (2)
• I-nodes placed at the start of the disk • Disk divided into cylinder groups – each with its own blocks and i-nodes 34
17 Log-Structured File Systems
• With CPUs faster, memory larger – disk caches can also be larger – increasing number of read requests can come from cache – thus, most disk accesses will be writes
• LSF Strategy structures entire disk as a log – have all writes initially buffered in memory – periodically write these to the end of the disk log – when file opened, locate i-node, then find blocks
35
Journaling
• What happens when you remove a file? – Remove the directory entry – Release the i-node – Free the disk blocks • What happens if there is a crash after the first or second steps? • How can you minimize the damage?
36
18 The CP/M File System (1)
Memory layout of CP/M 37
The CP/M File System (2)
The CP/M directory entry format
38
19 File Allocation Table (FAT) Partition Layout
• Partion layout: – Boot block – FAT – FAT copy – Root directory • In FAT-12 and FAT-16, preassigned enough space for 256 directory entries – Other directories and files
39
FAT Table Sizes
• FAT-12 – 212 clusters – Cluster Size: 512 Byte to 8KB – Partitions size up to 32MB (4K clusters * 8KB / cluster) – Windows default for volumes < 16MB, such as floppies • FAT-16 – 216 clusters – Cluster Size: 512 Byte to 64KB – Partitions size up to 4GB (64K clusters * 64KB / cluster) • FAT-32 – 228 clusters – Cluster Size: 512 Byte to 32KB – Partitions size in principle up to 8TB, but Windows will only create FAT-32 partitions up to 32GB. • Note that all FAT systems reserve the first two and last sixteen clusters in a partition, so actual partition sizes are slightly smaller than listed above.
40
20 Directory Entries
Original MS-DOS directory entry:
Directory entry used in Windows:
Bytes
41
The Windows 98 File System
An example of how a long name is stored in Windows 98
42
Disk layout in classical UNIX systems
43
The UNIX File System
A UNIX V7 directory entry (old)
44
22 The UNIX V7 File System
A UNIX i-node
45
UNIX File System
Directory entry fields.
Attributes in the i-node
46
23 The UNIX File System
Path Names 47
The UNIX File System
Some important directories found in most UNIX systems
48
24 Pathnames
• Absolute Pathname – Begins at the root directory : / – Contains all sub-directory names separated by slashes – Filename – ~jsterling • ~ is recognized as a short-cut for the absolute path to the home directory. • Relative Pathname – Does not begin with the root directory. – Starts in the current working directory. Sometimes the current working directory is shown explicitly using: . – May move up the file tree using .. to refer to a parent directory.
49
The UNIX File System
The steps in looking up /usr/ast/mbox 50
25 The UNIX File System
• Before linking. • After linking.
(a) Before linking. (b) After linking
Note that hard links Note that soft links (aka symbolic links) • must refer to other files in the same file system. • contain only a pathname. • are not permitted to refer to directories. • result in a “dangling pointer” if the original • are indistinguishable from the original file name. filename is deleted. 51
The UNIX File System
• Separate file systems • After mounting
(a) (b)
(a) Before mounting. (b) After mounting 52
26 UNIX File System (3)
The relation between the file descriptor table, the open file description and the i-node 53
The Linux File System
• Super block: – # of inodes, # of blocks, etc. • Group Descriptor: – # of free i-nodes, # of free blocks, # of directories • Bitmaps: – Each is one block long
54
27 Record Locking in Unix
• Can lock any range of bytes in a file • Can be multiple locks overlapping on a file • Locks can be – Exclusive (write): No other process can have a lock on the range. – Shared (read): Other locks may exist on the same bytes. • A failed lock can be made to block or not (choice of system call). • A process’s locks are released when a) The process terminates b) The process closes the file. Even if the process had it open more than once simultaneously! • Locks are not inherited across fork. • Locks can carry across exec. • Deadlock possibility: – Competing locks could in principle result in deadlock, – but this is prevented in Unix.
55
System Calls for File Management
• s is an error code • fd is a file descriptor • position is a file offset
56
28 The stat System Call
• Mode: includes type and protection • Inode # • Device • Link count • Owner’s ID • Group ID • File Size in bytes • Access Time • Modification Time • Status change Time • Blocksize • Block count (512 byte blocks)
Note that not all fields are stored in the i-nodes, themselves.
From Solaris Man Page 57
System Calls for Directory Management
• s is an error code • dir identifies a directory stream • dirent is a directory entry 58
29 UNIX File System (4)
• A BSD directory with three files • The same directory after the file voluminous
has been removed 59
Unix File Protection
• Divide the world into categories (owner / group / world) and specifying read / write / execute access for each. • Add read/write permissions for the owner/user and the group with: – chmod ug+rw – u for user/owner; g for group; o for other/world. – + means add and – means remove. – r for read; w for write; x for execute. • Adding / removing files requires write permission on their directory! • Setuid – Executable files may have their bit setuid set to indicate that they execute with the permission of their owner. – An example is the program to change the password. Requires write access to the password file.
60
30 NTFS Goals
• File / disk security • Disk quotas • File compression • Encryption
61
NTFS Features
• 64-bit cluster indices • Logging of metadata • Multiple data streams • Unicode-based names • Hard links • Sparse Files • Dynamic bad-cluster remapping
62
31 NTFS Master File Table (MFT)
• Can be anywhere. Boot sector says where • Contains up to 248 records • First 16 records reserved for Metadata, describing – MFT itself – copy of the MFT – logging file – Root directory – Bitmap of free/used blocks – Bootstrap code – Bad blocks – Quotas, etc.
63
NTFS MFT Record
• Describes one file or directory • Has a length of 1KB • Header – Magic number – Sequence number – Reference count – Size – Etc. • Attribute / value pairs – Eg. Name, additional MFT records, if directory then how the entries are stored. • May need more than one to describe a file.
64
32 NTFS MFT Data Attributes
• Data Attribute keeps track of runs of consecutive blocks • There may be holes. • Each entry consist of a header and a sequence of runs. • If file is small, data may be stored in MFT Record
65
NTFS Small Directory
• A small directory is an MFT record containing directory entries. • The entries contain the file’s name and the MFT index of the file (plus some additional flags). • Note: Large directories are stored as B+ trees.
66
33 NTFS File Name Lookup
• The file name above is C:\maria\web.htm • The filename lookup first prepends \?? to the name. • The name “\??\C:” is a symbolic link to an object that points to the root directory of the C: drive • From there the filename lookup is similar to the process in Unix, accept that MFT index numbers replace the i-node numbers.
67
NTFS File Protection
• Access Control Lists – Discretionary (DACL) says who has what permissions – System (SACL) says whose actions get audited.
68
34 Network File System (NFS)
• Goal: to allow file system across a network to appear as one logical whole • Client – Server Model • Server “exports” one or more directories • Client 1 is replacing its /bin directory • Client 1 is also mounting the projects directory as /usr/ast/ work. • Client 2 is mounting the projects directory as /mnt.
69
NFS Protocols
• Mounting – Client sends pathname to server, requesting mount permission. – If the directory is listed as exported then server returns a file handle. – Remote directories might be mounted at boot time or automounted. • Directory and file access – Most Unix system calls supported. – Stateless connection. • Server does not support open • Instead Client sends a lookup which returns a handle. • Each read has to say where to start reading from. • File locks not supported.
70
35 NFS Implementation
• System call layer – Open, read, close, etc. • Virtual File System Layer – Maintain table of open files, using v-nodes – v-node may point to a standard i- node or to an NFS Client r-node. • NFS Client Code – On mount, create r-node for NFS directory. – On open, issue lookup and create r- node. – Transfers in 8KB chunks. – Client does read-ahead. – Client caching used for efficiency • Block discarded after 3 sec for data and 30 sec. for directories. • Writes synced after 30 seconds.
71
Example File Systems CD-ROM File Systems
• ISO9660 – Designed for lowest common denominator in 1988 (i.e., msdos). – Header contains 16 bytes that are up for grabs (e.g., bootstrap block) – Multi-byte numeric field in directories are presented twice, once in big endian and once in little endian. – File names were 8.3 + version • Rock Ridge extension allowed various Unix features, such as – file protections, – longer names, – more time stamps, – arbitrary depth hierarchies • Joliet extension (from Microsoft) added unicode.
72
36