<<

Files

What is a ? ¾ A named collection of related information recorded on secondary storage (e.g., disks)

File attributes ¾ Name, type, location, size, protection, creator, creation time, last- modified-time, … File Systems: Fundamentals File operations ¾ Create, , , , Seek, Delete, …

How does the OS allow users to use files? ¾ “Open” a file before use ¾ OS maintains an open file table per process, a is an index into this file. ¾ Allow sharing by maintaining a system-wide open file table

1 2 Fundamental Duality of File Systems Block vs. Sector

Metadata The may choose to use a larger ¾ The index node () is the fundamental data structure block size than the sector size of the physical disk. ¾ The superblock also has important metadata, like block Each block consists of consecutive sectors. Why? size ¾ A larger bloc k s ize increases the trans fer e ffic iency (w hy ?) Data ¾ It can be convenient to have block size match (a multiple of) ¾ The contents that users actually care about the machine's page size (why?) Files ¾ Contain data and have metadata like creation time, length, etc. Some systems allow transferring of many sectors Directories between interrupts. ¾ Map file names to inode numbers Some systems interrupt after each sector operation (rare these days) ¾ “consecutive” sectors may mean “every other physical sector” to allow time for CPU to the next transfer before the head moves over the desired sector

3 4 File System Functionality and Implementation File System Properties

File system functionality: Most files are small. ¾ Pick the blocks that constitute a file. ¾ Need strong support for small files. ™ Must balance locality with expandability. ¾ Block size can’t be too big. ™ MtMust manage f ree space. Some files are very large. ¾ Provide file naming organization, such as a hierarchical ¾ Must allow large files (64-bit file offsets). name space. ¾ Large file access should be reasonably efficient. Most systems fit the following profile: File system implementation: 1. Most files are small ¾ File header (descriptor, inode): owner id, size, last modified time, and location of all data blocks. 2. Most disk space is taken up by large files. ™ OS should be able to metadata block number N without a 3. I/O operations target both small and large files. disk access (e.g., by using math or cached data structure). --> The per-file cost must be low, but large files must also have ¾ Data blocks. good performance. ™ Directory data blocks (human readable names, permissions) ™ File data blocks (data). ¾ Superblocks, group descriptors, other metadata…

5 6 How do we find and organize files on the disk?

The information that we need: If my file system only has lots of big video files what file header points to data blocks block size do I want? fileID 0, Block 0 --> Disk block 19 fileID 0, Block 1 --> Disk block 4,528 … 1. Large Key performance issues: 2. Small 1. We need to support sequential and random access. 2. What is the right data structure in which to maintain file location information? 3. How do we lay out the files on the physical disk?

7 8 File Allocation Methods File Allocation Methods Contiguous allocation Linked allocation

I I

File header specifies starting block & length ‹ Files stored as a linked list of blocks Placement/Allocation policies ‹ File header contains a pointer to the first and last file ¾ First-fit, best-fit, ... blocks

‹ Pluses ‹ Minuses Pluses ‹ Minuses ¾ Best file read ¾ Easy to create, grow & shrink files ¾ Impossible to do true ¾ Fragmentation! random access performance ¾ No external fragmentation ¾ Problems with file growth ¾ Reliability ¾ Efficient sequential & ™ Pre-allocation? random access ™ Break one link in the chain ™ On-demand allocation? and...

9 10 File Allocation Methods File Allocation Methods Linked allocation – (FAT) (Win9x, OS2) Direct allocation

Maintain linked list in a separate table ¾ A table entry for each block on disk I ¾ Each table entry in a file has a pointer to the next entry in that file (with a special “eof” marker) ¾ A “0” in the table entry Î free block

File header points to each data block Comparison with linked allocation ¾ If FAT is cached Î better sequential and random access performance ™ How much memory is needed to entire FAT? ‹ 400GB disk , 4KB/block Î 100M entries in FAT Î 400MB ‹ Pluses ‹ MiM nuses ™ Solution approaches ¾ Easy to create, grow & ¾ Inode is big or variable size ‹ Allocate larger clusters of storage space shrink files ¾ How to handle large files? ‹ Allocate different parts of the file near each other Î better locality ¾ Little fragmentation for FAT ¾ Supports direct access

11 12 File Allocation Methods Indexed Allocation Indexed allocation Handling large files

I IB Linked index blocks (IB+IB+…)

I IB IB IB

Create a non-data block for each file called the index block ¾ A list of pointers to file blocks File header contains the index block Multilevel index blocks (IB*IB*…)

‹ Pluses ‹ MiM nuses ¾ Easy to create, grow & ¾ Overhead of storing index IB IB IB IB shrink files when files are small I ¾ Little fragmentation ¾ How to handle large files? ¾ Supports direct access

13 14 Multi-level Indirection in

File header contains 13 pointers Why bother with index blocks? ¾ 10 pointes to data blocks; 11th pointer Æ indirect block; 12th pointer th ¾ A. Allows greater . Æ doubly-indirect block; and 13 pointer Æ triply-indirect block ¾ ¾ BFB. Fast er t o creat e fil es. Implications ¾ C. Simpler to grow files. ¾ Upper limit on file size (~2 TB) ¾ D. Simpler to prepend and append to files. ¾ Blocks are allocated dynamically (allocate indirect blocks only for ¾ E. Scott Summers is the X-men’s Cyclops large files) Features ¾ Pros ™ Simple ™ Files can easily expand ™ Small files are cheap ¾ Cons ™ Large files require a lot of seek to access indirect blocks

15 16 Indexed Allocation in UNIX Multilevel, indirection, index blocks 10 Data Blocks 1st Level Inode Indirection Block n How big is an inode? Data ¾ A. 1 byte Blocks ¾ B16btB. 16 bytes n2 ¾ C. 128 bytes IB Data ¾ D. 1 KB nd IB Blocks 2 Level ¾ E. 16 KB Indirection Block IB IB n3 Data Blocks

IB IB IB IB 3rd Level Indirection Block IB IB IB IB

17 18 Allocate from a free list Free list representation

Represent the list of free blocks as a bit vector: Need a data block 111111111111111001110101011101111... ¾ Consult list of free data blocks ¾ If bit i = 0 then block i is free, if i = 1 then it is allocated Need an inode ¾ Consult a list of free Simple to use and vector is compact: 1TB disk with 4KB blocks is 2^28 bits or 32 MB

Why do inodes have their own free list? If free sectors are uniformly distributed across the disk then ¾ A. Because they are fixed size the expected number of bits that must be scanned before ¾ B. Because they exist at fixed locations finding a “0” is n/r ¾ C. Because there are a fixed number of them where n = total number of blocks on the disk, r = number of free blocks

If a disk is 90% full, then the average number of bits to be scanned is 10, independent of the size of the disk

19 20 Other Free List Representations Naming and Directories

In-situ linked lists Files are organized in directories ¾ Directories are themselves files D ¾ Contain table Only OS can modify a directory ¾ Ensure integrity of the mapping Grouped lists ¾ Application programs can read directory (e.g., ) Directory operations: D G ¾ List contents of a directory Next ¾ Search (find a file) group ™ Linear search block ™ Binary search ™ Hash table ¾ Create a file ¾ Delete a file Allocated block Empty block

21 22 Directory Hierarchy and Traversal

Every directory has an inode Directories are often organized in a hierarchy ¾ A. True Directory traversal: ¾ BFlB. False ¾ How do you find blocks of a file? Let’s start at the bottom ™ Find file header (inode) – it contains pointers to file blocks Given only the inode number (inumber) the OS can ™ To find file header (inode), we need its I-number ™ To find I-number, read the directory that contains the file find the inode on disk ™ But wait, the directory itself is a file ¾ A. True ™ Recursion !! ¾ Example: Read file /A/B/C ¾ B. False ™ C is a file ™ B/ is a direc tory tha t cont ai ns th e I -numbffilCber for file C ™ A/ is a directory that contains the I-number for file B ™ How do you find I-number for A? ‹ “/” is a directory that contains the I-number for file A ‹ What is the I-number for “/”? In Unix, it is 2

23 24 Directory Traversal (Cont’d.) Naming and Directories

Once you have the file header, you can access all blocks within How many disk accesses are needed to access file /A/B/C? a file 1. Read I-node for “/” (root) from a fixed location ¾ How to find the file header? Inode number + layout. 2. Read the first data block for root 3. Read the I-node for A Where are file headers stored on disk? 4. Read the first data block of A ¾ In early Unix: 5. Read the I-node for B ™ Special reserved array of sectors 6. Read the first data block of B ™ Files are referred to with an index into the array (I-node number) 7. Read I-node for C ™ Limitations: (1) Header is not near data; (2) fixed size of array Æ fixed 8. Read the first data block of C number of files on disk (determined at the time of formatting the disk) ¾ Berkeley fast file system (FFS): ™ Distribute file header array across cylinders. ‹ Optimization: ¾ (): ¾ Maintain the notion of a current (CWD) ™ Put inodes in block group header. ¾ Users can now specify relative file names ¾ OS can cache the data blocks of CWD How do we find the I-node number for a file? ¾ Solution: directories and name lookup

25 26 A corrupt directory can make a file system useless ¾ A. True ¾ BFlB. False

27