11/11/19

Where Are We?

• Covered: COS 318: Operating Systems ● Management of CPU & concurrency ● Management of main memory & virtual memory File Structure ● Management of I/O devices • Currently --- File Systems ● This lecture: File Structure Jaswinder Pal Singh and a Fabulous Course Staff Computer Science Department • Then: Princeton University • Naming and directories (http://www.cs.princeton.edu/courses/cos318/) • Efficiency and performance • Reliability and protection

1 2

The Abstraction File System

, , , … named files, arranged in folders or directories ◆ Naming ● File name and File naming Physical Reality File System Abstraction ◆ File access File access ● Read, write, other operations block oriented byte oriented (char stream) Buffer cache ◆ Buffer cache ● Reduce client/server disk I/Os Disk allocation Management physical sector #’s named files ◆ Disk allocation Disk Drivers

no protection users protected from ◆ Layout, mapping files to blocks each other ◆ Security, protection, reliability, durability

data might be corrupted robust to machine failures ◆ Management tools if machine crashes 4 3 4

1 11/11/19

Topics Typical File Attributes

◆ File system structure • Name ◆ Disk allocation and i-nodes • Type – needed for systems that support different types

◆ Directory and link implementations • Location – pointer to file location on device. ◆ Physical layout for performance • Size – current . • Protection – controls who can read, write, execute • Time, date, and user identification – data for protection, security, and usage monitoring

• Information about files are kept in the directory structure, which is maintained on the disk

2 5 6

Master Boot Record Typical Layout of a Disk Partition

• Starts at first sector of disk ◆ Boot block Boot block ● Code to load and boot OS • End of record lists the partitions on the disk ● Every partition can have a different file system ◆ Super-block defines a file system Superblock ● File system info: type, no of blocks, ... • Upon boot: ● File metadata area File metadata ● BIOS reads in and executes MBR (i-nodes in ) ● Finds active disk partition from MBR ● Information about / ptr to free blocks ● First block of active partition (boot block) is loaded and executed ● Location of descriptor of root directory Directory data ● That loads in the OS from that partition ◆ File metadata ● Each descriptor describes a file • What does partition and file layout on it look like? ◆ Directories File data ● Directory data (directory and file names) ◆ File data ● Data blocks

2 3 7 8

2 11/11/19

File Types – Name, Extension Typical File Operations

• Create File Type Usual extension Function Executable exe, com, bin or ready-to-run machine- • Write none language program Object obj, o complied, machine • Read language, not linked Source code c, p, pas, 177, source code in various • Reposition within file – file seek asm, a languages Batch bat, sh commands to the • Delete command interpreter • Truncate Text txt, doc textual data documents Word processor wp, tex, rrf, etc. various word-processor • Open(Fi) – search the directory structure on disk for formats entry Fi, and move the content of entry to memory. Library lib, a libraries of routines Print or view ps, dvi, gif ASCII or • Close (Fi) – move the content of entry Fi in memory to Archive arc, zip, tar related files grouped directory structure on disk. into one file, sometimes compressed.

9 10

Open A File: Open(fd, name, access) Translating from user to system view

• ◆ Various checking (directory and file name lookup, authenticate) User wants to read 10 bytes from file starting at byte 2? ◆ Copy the file descriptors into the in-memory data structure ● Seek byte 2, fetch the block, read 10 bytes ◆ Create an entry in the open file table (system wide) ◆ Create an entry in PCB • User wants to write 10 bytes to file starting at byte 2? File system ◆ Return user a pointer to “” info ● Seek byte 2, fetch the block, write 10 bytes, write out block Process File control Open-file table descriptors File • Everything inside file system is in whole size blocks block (system-wide) (Metadata) metadata ● Even getc and putc buffers 4096 bytes Directories • From now on, file is collection of blocks. Open file File data . pointer . array .

5 11 13

3 11/11/19

File usage patterns File system design constraints

• How do users access files? • For small files:

● Sequential: bytes read in order ● Small enough blocks for storage efficiency

● “Random”: read/write element out of middle of file ● Files used together should be stored together ● Content-based access: find me next byte starting with “COS318” • How are files used? • For large files:

● Most files are small ● Contiguous allocation for sequential access

● Large files use up most of the disk space ● Efficient lookup for random access ● Most transfers are small • ● Large files account for most of the bytes transferred May not know at file creation whether file will become small or large • Bad news

● Need everything to be efficient

14 15

File system design Data structures for disk management

• A file header for each file (part of the file meta-data) • Data structures ● Disk sectors associated with each file ● Directories: file name -> file metadata

• Store directories as files • A data structure to track free space on disk ● File metadata: used to find file data blocks of the file ● Bit map ● Free map: list of free disk blocks • 1 bit per block (sector) • blocks numbered in cylinder-major order, why? • How do we organize these data structures? ● Linked list

● Others?

• What about allocation for the blocks associated with a file?

16 18

4 11/11/19

Contiguous Allocation Linked Files

◆ Allocate contiguous blocks of storage ◆ File structure (Alto) File header ● Bitmap: find N contiguous 0’s 3 ● File metadata points to 1st block on storage ● Linked list: find a region (size >= N) ● A block points to the next ◆ File metadata ● Last block has a NULL pointer ● First block in file ◆ Pros ● Number of blocks . . . ● Can grow files dynamically ◆ Pros ● File data tracked similarly to free ● Fast sequential access list of blocks ● Easy random access File A File B ● Doesn’t waste space ◆ Cons null ◆ Cons ● External fragmentation ● Random access: bad (what if file C needs 4 blocks) ● Unreliable: losing a block means ● Hard to grow files losing the rest 7 8 19 20

Linked files (cont’d) File Allocation Table (FAT)

• Idea is to keep the linked list metadata foo 217 (pointers) in memory, rather than on disk • Allocation table at beginning of each volume 0 ◆ N entries for N blocks ◆ Want to keep it in memory • File structure (MS-DOS) 217 619 ● A file is a linked list of blocks ● File metadata points to first block of file 399 EOF ● The entry of first block points to next, … • Pros ● Simple 619 399 • Cons ● Random access: still not good ● Wastes space - table for each file expensive to keep in memory FAT Allocation Table 8 21 22

5 11/11/19

DEMOS (Cray-1) Single-level Indexed File

File metadata ◆ Idea • User declares max size size0 ● Try contiguous allocation size1 • File header holds array of pointers ● Allow non-contiguous to disk blocks Disk size9 File header ◆ File structure blocks

● Small file metadata has 10 (base,size) pointers • Pros: ● Big file has 10 indirect pointers ● Can grow up to a limit ◆ Pros & Cons size0 ● Random access is fast size1 ● Can grow ● No external fragmentation

● Fragmentation size9 • Cons: size0 ● Clumsy to grow beyond limit size1 size0 size1 ● Still lots of seeks size9 size9 11 25 26

Single-level indexed files (cont’d) Multi-level Indexed Files

!

outer-index

index table file

27 28

6 11/11/19

Hybrid Multi-level Indexed Files (Unix) Original Unix i-node

◆ 13 Pointers in a header ◆ Mode: file type, protection bits, setuid, setgid bits ● 10 direct pointers data ◆ Link count: no. of directory entries pointing to this file ● 11: 1-level indirect ◆ Uid: uid of the file owner ● 12: 2-level indirect data ◆ Gid: gid of the file owner ● 13: 3-level indirect 1 2 . ◆ File size ◆ Pros & Cons 11 . 12 . data ◆ Times (access, modify, change) ● In favor of small files 13 . ● Can grow . . ● Limit is 16G . . data ◆ 10 pointers to data blocks ● Can have lots of seeking ◆ Single indirect pointer ...... data ◆ Double indirect pointer ◆ Triple indirect pointer 12 13 29 30

Extents Naming Files

◆ An extent is a variable number of blocks Block offset Can name files via: ◆ Main idea length ◆ Index (i-node number): Not easy for users to specify ● A file is a number of extents ● XFS uses 8Kbyte blocks Starting block ◆ Text name: Need to map it to index ● Max extent size is 2M blocks ◆ Icon: Need to map it to index or to text and then to index ◆ Index nodes need to have ● Block offset . . . ● Length ◆ Directories ● Starting block ◆ Table of file name, file index pairs • NTFS, Linux EXT4, … ◆ Map name to file index (where to find the header) ◆ Pros: little metadata, fast seq access, can grow over time, less ◆ A directory is itself stored as a file fragmentation ◆ Cons: external fragmentation still problem

14 15 31 32

7 11/11/19

Naming Tricks Directory Organization Examples

• Bootstrapping: Where do you start looking?

● Root directory ◆ Flat (CP/M)

● inode #2 on the system ● All files are in one directory ● 0 and 1 used for other purposes ◆ Hierarchical (Unix) • Special names: ● /u/cos318/foo

● Root directory: “/” (bootstrap name system for users) ● Directory is stored in a file containing (name, i-node) pairs ● ● Current directory: “.” The name can be either a file or a directory ● Parent directory: “..” ◆ Hierarchical (Windows)

● user’s home directory: “~” ● C:\windows\temp\foo • Using the given names, only need two operations to ● File extensions have meaning (unlike in Unix). Use the navigate the entire name space: extension to indicate whether the entry is a directory

‘name’: move into (change context to) directory “name”

● ls : enumerate all names in current directory (context) 16 33 34

Mapping File Names to i-nodes Linear List

Need to support the following types of operations: ◆ Method /u/jps ● pairs are foo bar … ◆ Create/delete linearly stored in a file veryLongFileName ● Create a file ● Create/delete a directory • Append ◆ Open/close ● Delete a file ● Link/unlink a file • Compact by moving the rest ◆ Rename ◆ Pros

● Rename the directory ● Space efficient ◆ Cons ● Linear search ● Need to deal with fragmentation

17 18 35 36

8 11/11/19

Tree Data Structure Hashing

◆ Method ◆ Method ● Use a hash table to map ● Store a tree data structure such as B-tree FileName to i-node ● Create/delete/search in the tree data structure ● Space for name and metadata foo 1234 ◆ Pros is variable sized bar 1235 ● Good for a large number of files ● Create/delete will trigger space allocation and free ◆ Cons … foobar 4567 ● Inefficient for a small number of files ◆ Pros

● More space … ● Fast searching and relatively simple ● Complex ◆ Cons

● Not as efficient as trees for very large directory (wasting space for the hash table) 19 20 37 38

Number of I/O operations Hard Links Directory A Directory B

◆ I/Os to access a byte of /u/cos318/foo ◆ Approach ● Read the i-node and first data block of “/” ● A link to a file with the same i-node ● Read the i-node and first data block of “u” source target ● Read the i-node and first data block of “cos318” ● i.e. the name points to the same i-node as that of the file being linked to i-node ● Read the i-node and first data block of “foo” Ref=2 ◆ I/Os to write a file ● Delete may or may not remove the target depending on whether it is the last one ● Read the i-node of the directory and the directory file (as (link reference count) above) ● Read or create the i-node of the file ◆ Main issue with hard links? ● Read or create the file itself ● Write back the directory and the file

◆ Too many I/Os to traverse the directory ● Solution is to use Current Working Directory (e.g. ./foo) 21 23 39 40

9 11/11/19

Symbolic Links Original Unix File System Disk Layout

Directory B

Directory A ◆ Approach ◆ Simple disk layout

● A is a pointer to a file ● Block size is sector size (512 bytes) ● i-nodes are on outermost cylinders ● Use a new i-node for the link ln –s source target ● Data blocks are on inner cylinders ● Use linked list for free blocks ● Carries pathname of original file Link ◆ Issues ● Index is large due to small block size ◆ Main issue with symbolic links? ● Fixed max number of files ◆ Performance? ● i-nodes far from data blocks ◆ What if you delete the link? ● i-nodes for directory not close together ● Consecutive blocks of file can be anywhere on disk ◆ What if you delete the original file? ● Poor bandwidth (20Kbytes/sec even for sequential access!) i-node array

23 24 41 42

BSD FFS (Fast File System) FFS Disk Layout

foo ◆ i-nodes are grouped together ◆ Use a larger block size: 4KB or 8KB ● A portion of the i-node array on each cylinder ● Allow large blocks to be chopped into ● In same cylinder group as data for the files fragments, used for small files and pieces at ● 10% reserved disk space, to keep room ends of files ◆ Do you ever read i-nodes without bar ◆ Use bitmap instead of a free list reading any file blocks? ● Try to allocate contiguously ● 4 times more often than reading together ● examples: ls, make ◆ Overcome rotational delays ● Skip sector positioning to avoid the context switch delay ● Read ahead: read next block right after the first i-node subarray

25 26 43 44

10 11/11/19

FFS block groups for better locality What Has FFS Achieved?

Block Group 0 ◆ Performance improvements ● 20-40% of disk bandwidth for large files (10-20x original)

Block Group 1 ● Better small file performance (why?)

F

D r

e

a ◆ We can do better

e t

a

S

B

p Block Group 2

l

o a

c c

k

e

s

Extent based instead of block based B

f

o

i

t

r

p

m

/

a

a

l

/

e s

p

d s

e • Use a pointer and size for all contiguous blocks (XFS, Veritas

n

d

i

n

a o

,

n

d

I

c

i /

r

file system, etc)

, e

c q

/ t

o d r / i e s s e ● i Synchronous metadata writes hurt small file performance / r b to I , n / c o a D e / a ir d g ta d p e , B in a s /z lo es m cks for fil it B e ac Sp Free

D a ta B lo ck s for fil es i Fr n di /b/c ee rectories /a, /d, and Sp ace des Bi Ino tmap 27 45 46

Summary

◆ File system structure

● Boot block, super block, file metadata, file data ◆ File metadata ● Consider efficiency, space and fragmentation ◆ Directories

● Consider the number of files ◆ Links ● Soft vs. hard ◆ Physical layout

● Where to put metadata and data

28 47

11