<<

File Systems Main Points

• File layout • layout • Reliability/durability Named in a

index !le name directory !le number structure storage o"set o"set block Last Time: File System Design Constraints • For small files: – Small blocks for storage efficiency – Files used together should be stored together • For large files: – Conguous allocaon for sequenal access – Efficient lookup for random access • May not know at file creaon – Whether file will become small or large File System Design Opons

FAT FFS NTFS Index Linked list Tree structure (fixed, asym) (dynamic) granularity block block free space FAT array Bitmap Bitmap allocaon (fixed (file) locaon) Locality defragmentaon Block groups Extents + reserve Best fit space defrag Microso File Allocaon Table (FAT)

• Linked list index structure – Simple, easy to implement – Sll widely used (e.g., thumb drives) • File table: – Linear map of all blocks on disk – Each file a linked list of blocks FAT MFT Data Blocks 0 1 2 3 !le 9 block 3 4 5 6 7 8 9 !le 9 block 0 10 !le 9 block 1 11 !le 9 block 2 12 !le 12 block 0 13 14 15 16 !le 12 block 1 17 18 !le 9 block 4 19 20 FAT

• Pros: – Easy to find free block – Easy to append to a file – Easy to delete a file • Cons: – Small file access is slow – Random access is very slow – Fragmentaon • File blocks for a given file may be scaered • Files in the same directory may be scaered • Problem becomes worse as disk fills Berkeley FFS (Fast File System)

table – Analogous to FAT table • inode – Metadata • File owner, access permissions, access mes, … – Set of 12 data pointers – With 4KB blocks => max size of 48KB files FFS inode

• Metadata – File owner, access permissions, access mes, … • Set of 12 data pointers – With 4KB blocks => max size of 48KB files • Indirect block pointer – pointer to disk block of data pointers • Indirect block: 1K data blocks => 4MB (+48KB) FFS inode

• Metadata – File owner, access permissions, access mes, … • Set of 12 data pointers – With 4KB blocks => max size of 48KB • Indirect block pointer – pointer to disk block of data pointers – 4KB block size => 1K data blocks => 4MB • Doubly indirect block pointer – Doubly indirect block => 1K indirect blocks – 4GB (+ 4MB + 48KB) FFS inode

• Metadata – File owner, access permissions, access mes, … • Set of 12 data pointers – With 4KB blocks => max size of 48KB • Indirect block pointer – pointer to disk block of data pointers – 4KB block size => 1K data blocks => 4MB • Doubly indirect block pointer – Doubly indirect block => 1K indirect blocks – 4GB (+ 4MB + 48KB) • Triply indirect block pointer – Triply indirect block => 1K doubly indirect blocks – 4TB (+ 4GB + 4MB + 48KB) Inode Array Triple Double Indirect Indirect Indirect Data Inode Blocks Blocks Blocks Blocks

File Metadata ......

Direct ... Pointers ...... Indirect Pointer Dbl. Indirect Ptr. Tripl. Indrect Ptr...... FFS Asymmetric Tree

• Small files: shallow tree – Efficient storage for small files • Large files: deep tree – Efficient lookup for random access in large files FFS Locality

• Block group allocaon – Block group is a set of nearby cylinders – Files in same directory located in same group – Subdirectories located in different block groups • inode table spread throughout disk – , bitmap near file blocks • First fit allocaon – Small files fragmented, large files conguous Block Group 0

Block Group 1

s

e

d

o

n

Block Group 2 I

p

/

a

/

d

n

p a

a ,

c

c

/

/ m D

t

b

,

i

a

/

q t B

/ F a d

e r d n e B / c

a l e a o s ,

S e p c i d p k r S /

, a s o e t a c f c e / e o e r r r F s B ! i e s i d i e t le r m in o d s D s t a a le o in ta ! c n p Blocks for e I d ir ir e d ct n o i rie s s / le b, / ! I a/g, /z r no fo des ks loc a B ap Dat m ce Spa Free FFS First Fit Block Allocaon

In-Use Free Start of Block Block Block ... Group FFS First Fit Block Allocaon

Start of Write Two Block File Block ... Group FFS First Fit Block Allocaon

Start of Write Large File Block ... Group FFS

• Pros – Efficient storage for both small and large files – Locality for both small and large files – Locality for metadata and data • Cons – Inefficient for ny files (a 1 file requires both an inode and a data block) – Inefficient encoding when file is mostly conguous on disk (no equivalent to superpages) – Need to reserve 10-20% of free space to prevent fragmentaon NTFS

• Master File Table – Flexible 1KB storage for metadata and data • Extents – Block pointers cover runs of blocks – Similar approach in () – File create can provide hint as to size of file • Journalling for reliability – Discussed next me NTFS Small File

Master File Table

MFT Record (small !le) Std. Info. File Name Data (resident) (free) NTFS Medium File

Start Master File Table Length + Data Extent Data MFT Record Start + Length Std. Info. File Name Data (nonresident) (free)

Start Length + Data Extent Data

Start + Length NTFS Indirect Block

Master File Table MFT Record (part 1) name Std. Info. Attr. List name File Name File Name (free) data

MFT Record (part 2) Std. Info. Data (nonresident) (free) ...

+ + +

Data Extent Data ... Data Extent Data Data Extent Data NTFS Mulple Indirect Blocks Master File Table MFT Record (huge/badly-fragmented !le) Std. Info. Attr. List (nonresident) ...... Extent with part of attribute list

Data (nonresident) ...

Data (nonresident) ...

Data (nonresident) ...

... Extent with part of attribute list

Data (nonresident) ...

Data (nonresident) ...

... Extent with part of attribute list

Data (nonresident) ...

Data (nonresident) ... Named Data in a File System

index !le name directory !le number structure storage o"set o"set block Directories

!le 5268830 end of “/home/tom” !le . .. Name Music Work Free foo.txt Free File Number 5268830 88026158 35002320 85200219 Space 66212871 Space Next Directories

• Directories can be files – Map file name to file number (MFT #, inode num) • Table of file name -> file number – Small directories: linear search

!le 5268830 end of “/home/tom” !le . .. Name Music Work Free foo.txt Free File Number 5268830 88026158 35002320 85200219 Space 66212871 Space Next Large Directories: B-Trees

Search for hash(”out2”) = 0x0000c194 B+Tree Root Before 00ad1102 b0bf8201 ... c"1a412 Child Pointer

B+Tree Node B+Tree Node B+Tree Node Before 0000c195 00018201 ...... Child Pointer

B+Tree Leaf B+Tree Leaf B+Tree Leaf Hash 0000a0d1 0000b971 ... 0000c194 ... Entry Pointer

Name . .. !le1 !le2 ... !le9841 out1 out2 ... out16341 File Number 36210429 983211 239341 231121 ... 243212 841013 841014 ... 324114 “out2” is !le 841014 File System Reliability

• What can happen if disk loses power or machine soware crashes? – Some operaons in progress may complete – Some operaons in progress may be lost – Overwrite of a block may only parally complete • File system wants durability (as a minimum!) – Data previously stored can be retrieved (maybe aer some recovery step), regardless of failure Storage Reliability Problem

• File operaons oen involve updates to mulple disk blocks – inode, indirect block, data block, bitmap, … • At a physical level, operaons complete one at a me – Hard to guarantee that they will all complete • Many disk devices have an extra capacitor to allow writes on current track to complete FAT Reconsidered

MFT Data Blocks • To append to a 0 1 2 file: 3 !le 9 block 3 4 – Add data block 5 6 7 – Add pointer to 8 9 !le 9 block 0 data block 10 !le 9 block 1 11 !le 9 block 2 – Update access 12 !le 12 block 0 13 14 me for entry 15 16 !le 12 block 1 at head of file 17 18 !le 9 block 4 19 20 Reliability Approach #1

• Sequence operaons in a specific order • Careful design to allow sequence to be interrupted safely – Write data block before updang FAT entry • Requires post-crash recovery – Write file before updang directory – May leave file created, without being in any directory