11/8/17

Where Are We?

• Covered: COS 318: Operating Systems Management of CPU & concurrency Management of main memory & virtual memory Management of I/O devices File Structure • Currently --- File Systems This lecture: File Structure Jaswinder Pal Singh Computer Science Department • Then: Princeton University • Naming and directories (http://www.cs.princeton.edu/courses/cos318/) • Efficiency and performance • Reliability and protection

The Abstraction File System

, , , … named files, arranged in folders or directories ◆ Naming File name and File naming Physical Reality File System Abstraction ◆ File access File access Read, write, other operations block oriented byte oriented (char stream) Buffer cache ◆ Buffer cache Reduce client/server disk I/Os Management physical sector #’s named files Disk allocation ◆ Disk allocation Disk Drivers no protection users protected from ◆ Layout, mapping files to blocks each other ◆ Security, protection, reliability, durability

data might be corrupted robust to machine failures ◆ Management tools if machine crashes 4

1 11/8/17

Topics Typical File Attributes

◆ File system structure • Name ◆ Disk allocation and i-nodes • Type – needed for systems that support different types ◆ Directory and link implementations • Location – pointer to file location on device. ◆ Physical layout for performance • Size – current . • Protection – controls who can read, write, execute • Time, date, and user identification – data for protection, security, and usage monitoring

• Information about files are kept in the directory structure, which is maintained on the disk

2

Typical Layout of a Disk Partition File Types – Name, Extension

◆ Boot block Boot block Code to load and boot OS File Type Usual extension Function Executable exe, com, bin or ready-to-run machine- ◆ Super-block defines a file system Superblock none language program File system info: type, no of blocks, ... Object obj, o complied, machine language, not linked File metadata File metadata area Source code c, p, pas, 177, source code in various (i-nodes in ) Information about / ptr to free blocks asm, a languages Batch bat, sh commands to the Location of descriptor of root directory Directory data command interpreter ◆ File metadata Text txt, doc textual data doc uments Each descriptor describes a file Word processor wp, tex, rrf, etc. various word-processor formats ◆ Directories File data Library lib, a libraries of routines Directory data (directory and file names) Print or view ps, dvi, gif ASCII or ◆ File data Archive arc, zip, tar related files grouped Data blocks into one file, sometimes compressed. 3

2 11/8/17

Typical File Operations Open A File: Open(fd, name, access)

◆ Various checking (directory and file name lookup, authenticate) • Create ◆ Copy the file descriptors into the in-memory data structure • Write ◆ Create an entry in the open file table (system wide) • Read ◆ Create an entry in PCB File system • Reposition within file – file seek ◆ Return user a pointer to “” info Process File • Delete control Open-file table descriptors File block (system-wide) (Metadata) • Truncate metadata

• Open(Fi) – search the directory structure on disk for Directories entry Fi, and move the content of entry to memory.

• Close (Fi) – move the content of entry Fi in memory to Open directory structure on disk. file File data pointer . array .

5

Translating from user to system view File system design constraints

• User wants to read 10 bytes from file starting at byte 2? • For small files: Seek byte 2, fetch the block, read 10 bytes Small blocks for storage efficiency Files used together should be stored together • User wants to write 10 bytes to file starting at byte 2? Seek byte 2, fetch the block, write 10 bytes, write out block • For large files: Contiguous allocation for sequential access • Everything inside file system is in whole size blocks Efficient lookup for random access Even getc and putc buffers 4096 bytes • May not know at file creation whether file will become • From now on, file is collection of blocks. small or large

3 11/8/17

File usage patterns File system design

• How do users access files? • Data structures Sequential: bytes read in order Directories: file name -> file metadata Random: read/write element out of middle of arrays • Store directories as files Content-based access: find me next byte starting with “COS318” File metadata: used to find file data blocks • How are files used? Free map: list of free disk blocks Most files are small

Large files use up most of the disk space • How do we organize these data structures? Most transfers are small Large files account for most of the bytes transferred • Bad news Need everything to be efficient

Data Structures for Storage Allocation Data structures for disk management

• A file header for each file (part of the file meta-data) ◆ A File Disk sectors associated with each file Metadata A list of data blocks • A data structure to track free space on disk ◆ Free space data structure Bit map Bit map indicating the status of 11111111111111111000000000000000 • 1 bit per block (sector) disk blocks 00000111111110000000000111111111 … • blocks numbered in cylinder-major order, why? Linked list that chains free blocks together 11000001111000111100000000000000 Linked list Others? Buddy system Free … link • What about allocation for the blocks associated with a file? • Let’s look at some ways of keeping addr link … track of file data size addr size 6

4 11/8/17

Contiguous Allocation Linked Files

◆ Allocate contiguous blocks on storage ◆ File structure (Alto) File header Bitmap: find N contiguous 0’s 3 File metadata points to 1st block on storage Linked list: find a region (size >= N) A block points to the next ◆ File metadata Last block has a NULL pointer First block in file ◆ Pros Number of blocks . . . Can grow files dynamically ◆ Pros File data tracked similarly to free Fast sequential access list of blocks Easy random access File A File B Doesn’t waste space ◆ Cons ◆ Cons null External fragmentation Random access: bad (what if file C needs 4 blocks) Unreliable: losing a block means Hard to grow files losing the rest

7 8

Linked files (cont’d) File Allocation Table (FAT)

• Idea is to keep the linked list metadata foo 217 (pointers) in memory, rather than on disk • Allocation table at beginning of each volume 0 ◆ N entries for N blocks ◆ Want to keep it in memory • File structure (MS-DOS) 217 619 A file is a linked list of blocks File metadata points to first block of file 399 The entry of first block points to next, … EOF • Pros Simple 619 399 • Cons Random access: still not good Wastes space - table for each file expensive to keep in memory FAT Allocation Table 8

5 11/8/17

DEMOS (Cray-1) Single-level Indexed File

File metadata ◆ Idea • User declares max size size0 Try contiguous allocation size1 • File header holds array of pointers Allow non-contiguous Disk size to disk blocks 9 File header blocks ◆ File structure Small file metadata has 10 (base,size) pointers • Pros: Big file has 10 indirect pointers Can grow up to a limit

◆ Pros & Cons size0 Random access is fast size Can grow (max 10GB) 1 No external fragmentation

Fragmentation size9 • Cons: size 0 Clumsy to grow beyond limit size1 size0 size1 Still lots of seeks size9 size9 11

Single-level indexed files (cont’d) Multi-level Indexed Files

!

outer-index

index table file

6 11/8/17

Hybrid Multi-level Indexed Files (Unix) Original Unix i-node

◆ 13 Pointers in a header ◆ Mode: file type, protection bits, setuid, setgid bits 10 direct pointers data ◆ Link count: no. of directory entries pointing to this file 11: 1-level indirect ◆ Uid: uid of the file owner 12: 2-level indirect data ◆ Gid: gid of the file owner 13: 3-level indirect 1 .2 . ◆ File size ◆ Pros & Cons 11 . 12 . data ◆ Times (access, modify, change) In favor of small files 13 Can grow

. . Limit is 16G . . data ◆ 10 pointers to data blocks Can have lots of seeking ◆ Single indirect pointer

. . . ◆ Double indirect pointer . . . data ◆ Triple indirect pointer

12 13

Extents Naming Files

◆ An extent is a variable number of Can name files via: blocks Block offset ◆ Main idea length ◆ Index (i-node number): Not easy for users to specify A file is a number of extents Starting block ◆ Text name: Need to map it to index XFS uses 8Kbyte blocks ◆ Max extent size is 2M blocks Icon: Need to map it to index or to text and then to index ◆ Index nodes need to have . . . Block offset ◆ Directories Length ◆ Table of file name, file index pairs Starting block ◆ Map name to file index (where to find the header) • NTFS, Linux EXT4, … ◆ Pros: little metadata, fast seq ◆ A directory is itself stored as a file access, can grow over time, less fragmentation ◆ Cons: external fragmentation still problem 14 15

7 11/8/17

Naming Tricks Directory Organization Examples

• Bootstrapping: Where do you start looking? ◆ Flat (CP/M) Root directory All files are in one directory inode #2 on the system ◆ Hierarchical (Unix) 0 and 1 used for other purposes /u/cos318/foo • Special names: Directory is stored in a file containing (name, i-node) pairs Root directory: “/” (bootstrap name system for users) The name can be either a file or a directory Current directory: “.” ◆ Hierarchical (Windows) Parent directory: “..” (otherwise how to go up??) C:\windows\temp\foo user’s home directory: “~” File extensions have meaning (unlike in Unix). Use the • Using the given names, only need two operations to extension to indicate whether the entry is a directory navigate the entire name space: ‘name’: move into (change context to) directory “name” ls : enumerate all names in current directory (context) 16

Mapping File Names to i-nodes Linear List

Need to support the following types of operations: ◆ Method /u/jps pairs are foo bar … ◆ Create/delete linearly stored in a file veryLongFileName Create a file Create/delete a directory • Append ◆ Open/close Delete a file Open/close a directory for read and write • Search for FileName Link/unlink a file • Compact by moving the rest ◆ Rename ◆ Pros Rename the directory Space efficient ◆ Cons Linear search Need to deal with fragmentation

17 18

8 11/8/17

Tree Data Structure Hashing

◆ Method ◆ Method Use a hash table to map Store a tree data structure such as B-tree FileName to i-node Create/delete/search in the tree data structure Space for name and metadata foo 1234 ◆ Pros is variable sized bar 1235 Good for a large number of files Create/delete will trigger space allocation and free

◆ Cons … 4567 Inefficient for a small number of files ◆ Pros foobar Fast searching and relatively More space … simple Complex ◆ Cons Not as efficient as trees for very large directory (wasting space for the hash table) 19 20

Number of I/O operations Hard Links

Directory A Directory B

◆ I/Os to access a byte of /u/cos318/foo ◆ Approach Read the i-node and first data block of “/” A link to a file with the same i-node Read the i-node and first data block of “u” source target Read the i-node and first data block of “cos318” i.e. the name points to the same i-node as that of the file being linked to i-node Read the i-node and first data block of “foo” Ref=2 Delete may or may not remove the target ◆ I/Os to write a file depending on whether it is the last one Read the i-node of the directory and the directory file (as (link reference count) above) Read or create the i-node of the file ◆ Main issue with hard links? Read or create the file itself Write back the directory and the file

◆ Too many I/Os to traverse the directory Solution is to use Current Working Directory (e.g. ./foo) 21 23

9 11/8/17

Symbolic Links Original Unix File System Disk Layout

Directory B Directory A ◆ Approach ◆ Simple disk layout Block size is sector size (512 bytes) A is a pointer to a file i-nodes are on outermost cylinders Use a new i-node for the link Data blocks are on inner cylinders ln –s source target Use linked list for free blocks Carries pathname of original file ◆ Issues Link Index is large ◆ Main issue with symbolic links? Fixed max number of files ◆ Performance? i-nodes far from data blocks i-nodes for directory not close together ◆ What if you delete the link? Consecutive blocks can be anywhere ◆ What if you delete the original file? Poor bandwidth (20Kbytes/sec even for sequential access!) i-node array

23 24

BSD FFS (Fast File System) FFS Disk Layout

foo ◆ Use a larger block size: 4KB or 8KB ◆ i-nodes are grouped together Allow large blocks to be chopped into A portion of the i-node array on each cylinder fragments In same cylinder group as data for the files 10% reserved disk space, to keep room Used for small files and pieces at ends of files bar ◆ Do you ever read i-nodes without ◆ Use bitmap instead of a free list reading any file blocks? Try to allocate contiguously 4 times more often than reading together examples: ls, make ◆ Overcome rotational delays Skip sector positioning to avoid the context switch delay Read ahead: read next block right after the first i-node subarray

25 26

10 11/8/17

FFS block groups for better locality What Has FFS Achieved?

Block Group 0 ◆ Performance improvements 20-40% of disk bandwidth for large files (10-20x original)

Block Group 1 Better small file performance (why?)

F

D

r

e

a ◆ We can do better

e t

a

S

B

p Block Group 2

l

o a

c c

k

e

s Extent based instead of block based

B

f

o

i

t

r

p

m

/

a

a

l

/

e s

p • Use a pointer and size for all contiguous blocks (XFS, Veritas d s e

n

d

i

n

a o

,

n

d

I

c

i /

r file system, etc)

, e

c q

/ t

o d r / i e s s Synchronous metadata writes hurt small file performance e i / r b to I , n / c o a D e / a ir d g ta d p e , B in a s /z lo es m cks for fil it B e ac Sp Free

D a ta B lo ck s for fil es i Fr n di /b/c ee rectories /a, /d, and Sp ace des Bi Ino tmap 27

Summary

◆ File system structure Boot block, super block, file metadata, file data ◆ File metadata Consider efficiency, space and fragmentation ◆ Directories Consider the number of files ◆ Links Soft vs. hard ◆ Physical layout Where to put metadata and data

28

11