• Assume as a reference point: Ext2 (Very much like a classic )

• “Small” vs “Large” filesystems • Size of files? • Number of files? • Large number of files? • Files are organized by files • If they get too long, it is a problem to find things • Solution: Something like a B-Tree • Small number of files? • B-trees optimize search per • If it all fits in a block – no benefit. • And there is overhead in indexes • “Small” vs “large” files (size of files in bytes) • Allocated blocks are organized by inodes • Relatively small files • Small block size. • Minimizes “slack” unused space within a block at the end of a file • Simple data structure for organizing them, like direct, indirect, 2xindirect, etc pointer • Relatively large files • Larger block size • Means fewer blocks to manage • B-tree or more sophisticated organization for faster search • “Extents” • Extent – a collection of blocks logically adjacent on the storage device (we don’t know if they are physically adjacent) managed as a unit by the file system • Classic inode maps block#->block# • An inode that supports extends maps (start#,end#) range. • What are the benefits of extents? • Fewer entries needed to cover larger spaces • We can have smaller block sizes • Which are good for small files • We can have fewer inode entries • Which are good for larger files

• Extents: Key challenge • Grow files all at once? • Or slowly over time?

• How is space allocated? • As needed, as files grow • Often one block at a time • How do we ever have a large extent? • Preallocation • Grow the file in large chunks, rather than as needed. • Sparse files vs Preallocation • Sparse file • Created to support core dumps • Hash table, etc. • When we , we allocate the block we are writing to, unless it has already been allocated • Blocks which are never written aren’t allocated – reads just return 0. • To create: Seek to end address and write one byte • Preallocation requires • a new call in the API, or • Explicitly ask • Large single writes, or • Delayed allocation

• Immediate vs delayed allocation • Immediate allocation • 1st write to a block allocates the block and caches it • Delayed allocation • Doesn’t allocated until flushed. • If we delete or truncate file, never need to allocate, but this is a really tiny benefit, if that. • Big benefit comes in coordination with extents • Allows automatic recognition of logically contiguous blocks as a file grows so that they can be represented as an extent and kept physically contiguous • This should reduce access overhead, e.g. disk seeks • As well as reducing overhead for mapping in inode • Small file optimization • Optimization for the “tail”

• The small file or “tail” problem • One byte takes up a whole block

• Keep really files and/or the tail of files in the inode • Small files and tails don’t waste a whole block • Another benefit especially for small files • No cache impact to read block • No latency to wait for block • Small tails still require reading a while block – and storing a whole block in cache. • What does storing tails in the inode cost us? • Doesn’t make the inode larger, wasting the space in cases where it isn’t used? • Don’t want to have inodes “span blocks” or some inodes will take 2 reads. • This means inode size needs to be designed such that a multiple of the inode size = the block size (or space will be left over) • Take extra space that would be left over, divide it up among inodes, and now we have a little bit of space for tails – for FREE. • 2 options for the midterm retake: • 1 week from today, after class, or • 7:30-8:50pm on same day as final exam • Pick one sitting, you can’t do both. • If come for a sitting • Turn in the exam • Graded, and counted, for better or for worse • Not turn in the exam • Won’t be graded, it won’t counted • But, you can’t take the next retake • You can bail out • But you can’t take it twice • You can’t hedge your bets.