Case Study: Ext2 FS the Ext2 File System Recap: I-Nodes Recap

Recap: i-nodes Case study: ext2 FS 1 mode The ext2 file system uid Ext2 i-nodes • Second Extended Filesystem gid atime – The main Linux FS before ext3 • Mode – Evolved from Minix filesystem (via “Extended Filesystem”) ctime mtime – Type • Features size • Regular file or directory – Block size (1024, 2048, and 4096) configured at FS creation block count – inode-based FS – Access mode reference count – Performance optimisations to improve locality (from BSD • rwxrwxrwx FFS) direct blocks • Uid (12) • Main Problem: unclean unmount e2fsck – User ID – Ext3fs keeps a journal of (meta-data) updates single indirect • Gid – Journal is a file where updates are logged double indirect – Compatible with ext2fs triple indirect – Group ID 5 mode uid Recap: i-nodes gid Inode Contents atime • Each file is represented by an inode on disk ctime • atime • Inode contains all of a file’s metadata mtime – Time of last access – Access rights, owner,accounting info size • ctime – (partial) block index table of a file block count – Time when file was • Each inode has a unique number reference count created – System oriented name direct blocks – Try ‘ls –i’ on Unix (Linux) (12) • mtime • – Time when file was Directories map file names to inode numbers single indirect last modified – Map human-oriented to system-oriented names double indirect triple indirect 3 6 mode mode 17 uid uid 16 Inode Contents Inode Contents 15 gid • Size gid • Single Indirect Block 14 atime – Size of the file in bytes atime – Block number of a block ctime • Block count ctime containing block numbers 13 mtime – Number of disk blocks used by mtime 12 the file. 11 size size • Note that number of blocks 10 block count can be much less than block count 0 10 7 9 reference count expected given the file size reference count – Files can be sparsely 3 8 4 8 direct blocks (12) direct blocks populated 40,58,26,8,12, 28 11 7 • E.g. write(f,“hello”); lseek(f, (12) 1000000); write(f, “world”); 44,62,30,10,42,3,21 29 2 12 13 7 6 single indirect • Only needs to store the start single indirect: 32 38 SI 14 5 and end of file, not all the double indirect empty blocks in between. double indirect 46 0 9 17 5 15 4 – Size = 1000005 3 triple indirect – Blocks = 2 + overheads triple indirect 61 43 56 1 16 6 63 2 1 7 Disk 10 0 mode uid Inode Contents File Single Indirection gid • Direct Blocks atime – Block numbers of first 12 11 ctime blocks in the file 10 • Requires two disk access to read mtime – Most files are small 9 – One for the indirect block; one for the target block • We can find blocks of file 8 size directly from the inode • Max File Size block count 7 0 10 7 – Assume 1Kbyte block size, 4 byte block numbers reference count 6 3 8 4 – 12 * 1K + 1K/4 * 1K = 268 Kbytes direct blocks (12) 5 40,58,26,8,12, 11 4 • For large majority of files (< 268 K), given the 44,62,30,10,42,3,21 2 7 3 inode, only one or two further accesses required single indirect 2 double indirect 0 9 5 1 to read any block in file. triple indirect 0 56 1 6 63 Disk 8 11 mode uid Problem gid Inode Contents • Double Indirect Block atime – Block number of a block • How do we store files greater than 12 ctime containing block numbers of mtime blocks containing block blocks in size? numbers size – Adding significantly more direct entries in the block count inode results in many unused entries most of reference count the time. direct blocks (12) 40,58,26,8,12, 44,62,30,10,42,3,21 single indirect: 32 double indirect triple indirect 9 12 mode uid Where is the data block number gid Inode Contents • Double Indirect Block stored? atime – Block number of a block ctime containing block numbers of • Assume 4K blocks, 4 byte block numbers, 12 direct mtime blocks containing block blocks numbers size • Triple Indirect • A 1 byte file produced by block count – Block number of a block lseek(fd, 1048576, SEEK_SET) /* 1 megabyte */ reference count containing block numbers of direct blocks (12) blocks containing block write(fd, “x”, 1) 40,58,26,8,12, numbers of blocks containing • What if we add 44,62,30,10,42,3,21 block numbers single indirect: 32 lseek(fd, 5242880, SEEK_SET) /* 5 megabytes */ double indirect write(fd, “x”, 1) triple indirect 13 16 UNIX Inode Block Addressing Solution Scheme block # location 0 through 11 Direct block ? Single-indirect block Double-indirect blocks 14 Max File Size Solution block # location 0 through 11 Direct block • Assume 4 bytes block numbers and 1K blocks 12 through (11 + 1024 = 1035) Single-indirect block • The number of addressable blocks ? Double-indirect blocks – Direct Blocks = 12 – Single Indirect Blocks = 256 – Double Indirect Blocks = 256 * 256 = 65536 – Triple Indirect Blocks = 256 * 256 * 256 = 16777216 • Max File Size – 12 + 256 + 65536 + 16777216 = 16843020 blocks = 16 GB 15 Solution Solution block # location block # location 0 through 11 Direct block 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) = 1049611) Address = 1048576 ==> block number=? Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=? Solution Solution block # location block # location 0 through 11 Direct block 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0 Index in the double indirect block=? Solution Solution block # location block # location 0 through 11 Direct block 0 through 11 Direct block 12 through (11 + 1024 = 1035) Single-indirect block 12 through (11 + 1024 = 1035) Single-indirect block 1036 through (1035+1024*1024 Double-indirect blocks 1036 through (1035+1024*1024 Double-indirect blocks = 1049611) = 1049611) Address = 1048576 ==> block number=1048576/4096=256 Address = 1048576 ==> block number=1048576/4096=256 Block number=256 ==> index in the single-indirect block=? Block number=256 ==> index in the single-indirect block=256-12=244 Address = 5242880 ==> block number=5242880/4096=1280 Block number=1280 ==> double-indirect block number=(1280-1036)/1024=244/1024=0 Index in the double indirect block=244 Some Best and Worst Case Where/How are Inodes Stored Access Patterns Assume Inode already in memory Boot Super Inode • To read 1 byte Data Blocks Block Block Array – Best: • 1 access via direct block – Worst: • System V Disk Layout (s5fs) • 4 accesses via the triple indirect block – Boot Block • To write 1 byte • contain code to bootstrap the OS – Best: – Super Block • 1 write via direct block (with no previous content) • Contains attributes of the file system itself – – e.g. size, number of inodes, start block of inode array, start of data Worst: block area, free inode list, free data block list • 4 reads (to get previous contents of block via triple indirect) + 1 write – Inode Array (to write modified block back) – Data blocks 25 28 Worst Case Access Patterns with Some problems with s5fs Unallocated Indirect Blocks • Worst to write 1 byte • Inodes at start of disk; data blocks end – 4 writes (3 indirect blocks; 1 data) – Long seek times – 1 read, 4 writes (read-write 1 indirect, write 2; write 1 data) • Must read inode before reading data blocks – 2 reads, 3 writes (read 1 indirect, read-write 1 indirect, write 1; • Only one superblock write 1 data) – Corrupt the superblock and entire file system is lost – 3 reads, 2 writes (read 2, read-write 1; write 1 data) • Block allocation was suboptimal • Worst to read 1 byte – Consecutive free block list created at FS format time – If reading writes a zero-filled block on disk • Allocation and de-allocation eventually randomises the list • Worst case is same as write 1 byte resulting in random allocation – If not, worst-case depends on how deep is the current indirect • Inodes also allocated randomly block tree. – Directory listing resulted in random inode access patterns 26 29 Inode Summary Berkeley Fast Filesystem (FFS) • The inode contains the on disk data associated with a • Historically followed s5fs file – Contains mode, owner, and other bookkeeping – Addressed many limitations with s5fs – Efficient random and sequential access via indexed allocation – ext2fs mostly similar – Small files (the majority of files) require only a single access – Larger files require progressively more disk accesses for random access • Sequential access is still efficient – Can support really large files via increasing levels of indirection 27 30 Layout of an Ext2 FS Group Descriptors Boot Block Group Block Group • Location of the bitmaps …. Block 0 n • Counter for free blocks and inodes in this group • • Partition: Number of directories in the group – Reserved boot block, – Collection of equally sized block groups – All block groups have the same structure 31 34 Layout of a Block Group Performance considerations Group Data • EXT2 optimisations Super Inode Inode Descrip- Block Data blocks Block Bitmap Table – Block groups cluster related inodes and data blocks tors Bitmap – Pre-allocation of blocks on write (up to 8 blocks) 1 blk n blks 1 blk 1 blk m blks k blks • 8 bits in bit tables • Replicated super block • Better contiguity when there are concurrent writes – For e2fsck – Aim to store files within a directory in the same group • Group descriptors • Bitmaps identify used inodes/blocks • All block groups have the same number of data blocks • Advantages of this structure: – Replication simplifies recovery – Proximity of inode tables and data blocks (reduces seek time) 32 35 Superblocks Thus far… • Size of the file system, block size and similar • Inodes representing files laid out on disk.

Load more