File Systems Ext2, Ext3 and Ext4 Table of Contents

File systems ext2, ext3 and ext4 Table of contents • Ext2 file system – Directories – Data structures on disk – Data structures in memory – Disk space management • Ext3 file system – Journaling – Storing directory entries in H-trees – Data structures • Ext4 file system – Extents – Block allocation – Journal – Nanosecond time stamps • Tests 2 File systems of the ext* family Linux supports many file systems, but ext* family systems are native to it. Ext2 (second extended file system) • Introduced in 1993. Main developer is Rémy Card. • The maximum file size allowed is from 16 GB to 2 TB. • The total file system size is between 2 TB and 32 TB. • A directory can contain 32,000 subdirectories. • Recommended on flash drives and USB, because it does not introduce overhead associated with journaling. • A journaling extension to the ext2 has been developed. It is possible to add a journal to an existing ext2 filesystem without the need for data conversion. 3 General features of ext2 The main features of ext2 affecting its performance: • When creating the system, the administrator can choose the optimal block size (in the range of 1 KB to 4 KB), depending on the expected average file size. • When creating a system, the administrator can set the number of inodes for a particular partition size, depending on the number of files expected. • Disk blocks are divided into groups including adjacent tracks, thanks to which reading a file located within a single group is associated with a short seek time. • The file system preallocates disk blocks for regular files, so as the file grows, blocks are already reserved for it in physically adjacent areas, which reduces file fragmentation. • Thanks to the careful implementation, it is stable and flexible. • Defined in /fs/ext2. 4 Directories in ext2 Directory – consists of blocks of type ext2_dir_entry_2. #define EXT2_NAME_LEN 255 struct ext2_dir_entry_2 { __le32 inode; /* Inode number */ __le16 rec_len; /* Directory entry length */ __u8 name_len; /* Name length */ __u8 file_type; char name[]; /* File name, up to EXT2_NAME_LEN */ }; The file name is limited to 255 characters (constant EXT2_NAME_LEN). Depending on the setting of the NO_TRUNCATE, flag, longer names may be truncated or treated as incorrect. The structure has variable length because the last field is an array of variable length. Each entry is supplemented with \0 to multiples of 4. The name_len field stores the actual length of the file name. 5 Directories in ext2 The rec_len field can be interpreted as a pointer to the next correct directory entry. To remove an entry, just reset the inode field and increase the rec_len value. The file is added by means of a linear search to the first structure, in which the inode number is 0 and there is enough space. If one is not found, the new file will be appended at the end. User processes can read the directory as a file, but only the kernel can write the directory, which guarantees the correctness of the data. Example directory (source: Bovet, Cesati, Understanding the Linux Kernel) 6 Ext2 file system data structures on disk The basic physical unit of data on a disk is a block. The block size is constant throughout the entire file system. These constants limit the block size: #define EXT2_MIN_BLOCK_SIZE 1024 #define EXT2_MAX_BLOCK_SIZE 4096 /* in bytes*/ Ext2 on a disk consists of many groups of disk blocks (of the same size, located sequentially one after the other). Block groups reduce file fragmentation. Source: https://computing.ece.vt.edu/~changwoo/ECE-LKP-2019F/l/lec21-fs.pdf 7 Ext2 file system data structures on disk Superblock — Superblocks in all groups have the same content*. Group descriptors — As with superblocks, their content is copied to all groups*. * Originally, the superblock and group descriptors were replicated in every block group with those located in block group 0 designated as the primary copies. This is no longer common practice due to the Sparse SuperBlock Option. The Sparse SuperBlock Option only replicates the file system superblock and group descriptors in a fraction of the block groups. The kernel only uses the superblock and group descriptors from group 0. When e2fsck checks the consistency of the file system, it reaches into the superblock and descriptors from group 0 and copies them to other groups. If as a result of the failure the structure data stored in block 0 are unusable, the administrator can order e2fsck to reach older copies in the other groups. During system initialization, blocks with group descriptors from group 0 are read into memory. Unless there are exceptional situations, the system does not use blocks with descriptors and a superblock from other groups 8 Ext2 file system data structures on disk Number of block groups depends on the partition size and the block size. The block bitmap must be stored in a single block, so if the block size in bytes is b, there can be at most 8 * b blocks in each block group, thus if s is the partition size in blocks, the total number of block groups is roughly s/(8 * b). If the blocks are 1 KB in size, then we can describe 8 K blocks in a single bitmap, so in the group we have 8 MB for data blocks. If the blocks are 4 KB, then we have sixteen times more space for data blocks, i.e. 128 MB. Let’s consider a 32 GB ext2 partition with a 4 KB block size. In this case, each 4 KB block bitmap describes 32K data blocks—that is, 128 MB. Therefore, at most 256 block groups are needed. Dividing the file system into block groups is designed to increase security and optimize data writing to disk. Increased security is achieved by maintaining redundant information about the file system (superblock and group descriptors) in each block group. Optimization of data writing is ensured by algorithms for the allocation of new inodes and disk blocks. 9 Superblock Stores information describing file system (block size, number of inodes and blocks, number of free blocks, etc.) struct ext2_super_block { __le32 s_inodes_count; /* Inodes count */ __le32 s_blocks_count; /* Blocks count */ __le32 s_free_blocks_count; /* Free blocks count */ __le32 s_free_inodes_count; /* Free inodes count */ __le32 s_first_data_block; /* First Data Block */ __le32 s_blocks_per_group; /* # blocks per group */ __le32 s_inodes_per_group; /* # inodes per group */ __le16 s_inode_size; /* size of inode structure */ /* * Performance hints. Directory preallocation should only * happen if the EXT2_COMPAT_PREALLOC flag is on. */ __u8 s_prealloc_blocks; /* Nr of blocks to try to preallocate*/ __u8 s_prealloc_dir_blocks; /* Nr to preallocate for dirs */ ... }; 10 Group descriptors and bitmaps It occupes 32 bytes, which means that when a disk block has a size of 1 KB, one block contains 32 group descriptors. Bitmaps are bit arrays in which a value of 0 means that the corresponding inode or data block is free, and 1 means it is busy. struct ext2_group_desc { __le32 bg_block_bitmap; /* Blocks bitmap block */ __le32 bg_inode_bitmap; /* Inodes bitmap block */ __le32 bg_inode_table; /* Inodes table block */ __le16 bg_free_blocks_count; /* Free blocks count */ __le16 bg_free_inodes_count; /* Free inodes count */ __le16 bg_used_dirs_count; /* Directories count */ ... }; 11 Inode table It consists of a sequence of adjacent blocks, each containing a fixed number of inodes. All inodes have the same size of 128 bytes. One 1 KB block contains 8 inodes, and a 4 KB block contains 32 inodes. Inode number = starting inode number of a block group + inode location in the inode table. The inode in the ext2 file system is described by the structure ext2_inode. struct ext2_inode { __le16 i_mode; /* File mode */ __le32 i_size; /* Size in bytes */ __le16 i_links_count; /* Links count */ __le32 i_blocks; /* Blocks count */ __le32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */ __le32 i_file_acl; /* File ACL */ __le32 i_dir_acl; /* Directory ACL */ ... }; 12 Inode table The i_size field has 32 bits, i.e. the field size limits the file size to 4 GB. In fact, the highest bit of the field is not used, so the maximum file size is even smaller — 2 GB. Ext2 uses different methods to increase the allowable file size. Some architectures use the i_dir_acl field as a 32-bit extension of the i_size field. In ext2 there is no need to store on the disk the mapping between the inode number and the number of the block containing it, because this block number can be calculated based on the group number and relative position of the inode in the inode table. Let's assume that each group has 4,096 inodes and that we want to find the disk address of the inode 13,021. The inode therefore belongs to the third group and its disk address is stored in the 733th position of the corresponding inode table. 13 Ext2 file system data structures in memory Most of the information stored in disk data structures is copied to RAM during file system mounting, thanks to which the kernel has quick access to them. Mapping ext2 data structures to VFS structures Type Data structure on disk Data structure in memory Buffering mode Superblock ext2_super_block ext2_sb_info Always cache Group descriptor ext2_group_desc ext2_group_desc Always cache Block bitmap Bitmap in a block Bitmap in a buffer Dynamically Inode bitmap Bitmap in a block Bitmap in a buffer Dynamically Inode ext2_inode ext2_inode_info Dynamically Data block Byte array VFS buffer Dynamically Free inode ext2_inode ------------ Never Free block Byte array ------------ Never 14 Superblock in memory In VFS, the superblock is described in the super_block structure. The s_fs_info field of this structure for the ext2 filesystem points to the ext2_sb_info structure, and the s_es field of this structure to ext2_super_block. The kernel initializes these data structures during mounting and loads the superblock and group descriptors to memory. It releases them when unmounting the file. The ext2_sb_info structure (source: Bovet, Cesati, Understanding the Linux Kernel) 15 Inode in memory An ext2_inode_info inode descriptor is created for a file from the ext2 system.

Load more