<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Global Journal of Science and Technology (GJCST)

Global Journal of Computer Science & Technology 11 Issue 6 Version 1.0 April 2011 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975 - 4172 & Print ISSN: 0975-4350

A Quick Review of On-Disk Layout of Some Popular Disk Systems By Wasim Ahmad Bhat , S. M. K. Quadri Kashmir University

Abstract- : Disk file systems are being researched since the inception of first magnetic disk in 1956 by IBM. As such, many good disk designs have been drafted and implemented. Every file system design addressed a problem at the time of its development and efficiently mitigated it. The augmented or new designs rectified the flaws in previous designs or provided a new concept in file system design. As such, there are many file systems that have been successfully d in operating systems. Among these designs, some file systems have made an influential impact on the file system design because of their capability to cope up with change in hardware technology and/or user requirements or because of their innovation in file system ign or because time favored them which allowed them to find space in popular operating systems. In this paper, we vide a quick review of on-disk layout of some popular disk file systems across many popular platforms like Windows, & . The goal of this paper is to explore the on-disk layout of these file systems to identify the various layout policies and data structures they exploit which made them to be adapted by their native and other operating systems.

Keywords: File System, On-Disk, Design, Popular, Review.

Classification: GJCST Classification: FOR Code: 100699,100604

A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Strictly as per the compliance and regulations of:

© 2011 Wasim Ahmad Bhat , S. M. K. Quadri. This is a research/review paper, distributed under the terms of the Creative Commons Attribution-Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non- commercial use, distribution, and reproduction inany medium, provided the original work is properly cited.

A Quick Review of On-Disk Layout of Some Popular Disk File Systems 2011 Wasim Ahmad Bhatα, S. M. K. QuadriΩ April April Abstract- Disk file systems are being researched since the A File System is a way to organize, store, inception of first magnetic disk in 1956 by IBM. As such, many retrieve, and manage information on a permanent good disk file system designs have been drafted and storage medium such as a disk [4]. File system is an implemented. Every file system design addressed a problem important part of an as it provides a at the time of its development and efficiently mitigated it. The 1 way by which data can be stored, organized, navigated, augmented or new designs rectified the flaws in previous designs or provided a new concept in file system design. As accessed and retrieved in of files and directories

such, there are many file systems that have been successfully from storage sub system. It is generally a kernel module implemented and incorporated in operating systems. Among which consists of algorithms to maintain the logical data these designs, some file systems have made an influential structures residing on the storage subsystems. The impact on the file system design because of their capability to basic key functions that every file system incorporates cope up with change in hardware technology and/or user are basic file operations like copy, move, create, delete requirements or because of their innovation in file system and rename, efficient organization of data for quick design or because time favored them which allowed them to storage and retrieval and efficient use of disk space. find space in popular operating systems. In this paper, we Apart from these basic functions some file systems also provide a quick review of on-disk layout of some popular disk file systems across many popular platforms like Windows, provide additional functions such as compression, Linux & Macintosh. The goal of this paper is to explore the on- encryption, file streams and others. Keeping all the disk layout of these file systems to identify the various layout hardware parameters and workload constant, the policies and data structures they exploit which made them to performance of a hard disk will all depend upon the be adapted by their native and other operating systems. type of file system used. In general file systems were Keywords- File System, On-Disk, Design, Popular, developed in an incremental fashion by individual Review. efforts of researchers and industry with high cohesion with the hardware limitation and requirements I. INTRODUCTION at that time. Later, refinement of existing file systems [5] ince the advent of a mechanism for and new file systems were developed to keep pace with persistent storage of data and/or programs was hardware enhancement and off course need. Sneeded. On the time line, magnetic disks are the To understand the file system design in general primitive [1] (introduced in 1956 as data storage for an and on-disk layout specifically, we need to review the IBM accounting computer) and still widely used history of its invention a bit so that we can get some secondary storage device. Magnetic disk drive is the overview of the environment and situations in which the most primitive and cost effective storage device. There first file systems were drafted and implemented. has been continuous improvement in its hardware Further, this history will give us some idea about the technology to increase its performance and capacity incremental file system design that has been followed [2]. Although performance has seen less improvement since the inception of first file system. In the early days with respect to capacity, but the tremendous drop in of computers, file systems were simply considered part cost per unit , reliability over solid state storage and of the operating system that ran the computer, and in increase in capacity have made disk drives every those days operating systems themselves were rather body’s choice [3]. And hence, disk file systems have new and fancy. One of the first file systems to have a attracted researchers over the globe to exploit its pros name was DEC Tape [6], named after the company that Global Journal of Computer Science and Technology Volume XI Issue VI Version I and minimize its cons. made it (Digital Equipment Corporation) and the physical system the files were stored on (reel-to-reel About α- Research scholar in P. G. Department of Computer Sciences, Kashmir University, India. He did his Bachelor’s degree in Computer tape recorders). The tapes acted like very slow disk Applications from Islamia College of Science & Commerce, India and drives. DEC Tape stored an astoundingly small 184 Master’s degree in Computer Applications from Kashmir University, of data per tape on the PDP-8 [7], DEC’s India. popular early minicomputer. It was called a E-- [email protected] About Ω - Head, P. G. Department of Computer Sciences, Kashmir minicomputer only because, while the size of a University, India. He did his M. Tech. in Computer Applications from refrigerator, it was still smaller than IBM’s mainframes Indian School of Mines, India and Ph. D. in Computer Sciences from that took up entire rooms. Of course, the invention of Kashmir University. the transistor and integrated circuit allowed another [email protected]

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

whole round of miniaturization. DEC slowly became the day. The introduction of hard disks soon made FAT- extinct while the rest of the world moved to 12 obsolete but file systems got attention and every microcomputers. individual researcher and software industry professional In 1972, Gary Kildall [8] got interested in recognized its importance and started either enhancing working with Microprocessors and got involved with and augmenting the older designs or re-designing 2011 Intel. His research was related to and code some new file systems from scratch. optimization. While working as a consultant in Intel, In this paper, we will look at some most popular

April April Kildall developed the Programming file systems’ on-disk layout. The popularity of the file

Language/Microprocessor (PL/M) [9] and the Control systems selected is solely based on the popularity of Language/Microprocessor (CP/M) [10]. He wrote CP/M the operating systems that support them natively. The to test out PL/M . CP/M allowed him to store goal of this paper is to look at the layout policies they 2 files and retrieve them from 8-inch floppy. He was able exploit and data structures they use to mitigate the to run and test programs from it, modify them and challenges for which they were designed. In this paper, check their portability by putting floppy in other we will review the native file systems of Windows, Linux

machine’s drive. CP/M got very popular because it used and Macintosh operating systems. small amount of memory required to run it, approximately 3 ½ K and had a file system, but it does II. FAT File Systems not have a name. It was very simple, as it stored files in The design of FAT [14] file system is very a completely flat hierarchy with no directories. File simple as it uses simple data structures. This simplicity names were limited to eight characters plus a three- in design has made FAT file system popular and character “” that determined the file’s type. supported by almost every operating system. In today’s This was perfectly sensible because it was exactly the world, several digital devices, such as mini mp3 same limitation as the computer Kildall was working players, smart phones, digital cameras, etc. are with. Gary Kildall and the company he founded to sell becoming part of our life. These devices exchange data CP/M, , soon became very wealthy and frequently with desktop computers. The PC discovers the usage of CP/M was tripling every year. It turned out these devices as standard USB devices that a lot of microcomputer companies needed an and automatically mounts the file system inside them. operating system, and Gary had designed it in a way This is possible only if the file system used in device is that separated all the BIOS from the rest of the OS. supported by the PC’s operating system. That is why; Unfortunately for Kildall, other soon got the conventional FAT file system is a useful for solid same idea he had. state memory cards as it provides a convenient way to A named Tim Patterson [11] wrote share data by being supported by almost all operating his own OS called “QDOS” (Quick and Dirty Operating systems [15]. As mentioned before, FAT12 was the first System) [12] that was a quick and dirty clone of FAT file system but was able to address only limited everything CP/M did, because he needed to run an OS number of sectors as it was developed for floppy disks. on a new 16-bit computer, and Gary hadn’t bothered to Later, with the introduction of , FAT16 a 16-bit version of CP/M yet. QDOS had a slightly was introduced and with higher capacity drives, FAT32 different file system than CP/M, although it did basically and now exFAT [16] (unofficially called FAT64). Almost the same thing and didn’t have directories either. all the flavors of FAT file system follow same design with Patterson’s file system was based on a 1977 the exception of pointer width in bits that is used to program called Microsoft Disk Basic, which was access the sectors (or Clusters) and which gives the basically a version of Basic that could write its files to FAT suffix 12, 16, 32 and 64. FAT12 and FAT16 are floppy disks. It used an organization method called the obsolete now whereas exFAT is not widely used yet, in . contrast to FAT32 which is supported by almost every bought Tim Patterson’s QDOS for

Global Journal of Computer Science and Technology Volume XI Issue VI Version I operating system. $50,000, and renamed it MS-DOS [13]. He now was The FAT32 file system consists of 4 different able to sell it to IBM and every company making an IBM data structures to allow semantics of hierarchical file clone, and Gary found himself quickly escorted from the systems to be implemented on volume. personal computing stage. As it was originally a quick and dirty clone of a file system designed for 8-bit a) microcomputers in the 1970s that was itself a quick- Boot Sector is located at the beginning of the and-dirty hack that mimicked the minicomputers of a volume. It includes an area called BPB (Bios Parameter decade earlier, FAT was not really up for very much. It ) at offset 11 of length 49 and contains retained CP/M's “8 and 3” file name limit, and the way it some basic file . The rest of the stored files was designed around the physical structure sector usually contains boot code with boot signature of the drive, the primary storage device of word (0x55AA) at offset 509.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems BPB is a one dimensional table that contains not. This data structure of FAT file system gives it the variable length entries. Each entry in BPB stores file name and is the heart of the file system. The suffixes system layout information except one (BPB_Reserved) used by various FAT file systems indicate the bit width which is kept reserved for future extension. Different of entries in FAT data structure. Thus, in FAT32, the FAT versions of FAT file systems have size difference in BPB entries are 32-bit wide.

and contain different entries. Table 1 shows the BPB for FAT data structure is a table that stores the 2011 FAT32 file system. Each entry has been given a name to information about which clusters are free, used or identify its role along with entry offset and size. possibly unusable. A cluster is a fixed length group of consecutive data sectors which are located immediately April

Name Offset (byte) Size (bytes) BS_jmpBoot 0 3 after FAT data structure and occupy rest of the volume. BS_OEMName 3 8 The number of sectors per cluster is indicated by BPB BPB_BytsPerSec 11 2 at offset 13 of boot sector. FAT file system always 3 BPB_SecPerClus 13 1 allocates space on storage device in terms of clusters. BPB_RsvdSecCnt 14 2 This is done to increase the performance of the file system by avoiding individual multiple accesses to disk.

BPB_NumFATs 16 1 BPB_RootEntCnt 17 2 Thus, the file system may suffer from high internal BPB_TotSec16 19 2 fragmentation if cluster is too large and there are many BPB_Media 21 1 small sized files; and may degrade the performance if it BPB_FATSz16 22 2 is small and the volume has large sized files. BPB_SecPerTrk 24 2 Depending upon the type of file system and size of the BPB_NumHeads 26 2 volume, the cluster size varies but the number of BPB_HiddSec 28 4 sectors per cluster is restricted to a value that is power BPB_TotSec32 32 4 of 2 i.e. 1,2,4,8,16,32,64, etc. In addition to keep track BPB_FATSz32 36 4 of used and unused clusters, FAT data structure also BPB_ExtFlags 40 2 keeps track of chain of clusters allocated to a file. The BPB_FSVer 42 2 technique used by FAT32 file system is simple. Every BPB_RootClus 44 4 file and except the of volume has BPB_FSInfo 48 2 an entry in its parent directory that contains its name, BPB_BkBootSec 50 2 attributes & 32 bit wide entry that indicates the first BPB_Reserved 52 12 cluster number allocated to it. The FAT data structure BS_DrvNum 64 1 entries are 32 bit wide and each entry uniquely BS_Reserved1 65 1 corresponds to the cluster on the volume sequentially BS_BootSig 66 1 i.e. the first entry corresponds to cluster 0, second entry BS_VolID 67 4 corresponds to the cluster 1, etc. The formula used to BS_VolLab 71 11 locate the cluster entry in FAT data structure for any

BS_FilSysType 82 8 valid cluster number N is Table 1. Description of FAT32 BPB FATOffset= N *4 Reserved Sectors immediately follow Boot FATOffset ThisFATSecNum= BPB_ Re svdSecCnt + Sector. The number of reserved for volume includes BPB_ BytsPerSec Boot Sector and is indicated by BPB at offset 14 of Boot ThisFATEntOffset= FATOffset%_ BPB BytsPerSec Sector. Typically, reserved sectors include FSInfo sector where ThisFATSecNum is the logical sector number of at sector 1 and BkBoot sector at sector 6 of the volume. the volume and ThisFATEntOffset is the offset in the FSinfo sector further qualifies the FAT32 volume, while sector where 32-bit FAT entry corresponding to cluster BkBoot is replica of boot sector and is used for recovery number N exists. The contents of any valid cluster entry purposes. in FAT can have values as shown in Table 2. Global Journal of Computer Science and Technology Volume XI Issue VI Version I b) File Allocation Table (FAT) The File Allocation Table (FAT) is an array of n- FAT32 Cluster Entry Description bit wide entries and spans over a number of sectors Values indicated by BPB at offset 36 of Boot Sector. FAT32 0x00000000 Is Free Cluster volume has generally 2 consecutive copies of FAT data 0x00000001 Reserved value structure and is called FAT Mirroring. Mirroring is done 0x00000002 – Is Used Cluster and value points to for recovering from FAT corruption in case one copy of 0x0FFFFFEF cluster in the chain allocated FAT gets corrupt. In case of solid state storage devices, to file/directory FAT is not mirrored to prolong the life of solid state 0x0FFFFFF0 – Reserved values 0x0FFFFFF6 device by reducing the write cycles. Bit 7 of BPB offset 0x0FFFFFF7 Some Bad sector in Cluster, 40 of boot sector indicates whether FAT is mirrored or

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Unusable • The first entry contains a copy of BPB at offset 0x0FFFFFF8 – Is Last Cluster in file/directory or 21 of Boot Sector which is 8 bit long which 0x0FFFFFFF EOC ( End Of Cluster chain) indicates the type of storage media. The marker remaining 20 bits between high 4 and low 8 of Table 2. Description of Valid FAT Entries this entry are set to 1. 2011 • The second entry stores the EOC marker. The Let’s suppose two files, say MYFILE1.TXT and high order two bits of this entry are sometimes, MYFILE2.TXT are currently residing on a FAT32 volume April April used for dirty volume management: high order such that the former is fragmented and is 3 clusters

bit if set to 1 indicates that last shutdown was long while the latter is not fragmented and is 2 clusters clean otherwise abnormal. The next highest bit, long as shown in figure 1. MYFILE1.TXT has first cluster if set to 1 indicates that during the previous 4 allocated 0x00000029, FAT contents against that cluster mount no disk I/O errors were detected else shows another cluster 0x0000002A, then 0x0000002D there were. whose FAT contents show that this cluster is the last Because the first two FAT entries store special

cluster in chain. Similarly, for MYFILE2.TXT the first values, there is no cluster 0 or 1. The first addressable cluster allocated is 0x0000002B whose FAT contents cluster in FAT32 FAT data structure is cluster 2, which is point to next cluster in chain, 0x00000002C, which is the the reason why BPB value at offset 44 of Boot Sector last cluster in chain as pointed by its FAT content. which indicates the Root Directory cluster number Each file/directory may occupy one or more cannot be less than 2 and is usually 2, i.e., the Root clusters depending upon its size. Thus, a file/directory is Directory is at the start of file/directory region. represented by a chain of these clusters. However, these clusters are not necessarily to be stored adjacent tag">c) Directory Structure to one another on the disk’s surface but are often The semantics of lives fragmented throughout the volume as shown in figure 1 on the notion of files and directories. The hierarchical where MYFILE1.TXT is fragmented while MYFILE2.TXT file system is like a where every non-leaf node is a is not. subdirectory containing any number of non-leaf nodes (sub-directories) or leaf nodes (files) or both. The tree begins at a root node called root directory. In FAT32, the root directory is of variable size and is assigned the first cluster, whose address is indicated in BPB at offset 44. Among all the files and directories that may reside on FAT32 volume, root directory is the only directory that does not have and attributes; more precisely does not have any entry like other files and directories have. In case of FAT12 and FAT16, root directory is located at fixed location after FAT copy and Figure 1. A Snapshot of FAT Data Structure. is of fixed size indicated in BPB. As memory cost per unit capacity is A directory is an array of 32 byte wide dramatically decreasing every year and storage size is structures where each structure represents a file or increasing, the maximum number of clusters have directory either existing or deleted and in case of long increased dramatically and also the cluster size. In name support, the remaining the parts of long name.

FAT32, the FAT entry is 32 bit wide which points to next The structure of 32 byte wide entry of directory is shown cluster in chain but it only uses lower 28 bits to address in Table 3. clusters. Thus, FAT entry values say 0xA0000000 and Name Offset (byte) Size (bytes) 0xB0000000 point to same cluster on volume. As such, DIR_Name 0 11 28 Global Journal of Computer Science and Technology Volume XI Issue VI Version I 2 clusters can exist on FAT32 volume. As mentioned DIR_Attr 11 1

before, the successive major versions of FAT file DIR_NTRes 12 1 systems are named after the number of table entry bits; DIR_CrtTimeTenth 13 1 FAT12, FAT16, FAT32 & FAT64, the goal of every new DIR_CrtTime 14 2 version is to address large volume and large . DIR_CrtDate 16 2 Although, KFAT [17], TFAT [18] and FATTY [19] DIR_LstAccDate 18 2 versions of FAT file system have also been designed DIR_FstClusHI 20 2 but the goal was reliability. Because the number of DIR_WrtTime 22 2 bytes per sector as indicated by BPB at offset 11 of DIR_WrtDate 24 2 Boot sector is always divisible by 4, a FAT32 FAT entry DIR_FstClusLO 26 2 never spans over a sector boundary. DIR_FileSize 28 4

The first two entries in FAT store special values: Table 3. Description of FAT32 Directory Entry Structure

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

The name and other metadata about a file are System File File MFT Purpose of the File all stored in the 32-byte directory entry for file. The list of Name Record characters that cannot be used in a file name, “. “ / \ [ ] ; Master file $Mft 0 Contains one base : | = or 0x20 is really an operating system issue, not a table file record for each file system issue. Linux, via its FAT support, can create file and folder on an

2011 files with some of these characters in their names. This NTFS volume. may cause problems with portability if that disk is later Master file $MftMirr 1 A duplicate image of in a Windows environment. Dating back to the table 2 the first four records April creation of the first FAT12 volumes in the 70’s, all files of the $MFT. were given a name in the 8.3 naming convention. That Log file $LogFile 2 Contains a list of is, eight characters for the name and three characters transaction steps for an extension that identified the type of file; ‘dot’ is used by NTFS for 5 recoverability. never saved. Long file name support was later Volume $Volume 3 Contains information introduced but not in any semblance of an elegant way. about the volume.

Usually, FAT32 places the root directory in the Attribute $AttrDef 4 A table of attribute first available cluster, which places it right behind the definitions names, numbers, FAT area. All other directories in all the FAT file systems and descriptions. will be allocated clusters as they need them and can Root file $ 5 The root folder. reside anywhere on the disk. name index Cluster $Bitmap 6 A representation of III. NT File System bitmap the volume showing which clusters are in NTFS was designed to quickly perform use. standard file operations such as read, write & search. It Boot sector $Boot 7 Includes the BPB was developed from scratch although some concepts used to mount the were borrowed from OS/2’s HPFS [20]. The design of volume and NTFS file system is bit complex but very nicely drafted additional bootstrap and crafted. It includes many new features of modern loader code used if file system like transparent compression and the volume is encryption, sparse files, multiple data streams, bootable. reliability, fast recovery, security features, privileges and Bad cluster $BadClus 8 Contains bad permissions, and representation of everything as file file clusters for the and everything belonging to a file as collection of volume. attribute/value pairs from filename attribute to data Security file $Secure 9 Contains unique security descriptors attribute [21]. The design of NTFS file system is such for all files within a that every sector of volume belongs to some file unlike volume. FAT. Even the file system metadata that describes the Upcase $Upcase 10 Converts lowercase file system is part of some file. table characters to When a volume is formatted with NTFS file matching system, it leads to the creation of several system files uppercase used by file system to store volume metadata and characters. implement the file system. These files are not NTFS $Extend 11 Used for various accessible to user directly. These system files have extension optional extensions entry just like other regular volume files and directories file such as quotas, have, and have been given some reserved names reparse point data, and object prefixed by $ sign. The standard configuration of NTFS identifiers. Global Journal of Computer Science and Technology Volume XI Issue VI Version I file system has 16 system files out of which last 4 12-15 Reserved for future entries are reserved [22]. Table 4 lists these system files use. along with their $MFT name, $MFT entry offset (explained later) and purpose of the file. Table 4. $MFT Entry name, & Offset & Purpose of NTFS System Files a) $BOOT The location of $BOOT file is fixed and resides on first 16 sectors of NTFS volume. The first sector is called Boot Sector as it contains the boot strap code and following 15 sectors are boot sector’s IPL (Initial

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Program Loader). The boot sector is duplicated at last locate $MFT at boot time. $MFT is not fixed like FAT sector of the volume. The boot sector of $BOOT file and hence can be relocated in case it is damaged; contains two data structures; BPB followed by Extended same is true for other system files. BPB. Table 5 describes the BPB and Extended BPB of A record in $MFT is a 1 KB structure NTFS boot sector (Offset, Length & Field Name). that stores attributes of file/directory to which it 2011 Byte Offset Field Length Field Name corresponds. NTFS stores everything belonging to file 0x0B WORD Bytes Per Sector or directory as a collection of attribute/value pairs

April April 0x0D BYTE Sectors Per Cluster including filename, security information, time stamps,

0x0E WORD Reserved Sectors data, etc [23]. Each $MFT record corresponds to a 0x10 3 BYTES always 0 unique file. If a file has large number of attributes, more 0x13 WORD not used by NTFS than one record is allocated to a file. In this case, the 6 0x15 BYTE Media Descriptor first record that stores the location of others in Attribute 0x16 WORD always 0 List attribute is called Base File Record. Whether a file 0x18 WORD Sectors Per Track consumes one or more $MFT records, if the value for

0x1A WORD Number Of Heads any particular attribute is completely stored in record, 0x1C DWORD Hidden Sectors such an attribute is called Resident Attribute. Several 0x20 DWORD not used by NTFS attributes are defined as always being resident so that 0x24 DWORD not used by NTFS NTFS can locate non-resident attributes for e.g. 0x28 LONGLONG Total Sectors $STANDARD_INFORMATION, $INDEX_ROOT, 0x30 LONGLONG Logical Cluster Number $ATTRIBUTE_LIST, etc. A non-resident attribute is one for the file $MFT whose value cannot be completely stored in an $MFT 0x38 LONGLONG Logical Cluster Number record. In such case, NTFS allocates clusters for the for the file $MFTMirr attribute’s data separate from $MFT. This area is called 0x40 DWORD Clusters Per File Record a run or technically an . If resident attribute’s value Segment grows, it is converted to non-resident attribute and 0x44 DWORD Clusters Per Index Block allocated a run. $DATA attribute for files greater than 1 0x48 LONGLONG 0x50 DWORD Checksum KB, $BOOT, $MFTMirr and $LogFile is always non- resident. Table 6 shows the standard attribute names Table 5. BPB & Extended BPB of NTFS file system and their description [24]. Actually attributes Among other things, the two data structures correspond to numeric codes which NTFS uses to order contain sectors per cluster, bytes per sector, total (in ascending order) the attributes within an $MFT sectors, logical cluster number of $MFT file, logical record with same attribute types appearing more than cluster number of $MFTMirr file, clusters per file record once in case a file has multiple values for that attribute. segment and clusters per index block. Most attributes never have names, though Index related attributes and $DATA attribute often does. Names b) $MFT distinguish among multiple attributes of same type that $MFT file or Master File Table file is an array of a file can include. The value of an attribute is a byte fixed records where each record represents uniquely stream and is stored as a separate stream in a file. every file or directory of the volume even the system NTFS does not read and write files instead attribute files including the $MFT file. The first 16 records are streams. The read and write exported by file reserved for system files. Table 4 shows the list of first system driver normally operate on file’s unnamed 16 records ordered as per their position and $DATA attribute. corresponding system files they represent along with Attribute Type Description short description. The first entry represents the $MFT file Standard Includes information such as timestamp itself while second entry represents the mirrored copy of Information and link count. Global Journal of Computer Science and Technology Volume XI Issue VI Version I $MFT file named $MFTMirr whose first record is Attribute List Lists the location of all attribute records identical to first record of $MFT. Actually, $MFTMirr that do not fit in the base MFT record. duplicates first 4 records of $MFT for recovery purpose. File Name A repeatable attribute for both long and In case the first record of $MFT that defines $MFT, is short file names. The long name of the file corrupted the file system code should read the second can be up to 255 Unicode characters. record of $MFT to locate $MFTMirr and read its first The short name is the 8.3, case- record to build $MFT or should directly read the insensitive name for the file. Additional names, or hard links, required by POSIX $MFTMirr file’s first record by locating its position from can be included as additional file name logical cluster number in BPB to build $MFT. As $MFT attributes. actually defines the NTFS layout, logical cluster number Security Describes who owns the file and who can of $MFT is kept in BPB so that file system driver can Descriptor access it.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Data Contains file data. NTFS allows multiple NTFS refers to physical locations on a disk by data attributes per file. Each file typically means of logical cluster numbers (LCNs). LCNs are has one unnamed data attribute. A file simply the numbering of all clusters from the beginning can also have one or more named data of the volume to the end. To an LCN to a attributes, each using a particular syntax. physical disk address, NTFS multiplies the LCN by the Object ID A volume-unique file identifier. Used by

cluster factor (i.e. number of sectors per cluster) to get 2011 the distributed link tracking service. Not all files have object identifiers. the physical byte offset on the volume. NTFS refers to the data within a file by means of virtual cluster numbers

Logged Utility Similar to a data stream, but operations April Stream are logged to the NTFS log file just like (VCNs). VCNs number the clusters belonging to a

NTFS metadata changes. This is used by particular file from 0 through m. VCNs aren't necessarily EFS. physically contiguous, however; they can be mapped to Reparse Point Used for volume mount points. They are any number of LCNs on the volume. When an attribute 7 also used by (IFS) is nonresident, as the data attribute for a large file might filter drivers to mark certain files as be, its header contains the information NTFS needs to special to that driver. locate the attribute’s value on the disk. This information

Index Root Used to implement folders and other is typically the VCN-to-LCN mapping pairs. Figure 3 indexes. shows the data attribute header containing VCN-to-LCN Index Used to implement folders and other mappings for the two runs, which allows NTFS to easily Allocation indexes. find the allocations on the disk. Other attributes can be Bitmap Used to implement folders and other stored in runs if there isn't enough room in the $MFT file indexes. Volume Used only in the $Volume system file. record to contain them. Information Contains the volume version. Volume Name Used only in the $Volume system file. Contains the volume . Table 6. Standard Attribute Types & their Description Each $MFT record begins with an entry header which is 42 bytes long. This standard header contains a magic number “FILE”, number of entries in fix up array, $Log File sequence number, Sequence number, count, offset to first attribute, flags that indicate whether record is in use or not, used and allocated size of MFT entry, file reference to base file record in case it is not base record, attributes and fix up values. Each attribute begins with a standard header containing Figure 3. Non-Resident $DATA attribute of File. information about the attribute like type and length of A file on an NTFS volume is identified by a 64- attribute, length of name and offset to name, non- bit value called a File Reference. The file reference resident flag, etc. The header of every attribute is always consists of a file number and a sequence number. The resident and records whether the value is resident or file number corresponds to the file’s $MFT record entry non-resident. offset (or to that of base file record if the file has more For resident attributes, the header also than one file record entries). The file reference contains the offset from the header to attribute’s value sequence number, which is incremented each time an and length of attribute’s value. Figure 2 shows the $MFT file record position is reused, enables NTFS to typical structure of a $MFT entry record [25]. perform internal consistency checks. If a particular file has too many attributes to fit in the $MFT record, a second $MFT record is used to contain the additional Global Journal of Computer Science and Technology Volume XI Issue VI Version I attributes (or attribute headers for nonresident attributes). In this case, an attribute called the Attribute List is added to file in base record. The attribute list attribute contains the name and type code of each of the file’s attributes and the file reference of the $MFT record where the attribute is located. The attribute list attribute is also provided for those cases in which a file

grows so large or so fragmented that a single $MFT Figure 2. Typical MFT Entry Record record can’t contain the multitude of VCN-to-LCN

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

mappings needed to find all its runs. Files with more The index allocation attribute maps the VCNs of than 200 runs typically require an attribute list. the index buffer runs to the LCNs that indicate where In NTFS, a file directory is simply an index of the index buffers reside on the disk, and the bitmap , i.e., a collection of filenames along with their attribute keeps track of which VCNs in the index buffers file references organized in a particular way (B-tree) for are in use and which are free. Figure 4 shows one file 2011 quick access [26]. To create a directory, NTFS indexes entry per VCN (that is, per cluster), but filename entries the filename attributes of the files in the directory. are actually packed into each cluster. Each 4-KB index

April April Conceptually, an $MFT entry for a directory contains in buffer can contain about 20 to 30 filename entries. The

its Index Root attribute a sorted list of the files and/or B+ tree data structure is a type of balanced tree that is directories in the directory. It also contains the file ideal for organizing sorted data stored on a disk reference in the MFT where the file/directory is because it minimizes the number of disk accesses 8 described and time stamp and size information for the needed to find an entry. In the $MFT, a directory’s index file/directory. A large directory can also have root attribute contains several filenames that act as nonresident attributes (or parts of attributes), as Figure indexes into the second level of the B+ tree. Each

4 shows. filename in the index root attribute has an optional pointer associated with it that points to an index buffer. The index buffer contains filenames with lexicographic values less than its own. In Figure 4, for example, file4 is a first-level entry in the B+ tree. It points to an index buffer containing filenames that are (lexicographically) less than itself—the filenames file0, file1, and file3. Note that the names file1, file2, and so on that are used in this example are not literal filenames but names intended to show the relative placement of files that are lexicographically ordered according to the displayed sequence. c) $LogFile

The internal structure of the $LogFile is not well Figure 4. Root Directory [21] understood. Once the log is full, the first entry is In this example, the $MFT file record doesn’t overwritten with the next new entry. What get logged are have enough room to store the index of files that make the individual transactions that make up each file up this large directory. A part of the index is stored in access or file write or whatever. For instance, when the Index Root attribute, and the rest of the index is modifying a file the following steps might occur: stored in non-resident runs called Index Buffers. For • read $MFT entry for directory entry file is in large directories, however, the filenames are actually • read directory entry file is in stored in 4-KB fixed-size index buffers that contain and • read $MFT record for file organize the filenames. Index Buffers implement a B+ • write file tree data structure, which minimizes the number of disk • update Atime in file’s MFT record accesses needed to find a particular file, especially for • update Mtime in file’s MFT record large directories. The index root attribute contains the • update Atime in directory entry for that file first level of the B+ tree (root subdirectories) and points • update Mtime in directory entry for that file to index buffers containing the next level (more subdirectories, perhaps, or files). Figure 4 shows only This list gets considerably longer if the file is filenames in the index root attribute and the index encrypted or compressed. If the command fails before Global Journal of Computer Science and Technology Volume XI Issue VI Version I buffers (file6, for example), but each entry in an index the entire string of transactions are completed, due to also contains the file reference in the $MFT where the system crash or whatever other reason, the file system file is described and time stamp and file size has to have a way to change each of the transactions information for the file. NTFS duplicates the time stamp involved back to their previous values in order to and file size information from the file’s $MFT record. maintain consistency of the file system. The file system This technique, which is used by FAT and NTFS, provides a reliable, crash-resilient environment. requires updated information to be written in two d) $Volume places. Even so, it’s a significant speed optimization for The file $Volume contains the name of the directory browsing because it enables the file system to volume. That is its most important function. There is display each file’s time stamps and size without also volume information data in this file that contains a opening every file in the directory. version number and a set of flags. The version number

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems will be broken into two pieces, a major and a minor file system and provided long filenames, support for version number. large volume size and 3 timestamps; while file system was based on Ext file system with many e) $AttrDef reorganizations and improvements. It was designed This file contains the list of attributes available with evolution in mind and contained space for future to the file system in this version of NTFS. It is because

extension. Due to minimal design, Xia was more stable 2011 of this file that we know the catchy names for the than Ext2 file system. Later, bugs were fixed in Ext2 file attributes that we are using. The entry for the attribute system and lots of improvements and new features also contains some information about the allowable April were integrated. Ext2 file system became stable and de sizes and location (resident or not) of the attribute can facto standard Linux file system. Ext2 uses VFS to be. extend the maximum volume from 2 GB to 4 TB. It f) $Bitmap allows root user to from incidents where other 9 The $BitMap is a special file within the NTFS file users overfill the file system. It uses variable length system. This file keeps track of all of the used and directory entries while filename length could be unused clusters on an NTFS volume. When a file takes extended to 1012. Ext2 file system may use up space on the NTFS volume the location is uses is synchronous updates like BSD FFS [32]. This is the marked out in the $BitMap. The method of keeping maximum reliability support provided by Ext2 file track of cluster allocation is relatively simple. Each bit in system. In synchronous updates, any modification to file the Bitmap represents 1 cluster; if that bit is “1” then the system metadata like I-node, bitmap blocks, indirect cluster is in use. blocks and directory blocks are synchronously written to the disk. Although this mechanism provides bit g) $BadClus reliability, it leads to poor performance. Ext2 file system This file is the size of the NTFS volume, but is a allows administrator to choose logical block size when of all zeros. Since zeros in sparse files are creating file system. Block sizes can typically be 1024, counted instead of saved, this file takes up no space on 2048 and 4096 bytes. Ext2 implements fast symbolic the disk. If a cluster is ever deemed ‘bad’, data will be links which does not use any data block on file system written to this file at the same offset into this file as the by not storing the target name in a data block but in I- offset the bad cluster is into the volume. This will causes node itself. this file to allocate clusters in the $bitmap file, which in Andrew S. Tanenbaum wrote the turn prevents other files from trying to use the bad operating system in 1987 [27]. Tanenbaum created it cluster in the future. for teaching purpose. Later, he published a textbook h) $Secure that included source code of Minix. This code was taken In Windows NT, every file had a and published on Usenet where thousands of readers $Security_Descriptor attribute that did this job. Since were able to examine and further develop Minix. As many files had the same values in that attribute it was Minix was simple and bug free, Torvalds decided to moved to this file so that data wasn’t repeated. incorporate its architecture into the operating system he was developing. Torvalds named his operating system i) $UpCase Linux. One shortcoming of Torvalds first Case in the file name is preserved, but is was that it only supported Minix file system. Minix file converted to all uppercase for sorting as the directory system was an efficient and relatively bug free piece of entry is created. This file contains the uppercase software. However, the restrictions in design of Minix file characters of ‘every’ UNICODE alphabet so that NTFS system were too limiting, so people started thinking and knows the proper alphabetical order of each code working on the implementation of new file system in of UNICODE without having to inherently knows every Linux [28]. In order to add more file systems to Linux code page of UNICODE. operating system, Torvalds modified a VFS written by j) $Extend Chris Provenzano and integrated it into the kernel [29]. Global Journal of Computer Science and Technology Volume XI Issue VI Version I $Extend is a directory that contains other After integration, a new file system called “ files. This allows for more system files to be System” was implemented which removed two big added but without pushing the limit of the 16 I-nodes Minix limitations; maximum volume size and maximum reserved for system files. filename length, but still there were some problems; no support for separate access, I-node and data IV. Extended File Systems modification timestamps. This file system used linked lists to keep track of free blocks and I-nodes and thus In response to these problems, two new file systems were developed “Xia” and “Second Extended File System” [31]. Xia file system was based on Minix

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

resulted in bad performance with aging [30]. b) I-node List file system was designed to eliminate I-node list structure immediately follows the enormously long file system recovery times after the Super block. The size of I-node list depends upon the crash. Ext3 is a [33]. A journaling volume size and is calculated at initial format and punched in Super block. I-node is the basic building file system differs from a traditional file system in that it block; every file and directory in the file system is 2011 keeps transient data in new location, independent of the described by one and only one I-node. Each I-node permanent data and metadata on disk. Because of this, contains the description of the file it represents; file

April April such a file system does not dictate that the permanent type, access permissions, owner, access times, link

data has to be stored in any particular way. As such, it count, file size and table of pointers to data blocks. I- is quite possible for Ext2 file system on disk structure nodes are internally represented by I-node number influenced by the layout of the BSD file system to be enumerated by their position in the I-node list. The 10 used in this file system. The layout of journaled Ext2 (or numbering begins from 1, I-node 0 does not exist on Ext3) file system on disk is entirely compatible with newly formatted volume. An I-node of Type=0 and existing Ext2 file system. Ext2 file system design already number of links=0, is free otherwise represents a file.

includes a number of reserved I-node numbers; one The table of pointers to data blocks is an array among them is used for the file system journal. The of entries where first 9 direct entries contain the address features that separate Ext3 from being a valid ext2 (index number) of data blocks containing data of the file system are journaling, h-tree indexing, and file system while the next single indirect entry contains the address growth while the system is online. [34] is the most of data block that contains the direct entries for data recent version of the extended file system. This latest blocks containing the data of the file. The next entry in release hosts many new features such as a maximum table is a double indirect entry that points to a data volume size of one Exabyte, backwards compatibility block which contains single indirect entries. Similarly, a with ext2 and ext3, online , and triple indirect entry in table points to a data block that nanosecond timestamps. The nanosecond timestamp contains double indirect entries. This level of indirection is unique to Ext4 and allows applications that utilize file is used to allow the structure of I-node to be small but at creation and modification times to track their timing in the same time allows large file size to be addressed. nanoseconds rather than seconds. This scheme is shown in figure 6. As there has been a large drift in the on-disk layout of Linux file systems from Extended file system to Extended 2 file system while later versions have support Ext2 on-disk layout, we will review only Ext and Ext2 on disk layout in detail. Extended File System is based on the concepts derived from operating system. In Extended File Systems, every file is represented by an I-node (Index Node), everything is a file, directory which is a special file contains list of entries pertaining to files it contains along with corresponding I-node. When a volume is formatted with Extended File System, 4 data structures are created as shown in figure 5.

Global Journal of Computer Science and Technology Volume XI Issue VI Version I Figure 5. Data Structures of Extended File System

a) Data Blocks Data Blocks immediately follow the I-node list and occupy rest of the volume. A data block is a set of consecutive sectors which is allocated to a file in its entirety. They are internally represented by numbers

corresponding to their position in the volume. A file may be allocated one or more data blocks, consecutive or fragmented over the volume. Figure 6. Levels of indirection to address data blocks. ©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Several block entries in I-node can be 0 Extended file system stores in Super block meaning that logical block entries contain no data. This information that is needed to maintain I-nodes and data happens if no process ever wrote data into the file blocks. When the volume is created Super block list of offsets corresponding to those blocks and hence block free I-nodes is empty and kernel searches the I-node list numbers remain at their initial value 0. This way structure for those I-nodes where the Type=0 and 2011 Extended File System supports Sparse files. populates the list to its full capacity remembering the

highest numbered I-node it finds. The next time the April kernel searches the disk for free I-nodes, it uses this

remembered I-node as its starting I-node. Keeping track of I-nodes is easy but the list is used to avoid the I-node list search every time an I-node is needed as free I- 11 nodes can be located in I-node list any time by searching for type field. The data blocks are necessarily

to be maintained in their entirety because there is no way for kernel to know on the basis of the content they contain that whether the data block is free or allocated. The Super block contains the list of free blocks populated at the time of volume creation. The data Figure 7. A typical directory file content. blocks are organized in a linked list fashion. The Super block list contains the list of free blocks to its capacity. Directories are implemented as special files One entry in the list points to a block that contains such containing a list of fixed sized entries. Each entry kind of a list to its capacity. During volume creation, the contains I-node number and fixed length filename it kernel tries to organize the list in such a manner such represents. Any entry that contains 0 in I-node but has that block numbers allocated to a file are nearby but some valid filename represents a deleted file that later on no such effort is made. The structure of existed previously on the volume. Every directory file metadata about the free data blocks is shown in figure has first 2 entries containing ‘.’ and ‘..’ entry 8. representing its I-node number and parent directory’s I- node number respectively. For root ‘/’ directory both entries have same value. A typical directory file content is shown in figure 7. c) Boot Block & Super Block The Boot block is located at first sector of volume and contains the boot strap code. The Super block immediately follows the Boot block and contains the information that describes the state of a file system. The information contained in Super block includes: • Size of the file system, • Number of free blocks in the file system, • A list of free blocks in the file system, • Index of next free block in the free block list, • Size of I-node list, • Number of free I-nodes in file system, Figure 8. Free data block management.

Global Journal of Computer Science and Technology Volume XI Issue VI Version I • List of free I-nodes in file system, • Index of next free I-node in free I-node list,

• Lock fields for free block and free I-node list, and • Flag indicating that Super block has been modified.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Extended 2 file system on-disk layout is • Block size fixed at volume creation. strongly influenced by BSD file system and is almost • Blocks per group fixed at volume creation. similar to Extended file system. Ext2 file system is • Free blocks which indicates number of free divided into block groups, which contain a fixed number blocks. of blocks where blocks are fixed sized number of • 2011 Free I-nodes which indicates number of free I- sectors. Block groups immediately follow the boot nodes. sector and are numbered from 0 onwards. Every block • First I-node which indicates the root ‘/’ I-node. April April group contains a Super block (1 block in size), Group Ext2 Super block does not contain information descriptors (n blocks in size), Data block bitmap (1 regarding the list of free data blocks and I-nodes. This block in size), I-node bitmap (1 block in size), I-node information is individually maintained by Block bitmap table (n blocks in size) and data blocks (n blocks in and I-node bitmap of block group. 12 size). The typical structure of Ext2 file system is shown in figure 9. e) Block Group Descriptor Block group descriptor consumes one block and contains following information: • The block number of block allocation bitmap for this block group used during block allocation and de allocation. • The block number of I-node allocation bitmap for this block group used during I-node allocation and de allocation.

• I-node table which contains the starting block number of I-node table for this block group. • Number of free blocks in group. • Number of free I-nodes in group.

• Number of directories in group.

Only the first copy of Super block and group descriptors is updated by Ext2 file system while for other block groups it is left untouched. When a

consistency check is executed, the information is copied on other block groups.

f) Block & I-node Bitmaps Figure 9. Ext2 data structures. Both of these bitmaps occupy one block each

Using block groups has 3 advantages: and number of blocks they address depends upon fixed • Each block group contains a redundant copy of number of blocks per group. In these bitmaps, each bit

Super block and block group descriptors that corresponds to a block (or I-node) of group and its actually define the file system. As such, it is state indicates whether it is allocated or not. easy to recover if any Super block gets corrupted. g) I-node Table • This arrangement gives good performance by I-node table is an array of fixed sized I-nodes reducing the distance between the I-node table and occupy many blocks depending upon the size of I- and the data blocks which reduces the head node, total number of I-nodes in a group and block seeks during file I/O. size all indicated by Super block.

Global Journal of Computer Science and Technology Volume XI Issue VI Version I • It reduces fragmentation by keeping the data Ext2 I-node is almost same as that of Extended blocks belonging to a file in same block group. I-node in that it uses multiple levels of indirection but Ext2 directories contain variable length entries unlike Ext d) Super Block file system directory. Each directory entry contains I- The Super block of Ext2 contains following information: node number, name length and name of file.

• Magic number which validates whether the Ext3 on-disk data structures are identical to block is Super block or not. those of an Ext2 file system. As a matter of fact, if an • Revision level which indicates features it Ext3 file system has been cleanly un-mounted, it can be supports. remounted as an Ext2 file system, conversely, creating • Mount count and maximum mount count. a journal of Ext2 file system and remounting it as Ext3 is • Block group number that holds this copy of simple and fast operation. Super block.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

contains extra extents pertaining to any file if V. Hierarchical File Systems the initial 3 extents of that file record in Catalog (MFS) was introduced file are used up. Later versions allowed bad around 1983 with first Mac computer. MFS was blocks to be recorded as extents. optimized to be used on very small and slow media An extent is a contiguous range of allocation

[35]. With the introduction of larger media, the time blocks allocated to some , represented by a pair of 2011 taken to display the contents of a folder was a concern numbers; the first allocation block number and number as MFS used a single flat file to store all of the file and

of allocation blocks. April directory listing information. As such, the system had to

The general on disk layout of HFS file system is do a complete search of this file in order to build a list of shown in figure 10. files stored in a particular folder. Hierarchical File System (HFS), also called Mac 13 OS Standard, was introduced in 1985 to mitigate this problem. HFS replaced the flat file of MFS with Catalog File which uses B-tree structure that could be searched very quickly regardless of size. HFS was introduced with 20 MB hard disk drive and was hard coded into 128 KB ROM. HFS file system divides the volume in 512 bytes long sectors and allocates to files allocation blocks which contain one or more consecutive sectors. HFS contains 5 data structures that make up the volume: • Boot blocks occupy sector 0 and 1 of system and contain system startup information. • Master Directory Block (MDB) occupies sector 2 and defines the volume layout and other information like location and size of other structures. MDB is duplicated at opposite end of the volume in second to last sector. This is used to recover the volume in case of corruption and is only updated only when either Catalog file or Extent Overflow file size increases. Figure 10. HFS On-Disk Layout. • Volume Bitmap starts at sector 3 and keeps Under HFS (also in HFS+) files are not track of which allocation blocks are free. The monolithic and do not consist of one single element size of Volume bitmap depends upon the size [36]. They may be composed of two or more pieces, of the volume. called Forks. NTFS also supports this concept by • Catalog file is a B-tree that contains records for supporting multiple data streams in general and all files and folders which exist on the volume. multiple values for same attribute types identified by Files and folders are uniquely identified in names. HFS files have 2 named forks (Data & Catalog file by Catalog Node ID (CNID). Each Resource) and can have logically any number of un- node represents a file or folder and may named forks. A Data fork contains the actual data contain any 2 types of records among the 4 pertaining to the file like text for word processor, etc. A possible types. For a file node, a File Thread contains metadata pertaining to the file Record stores filename and CNID of its parent like , picture, etc. In other words, Data fork directory and a File Record stores 16 byte is used to store the unstructured data while Resource attributes used by , timestamps, its fork is used to store the structured data. The Resource Global Journal of Computer Science and Technology Volume XI Issue VI Version I CNID, first 3 extents of file for both data and fork was designed to store metadata that would be resource fork, and pointer to first data and used by GUI. HFS+ supports any arbitrary number of resource fork extent records in Extent Overflow custom named forks in addition to data & resource file (in case it has any). For a directory node, a forks. Directory Thread Record stores name of As the Catalog file stores all the file and directory and CNID of parent directory and a directory records in single data structure, only one Directory Record stores 16 byte attributes used program can write to this structure at a time, forcing by Finder, timestamps, its CNID and number of other programs to wait in a queue to get their turn. This files stored in it. raises both a performance and reliability issue. Also, • Extent Overflow file is a B-tree structure file that due to 16 bit pointers used to address allocation blocks, HFS is able to address only 65535 allocation

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

the volume including the special files and the hierarchy blocks. This means, a minimum size of allocation block in the volume. It is similar to Catalog file of HFS. The can be 1/65535th of volume size. This means only 65535 Catalog file is organized as a B-tree to allow quick and files are possible and high internal fragmentation on efficient searches through a large hierarchy. This file large volumes. contains vital information about every file and folder

2011 Hierarchical File System Plus (HFS+), also along with the catalog information. The main difference called Mac OS Extended, was introduced in 1998 to between the records in HFS and HFS+ Catalog file is overcome problems of HFS and has become the

April April that in HFS+ the nodes of B-tree pertaining to files and primary file system used in Mac computers [37]. HFS+

folders contain more information and can have varying is an improved version of HFS supporting larger files size unlike HFS. The location of first extent of Catalog and volumes by using 32-bit allocation block addresses file is stored in Volume Header. Catalog file contains and Unicode for filenames. It also supports multiple 14 Header node, Index nodes, Leaf nodes and if necessary named forks for files, Journaling, inline attribute data Map nodes. Each file or folder in Catalog file is given a records, access control list based file security and unique Catalog Node ID (CNID). For folder, CNID is compatibility with file permission models on other called FolderID and for files FileID. Like HFS Catalog platforms such as Windows. nodes, HFS+ Catalog nodes also store File Record and Like HFS, HFS+ divides volume into 512 byte File Thread Record for files and Folder Record and sectors and groups them into allocation blocks (usually Folder Thread Record for folders in addition to some 8) to be allocated to a file. Allocation blocks are more additional information. The main difference addressed by 32-bit pointers [38]. In HFS+ volume between HFS File Record and Directory Record and everything is a part of one or more allocation blocks HFS+ File Record and Folder Record is that in HFS the with possible exception Alternate Volume Header, unlike records contain information about first 3 extents HFS were Boot blocks, Master Directory Block and belonging to the file or folder while in HFS+ it is 8. Volume Bitmap are not part of any allocation block. To reduce file fragmentation, contiguous allocation blocks c) Extent Overflow File

called Clumps are allocated to files. The number of Special files only have one fork i.e. allocation blocks per Clump is fixed and is specified in Data fork. The Catalog file does not store any extent for Volume Header. The first 1024 bytes and last 512 bytes special files rather first 8 extents of special files are of volume are reserved. The Volume Header is located stored in Volume Header. User files can have both data immediately after first 1024 bytes and is fixed. The and resource fork and if necessary other named forks. Alternate Volume Header which is replica of Volume The first 8 extents of both data and resource forks for

Header is located at 1024 bytes before the end of user files are stored in Catalog file. In both types of files, volume and is also fixed. The on-disk layout of HFS+ if there is need for additional extents for data and

volume is shown in figure 11. resource fork and/or for named forks, the extents are Volume Header is equivalent of Master recorded in Extent Overflow file. It is a B-tree structured

Directory Block of HFS. It stores timestamps, number of file that stores standard additional forks’ extents and files on volume, location of other structures on volume, named forks’ extents for user files. It does not store for size of allocation blocks, size of clumps, etc. When a itself any additional data fork extent. volume is formatted with HFS+ file system, it leads to d) Bad Block File the creation of 5 special files in addition to reserved Bad Block file is used to mark and record the allocation blocks, Volume Header and Alternate Volume areas of the volume that contain bad blocks. The Header. Extent Overflow file is used to hold information about a) Allocation File the Bad Block file extents. Allocation file keeps track of which allocation e) Attributes File locks are free and which are in use by representing

Global Journal of Computer Science and Technology Volume XI Issue VI Version I An Attributes file is a special file which does not every block by bit. It is equivalent to Volume Bitmap of have an entry in Catalog file. An Attributes file is a HFS. The main difference between Volume Bitmap and complex file. A volume can have no Attributes file in Allocation File is that Allocation file is a regular file which which case its description in Volume Header for can exist anywhere on volume, shrink or grow in size allocation blocks is 0. Attributes file is a B-tree and need not to be contiguous while Volume Bitmap structured file where nodes can contain records known always resides in reserved area and its size is fixed. The as Attributes. An Attributes file can have 3 types of location of first extent of Allocation file is stored in attributes: Volume Header. This architecture of Allocation file • Inline Data Attributes which contain small induces flexibility in HFS+ file system not found in HFS. attributes.

b) Catalog File • Fork Data Attributes which contain references Catalog file describes every file and folder of to a maximum of 8 extents.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

• Extended Attributes which contain references to size. In both cases, the actual design remained the 8 more extents for data attributes. same. We also observed, in case of NTFS that the f) Startup File design was drafted from scratch which yielded into an

Startup file is a special file used to hold elegant file system having almost all features which a information needed when a system that does modern file system should have. Further, in case of

Extended file systems, we observed large drift in on- 2011 not have built-in ROM support for HFS plus. The boot disk layout from Extended file system to Extended 2 file loader can find the location of Startup file from Volume system to increase performance and reliability. Again,

Header which contains the first 8 extents of Startup file. April the design of Extended 3 file system which is mount

Startup file should not have any additional extents for data fork as it will complicate things for boot loader. compatible with Extended 2 file system is an excellent example of flexibility in design of Extended 2 file system.

We also observed some similarity in heterogeneous file 15 systems. The concept of treating everything residing on the volume as a file is the basic building block of both NTFS and Hierarchical file systems.

References Références Referencias 1. Grochowski, E. (1998), “Emerging trends in data storage on magnetic hard disk drives”, Datatech, pages 11–16, Sep 1998. 2. Dahlin, M.D. (1996), “The Impact of Trends in Technology on File System Design”, University of California, Berkeley. 3. Gibson, G.A. (1992), “Redundant Disk Arrays: Reliable, Parallel Secondary Storage”, ACM Distinguished Dissertations. MIT Press, Cambridge, Massachusetts. 4. Giampaolo, D., “Practical File System Design with the ”, Be, Inc. 5. Zadok, E., Iyer, R., Joukov, N., Sivathanu, G. and Wright, C.P. (2006), "On Incremental FileSystem Development", ACM Transactions onStorage (TOS),2(2):161–196, May 2006 6. DEC Tape, http://en.wikipedia.org/wiki/DECtape, Accessed on November 2010. 7. http://www.pdp8.net, Accessed on November 2010 8. http://en.wikipedia.org/wiki/Gary_Kildall, Accessed on November 2010. 9. http://en.wikipedia.org/wiki/PL/M, Accessed on

November 2010. 10. http://www.digitalresearch.biz/cpm.htm, Accessed on November 2010. 11. http://en.wikipedia.org/wiki/Tim_Paterson, Figure 11. HFS+ On-Disk Layout. Accessed on November 2010. Global Journal of Computer Science and Technology Volume XI Issue VI Version I 12. http://en.wikipedia.org/wiki/86 -DOS, Accessed VI. Discussion And Conclusion on November 2010. We observed that the on-disk layout of file 13. “The Man Who Could Have Been Bill Gates”, systems reviewed in this paper were objective specific. http://www.businessweek.com/magazine/conte In case of FAT file systems, the new versions were nt/04_43/b3905109_mz063.htm, Accessed on developed to address the issue of large file size and November 2010. large volume size support. Similarly in case of 14. FAT32 File System Specification, Hierarchical file systems; the augmented versions http://microsoft.com/whdc/system/platform/firm addressed Unicode support in filenames, relocatable ware/fatgen.mspx, Accessed in 2009. system metadata structures and large file and volume 15. Bhat, W.A., Quadri, S.M.K., (2010), “Review of FAT Data Structure of FAT32 file system”,

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems

Oriental Journal of Computer Science & 32. McKusick, M.K., Joy, W.N., Leffler, S.J. and Technology, Volume 3, No 1. Fabry, R.S., (1984), “A Fast File System for 16. Extended FAT File System, UNIX”, In ACM Transactions of computer http://msdn.microsoft.com/enus/library/aa9143 Systems, 2, No. 3, 1984. 53.aspx, Accessed on November 2010. 33. Tweedie, S. (1998), “Journaling the Linux ext2fs 2011 17. Kwon, M.S., Bae, S.H., Jung, S.S., Seo, D.Y. filesystem”, In LinuxExpo ’98,1998. and Kim, C.K., (2005), “KFAT: Log-based 34. Ts’o, T. (2006), “Proposal and plan for ext2/3

April April Transactional FAT File system for Embedded future development work”. Linux kernel mailing

Mobile Systems”, In Proceedings of 2005 US- list. http://lkml.org/lkml/2006/6/28/454 Korea Conference, ZCTS-142, 2005. 35. HFS,http://en.wikipedia.org/wiki/Hierarchical_Fil 18. Alei, L., Kejia, L., Xiaoyong, L., (2007), “FATTY : e_System, Accessed on November 2010. 16 A reliable FAT File System”, In Proceedings of 36. HFS Plus, the 10th Euromicro Conference on Digital http://en.wikipedia.org/wiki/HFS_Plus, System Design Architectures, Methods and Accessed on November 2010.

Tools, Pages: 390-395, 2007. 37. TN1150, “HFS Plus Volume Format”, 19. Microsoft Corporation, “Transaction-Safe FAT http://developer.apple.com/library/mac/#techn File System”,http://msdn2.microsoft.c0m/en-us otes/tn/tn1150., Accessed on November /library/aa911939.aspx, Accessed in 2010. 2010 20. Duncan, R., (1989), “Design goals and 38. Mac OS X: Mac OS Extended format (HFS implementation of the new High Performance Plus) volume and file limits, File System”, Microsoft Systems Journal, http://support.apple.com/kb/HT2422, Accessed September 1989 v4 n5 p1 (13). on November 2010 21. Russinovich, M., Solomon, D.A. and Ionescu, 39. Overview of FAT, HPFS, and NTFS File A. (2009), “File Systems”, Windows Internals Systems, (5th edition), . ISBN http://support.microsoft.com/kb/100108, , 0735625301. Accessed on November 2010. 22. Nagar, R., (1997), “Windows NT File System 40. Daily, S. (1996), “NTFS vs. FAT.” Windows NT Internals : A Developer's Guide”, O'Reilly. ISBN Magazine October 1996: 95. 9781565922495. 41. NTFS Directories and Files, 23. NTFS Concepts http://www.pcguide.com/ref/hdd/file/ntfs/files.ht http://www.priscilla.com/Courses/ComputerFor m, Accessed on November 2010. ensics/pdfslides/03-NTFSConcepts., 42. Janes, M., “Progression of Linux File Systems”, Accessed on November 2010 http://mjanes.public.iastate.edu/Engl314/indiv_ 24. NTFS Documentation, doc.pdf http://www.scribd.com/doc/2187280/NTFS- 43. Anjoy, R.G., Chakraborty, S.K., (2009), “Feature Documentation, Accessed on November 2010 Based Comparison of Modern File Systems”, 25. http://www.ntfs.com, Accessed on November http://www.idt.mdh.se/kurser/ct3340/ht09/ADMI 2010. NISTRATION/IRCSE09_submissions/ircse09_su 26. Probert, D.B., “Windows Kernel Development”, bmission_16.pdf Microsoft Corporation, http://i-web.i.u- 44. Mitchell, S. (1997), “Inside the File tokyo.ac.jp/edu/training/ss/lecture/new - System”, O'Reilly. ISBN 156592200X. documents/Lectures/08-NTFS/NTFS.ppt, 45. Tanenbaum, A.S., Woodhull, A.S., (2006). “File Accessed on November 2010 Systems”, Operating Systems: Design and 27. http://www.minix3.org/, Accessed on November Implementation (3rd edition.). Prentice Hall. 2010. ISBN 0131429388. Global Journal of Computer Science and Technology Volume XI Issue VI Version I 28. http://en.wikipedia.org/wiki/MINIX_file_system, 46. Pate, S.D. (2003). “UNIX Filesystems: Accessed on November 2010. Evolution, Design, and Implementation”, Wiley. 29. The in Linux, ISBN 0471164836. http://www.linux.it/~rubini/docs/vfs/vfs.html, 47. Leffler, S.J. and McKusick, M.K., “The design Accessed on November 2010 and implementation of the 4.3BSD UNIX 30. Bach, M.J. (1986), “The Design of the UNIX operating System Answer Book”, Addison- Operating System”, Prentice Hall, 1986. Wesley, ISBN 0201546299 31. Card, R., Ts’o, T. and Tweedie, S., (1994), 48. Bar, M., “Linux File Systems”, McGraw-Hill, “Design and Implementation of the Second ISBN 0072129557 Extended Filesystem”, In Proceedings of the First Dutch International Symposium on Linux, Amsterdam, Holland, 1994.

©2011 Global Journals Inc. (US) A Quick Review of On-Disk Layout of Some Popular Disk File Systems 49. Bovet, D.P. and Cesati, M. (2005), “Understanding the Linux Kernel”,O´Reilly Media, 3rd edition, 2005. ISBN 0596005652. 50. Silberschatz, A., Galvin, P.B. and Gagne, G., (2004), “Storage Management”, Operating

System Concepts , 7th Edition, Wiley. ISBN 2011 0471694665. April April

17

Global Journal of Computer Science and Technology Volume XI Issue VI Version I

©2011 Global Journals Inc. (US) Global Journal of Computer Science and Technology Volume XI Issue VI Version I 18 April 2011 ©2011 Global Journals Inc. (US) Inc. Journals This A QuickA Review of On page intentionally page is leftblank - Disk Layout of Some Popular Disk File Systems