The extende d-2 le system overview
Gadi Oxman, tgud@tochnap c2.technion.ac.il v0.1, August 3 1995
Contents
1 Pref ace 2
2 Intro duction 3
3 A le system - Whydowenee d it ? 3
4 The Linux VFS layer 3
5 Aboutblo cks andblo ck group s 4
6 The view of ino de s f rom the p oint of view of a blo cks group 4
7 The group de scr iptors 4
8 Theblo ck bitmap allo cation blo ck 5
9 The ino de allo cation bitmap 6
10 On the ino deandthe ino detable s 6
10.1 The allo cate d blo cks : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 7
10.2 Thei mo devar iable : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 8
10.2.1 The r ightmost 4 o ctal digits :::: ::: :::: ::: :::: ::: :::: ::: :::: : 8
10.2.2 The leftmost twooctal digits ::: ::: :::: ::: :::: ::: :::: ::: :::: : 8
10.3 Timeanddate : ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9
s ize ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9 10.4 i
10.5 Us er and group id :: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9
10.6 Hard links :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 10
10.7 The Ext2fs extende d ags :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 10
10.8 Symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11
1. Pref ace 2
10.8.1 Fast symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11
10.8.2 Slow symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11
10.9 i vers ion : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11
10.10Re s erved var iable s :: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 12
10.11Sp ecial re s erve d ino des :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 12
11 Director ie s 12
12 The sup erblo ck 13
12.1 sup erblo ckidenti cation ::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 14
12.2 File system xe d parameters : ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 14
12.3 Ext2fs error handling : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 15
12.4 Additional parameters us e d by e2fsck ::: ::: :::: ::: :::: ::: :::: ::: :::: : 15
12.5 Additional us er tunable parameters :::: ::: :::: ::: :::: ::: :::: ::: :::: : 16
12.6 File system currentstate ::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 16
13 Copyr ight 16
14 Acknowle dgments 16
1 Pref ace
Thi s do cumentattemptsto presentanoverview of theinter nal structure of the ext2 le system. It was wr itten
in summer 95, while I was workingonthe ext2 filesystem editor project EXT2ED.
In the pro ce ss of constructing EXT2ED, I acquire d knowle dge of thevar ious de s ign asp ectsofthethe ext2
le system. Thi s do cument i s a re sult of an e ort todocumentthi s knowle dge.
Thi s i s only the initial vers ion of thi s do cument. It i s obviously neither error-prone nor complete, butat least
it providesastarting p oint.
In the pro ce ss of lear ningthesub ject, I have used the following source s / to ols:
Exp er imenting with EXT2ED, as it was developed.
The ext2 ker nel source s:
fs.h { Themain ext2 include le, /usr/include/linux/ext2
{ The contentsofthe directory /usr/src/linux/fs/ext2.
{ The VFS layer source s only a bit.
The slides: The Second Extende d File System, CurrentState, Future Development, by Remy Card.
2. Intro duction 3
The slide s: Optimi sation in File Systems, by Stephen Tweedie.
Thevar ious ext2 utilitie s.
2 Intro duction
The Second Extended File System Ext2fs is very p opular among Linux us ers. If you us e Linux, chance s
are thatyou are us ingthe ext2 le system.
Ext2fs was de s igned by Remy Card and Wayne Davison.Itwas implemented by Remy Card andwas further
enhance d by Stephen Tweedie and Theodore Ts'o.
The ext2 le system i s still under development. I will do cumenthere vers ion 0.5a, which i s di str ibute d along
with Linux 1.2.x. Atthi s timeofwriting, the most recentvers ion of Linux i s 1.3.13, andthevers ion of the
ext2 ker nel source i s 0.5b. A lot of f ancy enhancements are planne d for the ext2 le system in Linux 1.3, so
stay tuned.
3 A le system - Whydoweneedit?
Ithoughtthat b efore we diveintothevar ious small details, I'll re s erve a few minute s for the di scuss ion of
le systems f rom a general p oint of view.
A filesystem cons i stsoftwoword - file and system.
Everyone knows themeaningoftheword file -Abunchofdataput somewhere. where ? Thi s i s an imp ortant
que stion. I, for example, usually throw almost everythingintoasingle drawer, andhave dicultie s nding
somethinglater.
Thi s i s where the system come s in - Instead of just throwingthedatatothedevice, we generalize and
construct a system which will virtualize for us a nice and ordere d structure in whichwe could arrange our
datainmuchthe sameway as b o oks are arrange d in a library.The purp os e of the le system, as I understand
it, i s tomake it easy for us toupdateandmaintain our data.
Normally,by mounting le systems, we just us e the nice and logical virtual structure. However, the di sk knows
nothingaboutthat-Thedevice dr iver views the di sk as a large continuous pap er in whichwe can wr ite notes
wherever we wi sh. It i s thetask of the le system managementcodetostore b o okkeeping information which
will s ervetheker nel for showingusthe nice and ordere d virtual structure.
In thi s do cument, we cons ider one particular admini strative structure - The Second Extende d File system.
4 The Linux VFS layer
When Linux was rst developed, it supp orte d only one le system-TheMinix le system. Today, Linux has
theabilityto supp ort s everal le systems concurrently. Thi s was donebytheintro duction of another layer
between theker nel andthe le system co de-The Virtual File System VFS.
5. Aboutblo cks andblo ck group s 4
Theker nel "sp eaks" withthe VFS layer. The VFS layer pass e s theker nel's reque st tothe prop er le system
managementcode. I haven't lear ned muchofthe VFS layer as I didn't nee d it for the construction of EXT2ED
so that I can't elab orate on it. Just b e aware that it exi sts.
5 Aboutblo cks andblo ck group s
In order to eas e management, the ext2 le system logically divides thediskintosmall units calle d blocks.A
1
blo ckisthesmalle st unit which can b e allo cate d. Each blo ckinthe le system can b e allocated or free.
The blo ck s ize can b e s elected to b e 1024, 2048 or 4096 bytes when creatingthe le system.
Ext2fs groups together a xe d numb er of s equential blo cks intoa group block.The re sultingsituation i s
thatthe le system i s manage d as a s er ie s of group blo cks. Thi s i s done in order tokeep relate d information
phys ically clos e on thediskandto eas e themanagementtask. As a re sult, muchofthe le system management
re duce s tomanagementofa single blo cks group.
6 The view of ino de s f rom the p oint of view of a blo cks group
Each le in the le system i s re s erve d a sp ecial inode. I don't wantto explain ino des now. Rather, I would
liketo treat it as another re source, much likeablock - Each blo cks group contains a limited number of
ino de, while any sp eci c ino de can b e allocated or unallocated.
7 The group de scr iptors
Each blo cks group i s accompanie d byagroup descriptor.The group de scr iptor summar ize s somenece ssary
information aboutthe sp eci c group blo ck. Follows thede nition of the group de scr iptor, as de ned in
/usr/include/linux/ext2 fs.h:
struct ext2_group_desc
{
__u32 bg_block_bitmap; /* Blocks bitmap block */
__u32 bg_inode_bitmap; /* Inodes bitmap block */
__u32 bg_inode_table; /* Inodes table block */
__u16 bg_free_blocks_count; /* Free blocks count */
__u16 bg_free_inodes_count; /* Free inodes count */
__u16 bg_used_dirs_count; /* Directories count */
__u16 bg_pad;
__u32 bg_reserved[3];
};
1
The Ext2fs source co de refers tothe concept of fragments, which I b elieve are supposed tobesub-blo ck allo cations.
As f ar as I know, f ragments are currently unsupp orte d in Ext2fs.
8. Theblo ck bitmap allo cation blo ck 5
The last three var iable s: bg free blocks count, bg free inodes count and bg used dirs count provide
statistics aboutthe us e of thethree re source s in a blo cks group - The blocks,the inodes andthe directories.
I b elievethatthey are us e d bytheker nel for balancingthe load b etween thevar ious blo cks groups.
bg block bitmap contains the blo cknumberofthe block allocation bitmap block. Thi s i s us e d to
allo cate/ deallo cate each blo ckinthe sp eci c blo cks group.
inode bitmap i s fully analogous tothe previous var iable - It contains the blo cknumberofthe inode bg
allocation bitmap block, whichisusedto allo cate/ deallo cate each sp eci c ino deinthe le system.
inode table contains the blo cknumberofthestart of the inode table of the current blocks group. bg
The inode table i s just the actual ino de s which are re s erve d for the current blo ck.
The blo ck bitmap blo ck, ino de bitmap blo ckandthe ino detable are created when the le system i s created.
The group de scr iptors are place d one after theother. Together they makethe group descriptors table.
Each blo cks group contains theentire table of group de scr iptors in its s econd blo ck, r ight after the sup erblo ck.
However, only the rst copy in group 0 i s actually us e d bytheker nel. Theother copie s are there for backup
purp os e s and can b e of us e if themain copy gets corrupted.
8 Theblo ck bitmap allo cation blo ck
Each blo cks group contains one sp ecial blo ck whichisactually a map of theentire blo cks in the group, with
re sp ect tothe ir allo cation status. Each bit in the blo ck bitmap indicated whether a sp eci c blo ckinthe
group i s us e d or f ree.
The format i s actually quite s imple - Just view theentire blo ck as a series of bits. For example,
Suppose the blo ck s ize i s 1024 bytes. As such, there i s a place for 1024*8=8192 blo cks in a group blo ck. Thi s
number is oneofthe elds in the le system's superblock, which will b e explained later.
Blo ck 0 in the blo cks group i s manage d by bit 0 of byte0inthe bitmap blo ck.
Blo ck 7 in the blo cks group i s manage d by bit 7 of byte0inthe bitmap blo ck.
Blo ck 8 in the blo cks group i s manage d by bit 0 of byte1inthe bitmap blo ck.
Blo ck 8191 in the blo cks group i s manage d by bit 7 of byte 1023 in the bitmap blo ck.
Avalue of "1"intheappropr iate bit s ignals thatthe blo ck i s allo cate d, while a value of "0" s ignals thatthe
blo ckisunallo cated.
You will probably notice thattypically, all the bitsinabyte contain the samevalue, makingthebyte's value
0 or 0ffh. Thi s i s donebytheker nel on purp os e in order to group related datainphys ically clos e blo cks,
s ince thephys ical device i s usually optimize d tohandle such a clos e relationship.
9. The ino de allo cation bitmap 6
9 The ino de allo cation bitmap
The formatofthe ino de allo cation bitmap blo ck i s exactly likethe formatofthe blo ck allo cation bitmap
blo ck. The explanation aboveisvalid here, withthework block replace d by inode.Typically,there are
much le ss ino des then blo cks in a blo cks group andthus only part of the ino de bitmap blo ck is used. The
numb er of ino de s in a blo cks group i s another var iable whichisliste d in the superblock.
10 On the ino deandthe ino detable s
An ino deisamain re source in the ext2 le system. It i s us e d for var ious purp os e s, butthemain two are:
Supp ort of le s
Supp ort of director ie s
Each le, for example, will allo cateone ino de f rom the le system re source s.
An ext2 le system has a total number of available ino de s whichisdetermine d while creatingthe le system.
When all the ino de s are us e d, for example, you will not b e able to create an additional le even though there
will still b e f ree blo cks on the le system.
Each ino detake s up 128 byte s in the le system. By def ault, mke2fs re s erve s an ino de for each 4096 bytes
of the le system space.
The ino de s are place d in s everal table s, each of which contains the samenumb er of ino des and i s place d at
a di erent blo cks group. The goal i s to place ino des andthe ir relate d le s in the same blo cks group b ecaus e
of lo cality arguments.
Thenumb er of ino de s in a blo cks group i s available in the sup erblo ckvar iable s inodes per group.For
example, if there are 2000 ino de s p er group, group 0 will contain the ino de s 1-2000, group 2 will contain the
ino de s 2001-4000, and so on.
Each ino detable i s acce ss e d f rom the group de scr iptor of the sp eci c blo cks group which contains thetable.
Follows the structure of an ino de in Ext2fs:
struct ext2_inode {
__u16 i_mode; /* File mode */
__u16 i_uid; /* Owner Uid */
__u32 i_size; /* Size in bytes */
__u32 i_atime; /* Access time */
__u32 i_ctime; /* Creation time */
__u32 i_mtime; /* Modification time */
__u32 i_dtime; /* Deletion Time */
__u16 i_gid; /* Group Id */
__u16 i_links_count; /* Links count */
__u32 i_blocks; /* Blocks count */
__u32 i_flags; /* File flags */ union {
10. On the ino deandthe ino detable s 7
struct {
__u32 l_i_reserved1;
} linux1;
struct {
__u32 h_i_translator;
} hurd1;
struct {
__u32 m_i_reserved1;
} masix1;
} osd1; /* OS dependent 1 */
__u32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */
__u32 i_version; /* File version for NFS */
__u32 i_file_acl; /* File ACL */
__u32 i_dir_acl; /* Directory ACL */
__u32 i_faddr; /* Fragment address */
union {
struct {
__u8 l_i_frag; /* Fragment number */
__u8 l_i_fsize; /* Fragment size */
__u16 i_pad1;
__u32 l_i_reserved2[2];
} linux2;
struct {
__u8 h_i_frag; /* Fragment number */
__u8 h_i_fsize; /* Fragment size */
__u16 h_i_mode_high;
__u16 h_i_uid_high;
__u16 h_i_gid_high;
__u32 h_i_author;
} hurd2;
struct {
__u8 m_i_frag; /* Fragment number */
__u8 m_i_fsize; /* Fragment size */
__u16 m_pad1;
__u32 m_i_reserved2[2];
} masix2;
} osd2; /* OS dependent 2 */
};
10.1 The allo cated blo cks
The bas ic functionality of an ino deisto group together a s er ie s of allo cate d blo cks. There i s no limitation on
the allo cate d blo cks - Each blo ck can b e allo cated to each ino de. Neverthele ss, blo ck allo cation will usually
b e done in s er ie s totake advantage of the lo cality pr inciple.
The ino de i s not always us e d in thatway. I will now explain the allo cation of blo cks, assumingthatthe
current ino detyp e indee d refers to a li st of allo cate d blo cks.
10. On the ino deandthe ino detable s 8
It was found exp er imently thatmanyofthe le s in the le system are actually quitesmall. Totake advantage
of thi s e ect, theker nel provides storage of up to 12 blo cknumb ers in the ino deits elf. Thos e blo cks are
calle d direct blocks.The advantage i s that once theker nel has the ino de, it can directly acce ss the le's
blo cks, without an additional di sk acce ss. Thos e 12 blo cks are directly sp eci e d in thevar iable s i block[0]
to i block[11].
i block[12] is the indirect block -The blo ck p ointed byiblo ck12 will not beadata blo ck. Rather, it
will just contain a li st of direct blo cks. For example, if the blo ck s ize i s 1024 byte s, s ince each blo cknumber
is4byte s long, there will b e place for 256 indirect blo cks. That i s, blo ck13till blo ck 268 in the le will b e
acce ss e d bythe indirect block method. Thepenaltyinthi s cas e, compare d tothe direct blo cks cas e, i s
that an additional acce ss tothedevice i s nee ded - Wenee d two acce ss e s to reachthe require d data blo ck.
In muchthe sameway, i block[13] is the double indirect block and i block[14] is the triple
indirect block.
block[13] p ointsto a blo ck which contains p ointers toindirect blo cks. Eachoneofthem i s handle d in i
theway de scr ib e d above.
In muchthe sameway,the tr iple indirect blo ck i s just an additional level of indirection - It will p ointtoa
li st of double indirect blo cks.
mo devar iable 10.2 Thei
Theimo devar iable is used todeterminetheinode type andthe asso ciated permissions. It is best
de scr ib e d by repre s enting itasanoctal numb er. Since it i s a 16 bit var iable, there will b e 6 o ctal digits.
Thos e are divided intotwo parts-The r ightmost 4 digitsandthe leftmost 2 digits.
10.2.1 The r ightmost 4 o ctal digits
The r ightmost 4 digits are bit options - Each bit has itsown purp os e.
The last 3 digits Octal digits 0,1 and 2 are just the usual p ermi ss ions, in the known form rwxrwxrwx. Digit
2 refers tothe us er, digit 1 tothe group and digit 2 toeveryone els e. They are us e d bytheker nel to grant
2
or deny acce ss tothe ob ject pre s ented bythi s ino de.
Bit number9signals thatthe le I'll refer tothe ob ject pre s ented bythe ino de as le even though it can b e
a sp ecial device, for example i s set VTX.I still don't knowwhatisthemeaning of "VTX".
Bit numb er 10 s ignals thatthe le i s set group id - I don't know exactly themeaningoftheaboveeither.
Bit number11signals thatthe le i s set user id, whichmeans thatthe le will run with an e ectiveuser
id ro ot.
10.2.2 The leftmo st twooctal digits
Notethethe leftmost o ctal digit can only b e 0 or 1, s ince thetotal numb er of bits i s 16.
2
A smarter p ermi ss ions control i s oneofthe enhancements planne d for Linux 1.3 - TheACL Acce ss Control
Li sts. Actually, f rom brows ingoftheker nel source, someoftheACL handling i s already done.
10. On the ino deandthe ino detable s 9
Thos e digits, as opposed tothe r ightmost 4 digits, are not bit mapped options. They determinethetyp e of
the " le" to whichthe ino de b elongs:
01 -The le i s a FIFO.
02 -The le i s a character device.
04 -The le i s a directory.
06 -The le i s a block device.
10 -The le i s a regular file.
12 -The le i s a symbolic link.
14 -The le i s a socket.
10.3 Timeanddate
Linux records the last time in whichvar ious op erations o ccure d withthe le. Thetimeanddate are saved in
thestandard C library format- Thenumb er of s econds which pass e d s ince 00:00:00 GMT, January 1, 1970.
The followingtime s are recorded:
ctime -Thetime in whichthe ino dewas last allo cated. In other words, thetime in whichthe le i
was created.
mtime -Thetime in whichthe le was last mo di e d. i
i atime -Thetime in whichthe le was last acce ss e d.
i dtime -Thetime in whichthe ino dewas deallo cated. In other words, thetime in whichthe le was
deleted.
s ize 10.4 i
i size contains information aboutthe s ize of the ob ject pre s ented bythe ino de. If the ino de corre sp onds
to a regular le, thi s i s just the s ize of the le in bytes. In other cas e s, theinterpretation of thevar iable i s
di erent.
10.5 Us er and group id
Theuserand group id of the le are just save d in thevar iable s i uid and i gid.
10. On the ino deandthe ino detable s 10
10.6 Hard links
Later, when we'll di scuss the implementation of director ie s, it will b e explained that each directory entry
p ointsto an ino de. It i s quite p oss ible thatasingle inode will b e p ointed to f rom several director ie s. In
that cas e, we say thatthere exi st hard links tothe le-The le can b e acce ss e d f rom eachofthe director ie s.
links count.Thevar iable is set to "1" Theker nel keeps trackofthenumber of hard links in thevar iable i
when rst allo catingthe ino de, and i s incremente d with each additional link. Deletion of a le will delete
the current directory entry and will decrementthenumb er of links. Only when thi s numb er reache s zero, the
ino de will b e actually deallo cated.
Thename hard link is used todistingui sh b etween the alias method de scr ib e d above, to another alias
metho d calle d symbolic linking, which will b e de scr ib e d later.
10.7 The Ext2fs extende d ags
The ext2 le system asso ciate s additional ags with an ino de. The extended attr ibute s are store d in the
var iable i flags. i flags i s a 32 bit var iable. Only the 7 r ightmost bits are de ned. Of them, only 5
bits are us e d in vers ion 0.5a of the le system. Sp eci cally,the undelete andthe compress feature s are not
implemented, and are tobeintro duce d in Linux 1.3 development.
The currently available ags are:
bit 0 - Secure deletion.
When thi s bit i s on, the le's blo cks are zero e d when the le i s delete d. Withthi s bit o , they will just
b e left withthe ir or iginal datawhen the ino deisdeallo cated.
bit 1 - Undelete.
Thi s bit i s not supp orted yet. It will b e us e d to provideanundelete feature in future Ext2fs develop-
ments.
bit 2 - Compre ss le.
Thi s bit i s also not supp orted. The plan i s to o er "compre ss ion on the y" in future releas e s.
bit 3 - Synchronous up dates.
Withthi s bit on, themeta-data will b e wr itten synchronously tothe di sk, as if the le system was
mounte d withthe "sync" mountoption.
bit 4 - Immutable le.
When thi s bit i s on, the le will stay as it i s - Can not b e change d, delete d, renamed, no hard links,
etc, b efore the bit i s cleare d.
bit 5 - Append only le.
Withthi s option active, data will only b e appended tothe le.
bit 6 - Do not dump thi s le.
Ithink thatthi s bit i s us e d bythe p ort of dump to linux p orted by Remy Cardtocheckifthe le
should not b e dumped.
10. On the ino deandthe ino detable s 11
10.8 Symbolic links
The hard links presented above are just another p ointers tothe same ino de. The imp ortant asp ect i s that
the ino denumber is fixed when the link i s created. Thi s means thatthe implementation details of the
le system are vi s ible tothe us er - In a pure abstract usage of the le system, theusershould not care about
ino des.
Theabovecauses several limitations:
Hard links can b e done only in the same le system. Thi s i s obvious, s ince a hard link i s just an ino de
numb er in some directory entry,andtheabove elements are le system sp eci c.
You can not "replace" the le which i s p ointed tobythehard link after the link creation. "Replacing"
the le in one directory will still leavethe or iginal le in theother directory - The "replacement" will not
deallo catethe or iginal ino de, butrather allo cate another ino de for thenew vers ion, andthe directory
entry attheother place will just p ointtothe old ino denumb er.
Symbolic link,ontheother hand, i s analyze d at run time. A symb olic link i s just a pathname whichis
acce ss ible f rom an ino de. As such, it "sp eaks" in the language of theabstract le system. When theker nel
reache s a symb olic link, it will follow it in run time us ingits normal way of reaching director ie s.
As such, symb olic link can b e made across different filesystems and a replacement of a le withanew
vers ion will automatically b e active on all its symb olic links.
The di sadvantage i s thathard link do e sn't consume space except toasmall directory entry. Symb olic link,
on theother hand, consumes at least an ino de, and can also consumeone blo ck.
When the ino deisidenti e d as a symb olic link, theker nel nee ds to ndthepathto which it p oints.
10.8.1 Fast symbolic links
When thepathname contains up to64byte s, it can b e save d directly in the ino de, on the i block[0] -
i block[15] var iable s, s ince thos e are not nee de d in that cas e. Thi s i s calle d fast symb olic link. It i s f ast
b ecaus e thepathname re solution can b e doneusingthe ino deits elf, without acce ss ing additional blo cks. It i s
also economical, s ince it allo cate s only an ino de. The lengthofthepathnameisstore d in the i size var iable.
10.8.2 Slow symbolic links
Starting f rom 65 byte s, additional blo ck i s allo cated bytheuseofiblock[0]andthepathnameisstore d
in it. It i s calle d slow b ecaus e theker nel nee ds to read additional blo ckto re solvethepathname. The length
i s again saved in i size.
10.9 i vers ion
i version is used with regard to Network File System. I don't knowits exact us e.
11. Director ie s 12
10.10 Re s erved var iable s
As f ar as I know, thevar iable s which are connected toACL and f ragments are not currently us e d. They will
b e supp orte d in future vers ions.
Ext2fs i s b e ing p orted toother op erating systems. As f ar as I know, at least in linux, theosdep endent
var iable s are also not us e d.
10.11 Sp ecial re s erve d ino des
The rst ten ino de s on the le system are sp ecial ino des:
Ino de1isthebad blocks inode - I b elievethatitsdata blo cks containalistofthe bad blo cks in the
le system, whichshould not b e allo cated.
Ino de2istheroot inode -The ino deofthe ro ot directory.Itisthestarting p oint for reachinga
known pathinthe le system.
Ino de3istheacl index inode. Acce ss control li sts are currently not supp orted bythe ext2 le system,
so I b elievethi s ino de i s not us e d.
Ino de4isthe acl data inode. Of cours e, theaboveapplie s here too.
Ino de5isthe boot loader inode. I don't knowits usage.
Ino de6isthe undelete directory inode. It i s also a foundation for future enhancements, andis
currently not us e d.
Ino de s 7-10 are reserved and currently not us e d.
11 Director ie s
A directory i s implemente d in the sameway as le s are implemente d withthe direct blo cks, indirect blo cks,
etc - It i s just a le which i s formatte d with a sp ecial format - A li st of directory entr ie s.
Follows thede nition of a directory entry:
struct ext2_dir_entry {
__u32 inode; /* Inode number */
__u16 rec_len; /* Directory entry length */
__u16 name_len; /* Name length */
char name[EXT2_NAME_LEN]; /* File name */
};
Ext2fs supp orts le name s of varying lengths, up to 255 bytes. The name eld above just contains the le
name. Notethatitisnot zero terminated; Instead, thevar iable name len contains the lengthofthe le
name.
12. The sup erblo ck 13
Thevar iable rec len i s provided becaus e the directory entr ie s are padde d with zero e s so thatthenext entry
will b e in an o s et which isamultiplition of 4. The re sulting directory entry s ize i s store d in rec len.If
len i s the directory entry i s the last in the blo ck, it i s padde d with zero e s till theendofthe blo ck, and rec
up date d accordingly.
The inode var iable p ointstothe ino deoftheabove le.
Deletion of directory entr ie s i s donebyappendingofthedeleted entry space tothe previous or next, I am
not sure entry.
12 The sup erblo ck
The superblock i s a blo ck which contains information whichde scr ib e s thestateoftheinter nal le system.
The sup erblo ckislocated atthe fixed offset 1024 in thedevice. Its length i s 1024 byte s also.
The sup erblo ck, likethe group de scr iptors, i s copie d on each blo cks group b oundary for backup purp os e s.
However, only themain copy is used bytheker nel.
The sup erblo ck contain three typ e s of information:
File system parameters which are xe d and whichwere determined when thi s sp eci c le system was
create d. Someofthos e parameters can b e di erent in di erent installations of the ext2 le system, but
can not b e change d once the le system was created.
File system parameters which are tunable - Can always b e change d.
Information aboutthe current le system state.
Follows the sup erblo ckde nition:
struct ext2_super_block{
__u32 s_inodes_count; /* Inodes count */
__u32 s_blocks_count; /* Blocks count */
__u32 s_r_blocks_count; /* Reserved blocks count */
__u32 s_free_blocks_count; /* Free blocks count */
__u32 s_free_inodes_count; /* Free inodes count */
__u32 s_first_data_block; /* First Data Block */
__u32 s_log_block_size; /* Block size */
__s32 s_log_frag_size; /* Fragment size */
__u32 s_blocks_per_group; /* Blocks per group */
__u32 s_frags_per_group; /* Fragments per group */
__u32 s_inodes_per_group; /* Inodes per group */
__u32 s_mtime; /* Mount time */
__u32 s_wtime; /* Write time */
__u16 s_mnt_count; /* Mount count */
__s16 s_max_mnt_count; /* Maximal mount count */
__u16 s_magic; /* Magic signature */
__u16 s_state; /* File system state */
12. The sup erblo ck 14
__u16 s_errors; /* Behaviour when detecting errors */
__u16 s_pad;
__u32 s_lastcheck; /* time of last check */
__u32 s_checkinterval; /* max. time between checks */
__u32 s_creator_os; /* OS */
__u32 s_rev_level; /* Revision level */
__u16 s_def_resuid; /* Default uid for reserved blocks */
__u16 s_def_resgid; /* Default gid for reserved blocks */
__u32 s_reserved[235]; /* Padding to the end of the block */
};
12.1 sup erblo ckidenti cation
The ext2 le system's sup erblo ckisidenti e d bythe s magic eld. The current ext2 magic numb er i s 0xEF53.
I pre sumethat "EF" means "Extende d File system". In vers ions of the ext2 le system pr ior to 0.2B, themagic
number was 0xEF51. Thos e le systems are not compatible withthe currentvers ions; Sp eci cally,the group
de scr iptors de nition i s di erent. I doubt if there still exi stssuch a installation.
12.2 File system xe d parameters
By us ingtheword fixed,Imean xe d with re sp ect to a particular installation. Thos e var iable s are usually
not xe d with re sp ect to di erent installations.
The block size is determined byusingthe s log block size var iable. The blo ck s ize i s 1024*p ow
2,s log blo ck s ize andshould b e b etween 1024 and 4096. Theavailable options are 1024, 2048 and 4096.
s inodes count contains thetotal number of available ino des.
s blocks count contains thetotal number of available blo cks.
s first data block sp eci e s in whichofthe device block the superblock is present. The sup erblo ckis
always pre s entatthe xe d o s et 1024, butthedevice blo cknumbering can di er. For example, if the blo ck
s ize i s 1024, the sup erblo ck will b e at block 1 with re sp ect tothedevice. However, if the blo ck s ize i s 4096,
first data block will contain 0. At o s et 1024 i s included in block 0 of thedevice, andinthat cas e s
least thi s i s howI understood thi s var iable.
s blocks per group contains thenumb er of blo cks which are group e d together as a blo cks group.
s inodes per group contains thenumb er of ino des available in a group blo ck. I think thatthi s i s always the
total numb er of ino de s divided bythenumb er of blo cks groups.
s creator os contains a co denumb er which sp eci e s theop erating system which created thi s sp eci c le sys-
tem:
Linux :- i s sp eci e d bythevalue 0.
Hurd i s sp eci e d bythevalue 1.
Masix i s sp eci e d bythevalue 2.
12. The sup erblo ck 15
s rev level contains thema jor vers ion of the ext2 le system. Currently thi s i s always 0,asthe most recent
vers ion i s 0.5B. It will probably take sometimeuntil we reachvers ion 1.0.
As f ar as I know, f ragments sub-blo ck allo cations are currently not supp orted andhence a blo ck i s equal
to a f ragment. As a re sult, s log frag size and s frags per group are always equal to s log block size
blocks per group, respectively. and s
12.3 Ext2fs error handling
The ext2 le system error handling i s bas e d on the following philosophy:
1. Identi cation of problems i s donebytheker nel co de.
2. The correction task i s left to an exter nal utility,suchase2fsck by Theodore Ts'o for automatic
analys i s and correction, or p erhaps debugfs by Theodore Ts'o and EXT2ED by myself, for hand
analys i s and correction.
The s state var iable is used bytheker nel to pass theidenti cation re sulttothird partyutilitie s:
bit 0 of s state is reset when the partition i s mounted and s et when the partition i s unmounted. Thus,
avalue of 0 on an unmounte d le system means thatthe le system was not unmounte d prop erly - The
le system i s not "clean" and probably contains errors.
bit 1 of s state i s s et bytheker nel when it detects an error in the le system. A value of 0 do e sn't
mean thatthere i sn't an error in the le system, just thattheker nel didn't ndany.
Theker nel b ehavior when an error i s foundisdetermined bytheusertunable parameter s errors:
Theker nel will ignore the error and continue if s errors=1.
Theker nel will remountthe le system in read-only mo deifs errors=2.
errors=3. Aker nel panic will b e i ssue d if s
Thedef aultbehavior i s to ignore the error.
12.4 Additional parameters us e d by e2fsck
Of-cours e, e2fsck will checkthe le system if errors were detecte d or if the le system i s not clean.
In addition, eachtimethe le system i s mounted, s mnt count i s incremented. When s mnt count reaches
max mnt count, e2fsck will force a checkonthe le system even though it may b e clean. It will then zero s
s mnt count. s max mnt count is a tunable parameter.
E2fsck also records the last time in whichthe le system was checke d in the s lastcheck var iable. The
us er tunable parameter s checkinterval will contain thenumb er of s econds which are allowed to pass s ince
s lastcheck until a check i s reforce d. A value of 0 di sable s time-bas e d check.
13. Copyr ight 16
12.5 Additional us er tunable parameters
s r blocks count contains thenumber of disk blocks which are re s erve d for ro ot, theuserwhos e id number
is s def resuid andthe group whos e id number is s deg resgid.Theker nel will refus e to allo catethos e
last s r blocks count if the us er i s not oneoftheabove. Thi s i s donesothatthe le system will usually not
b e 100 full, s ince 100 full le systems can a ect var ious asp ectsofop eration.
s def resuid and s def resgid contain theidoftheuserandofthe group who can us e the re s erve d blo cks
in addition to ro ot.
12.6 File system currentstate
sfree blocks count contains the currentnumb er of f ree blo cks in the le system.
s free inodes count contains the currentnumb er of f ree ino de s in the le system.
s mtime contains thetimeat whichthe system was last mounted.
wtime contains the last timeat which somethingwas changedinthe le system. s
13 Copyr ight
Thi s do cument contains source co de whichwas taken f rom the Linux ext2 ker nel source co de, mainly f rom
/usr/include/linux/ext2 fs.h. Follows the or iginal copyr ight:
/*
* linux/include/linux/ext2_fs.h
*
* Copyright C 1992, 1993, 1994, 1995
* Remy Card [email protected]
* Laboratoire MASI - Institut Blaise Pascal
* Universite Pierre et Marie Curie Paris VI
*
* from
*
* linux/include/linux/minix_fs.h
*
* Copyright C 1991, 1992 Linus Torvalds
*/
14 Acknowle dgments
Iwould liketothank the followingpeople, whowere involve d in thede s ign and implementation of the ext2
le system ker nel co deand supp ort utilitie s:
14. Acknowle dgments 17
Remy Card
Whode s igne d, implemented andmaintains the ext2 le system ker nel co de, and someofthe ext2 utilitie s.
Remy Card i s also theauthor of s everal helpful slide s concer ningthe ext2 le system. Sp eci cally,heis
theauthor of File Management in the Linux Kernel andofThe Second Extended File System
- Current State, Future Development.
Wayne Davison
Whode s igned the ext2 le system.
Stephen Tweedie
Whohelped de s igningthe ext2 le system ker nel co deand wrotethe slides Optimizations in File
Systems.
Theodore Ts'o
Whoistheauthor of s everal ext2 utilitie s andofthe ext2 library libext2fs which I didn't us e, s imply
b ecaus e I didn't know it exi stswhenIstarted toworkonmy pro ject.
Lastly,I would liketothank, of-cours e, Linus Torvalds andtheLinux community for providing all of us
withsuch a greatop erating system.
Pleas e contact me in a cas e of an error rep ort, sugge stions, or just aboutanything concer ningthi s do cument.
Enjoy,
Gadi Oxman
Haif a, August 95