The extende d-2 le system overview

Gadi Oxman, tgud@tochnap c2.technion.ac.il v0.1, August 3 1995

Contents

1 Pref ace 2

2 Intro duction 3

3 A le system - Whydowenee d it ? 3

4 The VFS layer 3

5 Aboutblo cks andblo ck group s 4

6 The view of ino de s f rom the p oint of view of a blo cks group 4

7 The group de scr iptors 4

8 Theblo ck bitmap allo cation blo ck 5

9 The ino de allo cation bitmap 6

10 On the ino deandthe ino detable s 6

10.1 The allo cate d blo cks : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 7

10.2 Thei mo devar iable : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 8

10.2.1 The r ightmost 4 o ctal digits :::: ::: :::: ::: :::: ::: :::: ::: :::: : 8

10.2.2 The leftmost twooctal digits ::: ::: :::: ::: :::: ::: :::: ::: :::: : 8

10.3 Timeanddate : ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9

s ize ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9 10.4 i

10.5 Us er and group id :: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 9

10.6 Hard links :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 10

10.7 The Ext2fs extende d ags :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 10

10.8 Symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11

1. Pref ace 2

10.8.1 Fast symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11

10.8.2 Slow symb olic links :: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11

10.9 i vers ion : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 11

10.10Re s erved var iable s :: :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 12

10.11Sp ecial re s erve d ino des :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 12

11 Director ie s 12

12 The sup erblo ck 13

12.1 sup erblo ckidenti cation ::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 14

12.2 File system xe d parameters : ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 14

12.3 Ext2fs error handling : :::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 15

12.4 Additional parameters us e d by e2fsck ::: ::: :::: ::: :::: ::: :::: ::: :::: : 15

12.5 Additional us er tunable parameters :::: ::: :::: ::: :::: ::: :::: ::: :::: : 16

12.6 File system currentstate ::: ::: :::: ::: :::: ::: :::: ::: :::: ::: :::: : 16

13 Copyr ight 16

14 Acknowle dgments 16

1 Pref ace

Thi s do cumentattemptsto presentanoverview of theinter nal structure of the le system. It was wr itten

in summer 95, while I was workingonthe ext2 filesystem editor project EXT2ED.

In the pro ce ss of constructing EXT2ED, I acquire d knowle dge of thevar ious de s ign asp ectsofthethe ext2

le system. Thi s do cument i s a re sult of an e ort todocumentthi s knowle dge.

Thi s i s only the initial vers ion of thi s do cument. It i s obviously neither error-prone nor complete, butat least

it providesastarting p oint.

In the pro ce ss of lear ningthesub ject, I have used the following source s / to ols:

 Exp er imenting with EXT2ED, as it was developed.

 The ext2 ker nel source s:

fs.h { Themain ext2 include le, /usr/include/linux/ext2

{ The contentsofthe directory /usr/src/linux/fs/ext2.

{ The VFS layer source s only a bit.

 The slides: The Second Extende d File System, CurrentState, Future Development, by Remy Card.

2. Intro duction 3

 The slide s: Optimi sation in File Systems, by Stephen Tweedie.

 Thevar ious ext2 utilitie s.

2 Intro duction

The Second Extended File System Ext2fs is very p opular among Linux us ers. If you us e Linux, chance s

are thatyou are us ingthe ext2 le system.

Ext2fs was de s igned by Remy Card and Wayne Davison.Itwas implemented by Remy Card andwas further

enhance d by Stephen Tweedie and Theodore Ts'o.

The ext2 le system i s still under development. I will do cumenthere vers ion 0.5a, which i s di str ibute d along

with Linux 1.2.x. Atthi s timeofwriting, the most recentvers ion of Linux i s 1.3.13, andthevers ion of the

ext2 ker nel source i s 0.5b. A lot of f ancy enhancements are planne d for the ext2 le system in Linux 1.3, so

stay tuned.

3 A le system - Whydoweneedit?

Ithoughtthat b efore we diveintothevar ious small details, I'll re s erve a few minute s for the di scuss ion of

le systems f rom a general p oint of view.

A filesystem cons i stsoftwoword - file and system.

Everyone knows themeaningoftheword file -Abunchofdataput somewhere. where ? Thi s i s an imp ortant

que stion. I, for example, usually throw almost everythingintoasingle drawer, andhave dicultie s nding

somethinglater.

Thi s i s where the system come s in - Instead of just throwingthedatatothedevice, we generalize and

construct a system which will virtualize for us a nice and ordere d structure in whichwe could arrange our

datainmuchthe sameway as b o oks are arrange d in a library.The purp os e of the le system, as I understand

it, i s tomake it easy for us toupdateandmaintain our data.

Normally,by mounting le systems, we just us e the nice and logical virtual structure. However, the di sk knows

nothingaboutthat-Thedevice dr iver views the di sk as a large continuous pap er in whichwe can wr ite notes

wherever we wi sh. It i s thetask of the le system managementcodetostore b o okkeeping information which

will s ervetheker nel for showingusthe nice and ordere d virtual structure.

In thi s do cument, we cons ider one particular admini strative structure - The Second Extende d File system.

4 The Linux VFS layer

When Linux was rst developed, it supp orte d only one le system-TheMinix le system. Today, Linux has

theabilityto supp ort s everal le systems concurrently. Thi s was donebytheintro duction of another layer

between theker nel andthe le system co de-The Virtual File System VFS.

5. Aboutblo cks andblo ck group s 4

Theker nel "sp eaks" withthe VFS layer. The VFS layer pass e s theker nel's reque st tothe prop er le system

managementcode. I haven't lear ned muchofthe VFS layer as I didn't nee d it for the construction of EXT2ED

so that I can't elab orate on it. Just b e aware that it exi sts.

5 Aboutblo cks andblo ck group s

In order to eas e management, the ext2 le system logically divides thediskintosmall units calle d blocks.A

1

blo ckisthesmalle st unit which can b e allo cate d. Each blo ckinthe le system can b e allocated or free.

The blo ck s ize can b e s elected to b e 1024, 2048 or 4096 bytes when creatingthe le system.

Ext2fs groups together a xe d numb er of s equential blo cks intoa group block.The re sultingsituation i s

thatthe le system i s manage d as a s er ie s of group blo cks. Thi s i s done in order tokeep relate d information

phys ically clos e on thediskandto eas e themanagementtask. As a re sult, muchofthe le system management

re duce s tomanagementofa single blo cks group.

6 The view of ino de s f rom the p oint of view of a blo cks group

Each le in the le system i s re s erve d a sp ecial inode. I don't wantto explain ino des now. Rather, I would

liketo treat it as another re source, much likeablock - Each blo cks group contains a limited number of

ino de, while any sp eci c ino de can b e allocated or unallocated.

7 The group de scr iptors

Each blo cks group i s accompanie d byagroup descriptor.The group de scr iptor summar ize s somenece ssary

information aboutthe sp eci c group blo ck. Follows thede nition of the group de scr iptor, as de ned in

/usr/include/linux/ext2 fs.h:

struct ext2_group_desc

{

__u32 bg_block_bitmap; /* Blocks bitmap block */

__u32 bg_inode_bitmap; /* Inodes bitmap block */

__u32 bg_inode_table; /* Inodes table block */

__u16 bg_free_blocks_count; /* Free blocks count */

__u16 bg_free_inodes_count; /* Free inodes count */

__u16 bg_used_dirs_count; /* Directories count */

__u16 bg_pad;

__u32 bg_reserved[3];

};

1

The Ext2fs source co de refers tothe concept of fragments, which I b elieve are supposed tobesub-blo ck allo cations.

As f ar as I know, f ragments are currently unsupp orte d in Ext2fs.

8. Theblo ck bitmap allo cation blo ck 5

The last three var iable s: bg free blocks count, bg free inodes count and bg used dirs count provide

statistics aboutthe us e of thethree re source s in a blo cks group - The blocks,the inodes andthe directories.

I b elievethatthey are us e d bytheker nel for balancingthe load b etween thevar ious blo cks groups.

bg block bitmap contains the blo cknumberofthe block allocation bitmap block. Thi s i s us e d to

allo cate/ deallo cate each blo ckinthe sp eci c blo cks group.

inode bitmap i s fully analogous tothe previous var iable - It contains the blo cknumberofthe inode bg

allocation bitmap block, whichisusedto allo cate/ deallo cate each sp eci c ino deinthe le system.

inode table contains the blo cknumberofthestart of the inode table of the current blocks group. bg

The inode table i s just the actual ino de s which are re s erve d for the current blo ck.

The blo ck bitmap blo ck, ino de bitmap blo ckandthe ino detable are created when the le system i s created.

The group de scr iptors are place d one after theother. Together they makethe group descriptors table.

Each blo cks group contains theentire table of group de scr iptors in its s econd blo ck, r ight after the sup erblo ck.

However, only the rst copy in group 0 i s actually us e d bytheker nel. Theother copie s are there for backup

purp os e s and can b e of us e if themain copy gets corrupted.

8 Theblo ck bitmap allo cation blo ck

Each blo cks group contains one sp ecial blo ck whichisactually a map of theentire blo cks in the group, with

re sp ect tothe ir allo cation status. Each bit in the blo ck bitmap indicated whether a sp eci c blo ckinthe

group i s us e d or f ree.

The format i s actually quite s imple - Just view theentire blo ck as a series of bits. For example,

Suppose the blo ck s ize i s 1024 bytes. As such, there i s a place for 1024*8=8192 blo cks in a group blo ck. Thi s

number is oneofthe elds in the le system's superblock, which will b e explained later.

 Blo ck 0 in the blo cks group i s manage d by bit 0 of byte0inthe bitmap blo ck.

 Blo ck 7 in the blo cks group i s manage d by bit 7 of byte0inthe bitmap blo ck.

 Blo ck 8 in the blo cks group i s manage d by bit 0 of byte1inthe bitmap blo ck.

 Blo ck 8191 in the blo cks group i s manage d by bit 7 of byte 1023 in the bitmap blo ck.

Avalue of "1"intheappropr iate bit s ignals thatthe blo ck i s allo cate d, while a value of "0" s ignals thatthe

blo ckisunallo cated.

You will probably notice thattypically, all the bitsinabyte contain the samevalue, makingthebyte's value

0 or 0ffh. Thi s i s donebytheker nel on purp os e in order to group related datainphys ically clos e blo cks,

s ince thephys ical device i s usually optimize d tohandle such a clos e relationship.

9. The ino de allo cation bitmap 6

9 The ino de allo cation bitmap

The formatofthe ino de allo cation bitmap blo ck i s exactly likethe formatofthe blo ck allo cation bitmap

blo ck. The explanation aboveisvalid here, withthework block replace d by inode.Typically,there are

much le ss ino des then blo cks in a blo cks group andthus only part of the ino de bitmap blo ck is used. The

numb er of ino de s in a blo cks group i s another var iable whichisliste d in the superblock.

10 On the ino deandthe ino detable s

An ino deisamain re source in the ext2 le system. It i s us e d for var ious purp os e s, butthemain two are:

 Supp ort of le s

 Supp ort of director ie s

Each le, for example, will allo cateone ino de f rom the le system re source s.

An ext2 le system has a total number of available ino de s whichisdetermine d while creatingthe le system.

When all the ino de s are us e d, for example, you will not b e able to create an additional le even though there

will still b e f ree blo cks on the le system.

Each ino detake s up 128 byte s in the le system. By def ault, mke2fs re s erve s an ino de for each 4096 bytes

of the le system space.

The ino de s are place d in s everal table s, each of which contains the samenumb er of ino des and i s place d at

a di erent blo cks group. The goal i s to place ino des andthe ir relate d le s in the same blo cks group b ecaus e

of lo cality arguments.

Thenumb er of ino de s in a blo cks group i s available in the sup erblo ckvar iable s inodes per group.For

example, if there are 2000 ino de s p er group, group 0 will contain the ino de s 1-2000, group 2 will contain the

ino de s 2001-4000, and so on.

Each ino detable i s acce ss e d f rom the group de scr iptor of the sp eci c blo cks group which contains thetable.

Follows the structure of an ino de in Ext2fs:

struct ext2_inode {

__u16 i_mode; /* File mode */

__u16 i_uid; /* Owner Uid */

__u32 i_size; /* Size in bytes */

__u32 i_atime; /* Access time */

__u32 i_ctime; /* Creation time */

__u32 i_mtime; /* Modification time */

__u32 i_dtime; /* Deletion Time */

__u16 i_gid; /* Group Id */

__u16 i_links_count; /* Links count */

__u32 i_blocks; /* Blocks count */

__u32 i_flags; /* File flags */ union {

10. On the ino deandthe ino detable s 7

struct {

__u32 l_i_reserved1;

} linux1;

struct {

__u32 h_i_translator;

} hurd1;

struct {

__u32 m_i_reserved1;

} masix1;

} osd1; /* OS dependent 1 */

__u32 i_block[EXT2_N_BLOCKS];/* Pointers to blocks */

__u32 i_version; /* File version for NFS */

__u32 i_file_acl; /* File ACL */

__u32 i_dir_acl; /* Directory ACL */

__u32 i_faddr; /* Fragment address */

union {

struct {

__u8 l_i_frag; /* Fragment number */

__u8 l_i_fsize; /* Fragment size */

__u16 i_pad1;

__u32 l_i_reserved2[2];

} linux2;

struct {

__u8 h_i_frag; /* Fragment number */

__u8 h_i_fsize; /* Fragment size */

__u16 h_i_mode_high;

__u16 h_i_uid_high;

__u16 h_i_gid_high;

__u32 h_i_author;

} hurd2;

struct {

__u8 m_i_frag; /* Fragment number */

__u8 m_i_fsize; /* Fragment size */

__u16 m_pad1;

__u32 m_i_reserved2[2];

} masix2;

} osd2; /* OS dependent 2 */

};

10.1 The allo cated blo cks

The bas ic functionality of an ino deisto group together a s er ie s of allo cate d blo cks. There i s no limitation on

the allo cate d blo cks - Each blo ck can b e allo cated to each ino de. Neverthele ss, blo ck allo cation will usually

b e done in s er ie s totake advantage of the lo cality pr inciple.

The ino de i s not always us e d in thatway. I will now explain the allo cation of blo cks, assumingthatthe

current ino detyp e indee d refers to a li st of allo cate d blo cks.

10. On the ino deandthe ino detable s 8

It was found exp er imently thatmanyofthe le s in the le system are actually quitesmall. Totake advantage

of thi s e ect, theker nel provides storage of up to 12 blo cknumb ers in the ino deits elf. Thos e blo cks are

calle d direct blocks.The advantage i s that once theker nel has the ino de, it can directly acce ss the le's

blo cks, without an additional di sk acce ss. Thos e 12 blo cks are directly sp eci e d in thevar iable s i block[0]

to i block[11].

i block[12] is the indirect block -The blo ck p ointed byiblo ck12 will not beadata blo ck. Rather, it

will just contain a li st of direct blo cks. For example, if the blo ck s ize i s 1024 byte s, s ince each blo cknumber

is4byte s long, there will b e place for 256 indirect blo cks. That i s, blo ck13till blo ck 268 in the le will b e

acce ss e d bythe indirect block method. Thepenaltyinthi s cas e, compare d tothe direct blo cks cas e, i s

that an additional acce ss tothedevice i s nee ded - Wenee d two acce ss e s to reachthe require d data blo ck.

In muchthe sameway, i block[13] is the double indirect block and i block[14] is the triple

indirect block.

block[13] p ointsto a blo ck which contains p ointers toindirect blo cks. Eachoneofthem i s handle d in i

theway de scr ib e d above.

In muchthe sameway,the tr iple indirect blo ck i s just an additional level of indirection - It will p ointtoa

li st of double indirect blo cks.

mo devar iable 10.2 Thei

Theimo devar iable is used todeterminetheinode type andthe asso ciated permissions. It is best

de scr ib e d by repre s enting itasanoctal numb er. Since it i s a 16 bit var iable, there will b e 6 o ctal digits.

Thos e are divided intotwo parts-The r ightmost 4 digitsandthe leftmost 2 digits.

10.2.1 The r ightmost 4 o ctal digits

The r ightmost 4 digits are bit options - Each bit has itsown purp os e.

The last 3 digits Octal digits 0,1 and 2 are just the usual p ermi ss ions, in the known form rwxrwxrwx. Digit

2 refers tothe us er, digit 1 tothe group and digit 2 toeveryone els e. They are us e d bytheker nel to grant

2

or deny acce ss tothe ob ject pre s ented bythi s ino de.

Bit number9signals thatthe le I'll refer tothe ob ject pre s ented bythe ino de as le even though it can b e

a sp ecial device, for example i s set VTX.I still don't knowwhatisthemeaning of "VTX".

Bit numb er 10 s ignals thatthe le i s set group id - I don't know exactly themeaningoftheaboveeither.

Bit number11signals thatthe le i s set user id, whichmeans thatthe le will run with an e ectiveuser

id ro ot.

10.2.2 The leftmo st twooctal digits

Notethethe leftmost o ctal digit can only b e 0 or 1, s ince thetotal numb er of bits i s 16.

2

A smarter p ermi ss ions control i s oneofthe enhancements planne d for Linux 1.3 - TheACL Acce ss Control

Li sts. Actually, f rom brows ingoftheker nel source, someoftheACL handling i s already done.

10. On the ino deandthe ino detable s 9

Thos e digits, as opposed tothe r ightmost 4 digits, are not bit mapped options. They determinethetyp e of

the " le" to whichthe ino de b elongs:

 01 -The le i s a FIFO.

 02 -The le i s a character device.

 04 -The le i s a directory.

 06 -The le i s a block device.

 10 -The le i s a regular file.

 12 -The le i s a symbolic link.

 14 -The le i s a socket.

10.3 Timeanddate

Linux records the last time in whichvar ious op erations o ccure d withthe le. Thetimeanddate are saved in

thestandard C library format- Thenumb er of s econds which pass e d s ince 00:00:00 GMT, January 1, 1970.

The followingtime s are recorded:

ctime -Thetime in whichthe ino dewas last allo cated. In other words, thetime in whichthe le  i

was created.

mtime -Thetime in whichthe le was last mo di e d.  i

 i atime -Thetime in whichthe le was last acce ss e d.

 i dtime -Thetime in whichthe ino dewas deallo cated. In other words, thetime in whichthe le was

deleted.

s ize 10.4 i

i size contains information aboutthe s ize of the ob ject pre s ented bythe ino de. If the ino de corre sp onds

to a regular le, thi s i s just the s ize of the le in bytes. In other cas e s, theinterpretation of thevar iable i s

di erent.

10.5 Us er and group id

Theuserand group id of the le are just save d in thevar iable s i uid and i gid.

10. On the ino deandthe ino detable s 10

10.6 Hard links

Later, when we'll di scuss the implementation of director ie s, it will b e explained that each directory entry

p ointsto an ino de. It i s quite p oss ible thatasingle inode will b e p ointed to f rom several director ie s. In

that cas e, we say thatthere exi st hard links tothe le-The le can b e acce ss e d f rom eachofthe director ie s.

links count.Thevar iable is set to "1" Theker nel keeps trackofthenumber of hard links in thevar iable i

when rst allo catingthe ino de, and i s incremente d with each additional link. Deletion of a le will delete

the current directory entry and will decrementthenumb er of links. Only when thi s numb er reache s zero, the

ino de will b e actually deallo cated.

Thename hard link is used todistingui sh b etween the alias method de scr ib e d above, to another alias

metho d calle d symbolic linking, which will b e de scr ib e d later.

10.7 The Ext2fs extende d ags

The ext2 le system asso ciate s additional ags with an ino de. The extended attr ibute s are store d in the

var iable i flags. i flags i s a 32 bit var iable. Only the 7 r ightmost bits are de ned. Of them, only 5

bits are us e d in vers ion 0.5a of the le system. Sp eci cally,the undelete andthe compress feature s are not

implemented, and are tobeintro duce d in Linux 1.3 development.

The currently available ags are:

 bit 0 - Secure deletion.

When thi s bit i s on, the le's blo cks are zero e d when the le i s delete d. Withthi s bit o , they will just

b e left withthe ir or iginal datawhen the ino deisdeallo cated.

 bit 1 - Undelete.

Thi s bit i s not supp orted yet. It will b e us e d to provideanundelete feature in future Ext2fs develop-

ments.

 bit 2 - Compre ss le.

Thi s bit i s also not supp orted. The plan i s to o er "compre ss ion on the y" in future releas e s.

 bit 3 - Synchronous up dates.

Withthi s bit on, themeta-data will b e wr itten synchronously tothe di sk, as if the le system was

mounte d withthe "sync" mountoption.

 bit 4 - Immutable le.

When thi s bit i s on, the le will stay as it i s - Can not b e change d, delete d, renamed, no hard links,

etc, b efore the bit i s cleare d.

 bit 5 - Append only le.

Withthi s option active, data will only b e appended tothe le.

 bit 6 - Do not dump thi s le.

Ithink thatthi s bit i s us e d bythe p ort of dump to linux p orted by Remy Cardtocheckifthe le

should not b e dumped.

10. On the ino deandthe ino detable s 11

10.8 Symbolic links

The hard links presented above are just another p ointers tothe same ino de. The imp ortant asp ect i s that

the ino denumber is fixed when the link i s created. Thi s means thatthe implementation details of the

le system are vi s ible tothe us er - In a pure abstract usage of the le system, theusershould not care about

ino des.

Theabovecauses several limitations:

 Hard links can b e done only in the same le system. Thi s i s obvious, s ince a hard link i s just an ino de

numb er in some directory entry,andtheabove elements are le system sp eci c.

 You can not "replace" the le which i s p ointed tobythehard link after the link creation. "Replacing"

the le in one directory will still leavethe or iginal le in theother directory - The "replacement" will not

deallo catethe or iginal ino de, butrather allo cate another ino de for thenew vers ion, andthe directory

entry attheother place will just p ointtothe old ino denumb er.

Symbolic link,ontheother hand, i s analyze d at run time. A symb olic link i s just a pathname whichis

acce ss ible f rom an ino de. As such, it "sp eaks" in the language of theabstract le system. When theker nel

reache s a symb olic link, it will follow it in run time us ingits normal way of reaching director ie s.

As such, symb olic link can b e made across different filesystems and a replacement of a le withanew

vers ion will automatically b e active on all its symb olic links.

The di sadvantage i s thathard link do e sn't consume space except toasmall directory entry. Symb olic link,

on theother hand, consumes at least an ino de, and can also consumeone blo ck.

When the ino deisidenti e d as a symb olic link, theker nel nee ds to ndthepathto which it p oints.

10.8.1 Fast symbolic links

When thepathname contains up to64byte s, it can b e save d directly in the ino de, on the i block[0] -

i block[15] var iable s, s ince thos e are not nee de d in that cas e. Thi s i s calle d fast symb olic link. It i s f ast

b ecaus e thepathname re solution can b e doneusingthe ino deits elf, without acce ss ing additional blo cks. It i s

also economical, s ince it allo cate s only an ino de. The lengthofthepathnameisstore d in the i size var iable.

10.8.2 Slow symbolic links

Starting f rom 65 byte s, additional blo ck i s allo cated bytheuseofiblock[0]andthepathnameisstore d

in it. It i s calle d slow b ecaus e theker nel nee ds to read additional blo ckto re solvethepathname. The length

i s again saved in i size.

10.9 i vers ion

i version is used with regard to Network File System. I don't knowits exact us e.

11. Director ie s 12

10.10 Re s erved var iable s

As f ar as I know, thevar iable s which are connected toACL and f ragments are not currently us e d. They will

b e supp orte d in future vers ions.

Ext2fs i s b e ing p orted toother op erating systems. As f ar as I know, at least in linux, theosdep endent

var iable s are also not us e d.

10.11 Sp ecial re s erve d ino des

The rst ten ino de s on the le system are sp ecial ino des:

 Ino de1isthebad blocks inode - I b elievethatitsdata blo cks containalistofthe bad blo cks in the

le system, whichshould not b e allo cated.

 Ino de2istheroot inode -The ino deofthe ro ot directory.Itisthestarting p oint for reachinga

known pathinthe le system.

 Ino de3istheacl index inode. Acce ss control li sts are currently not supp orted bythe ext2 le system,

so I b elievethi s ino de i s not us e d.

 Ino de4isthe acl data inode. Of cours e, theaboveapplie s here too.

 Ino de5isthe boot loader inode. I don't knowits usage.

 Ino de6isthe undelete directory inode. It i s also a foundation for future enhancements, andis

currently not us e d.

 Ino de s 7-10 are reserved and currently not us e d.

11 Director ie s

A directory i s implemente d in the sameway as le s are implemente d withthe direct blo cks, indirect blo cks,

etc - It i s just a le which i s formatte d with a sp ecial format - A li st of directory entr ie s.

Follows thede nition of a directory entry:

struct ext2_dir_entry {

__u32 inode; /* Inode number */

__u16 rec_len; /* Directory entry length */

__u16 name_len; /* Name length */

char name[EXT2_NAME_LEN]; /* File name */

};

Ext2fs supp orts le name s of varying lengths, up to 255 bytes. The name eld above just contains the le

name. Notethatitisnot zero terminated; Instead, thevar iable name len contains the lengthofthe le

name.

12. The sup erblo ck 13

Thevar iable rec len i s provided becaus e the directory entr ie s are padde d with zero e s so thatthenext entry

will b e in an o s et which isamultiplition of 4. The re sulting directory entry s ize i s store d in rec len.If

len i s the directory entry i s the last in the blo ck, it i s padde d with zero e s till theendofthe blo ck, and rec

up date d accordingly.

The inode var iable p ointstothe ino deoftheabove le.

Deletion of directory entr ie s i s donebyappendingofthedeleted entry space tothe previous or next, I am

not sure entry.

12 The sup erblo ck

The superblock i s a blo ck which contains information whichde scr ib e s thestateoftheinter nal le system.

The sup erblo ckislocated atthe fixed offset 1024 in thedevice. Its length i s 1024 byte s also.

The sup erblo ck, likethe group de scr iptors, i s copie d on each blo cks group b oundary for backup purp os e s.

However, only themain copy is used bytheker nel.

The sup erblo ck contain three typ e s of information:

 File system parameters which are xe d and whichwere determined when thi s sp eci c le system was

create d. Someofthos e parameters can b e di erent in di erent installations of the ext2 le system, but

can not b e change d once the le system was created.

 File system parameters which are tunable - Can always b e change d.

 Information aboutthe current le system state.

Follows the sup erblo ckde nition:

struct ext2_super_block{

__u32 s_inodes_count; /* Inodes count */

__u32 s_blocks_count; /* Blocks count */

__u32 s_r_blocks_count; /* Reserved blocks count */

__u32 s_free_blocks_count; /* Free blocks count */

__u32 s_free_inodes_count; /* Free inodes count */

__u32 s_first_data_block; /* First Data Block */

__u32 s_log_block_size; /* Block size */

__s32 s_log_frag_size; /* Fragment size */

__u32 s_blocks_per_group; /*  Blocks per group */

__u32 s_frags_per_group; /*  Fragments per group */

__u32 s_inodes_per_group; /*  Inodes per group */

__u32 s_mtime; /* Mount time */

__u32 s_wtime; /* Write time */

__u16 s_mnt_count; /* Mount count */

__s16 s_max_mnt_count; /* Maximal mount count */

__u16 s_magic; /* Magic signature */

__u16 s_state; /* File system state */

12. The sup erblo ck 14

__u16 s_errors; /* Behaviour when detecting errors */

__u16 s_pad;

__u32 s_lastcheck; /* time of last check */

__u32 s_checkinterval; /* max. time between checks */

__u32 s_creator_os; /* OS */

__u32 s_rev_level; /* Revision level */

__u16 s_def_resuid; /* Default uid for reserved blocks */

__u16 s_def_resgid; /* Default gid for reserved blocks */

__u32 s_reserved[235]; /* Padding to the end of the block */

};

12.1 sup erblo ckidenti cation

The ext2 le system's sup erblo ckisidenti e d bythe s magic eld. The current ext2 magic numb er i s 0xEF53.

I pre sumethat "EF" means "Extende d File system". In vers ions of the ext2 le system pr ior to 0.2B, themagic

number was 0xEF51. Thos e le systems are not compatible withthe currentvers ions; Sp eci cally,the group

de scr iptors de nition i s di erent. I doubt if there still exi stssuch a installation.

12.2 File system xe d parameters

By us ingtheword fixed,Imean xe d with re sp ect to a particular installation. Thos e var iable s are usually

not xe d with re sp ect to di erent installations.

The block size is determined byusingthe s log block size var iable. The blo ck s ize i s 1024*p ow

2,s log blo ck s ize andshould b e b etween 1024 and 4096. Theavailable options are 1024, 2048 and 4096.

s inodes count contains thetotal number of available ino des.

s blocks count contains thetotal number of available blo cks.

s first data block sp eci e s in whichofthe device block the superblock is present. The sup erblo ckis

always pre s entatthe xe d o s et 1024, butthedevice blo cknumbering can di er. For example, if the blo ck

s ize i s 1024, the sup erblo ck will b e at block 1 with re sp ect tothedevice. However, if the blo ck s ize i s 4096,

first data block will contain 0. At o s et 1024 i s included in block 0 of thedevice, andinthat cas e s

least thi s i s howI understood thi s var iable.

s blocks per group contains thenumb er of blo cks which are group e d together as a blo cks group.

s inodes per group contains thenumb er of ino des available in a group blo ck. I think thatthi s i s always the

total numb er of ino de s divided bythenumb er of blo cks groups.

s creator os contains a co denumb er which sp eci e s theop erating system which created thi s sp eci c le sys-

tem:

 Linux :- i s sp eci e d bythevalue 0.

 Hurd i s sp eci e d bythevalue 1.

 Masix i s sp eci e d bythevalue 2.

12. The sup erblo ck 15

s rev level contains thema jor vers ion of the ext2 le system. Currently thi s i s always 0,asthe most recent

vers ion i s 0.5B. It will probably take sometimeuntil we reachvers ion 1.0.

As f ar as I know, f ragments sub-blo ck allo cations are currently not supp orted andhence a blo ck i s equal

to a f ragment. As a re sult, s log frag size and s frags per group are always equal to s log block size

blocks per group, respectively. and s

12.3 Ext2fs error handling

The ext2 le system error handling i s bas e d on the following philosophy:

1. Identi cation of problems i s donebytheker nel co de.

2. The correction task i s left to an exter nal utility,suchase2fsck by Theodore Ts'o for automatic

analys i s and correction, or p erhaps debugfs by Theodore Ts'o and EXT2ED by myself, for hand

analys i s and correction.

The s state var iable is used bytheker nel to pass theidenti cation re sulttothird partyutilitie s:

 bit 0 of s state is reset when the partition i s mounted and s et when the partition i s unmounted. Thus,

avalue of 0 on an unmounte d le system means thatthe le system was not unmounte d prop erly - The

le system i s not "clean" and probably contains errors.

 bit 1 of s state i s s et bytheker nel when it detects an error in the le system. A value of 0 do e sn't

mean thatthere i sn't an error in the le system, just thattheker nel didn't ndany.

Theker nel b ehavior when an error i s foundisdetermined bytheusertunable parameter s errors:

 Theker nel will ignore the error and continue if s errors=1.

 Theker nel will remountthe le system in read-only mo deifs errors=2.

errors=3.  Aker nel panic will b e i ssue d if s

Thedef aultbehavior i s to ignore the error.

12.4 Additional parameters us e d by e2fsck

Of-cours e, e2fsck will checkthe le system if errors were detecte d or if the le system i s not clean.

In addition, eachtimethe le system i s mounted, s mnt count i s incremented. When s mnt count reaches

max mnt count, e2fsck will force a checkonthe le system even though it may b e clean. It will then zero s

s mnt count. s max mnt count is a tunable parameter.

E2fsck also records the last time in whichthe le system was checke d in the s lastcheck var iable. The

us er tunable parameter s checkinterval will contain thenumb er of s econds which are allowed to pass s ince

s lastcheck until a check i s reforce d. A value of 0 di sable s time-bas e d check.

13. Copyr ight 16

12.5 Additional us er tunable parameters

s r blocks count contains thenumber of disk blocks which are re s erve d for ro ot, theuserwhos e id number

is s def resuid andthe group whos e id number is s deg resgid.Theker nel will refus e to allo catethos e

last s r blocks count if the us er i s not oneoftheabove. Thi s i s donesothatthe le system will usually not

b e 100 full, s ince 100 full le systems can a ect var ious asp ectsofop eration.

s def resuid and s def resgid contain theidoftheuserandofthe group who can us e the re s erve d blo cks

in addition to ro ot.

12.6 File system currentstate

sfree blocks count contains the currentnumb er of f ree blo cks in the le system.

s free inodes count contains the currentnumb er of f ree ino de s in the le system.

s mtime contains thetimeat whichthe system was last mounted.

wtime contains the last timeat which somethingwas changedinthe le system. s

13 Copyr ight

Thi s do cument contains source co de whichwas taken f rom the Linux ext2 ker nel source co de, mainly f rom

/usr/include/linux/ext2 fs.h. Follows the or iginal copyr ight:

/*

* linux/include/linux/ext2_fs.h

*

* Copyright C 1992, 1993, 1994, 1995

* Remy Card [email protected]

* Laboratoire MASI - Institut Blaise Pascal

* Universite Pierre et Marie Curie Paris VI

*

* from

*

* linux/include/linux/minix_fs.h

*

* Copyright C 1991, 1992

*/

14 Acknowle dgments

Iwould liketothank the followingpeople, whowere involve d in thede s ign and implementation of the ext2

le system ker nel co deand supp ort utilitie s:

14. Acknowle dgments 17

 Remy Card

Whode s igne d, implemented andmaintains the ext2 le system ker nel co de, and someofthe ext2 utilitie s.

Remy Card i s also theauthor of s everal helpful slide s concer ningthe ext2 le system. Sp eci cally,heis

theauthor of File Management in the andofThe Second Extended File System

- Current State, Future Development.

 Wayne Davison

Whode s igned the ext2 le system.

 Stephen Tweedie

Whohelped de s igningthe ext2 le system ker nel co deand wrotethe slides Optimizations in File

Systems.

 Theodore Ts'o

Whoistheauthor of s everal ext2 utilitie s andofthe ext2 library libext2fs which I didn't us e, s imply

b ecaus e I didn't know it exi stswhenIstarted toworkonmy pro ject.

Lastly,I would liketothank, of-cours e, Linus Torvalds andtheLinux community for providing all of us

withsuch a greatop erating system.

Pleas e contact me in a cas e of an error rep ort, sugge stions, or just aboutanything concer ningthi s do cument.

Enjoy,

Gadi Oxman

Haif a, August 95