EROFS File System Introduction

EROFS file system Gao Xiang <[email protected]> Self-introduction & Background 1. Graduated from university and joined HUAWEI for 3 years, working for OS !ernel Lab %ept., mainly fo'using on #inux )"e system ●im&rovement for HUAWEI mo$ile phones+ ● ,-, su&&ort . bugfi(/ New solutions of requirements from our &rodu'ts+ 12HUAWEI sd'ardfs; -2ERO, / 32Other useful so"utions w*i'* are under deve"o&ment... -. User availa$"e s&ace becomes tig*t wit* t*e increase of Android RO4, especial"y for our low5end devices (some of t*em are stil" wit* 17GB tota" storage, saving 19-GB more storage spa'e is usefu" for users wit* t*e hel& of ERO, 2/ 3. ome solution to leave more spa'e and maintain *ig*5 &erforman'e to end users: 3 Questions 1. What kind of fi"e types 'an be 'ompressed pra'ti'a""y? and w*ere are t*ey stored in: ❌ a. <*otos, <i'tures,❓ 4ovies? 6no gain to 'om&ress "oss"ess"y again2 $. =e(t materia"?❌ ( 'ompressible $ut t*e total amount de&ends on end-user be*aviors2 c. %atabase: ( some &art of overwrite IO &atterns 'ou"d ki"" t*e✔️ &erforman'e 2 d. 8I0s, s*ared "ibrary, some A<!s, &re"oad con)guration )"es? (most"y read5on"y )"es2 -. Is t*ere some so"ution to reso"ve our 'om&ression re1uirements: s1uas*fs is not good enoug* for real5time de'om&ression (es&ecia""y for em$edded devices wit* "imited memory, su'* as mo$ile &*ones2 sin'e 1. e(tra memory over*ead/ 2. noti'eab"e read am&"ifi'ation 6> based on its default 12?!8 $"ock size) 3. some metadata &arsing *ave to be done syn'*ronous"y limited by its on5disk design. 3. =*e &erformance 'an be even better wit* 'om&ressing effectively in addition to reso"ving t*e above squashfs issues, as shown in the fo""owing pages. ; Fixed-output compression Advantage: 1. improved storage density (hig* C3 -D less IO for most patterns -D read performan'e improvement2/ -. decompression in5&"a'e (no mem'&y2 com&ared with )(ed-input 'om&a'ted compression/ 3. 'an be on5disk com&atib"e wit* b"o'k boundary a"igned compression (eg. 8trfs2 in order to ar'hive zero IO read am&"i)'ation (However, in my opinion, it depends on a'tua" IO patterns and "ess usefu" for fi(ed-output decompression sin'e a"" compressed data in t*e b"o'k can be compressed.2 #m LmE1 #m+- #mE3 #m #m+1 #mE- #m+3 #m #mE1 #mE- <n <nE1 <nE- <nE3 Pn PnE1 <nE- <nE3 Pn Pn+1 PnE- blo'k $oundary 'om&ression 6$trfs2 )(ed5in&ut 'om&a'ted com&ression (s1uashfs, 'ramfs2 it is only useful if C3 DD - 6more t*an BFG oA2, )(ed5out&ut 'om&ression 6erofs2 it is toug* for ;k 'om&ress $lo'k B Comparsion of erofs/squashfs/btrfs image sizes Support 4k compressed cluster only in current erofs kernel implementation due to our requirement: dataset testcase output size CR enwikH enwikH 1,FFF,FFF,FFF8 1 enwikHI;k.s1uashfs.img 7-1,-11,7;?8 1.71 enwik9_4k.erofs.img 558,133,248 1.79 enwikHI?k.s1uashfs.img BB7,1H1,N;;8 1.?F enwikHI1-?k.s1uashfs.img 3H?,-F;,H-?8 -.B1 657,608,704 enwikHI1-?k.$trfs.img> 8 1.B- si"esia.tar si"esia.tar -11,HBN,N7F8 1 si"esia.tarI;k.s1uas*fs.img 11;,B-;,17F8 1.?B si"esiaI?k.s1uashfs.img 1F7,FH;,BH-8 -.FF si"esia.tar_4k.erofs.img 1#5,2$!,2## 2.01 si"esia.tarI1-?k.s1uashfs.img ?1,N7?,;;?8 -.BH 2.03 si"esia.tarI1-?k.$trfs.img> 1F;,7-?,--;8 'om&ressed wit* "@; 1.8.3 with 'ommand "ines+ 12 mksquashfs enwikH enwikHI;k.s1uashfs.img 5'omp "@; 5J*' 5$ ;FH7 5noappend -2 mkfs.erofs 5@"@;*' enwikHI;k.erofs.img enwikH > for $trfs, mounting with Knodatasum,'om&ress5for'eL"@o”, observed with 'ompsi@e, thus no metadata at all. 7 Fixed-output compression (cont.) =ypica""y, t*ere are 2 ways for compress file system to decom&ress+ 1.read a"" 'om&ressed data into bdev page 'a'*e (like all metadata do2 and decompress O squas*fs/ 2.read com&ressed data to tem&orary $uAer and de'om&ress ⇒ e.g., $trfs. ●However, bot* 2 ways takes e(tra over*ead sin'e 3e'"aimed ,or 1), it 'ou"d 'ause &age 'a'*e t*ras*ing, imagine a$out B0% 'om&ressed data of the origina" )"e si@e are added into &age 'a'*e ina'tive #3U "ist at "east for em$eded devi'es, 'om&ressed $"ock needed to read many of t*em are used5once or at a very "ow fre1uency/ furt*ermore, it’s *ard to de'om&ress dire'tly if some of 'om&ressed &ages are re'"aimed. ● ,or -), it 'an *ard"y use t*e remaining 'om&ressed data since tem&orary $uAer 'ou"d $e freed immediate"y. ,ixed-output compression can make fu"" use of data in 'ompressed b"o'k in prin'ip"e (a"" com&ressed data can be de'ompressed if needed.2 N Our solution ERO, *as de'om&ression strategies in - dimensions+ %Dimension 1' Cached or In5&"a'e IO decom&ression ● Ca'*ed de'om&ression 'urrently for in'om&"ete 'om&ression read / ● In5&"a'e IO de'om&ression for 'om&"ete 'om&ression read. KComp"ete or in'omp"ete” means whether or not data in the 'ompressed $"ock are a"" re1uested. =*e &oli'y 'an $e re)ned "ater since for sma"" C3 it 'an $e fu""y de'om&ressed in advance since it takes t*e simi"ar amount of memory if t*e 'om&ressed data are 'a'*ed in memory. %Dimension 2' yn' or async de'om&ression ● yn' de'om&ression for sma"" num$er of &age re1uests, w*i'* de'om&ress in its 'a""er thread/ ● Asyn' de'om&ression for "arge read re1uests, w*i'* de'om&ress in work1ueue 'onte(t. =*e &oli'y 'an $e re)ned "ater as we"", since erofs 'annot make diAerence $etween asyn'Qsyn' reada*ead in .read&ages(). ? In-place IO decompression Why we need t*is? main"y for t*ree reasons+ 1.doing ca'hed decompression on"y cou"d cause ca'*e t*rashing/ -.'onsidering many read requests are pending for IO ending, temporary $uAers cannot be freed. A""o'ating immediate buAers wil" cause more memory rec"aim as wel" com&ared to un'ompressed generic file system su'* as e(t; O espe'ia""y for HUAWEI camera heavy memory work"oad/ 3.it wil" be later used for decompressing in5&"a'e. In this way, compressed data are read in its last decompressed pages as mu'h as possib"e, as il"ustrated below+ )"e &ages #m #m+1 #mE- R Ln68&P2 %ire'tly read 8& into #n 'om&ressed '"usters 8& H Decompression In-place As mentioned before, decom&ression in5&"a'e can be im&"emented wit* fixed- out&ut de'ompression, w*ich is hard for lega'y fixed-in&ut de'ompression. some margin )"e &ages #m #m+1 R Ln68&P2 IO submission read 8& dire'tly into #n 'om&ressed $"ocks 8& <er5C<U some margin buAer de'om&ress input decompression #m #m+1 R Ln68&P2 )"e &ages #m #m+1 R Ln68&P2 de'om&ress output 1F Limited bounced buffers in'e a"" "@5$ased compression a"gorit*ms use s"iding5window tec*no"ogy, E3OF can use limited (7;!8 for l@;2 boun'ed $uAers, whic* minimi@e memory 'onsum&tion as wel". Limited $oun'ed $uffers is a"so im&"emented in new decompression ba'kend. before #m #m+1 #mE- #m+3 R #m+17 #mE18 #mE19 R Ln68&P2 after #m #mE1 #mE- #mE3 R #m+1N #m+18 #m+19 R #n68&P2 11 On-disk brief introduction • • #ittle5endian • 4k $lock si@e 'urrently 6no$*2/ 4i(ed metadata with data 6In other words, • Se(ible enough for mkfs to &lay with it)/ • 32 6v12 or 7; 6v-25$yte inode $ase si@e/ 7; bit + ns timestam&, ea'* )"e 'an *ave • its own timestam& for v-/ • u&&ort tai"5end data in"ine; • u&&ort JA==R, PO IXAC# 65.-E2/ • u&&ort stat( 65.3E2/ • u&&ort 'om&acted inde(es 6B.3+)/ *ttps:QQgit.kernel.orgQ&u$Qs'mQlinu(Qkerne"QgitQtorvalds/linu(.gitQtreeQdrivers/ stagingQerofs/%ocumentationQ)lesystems/erofs.t(t:*Lv5.1 1- Microbenchmark C<U: Intel632 Core(=42 i55?-BFU C<U @ 1.6FGHz (; cores, 8 t*reads2 %%3: 8G D: I0TEL S %<E!!,37FGNHenwik9 FIO (psync, 1 thread) 800 700 600 500 erofs_4k squashfs_4k squashfs_8k 400 squashfs_16k squashfs_128k 300 200 100 0 seq rand rand1m 13 Real app launching C<U: M=7N7B (? Corte(5AB3 cores2 D%3: 2G8 eMMC: 3-G8 App. # 1 2 3 4 5 6 7 8 9 10 11 12 13 w/o FIO workload -16.4 -3.5 +4.2 -4.0 -7.5 -1.4 -6.8 +6.3 -2.2 -18.4 -3.3 -7.6 -4.5 w/ FIO workload -2.8 -12.9 -5.4 +3.9 -7.6 +3.7 +4.4 -2.6 +9.9 +4.0 -11.1 -10.3 -15.1 Relative boot time of thirteen applications.

EROFS File System Introduction

Huawei Announces EROFS Linux File-System, Might Eventually Be Used

Study of File System Evolution

F2FS) Overview

In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage

ECE 598 – Advanced Operating Systems Lecture 19

Open Source Licensing Information for Cisco IP Phone 8800 Series

De-Anonymizing Live Cds Through Physical Memory Analysis

Hardware-Driven Evolution in Storage Software by Zev Weiss A

NOVA: a Log-Structured File System for Hybrid Volatile/Non

Z/OS Distributed File Service Zseries File System Implementation Z/OS V1R13

Elinos Product Overview

A Study of Failure Recovery and Logging of High-Performance Parallel File Systems