EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices
Xiang Gao#, Mingkai Dong*, Xie Miao#, Wei Du#, Chao Yu#, Haibo Chen*,#
#Huawei Technologies Co., Ltd. *IPADS, Shanghai Jiao Tong University USENIX ATC 2019, Renton, WA, USA 2 System resources in Android
Read-only Read-only /oem /system 3072 2654 Read-only
/system Partition Size (MB) /odm
Read-only 840 654 650 512 /vendor 2.3.6 4.3 5.1 6.0 7.0 8.0
Android Version ... System resources consume significant storage! Read-only system partitions → compressed read-only file systems 3 Existing solution
Squashfs is the state-of-the-art compressed read-only file system We tried to use squashfs for system resources in Android
/system /oem /odm /vendor Squashfs Squashfs Squashfs Squashfs
Result: The system lagged and even froze for seconds and then rebooted
4 Why does Squashfs fail?
Fixed-sized input compression
1. Divide to fixed-sized chunks 2. Compress each chunk 3. Concatenate
6 Why does Squashfs fail?
Fixed-sized input compression
1. Divide to fixed-sized chunks 2. Compress each chunk 3. Concatenate
7 Why does Squashfs fail? Read Amplification
1 byte
Fixed-sized input compression decompression amplification
...... I/O amplification 8 Why does Squashfs fail? Read Amplification Massive Memory Consumption buffer_head 1 byte ... allocation copy decompression allocation amplification decompress allocation Page cache ...... copy I/O amplification 9 EROFS
Fixed-sized output compression 1. Prepare a large amount of data 2. Compress to a fixed-sized block 3. Repeat
10 EROFS
Fixed-sized output compression 1. Prepare a large amount of data 2. Compress to a fixed-sized block 3. Repeat
ü Reduce read amplification ü Better compression ratio ü Allows in-place decompression
12 Choosing the page for I/O
Page cache for partially Cached I/O compressed blocks For partially decompressed blocks Page cache Allocate a page in dedicated page cache for I/O ü Following decompression can reuse the cached page
In-place I/O Page cache of requested file For blocks to be fully decompressed Reuse the page allocated by VFS if possible Page cache ü Memory allocation and consumption are reduced
13 Decompression
0 1 2 3 4 5 6 7
Page cache
14 Decompression Vmap decompression 1. Count data blocks to decompress. 2. Allocate temporary physical pages or choose pages allocated by VFS. 3. Allocate a continuous VM area via vmap(), and map physical pages in the area. 4. For in-place I/O, copy the compressed block to a temporary per-CPU page. 5. Decompress to the VM area. Page cache
0 1 2 3 4 5 6 7
How data is compressed 15 Decompression Vmap decompression 1. Count data blocks to decompress. 2. Allocate temporary physical pages or choose pages allocated by VFS. 3. Allocate a continuous VM area via vmap(), and map physical pages in the area. 4. For in-place I/O, copy the compressed block to a temporary per-CPU page. 5. Decompress to the VM area. Page cache
0 1 2 3 4 5 6 7
How data is compressed 16 Decompression Vmap decompression 1. Count data blocks to decompress. 2. Allocate temporary physical pages or choose pages allocated by VFS. 3. Allocate a continuous VM area via vmap(), and map physical pages in the area. 4. For in-place I/O, copy the compressed block to a temporary per-CPU page. 5. Decompress to the VM area. Page cache
0 1 2 3 4 5 6 7 same physical pages
How data is compressed 17 Decompression Vmap decompression 1. Count data blocks to decompress. 2. Allocate temporary physical pages or choose pages allocated by VFS. 3. Allocate a continuous VM area via vmap(), and map physical pages in the area. 4. For in-place I/O, copy the compressed block to a temporary per-CPU page. 5. Decompress to the VM area. Page cache
0 1 2 3 4 5 6 7 same physical pages
How data is compressed 18 Decompression Vmap decompression 1. Count data blocks to decompress. 2. Allocate temporary physical pages or choose pages allocated by VFS. 3. Allocate a continuous VM area via vmap(), and map physical pages in the area. 4. For in-place I/O, copy the compressed block to a temporary per-CPU page. 5. Decompress to the VM area. Page cache
0 1 2 3 4 5 6 7 same physical pages
0 1 2 3 4
How data is compressed 19 Decompression
Vmap decompression
✓ For all cases ✘ Frequent vmap/vunmap ✘ Unbounded physical page allocations ✘ Data copy for in-place I/O
20 Decompression
Vmap decompression Buffer decompression Pre-allocate four-page per-CPU buffers ✓ For all cases ✘ For decompression <4 pages ✘ Frequent vmap/vunmap ✓ No vmap/vunmap ✘ Unbounded physical page allocations ✓ No physical page allocation ✘ Data copy for in-place I/O ✓ No data copy for in-place I/O
21 Decompression Vmap decompression Buffer decompression Pre-allocate four-page per-CPU buffers ✓ For all cases ✘ For decompression <4 pages ✘ Frequent vmap/vunmap ✓ No vmap/vunmap ✘ Unbounded physical page allocations ✓ No physical page allocation ✘ Data copy for in-place I/O ✓ No data copy for in-place I/O Details in the paper Rolling decompression In-place decompression ✓ For decompression < a pre-allocated ✓ No data copy for in-place I/O VM area size ✓ No vmap/vunmap Ø Decompression policy ✓ No physical page allocation ✘ Data copy for in-place I/O Ø Optimizations
22 Evaluation Setup
Platform CPU DRAM Storage HiKey 960 Kirin 960 (Cortex-A73 x 4 + Cortex-A53 x 4) 3 GB 32 GB UFS Low-end smartphone MT6765 (Cortex-A53 x 8) 2 GB 32 GB eMMC High-end smartphone Kirin 980 (Cortex-A76 x 4 + Cortex-A55 x 4) 6 GB 64 GB UFS
Micro-benchmarks Evaluated file systems • Platform: HiKey 960 EROFS: LZ4, 4KB-sized output • Tool: FIO Squashfs: LZ4, {4,8,16,128}KB chunk size • Workload: enwik9, silesia.tar Btrfs: LZO, 128KB chunk size Application benchmarks readonly mode w/o integrity checks • Platform: smartphones Ext4: no compression • Workload: 13 popular applications F2FS: no compression More results in the paper 23 Micro-benchmark: FIO Throughput FIO, enwik9, HiKey 960 A73 2362 MHz 300 EROFS 250 Squashfs-4K Squashfs-8K 200 Squashfs-16K (MB/s) Squashfs-128K 150 Ext4 100 F2FS Btrfs Throughput 50
0 Sequential Random Stride
24 Micro-benchmark: FIO Throughput FIO, enwik9, HiKey 960 A73 2362 MHz 300 EROFS 250 Squashfs-4K Squashfs-8K 200 Squashfs-16K (MB/s) Squashfs-128K 150 Ext4 100 F2FS Btrfs Throughput 50
0 Sequential Random Stride Btrfs performs worst in all cases. 25 Micro-benchmark: FIO Throughput FIO, enwik9, HiKey 960 A73 2362 MHz 300 EROFS 250 Squashfs-4K Squashfs-8K 200 Squashfs-16K (MB/s) Squashfs-128K 150 Ext4 100 F2FS Btrfs Throughput 50
0 Sequential Random Stride Larger chunk size brings better performance for Squashfs, if the cached results can are used. 26 Micro-benchmark: FIO Throughput FIO, enwik9, HiKey 960 A73 2362 MHz 300 EROFS 250 Squashfs-4K Squashfs-8K 200 Squashfs-16K (MB/s) Squashfs-128K 150 Ext4 100 F2FS Btrfs Throughput 50
0 Sequential Random Stride EROFS performs comparable or even better than Ext4. 27 Read Amplification and Resource Consumption
IO (MB) Consumption (MB) Seq. Random Stride Storage Memory Requested/Ext4 16 16 16 953.67 988.51 Squashfs-4K 10.65 26.19 26.23 592.43 1597.50 Squashfs-8K 9.82 33.52 34.08 530.43 1534.09 Squashfs-16K 9.05 46.42 48.32 479.38 1481.12 Squashfs-128K 7.25 165.27 203.91 379.76 1379.84 EROFS 10.14 26.12 25.93 533.67 1036.88
28 Read Amplification and Resource Consumption
IO (MB) Consumption (MB) Seq. Random Stride Storage Memory Requested/Ext4 16 16 16 953.67 988.51 Squashfs-4K 10.65 26.19 26.23 592.43 1597.50 Squashfs-8K 9.82 33.52 34.08 530.43 1534.09 Squashfs-16K 9.05 46.42 48.32 479.38 1481.12 Squashfs-128K 7.25 165.27 203.91 379.76 1379.84 EROFS 10.14 26.12 25.93 533.67 1036.88 84% 87%
29 Read Amplification and Resource Consumption
IO (MB) Consumption (MB) Seq. Random Stride Storage Memory Requested/Ext4 16 16 16 953.67 988.51 Squashfs-4K 10.65 26.19 26.23 592.43 1597.50 Squashfs-8K 9.82 33.52 34.08 530.43 1534.09 Squashfs-16K 9.05 46.42 48.32 479.38 1481.12 Squashfs-128K 7.25 165.27 203.91 379.76 1379.84 EROFS 10.14 26.12 25.93 533.67 1036.88 84% 87% 44% 92%
30 Real-world Application Boot Time Low-end Smartphone 97% 104% 104%104% 97% 110%104% 100% 87% 95% 92% 89% 90% 85% 75% 50% 3.2% 25% 0% 1 2 3 4 5 6 7 8 9 10 11 12 13 High-end Smartphone 95% 101% 95% 99% 100% 86% 89% 93% 89% 87% 81% 85% 77% 81% 75% 50% 10.9%
Boot Time (Relative to Ext4) to (Relative Time Boot 25% 0%
1 2 3 4 5 6 7 8 9 10 11 12 13 31 Deployment
üDeployed in HUAWEI EMUI 9.1 as a top feature üUpstreamed to Linux 4.19 üSystem storage consumption decreased >30% üPerformance comparable or even better than Ext4 üRunning on 10,000,000+ smartphones
32 Conclusion EROFS: an Enhanced Read-Only File System with compression support Fixed-sized output compression with four decompression approaches: Vmap decompression Buffer decompression Thank you & questions? Rolling decompression In-place decompression Running on 10,000,000+ smartphones
EROFS üReduce system storage consumption >30% EROFS üProvide comparable or even better performance than Ext4
33 34 Backup Slides
35 Throughput and space savings Computation > I/O I/O > Computation Less I/O => Better Throughput 800 EROFS-Seq. Ext4-Seq. 700 600 EROFS-Rand. Ext4-Rand.
(MB/s) 500 400 300 200 Throughput 100 0 0 6 15 24 34 47 52 65 74 85 90 96 Space savings (%)
FIO, customized workload generated from enwik9, HiKey 960 A73 2362 MHz 36 Decompression Observation: In decompression, LZ4 looks backward at most 64KB of decompressed data. Rolling decompression Pre-allocate a large VM area and 17 physical pages per CPU Use 17 physical pages in turn.
✓ For decompression < the VM area size ✓ No vmap/vunmap ✓ No physical page allocation ✘ Data copy for in-place I/O
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2
37 Decompression Seq. A Seq. corrupted
Before decompress in-place After In-place decompression If no corruption will happen decompressed Seq. A ✓ For decompression < the VM area size ✓ No vmap/vunmap ✓ No physical page allocation ✓ No data copy for in-place I/O
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2
38