HP-Mapper: A High Performance Storage Driver for Docker Containers

Fan Guo1, Yongkun Li1, Min Lv1, Yinlong Xu1, John C.S. Lui2

1University of Science and Technology of China 2The Chinese University of Hong Kong Outline

1 Background & Motivation

2 HP-Mapper Design

3 Evaluation

4 Conclusion

2 Container and Docker n Container ü Process-level virtualization l Share host kernel

l Namespaces, Cgroup ü Better performance

l Fast deployment l Low resources usage VM l Near bare-mental performance n Docker ü Most popular container engine ü Extensively used in production

Container 3 Storage Management of Containers n Container images ü Store all requirements for running the containers ü Hierarchical, read-only, sharable n Storage drivers ü Support cross-level lookup and copy-on-write (COW)

Container … Container Unified view Docker storage drivers Cross-level Copy-on-write lookup writeable layer writeable layer Apache (Image) nginx (Image) emacs (Image) (Base Image) 4 Docker Storage Drivers n File-based Drivers n Block-based Drivers ü File-level COW ü Block-level COW (≥64KB) ü Share cached data ü Cannot share cached data ü Overlay2, AUFS ü DeviceMapper, ZFS,

file4 file5 Container vblk1 vblk2 vblk3 vblk4 vblk5

File-based Drivers (path→file) Block-based Drivers (vblk→blk)

Layer3 File1 (writable) blk3

Layer2 File2 blk4 blk5 (read-only) File2

Layer1 File1 File2 blk1 blk2 blk3 blk4 blk5 (read-only) File1 File2

5 High COW Latency n File-based storage drivers incur large COW latency (especially for large files) ü Incur large write overhead ü Degrade write performance

File Size 4KB 64KB 1MB 16MB DeviceMapper 0.12 0.74 0.96 1.39 Block-based BtrFS 0.09 0.09 0.09 0.10 Overlay2 1.99 2.49 7.14 61.7 File-based Copy-on-write latency (ms)

6 Disk I/O Redundancy n Block-based drivers introduce many redundant I/Os when reading data from a shared file ü Degrade I/O performance ü Waste I/O bandwidth Block-based

File-based

Total amount of reading data during the startup of 64 containers 7 Cache Redundancy n Both kinds of storage drivers generate a lot of redundant cached data ü Block-based: Read multiple copies of the data (as cache can not be shared) ü File-based: Unchanged data in the file are also copied when performing copy-on-write

8 Motivation n Limitations of current storage drivers ü Tradeoff between write and read performance ü Low cache efficiency n Our goal: Develop a new storage driver for docker containers ü Low COW overhead (or high write performance) ü High read I/O performance ü High cache efficiency

9 Outline

1 Background & Motivation

2 HP-Mapper Design

3 Evaluation

4 Conclusion

10 HP-Mapper: A High Performance Storage Driver n Key idea: HP-Mapper works at the block level and manages physical blocks ü Why block level: Low COW overhead ü Why physical block: Able to detect redundant I/Os & redundant cached data n Three modules ü Address mapper ü I/O interceptor ü Cache manager

11 Address Mapper n Tradeoff exists in block-based management ü Large blocks: high COW latency ü Small blocks: high lookup and storage overhead due to large metadata size n HP-Mapper uses a two-level mapping tree ü Support two different block sizes ü Differentiate different requests w/ on-demand allocation

l New write: large sequential I/O (large block size)

l COW: depending on req size

12 Address Mapper

n Two-level mapping tree design

Root of Level-1 VBN1 index1 offset key … key VBN2 index1 index2 offset … … … key … ... … key … 1xx 0xx (PBN) … … … … … … … … (root) … Root of Level-2 key … key … … … key … ... … key … PBN … … … … … … … PBN … …

Storage efficiency Separated block placement + defragmentation

13 I/O Interceptor – Metadata Management n How to detect redundant I/O ü Indexing with physical block number (PBN) n PBN-indexing hash table ü Entries are linked in a two-dimensional list (LRU) l Head entry (latest copy) + Tail entry (other copies) 001 Head entry1 Head entry2 ... 002 Tail entry1 u64 PBN (key) … u16 copies_num … u64 Super_Block Tail entry2 u64 Inode_ID … void *next … ... void *tail 14 I/O Interceptor - Workflow n Detect redundant I/O w/ PBN-index Avoid unnecessary check (write I/Os to writable layer)

15 Cache Manager n How to remove redundant copies in cache ü Periodically scan the hash table to locate cached pages ü Maintain the hotness of each page (multiple LRU) n Page eviction: cache hit ratio vs. cache usage ü Limit # of copies: utilization-aware adjustment ü Hotness-aware eviction

Page1 Page2 Page3 Page4 Head entry1 Head entry2 Locate & (cold) (hot) (hot) (hot) (PBN1) (PBN2) monitor Cached copies of PBN1

Tail entry Tail entry Page5 Page6 Page7 Page8 Page9 (cold) (hot) (hot) (hot) (hot) Tail entry Tail entry Cached copies of PBN2 Evict (copies_limit = 3) Tail entry Tail entry Page2 Page3 Page4 Page1 ...... Scan next (hot) (hot) (hot) (cold) block Page6 Page7 Page8 Page5 Page9 Hash Table 16 (hot) (hot) (hot) (cold) (hot) Outline

1 Background & Motivation

2 HP-Mapper Design

3 Evaluation

4 Conclusion

17 Experiment Setup n Prototype ü Act as a plugin module in kernel 3.10.0 ü Backing : n Workloads ü Container images: Tomcat, Nextcloud, Vault n Overhead of HP-Mapper

Overhead of HP-Mapper (MT represents Mapping Tree, HT represents Hash Table) 18 Reduction of COW Latency n Copy-on-write(COW) latency

File Size 4KB 64KB 1MB 16MB DeviceMapper 0.13 0.74 0.96 1.39 BtrFS 0.09 0.09 0.09 0.10 Overlay2 1.99 2.49 7.14 61.7 HP-Mapper 0.07 0.12 0.55 0.57 Copy-on-write latency (ms) HP-Mapper reduces up to more than 90% COW latency comparing with DeviceMapper and Overlay2

19 Reduction of Redundant I/Os n Total amount of reading/writing data when launching 64 containers from a single image

HP-Mapper efficiently removes redundant read I/Os, and also reduces more than 50% writing data on average

20 Improvement of Cache Efficiency

n Cache usage when starting 64 containers from a single image

HP-Mapper reduces more than 65% cache usage on average

21 Improvement of Startup Time n Total startup time when launching 64 containers from a single image on SSD/HDD

HP-Mapper achieves up to 2.0× - 7.2× faster startup speed than the other three storage drivers

22 Improvement of Startup Time n Total startup time when launching 64 containers in memory-scarce systems DM BtrFS Overlay2 HP-Mapper DM BtrFS Overlay2 HP-Mapper 200 40

150 30 time (s) time time (s) time 100 20 Startup Startup 50 Startup 10 Total Total 0 Total 0 12GB 8GB 4GB 6GB 4GB 2GB Total available memory Total available memory Launch 64 Nextcloud containers Launch 64 Vault containers HP-Mapper achieves larger improvement as

memory size decreases 23 Outline

1 Background & Motivation

2 HP-Mapper Design

3 Evaluation

4 Conclusion

24 Conclusion n Tradeoffs exist for Docker storage drivers ü File-based: High COW overhead ü Block-based: Low cache efficiency & redundant I/O n We develop HP-Mapper which achieves ü Low COW overhead by following block-based design with differentiated block sizes ü High I/O efficiency by intercepting redundant I/Os ü High cache efficiency by enabling cache sharing and hotness-aware management

25 Thanks!

Q&A

Yongkun Li [email protected] http://staff.ustc.edu.cn/~ykli

26