Overcoming Traditional Problems with OS Huge Page Management

MEGA: Overcoming Traditional Problems with OS Huge Page Management Theodore Michailidis Alex Delis Mema Roussopoulos University of Athens University of Athens University of Athens Athens, Greece Athens, Greece Athens, Greece [email protected] [email protected] [email protected] ABSTRACT KEYWORDS Modern computer systems now feature memory banks whose Huge pages, Address Translation, Memory Compaction aggregate size ranges from tens to hundreds of GBs. In this context, contemporary workloads can and do often consume ACM Reference Format: Theodore Michailidis, Alex Delis, and Mema Roussopoulos. 2019. vast amounts of main memory. This upsurge in memory con- MEGA: Overcoming Traditional Problems with OS Huge Page Man- sumption routinely results in increased virtual-to-physical agement. In The 12th ACM International Systems and Storage Con- address translations, and consequently and more importantly, ference (SYSTOR ’19), June 3–5, 2019, Haifa, Israel. ACM, New York, more translation misses. Both of these aspects collectively NY, USA, 11 pages. https://doi.org/10.1145/3319647.3325839 do hamper the performance of workload execution. A solution aimed at dramatically reducing the number of address translation misses has been to provide hardware support for 1 INTRODUCTION pages with bigger sizes, termed huge pages. In this paper, we Computer memory capacities have been increasing sig- empirically demonstrate the benefits and drawbacks of using nificantly over the past decades, leading to the development such huge pages. In particular, we show that it is essential of memory hungry workloads that consume vast amounts for modern OS to refine their software mechanisms to more of main memory. This upsurge in memory consumption rou- effectively manage huge pages. Based on our empirical ob- tinely results in increased virtual-to-physical address trans- servations, we propose and implement MEGA, a framework lations and consequently, and more importantly, an increase for huge page support for the Linux kernel. MEGA deploys in TLB translation misses. Increased TLB translation misses basic tracking mechanisms and a novel memory compaction can cause execution time overheads of up to 50% [2, 4, 13]. algorithm that jointly provide for the effective management Some proposed solutions to this problem, such as increasing of huge pages. We experimentally evaluate MEGA using an the TLB size or adding more page table levels simply increase array of both synthetic and real workloads and demonstrate average and worst-case translations costs. However, one pro- that our framework tackles known problems associated with posed solution, dating back to the 1990s aims to dramatically huge pages including increased page fault latency, memory reduce the number of address translation misses by provid- bloating as well as memory fragmentation, while at the same ing hardware support for pages with bigger sizes, known as time it delivers all huge pages benefits. huge pages. Depending on the architecture, multiple huge page sizes exist, with the most common being 2MB or 1GB. CCS CONCEPTS As a result of hardware changes to add huge pages, software • Software and its engineering → Operating systems; techniques in modern OS have been developed to exploit Memory management; Allocation / deallocation strate- huge pages and their benefits. gies; Huge pages can significantly reduce address translations, and consequently address translation misses in the TLB. Un- Permission to make digital or hard copies of all or part of this work for fortunately, due to poor huge page management in modern personal or classroom use is granted without fee provided that copies are not operating systems, huge pages also bring a number of prob- made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components lems such as increased page fault latency, increased memory of this work owned by others than ACM must be honored. Abstracting with fragmentation and memory bloating, and high CPU usage. credit is permitted. To copy otherwise, or republish, to post on servers or to These problems, combined with the fact that for many years redistribute to lists, requires prior specific permission and/or a fee. Request TLBs provided limited entries for huge pages, led multiple permissions from [email protected]. software vendors to recommend disabling huge page sup- SYSTOR ’19, June 3–5, 2019, Haifa, Israel port [11, 20, 24, 33, 34]. However, in recent years, hardware © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6749-3/19/06...$15.00 vendors have dramatically increased the number of TLB en- https://doi.org/10.1145/3319647.3325839 tries provided for huge pages, making them a more attractive 121 SYSTOR ’19, June 3–5, 2019, Haifa, Israel T. Michailidis et al. option again. Thus, it is imperative for the OS community user must use the MEM_LARGE_PAGES as allocation type in the to design new sophisticated algorithms that allow operating VirtualAlloc function [8], to allocate non-pageable mem- systems to efficiently exploit huge pages, while minimizing ory, using large pages [30]. as much as possible their associated shortcomings. To avoid the drawbacks of hugetlbfs, the Transparent In this paper, we empirically demonstrate the benefits Huge Pages (THP) feature was developed in Linux 2.6.38 and drawbacks of using huge pages. Based on our empirical [28], released in March 2011. With THP, the handling of observations, we design, implement, and evaluate MEGA: huge pages is done transparently by the kernel, without user Managing Efficientlyg Hu e Pages1, a memory management involvement; however, users can force the kernel to use huge framework for huge page support for the Linux kernel. The pages by using mmap with the MAP_POPULATE flag. Initially, key insight of MEGA is to provide huge pages only when the kernel uses base pages, i.e. 4KB pages, to satisfy memory they are needed and when their use does not impair system requests from the user level. We say the kernel promotes a and workload performance. We achieve this by tracking the sequence of 512 properly aligned base pages to a huge page locality of virtual-to-physical mappings and their utilization (and demotes a huge page into 512 base pages). Currently, throughout execution. We introduce a novel compaction al- Linux’s policy is to promote even a single base page, that gorithm that distinguishes physical memory blocks based on was faulted, to a huge page. Correspondingly, it demotes a their utilization and age of last virtual-to-physical mapping. huge page immediately, if any portion of its memory is freed. We experimentally evaluate MEGA using an array of FreeBSD also uses a transparent model to handle su- both synthetic and real workloads and demonstrate that perpages, which is based on reservation-based allocations compared to previous work, MEGA tackles more effectively with variable sized superpages and incremental promo- the problems associated with the use of huge pages. Specif- tions/demotions [19]. ically, compared with the mainstream Linux kernel with huge page support, MEGA achieves an order of magnitude 2.2 Memory compaction lower page fault tail latencies. We show that both Linux Linux divides physical memory into four memory zones and Ingens, a recent, state-of-the-art huge page management [6]; we briefly introduce how they are used and which framework, suffer from gratuitous use of huge pages due subset of memory they occupy in the x86_64 architecture. to aggressive policies. On the contrary, MEGA avoids this, ZONE_DMA: It is used for Direct Memory Access (DMA), via intelligent selection of huge page placement. Moreover, primarily by Industry Standard Architecture (ISA) devices our memory compaction algorithm, based on a novel cost- which are limited to 24-bit addresses, and contains the first benefit approach, consistently achieves up to 2X the amount 16MB of memory. ZONE_DMA32: It exists only in the of available free memory compared with Ingens, thus greatly x86_64 architecture and is primarily used by hardware that facilitating memory allocation. uses 32-bit DMA, and contains physical pages that lie in [16MB, 4GB]. ZONE_NORMAL: It contains normal address- 2 BACKGROUND able pages and, in the 64-bit kernel, it contains addresses that In this section, we briefly present how the Linux kernel cur- lie in [4GB, end of memory]. In x86_64, the kernel tries to rently handles some aspects of huge pages and compaction satisfy user-level memory requests from ZONE_NORMAL, and then discuss its benefits and drawbacks. and if there is no available memory, it tries to allocate memory from ZONE_DMA32. 2.1 Hugepage/Superpage OS support Memory compaction is the process that the kernel fol- Initial huge page support in Linux was through hugetlbfs lows to defragment the memory. The current kernel memory [35]; users were able to use huge pages by explicitly request- compaction algorithm [12] performs the following steps in ing them in their application either via the mmap system each zone: call with the MAP_HUGETLB flag or the shmget system call • Initially, two scanners are used; the first one (the mi- with the SHM_HUGETLB flag [21][22]. However, this policy gration scanner) starts from the beginning of the zone burdens the developer who must explicitly state how and and keeps track of allocated pages, while the second when huge pages must be used, and also does not cope well (the free scanner) starts from the end of the zone and with non-database workloads [10]. Despite these disadvan- keeps track of free pages. The scanners continue until tages, this approach is currently followed by the macOS and they meet. Windows operating systems. Specifically, in macOS, the user • After the scanners meet, the page migration algorithm asks for a huge page by using the VM_FLAGS_SUPERPAGE_* is invoked. Initially, for each group of pages that will be flag in the mmap system23 call[ ].

Overcoming Traditional Problems with OS Huge Page Management

Asynchronous Page Faults Aix Did It Red Hat Author Gleb Natapov August 10, 2010

What Is an Operating System III 2.1 Compnents II an Operating System

Measuring Software Performance on Linux Technical Report

Guide to Openvms Performance Management

Virtual Memory §5.4 Virt Virtual Memory U Al Memo Use Main Memory As a “Cache” For

Operating Systems

An Evolutionary Study of Linux Memory Management for Fun and Profit Jian Huang, Moinuddin K

CS423 Spring 2014 MP3: Virtual Memory Page Fault Profiler Due: April 7Th, 11:00 Am

Virtual Memory March 18, 2008

User Level Page Faults

Basic Algorithms

6.828: Using Virtual Memory Adam Belay [email protected]