Understanding the Linux® Virtual Memory Manager BRUCE PERENS’ OPEN SOURCE SERIES

Understanding the Linux® Virtual Memory Manager BRUCE PERENS’ OPEN SOURCE SERIES http://www.phptr.com/perens ♦ C++ GUI Programming with Qt 3 Jasmin Blanchette, Mark Summerfield ♦ Managing Linux Systems with Webmin: System Administration and Module Development Jamie Cameron ♦ Understanding the Linux Virtual Memory Manager Mel Gorman ♦ Implementing CIFS: The Common Internet File System Christopher R. Hertel ♦ Embedded Software Development with eCos Anthony J. Massa ♦ Rapid Application Development with Mozilla Nigel McFarlane ♦ The Linux Development Platform: Configuring, Using, and Maintaining a Complete Programming Environment Rafeeq Ur Rehman, Christopher Paul ♦ Intrusion Detection Systems with Snort: Advanced IDS Techniques with Snort, Apache, MySQL, PHP, and ACID Rafeeq Ur Rehman ♦ The Official Samba-3 HOWTO and Reference Guide John H. Terpstra, Jelmer R. Vernooij, Editors ♦ Samba-3 by Example: Practical Exercises to Successful Deployment John H. Terpstra Understanding the Linux® Virtual Memory Manager Mel Gorman PRENTICE HALL PROFESSIONAL TECHNICAL REFERENCE UPPER SADDLE RIVER,NJ07458 WWW.PHPTR.COM Library of Congress Cataloging-in-Publication Data Gorman, Mel. Understanding the Linux Virtual Memory Manager / Mel Gorman. p. cm.—(Bruce Perens’ Open source series) Includes bibliographical references and index. ISBN 0-13-145348-3 1. Linux. 2. Virtual computer systems. 3. Virtual storage (Computer science) I. Title. II. Series. QA76.9.V5G67 2004 005.4’3—dc22 2004043864 Editorial/production supervision: Jane Bonnell Composition: TechBooks Cover design director: Jerry Votta Manufacturing buyer: Maura Zaldivar Executive Editor: Mark L. Taub Editorial assistant: Noreen Regina Marketing manager: Dan DePasquale c 2004 Pearson Education, Inc. Publishing as Prentice Hall Professional Technical Reference Upper Saddle River, New Jersey 07458 This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/). Prentice Hall PTR offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales. For more information, please contact: U.S. Corporate and Government Sales, 1-800-382-3419, [email protected]. For sales outside of the U.S., please contact: International Sales, 1-317-581-3793, [email protected]. Company and product names mentioned herein are the trademarks or registered trademarks of their respective owners. Printed in the United States of America First Printing ISBN 0-13-145348-3 Pearson Education LTD. Pearson Education Australia PTY, Limited Pearson Education South Asia Pte. Ltd. Pearson Education Asia Ltd. Pearson Education Canada, Ltd. Pearson Educación de Mexico, S.A. de C.V. Pearson Education—Japan Pearson Malaysia SDN BHD To John O’Gorman (RIP) for teaching me the joys of operating systems and for making memory management interesting. To my parents and family for their continuous support of my work. To Karen for making all the work seem worthwhile. Contents PREFACE xiii 1INTRODUCTION 1 1.1 Getting Started 2 1.2 Managing the Source 4 1.3 Browsing the Code 9 1.4 Reading the Code 11 1.5 Submitting Patches 12 2DESCRIBING PHYSICAL MEMORY 15 2.1 Nodes 16 2.2 Zones 18 2.3 Zone Initialization 23 2.4 Initializing mem map 24 2.5 Pages 24 2.6 Mapping Pages to Zones 29 2.7 High Memory 29 2.8 What’s New in 2.6 30 3PAGETABLE MANAGEMENT 33 3.1 Describing the Page Directory 33 3.2 Describing a Page Table Entry 36 3.3 Using Page Table Entries 37 3.4 Translating and Setting Page Table Entries 39 3.5 Allocating and Freeing Page Tables 39 3.6 Kernel Page Tables 40 3.7 Mapping Addresses to a struct page 42 3.8 Translation Lookaside Buffer (TLB) 43 vii viii Contents 3.9 Level 1 CPU Cache Management 44 3.10 What’s New in 2.6 47 4PROCESS ADDRESS SPACE 53 4.1 Linear Address Space 53 4.2 Managing the Address Space 55 4.3 Process Address Space Descriptor 57 4.4 Memory Regions 61 4.5 Exception Handling 79 4.6 Page Faulting 80 4.7 Copying to/from Userspace 87 4.8 What’s New in 2.6 90 5BOOT MEMORY ALLOCATOR 95 5.1 Representing the Boot Map 96 5.2 Initializing the Boot Memory Allocator 98 5.3 Initializing bootmem data 98 5.4 Allocating Memory 99 5.5 Freeing Memory 100 5.6 Retiring the Boot Memory Allocator 101 5.7 What’s New in 2.6 102 6 PHYSICAL PAGE ALLOCATION 105 6.1 Managing Free Blocks 105 6.2 Allocating Pages 106 6.3 Free Pages 109 6.4 Get Free Page (GFP) Flags 110 6.5 Process Flags 111 6.6 Avoiding Fragmentation 112 6.7 What’s New in 2.6 113 7 NONCONTIGUOUS MEMORY ALLOCATION 117 7.1 Describing Virtual Memory Areas 117 7.2 Allocating a Noncontiguous Area 118 7.3 Freeing a Noncontiguous Area 120 7.4 What’s New in 2.6 121 8SLABALLOCATOR 123 8.1 Caches 125 8.2 Slabs 137 Contents ix 8.3 Objects 144 8.4 Sizes Cache 146 8.5 Per-CPU Object Cache 148 8.6 Slab Allocator Initialization 150 8.7 Interfacing With the Buddy Allocator 151 8.8 What’s New in 2.6 151 9HIGHMEMORYMANAGEMENT 153 9.1 Managing the PKMap Address Space 153 9.2 Mapping High Memory Pages 154 9.3 Unmapping Pages 156 9.4 Mapping High Memory Pages Atomically 156 9.5 Bounce Buffers 157 9.6 Emergency Pools 159 9.7 What’s New in 2.6 160 10 PAGE FRAME RECLAMATION 163 10.1 Page Replacement Policy 164 10.2 Page Cache 165 10.3 LRU Lists 169 10.4 Shrinking All Caches 173 10.5 Swapping Out Process Pages 173 10.6 Pageout Daemon (kswapd) 175 10.7 What’s New in 2.6 177 11 SWAP MANAGEMENT 179 11.1 Describing the Swap Area 180 11.2 Mapping Page Table Entries to Swap Entries 183 11.3 Allocating a Swap Slot 184 11.4 Swap Cache 185 11.5 Reading Pages From Backing Storage 189 11.6 Writing Pages to Backing Storage 189 11.7 Reading/Writing Swap Area Blocks 192 11.8 Activating a Swap Area 192 11.9 Deactivating a Swap Area 193 11.10 What’s New in 2.6 194 12 SHARED MEMORY VIRTUAL FILESYSTEM 195 12.1 Initializing the Virtual Filesystem 196 12.2 Using shmem Functions 197 x Contents 12.3 Creating Files in tmpfs 199 12.4 Page Faulting Within a Virtual File 201 12.5 File Operations in tmpfs 203 12.6 Inode Operations in tmpfs 203 12.7 Setting Up Shared Regions 204 12.8 System V IPC 204 12.9 What’s New in 2.6 207 13 OUT OF MEMORY MANAGEMENT 209 13.1 Checking Available Memory 209 13.2 Determining OOM Status 210 13.3 Selecting a Process 211 13.4 Killing the Selected Process 211 13.5 Is That It? 211 13.6 What’s New in 2.6 211 14 THE FINAL WORD 213 CODE COMMENTARY AINTRODUCTION 217 BDESCRIBING PHYSICAL MEMORY 219 B.1 Initializing Zones 220 B.2 Page Operations 234 CPAGETABLE MANAGEMENT 239 C.1 Page Table Initialization 240 C.2 Page Table Walking 248 DPROCESS ADDRESS SPACE 251 D.1 Process Memory Descriptors 254 D.2 Creating Memory Regions 261 D.3 Searching Memory Regions 309 D.4 Locking and Unlocking Memory Regions 315 D.5 Page Faulting 328 D.6 Page-Related Disk I/O 355 EBOOT MEMORY ALLOCATOR 395 E.1 Initializing the Boot Memory Allocator 396 Contents xi E.2 Allocating Memory 399 E.3 Freeing Memory 409 E.4 Retiring the Boot Memory Allocator 411 F PHYSICAL PAGE ALLOCATION 419 F.1 Allocating Pages 420 F.2 Allocation Helper Functions 433 F.3 Free Pages 435 F.4 Free Helper Functions 440 G NONCONTIGUOUS MEMORY ALLOCATION 441 G.1 Allocating a Noncontiguous Area 442 G.2 Freeing a Noncontiguous Area 452 HSLABALLOCATOR 457 H.1 Cache Manipulation 459 H.2 Slabs 479 H.3 Objects 486 H.4 Sizes Cache 501 H.5 Per-CPU Object Cache 504 H.6 Slab Allocator Initialization 511 H.7 Interfacing with the Buddy Allocator 512 IHIGHMEMORYMANAGEMENT 513 I.1 Mapping High Memory Pages 514 I.2 Mapping High Memory Pages Atomically 519 I.3 Unmapping Pages 521 I.4 Unmapping High Memory Pages Atomically 523 I.5 Bounce Buffers 524 I.6 Emergency Pools 532 JPAGEFRAME RECLAMATION 535 J.1 Page Cache Operations 537 J.2 LRU List Operations 547 J.3 Refilling inactive list 552 J.4 Reclaiming Pages From the LRU Lists 554 J.5 Shrinking All Caches 562 J.6 Swapping Out Process Pages 566 J.7 Page Swap Daemon 577 xii Contents KSWAPMANAGEMENT 583 K.1 Scanning for Free Entries 585 K.2 Swap Cache 590 K.3 Swap Area I/O 597 K.4 Activating a Swap Area 607 K.5 Deactivating a Swap Area 619 L SHARED MEMORY VIRTUAL FILESYSTEM 633 L.1 Initializing shmfs 635 L.2 Creating Files in tmpfs 641 L.3 File Operations in tmpfs 645 L.4 Inode Operations in tmpfs 659 L.5 Page Faulting Within a Virtual File 668 L.6 Swap Space Interaction 679 L.7 Setting Up Shared Regions 686 L.8 System V IPC 689 MOUTOFMEMORY MANAGEMENT 697 M.1 Determining Available Memory 698 M.2 Detecting and Recovering From OOM 700 REFERENCES 707 CODE COMMENTARY INDEX 711 INDEX 717 ABOUT THE AUTHOR 729 Preface Linux is developed with a stronger practical emphasis than a theoretical one. When new algorithms or changes to existing implementations are suggested, it is common to request code to match the argument.

Understanding the Linux® Virtual Memory Manager BRUCE PERENS’ OPEN SOURCE SERIES

Analysing Aliasing in Java Applications

Better Performance Through a Disk/Persistent-RAM Hybrid Design

Virtual Memory

Automatic Detection of Bad Smells in Code: an Experimental Assessment

Preliminary Report on Empirical Study of Repeated Fragments in Internal Documentation

SFLC V Conservancy

Memory Protection at Option

Chapter 3. Booting Operating Systems

Evaluating Dependency Based Package-Level Metrics for Multi-Objective Maintenance Tasks

ELF1 7D Virtual Memory

Linux at 25 PETERHISTORY H

HALO: Post-Link Heap-Layout Optimisation