Copyright by Emery David Berger 2002

Copyright by Emery David Berger 2002 The Dissertation Committee for Emery David Berger certifies that this is the approved version of the following dissertation: Memory Management for High-Performance Applications Committee: Kathryn S. McKinley, Supervisor James C. Browne Michael D. Dahlin Stephen W. Keckler Benjamin G. Zorn Memory Management for High-Performance Applications by Emery David Berger, M.S., B.S. Dissertation Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy The University of Texas at Austin August 2002 To my wife and family. Acknowledgments I’d like to thank my advisor, Kathryn McKinley, and my de facto co-advisor Ben Zorn, for their invaluable guidance, insights, and help along the way. If not for Kathryn and Ben, it wouldn’t have happened. Really. I’d also like to thank my first advisor, Bobby Blumofe, for helping me to develop a rigorous analytical and experimental approach to systems work. Thanks to the Novell Corporation for supporting my doctoral work with a fellowship, and special thanks to Ben Zorn and Microsoft Research for supporting my doctorate with two internships, a grant, and a Microsoft Research Fellowship. Thanks to Bruce Porter for his advice and support through the years, going above and beyond in rescuing certain unnamed people’s chestnuts from fires, and to Steve Keckler and Calvin Lin, for great advice and hilarious conversation. Systems work requires lots of resources, and the CS staff have been tremendously helpful if not downright necessary, especially Patti Spencer, Fletcher Mattox, Boyd Merworth, and Kay Nettle. It would have been possible, but far less enjoyable, were it not for the life support system that was “the lunch bunch”, who made graduate school arguably a little too fun, except for the grueling spectacle of practice talks: Brendon Cahoon (thanks for bringing me your advisor!), Rich Cardone, Sam Guyer, Daniel Jimenez,´ Ram Mettu, and Phoebe Weidmann. The fact that we were never kicked out of any restaurant/cafe/bar´ is testament v to Austin’s reputation as a tolerant city. It’s been a blast. Thanks to Scott Kaplan and Yannis Smaragdakis for their friendship, support of my work, and advice that probably saved my career multiple times. Thanks to Tim Collins for all of the above, plus for the zeal of his unending quest to help get me through to the other side. Thanks to my mom and dad and to my mother-in-law Joyce for helping us out in so many ways. Finally, I dedicate this thesis to my wife Elayne, the love of my life. El, you are my density. And thanks to our babies, Sophia and Benjamin, just for being wonderful. EMERY DAVID BERGER The University of Texas at Austin August 2002 vi Memory Management for High-Performance Applications Publication No. Emery David Berger, Ph.D. The University of Texas at Austin, 2002 Supervisor: Kathryn S. McKinley Memory managers are a source of performance and robustness problems for ap- plication software. Current general-purpose memory managers do not scale on multipro- cessors, cause false sharing of heap objects, and systematically leak memory. Even on uniprocessors, the memory manager is often a performance bottleneck. General-purpose memory managers also do not provide the bulk deletion semantics required by many applications, including web servers and compilers. The approaches taken to date to address these and other memory management problems have been largely ad hoc. Programmers often attempt to work around these problems by writing custom memory managers. This approach leads to further difficulties, including data corruption caused when programmers inadvertently free custom-allocated objects to the general-purpose memory manager. In this thesis, we develop a framework for analyzing and designing high-quality memory managers. We develop a memory management infrastructure called heap layers that allows programmers to compose efficient memory managers from reusable and inde- vii pendently testable components. We conduct the first comprehensive examination of custom memory managers and show that most of these achieve slight or no performance improve- ments over a state-of-the-art general-purpose memory manager. Building on the knowledge gained in this study, we develop a hybrid memory management abstraction called reaps that combines the best of both approaches, allowing server applications to manage memory quickly and flexibly while avoiding memory leaks. We identify a number of previously unnoticed problems with concurrent memory management and analyze previous work in the light of these discoveries. We then present a concurrent memory manager called Hoard and prove that it avoids these problems. viii Table of Contents Acknowledgments v Abstract vii List of Tables xiv List of Figures xvii Chapter 1 Introduction 1 1.1 Problems with Current Memory Managers . 1 1.2 Contributions . 3 1.3 Summary of Contributions . 5 1.4 Outline . 6 Chapter 2 Background and Related Work 8 2.1 Basic Concepts . 8 2.2 General-Purpose Memory Management . 10 2.3 Memory Management Infrastructures . 11 2.3.1 Vmalloc . 11 ix 2.3.2 CMM . 12 2.4 Custom Memory Management . 13 2.4.1 Construction and Use of Custom Memory Managers . 13 2.4.2 Evaluation of Custom Memory Management . 15 Chapter 3 Experimental Methodology 17 3.1 Benchmarks . 17 3.1.1 Memory-Intensive Benchmarks . 17 3.1.2 General-Purpose Benchmarks . 19 3.2 Platforms . 20 3.3 Execution Environment . 20 Chapter 4 Composing High-Performance Memory Managers 22 4.1 Heap Layers . 24 4.1.1 Example: Composing a Per-Class Allocator . 27 4.1.2 A Library of Heap Layers . 29 4.2 Building Special-Purpose Allocators . 31 4.2.1 197.parser . 31 4.2.2 176.gcc . 33 4.3 Building General-Purpose Allocators . 35 4.3.1 The Kingsley Allocator . 36 4.3.2 The Lea Allocator . 38 4.3.3 Experimental Results . 40 4.4 Software Engineering Benefits . 41 4.5 Heap Layers as an Experimental Infrastructure . 43 x 4.6 Conclusion . 44 Chapter 5 Reconsidering Custom Memory Management 51 5.1 Benchmarks . 52 5.1.1 Emulating Custom Semantics . 54 5.2 Custom Memory Managers . 55 5.2.1 Why Programmers Use Custom Memory Managers . 55 5.2.2 A Taxonomy of Custom Memory Managers . 59 5.3 Evaluating Custom Memory Managers . 61 5.3.1 Evaluating Regions . 63 5.4 Results . 64 5.4.1 Runtime Performance . 64 5.4.2 Memory Consumption . 65 5.4.3 Evaluating the Memory Consumption of Region Allocation . 67 5.5 Discussion . 68 5.6 Conclusions . 70 Chapter 6 Memory Management for Servers 71 6.1 Drawbacks of Regions . 72 6.2 Desiderata . 73 6.3 Reaps: Generalizing Regions and Heaps . 73 6.3.1 Design and Implementation . 74 6.4 Results . 76 6.4.1 Runtime Performance . 77 6.4.2 Memory Consumption . 78 xi 6.4.3 Experimental Comparison to Previous Work . 78 6.4.4 Reap in Apache . 79 6.5 Conclusion . 80 Chapter 7 Scalable Concurrent Memory Management 82 7.1 Motivation . 85 7.1.1 Allocator-Induced False Sharing of Heap Objects . 86 7.1.2 Blowup . 88 7.2 The Hoard Memory Allocator . 90 7.2.1 Bounding Blowup . 91 7.2.2 Example . 92 7.2.3 Avoiding False Sharing . 94 7.3 Analytical Results . 96 7.4 Bounds on Blowup . 96 7.4.1 Proof . 97 7.5 Bounds on Synchronization . 98 7.5.1 Per-processor Heap Contention . 98 7.5.2 Global Heap Contention . 99 7.6 Experimental Results . 100 7.6.1 Speed . 102 7.6.2 Scalability . 103 7.7 False sharing . 107 7.8 Fragmentation . 109 7.8.1 Single-threaded Applications . 109 7.8.2 Multithreaded Applications . 110 xii 7.8.3 Sensitivity Study . 110 7.9 Related Work . 112 7.9.1 Taxonomy of Memory Allocator Algorithms . 112 7.9.2 Single Heap Allocation . 113 7.9.3 Multiple Heap Allocation . 114 7.10 Conclusion . 117 Chapter 8 Conclusion 119 8.1 Contributions . 119 8.2 Future Work . 120 8.3 Availability of Software . 122 Bibliography 123 Vita 134 xiii List of Tables 3.1 Memory-intensive benchmarks. All programs are written in C, except as noted. 18 3.2 Statistics for the memory-intensive benchmarks. We divide by runtime with the Lea allocator to obtain memory operations per second. 18 3.3 General-purpose benchmarks. All programs are written in C, except as noted. 19 3.4 Statistics for the General-Purpose Benchmark suite. 19 3.5 Platform characteristics. The number in parenthesis after CPU clock speed indicates the number of processors. In every case, the L2 caches are unified and the L1 caches are not. 20 4.1 Executable sizes for variants of 197.parser. 32 4.2 A library of heap layers, divided by category. 46 4.3 Runtime (in seconds) for the general-purpose allocators described in this chapter. See Figure 4.9(a) for the graph of this data normalized to the Lea allocator. 47 xiv 4.4 Memory consumption (in bytes) for the general-purpose allocators.

Load more