Optimizing Main Memory Usage in Modern Computing Systems to Improve Overall System Performance Daniel Jose Campello Florida International University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 6-20-2016 Optimizing Main Memory Usage in Modern Computing Systems to Improve Overall System Performance Daniel Jose Campello Florida International University, [email protected] DOI: 10.25148/etd.FIDC000755 Follow this and additional works at: https://digitalcommons.fiu.edu/etd Part of the Data Storage Systems Commons, OS and Networks Commons, Systems Architecture Commons, and the Theory and Algorithms Commons Recommended Citation Campello, Daniel Jose, "Optimizing Main Memory Usage in Modern Computing Systems to Improve Overall System Performance" (2016). FIU Electronic Theses and Dissertations. 2568. https://digitalcommons.fiu.edu/etd/2568 This work is brought to you for free and open access by the University Graduate School at FIU Digital Commons. It has been accepted for inclusion in FIU Electronic Theses and Dissertations by an authorized administrator of FIU Digital Commons. For more information, please contact [email protected]. FLORIDA INTERNATIONAL UNIVERSITY Miami, Florida OPTIMIZING MAIN MEMORY USAGE IN MODERN COMPUTING SYSTEMS TO IMPROVE OVERALL SYSTEM PERFORMANCE A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in COMPUTER SCIENCE by Daniel Campello 2016 To: Interim Dean Ranu Jung College of Engineering and Computing This dissertation, written by Daniel Campello, and entitled Optimizing Main Memory Us- age in Modern Computing Systems to Improve Overall System Performance, having been approved in respect to style and intellectual content, is referred to you for judgment. We have read this dissertation and recommend that it be approved. Giri Narasimhan Jason Liu Gang Quan Ming Zhao Raju Rangaswami, Major Professor Date of Defense: June 20, 2016 The dissertation of Daniel Campello is approved. Interim Dean Ranu Jung College of Engineering and Computing Andr´es G. Gil Vice President for Research and Economic Development and Dean of the University Graduate School Florida International University, 2016 ii DEDICATION To Isaac, Tabata, Marina and Juan Jos´e. iii ACKNOWLEDGMENTS I wish to acknowledge my research sponsors including NSF via CNS-1320426, CNS-1018262, CNS-0747038, gifts from Intel and NetApp, and USENIX’s travel scholarships. iv ABSTRACT OF THE DISSERTATION OPTIMIZING MAIN MEMORY USAGE IN MODERN COMPUTING SYSTEMS TO IMPROVE OVERALL SYSTEM PERFORMANCE by Daniel Campello Florida International University, 2016 Miami, Florida Professor Raju Rangaswami, Major Professor Operating Systems use fast, CPU-addressable main memory to maintain an applica- tion’s temporary data as anonymous data and to cache copies of persistent data stored in slower block-based storage devices. However, the use of this faster memory comes at a high cost. Therefore, several techniques have been implemented to use main memory more efficiently in the literature. In this thesis we introduce three distinct approaches to improve overall system performance by optimizing main memory usage. First, DRAM and host-side caching of file system data are used for speeding up virtual machine performance in today’s virtualized data centers. The clustering of VM images that share identical pages, coupled with data deduplication, has the potential to optimize main memory usage, since it provides more opportunity for sharing resources across processes and across different VMs. In our first approach, we study the use of content and semantic similarity metrics and a new algorithm to cluster VM images and place them in hosts where through deduplication we improve main memory usage. Second, while careful VM placement can improve memory usage by eliminating du- plicate data, caches in current systems employ complex machinery to manage the cached data. Writing data to a page not present in the file system page cache causes the operating system to synchronously fetch the page into memory, blocking the writing process. In this thesis, we address this limitation with a new approach to managing page writes involving v buffering the written data elsewhere in memory and unblocking the writing process imme- diately. This buffering allows the system to service file writes faster and with less memory resources. In our last approach, we investigate the use of emerging byte-addressable persistent memory technology to extend main memory as a less costly alternative to exclusively us- ing expensive DRAM. We motivate and build a tiered memory system wherein persistent memory and DRAM co-exist and provide improved application performance at lower cost and power consumption with the goal of placing the right data in the right memory tier at the right time. The proposed approach seamlessly performs page migration across memory tiers as access patterns change and/or to handle tier memory pressure. vi TABLE OF CONTENTS CHAPTER PAGE 1. INTRODUCTION ................................. 1 2. PROBLEMSTATEMENT ............................. 6 2.1 ThesisStatement ................................. 6 2.2 ThesisStatementDescription. ..... 6 2.3 ThesisImpact................................... 7 3. BACKGROUND .................................. 9 3.1 CacheDeduplication .............................. 9 3.2 OperatingSystemCaching . .. 10 3.3 EmergingMemoryTechnologies . ... 11 4. CORIOLIS: SCALABLE VM CLUSTERING FOR CACHE DEDUPLICATION 13 4.1 VMClustering:AnOverview . 13 4.2 VMSimilarity:TypesandApplications . ...... 14 4.2.1 ContentSimilarity ............................. .. 14 4.2.2 SemanticSimilarity. ... 16 4.2.3 HarnessingImageSimilarity . ..... 17 4.3 Similarity-basedVMClustering . ..... 18 4.3.1 ARepresentativeClusteringAlgorithm . ........ 18 4.3.2 ASimilarityFunctionforImages . ..... 19 4.3.3 ScalingChallenge .............................. 20 4.4 CORIOLIS ..................................... 22 4.4.1 SolutionIdea:AsymmetricClustering . ........ 22 4.4.2 CORIOLIS Architecture............................. 23 4.4.3 CORIOLIS’Tree-basedClustering . 24 4.4.4 ScalabilityEvaluation . .... 26 4.5 Summary ..................................... 27 4.6 Credits....................................... 28 5. NON-BLOCKINGWRITESTOFILES . 29 5.1 MotivatingNon-blockingWritestoFiles . ........ 30 5.1.1 Addressingthefetch-before-writeproblem . .......... 30 5.1.2 AddressingCorrectness. .... 34 5.2 ApproachOverview................................ 34 5.2.1 WriteHandling................................. 35 5.2.2 PatchManagement ............................... 35 5.2.3 Non-blockingReads ............................. 36 5.3 AlternativePageFetchModes . ... 36 5.3.1 AsynchronousPageFetch(NBW-Async) . ..... 37 5.3.2 LazyPageFetch(NBW-Lazy) . .. 38 vii 5.4 Implementation .................................. 39 5.4.1 Overview .................................... 39 5.4.2 ImplementationInsights . .... 39 5.5 Evaluation..................................... 41 5.5.1 FilebenchMicro-benchmark . .... 42 5.5.2 SPECsfs2008Macro-benchmark . .... 48 5.5.3 MobiBenchTraceReplay. .. 51 5.6 Summary ..................................... 52 5.7 Credits....................................... 53 6. MANAGINGTIEREDMEMORYSYSTEMSWITH MULTI-CLOCK . 54 6.1 Motivation..................................... 56 6.1.1 Swappingvs.Tiering. .. 57 6.1.2 StaticTiering ................................. 59 6.1.3 DynamicTiering ................................ 59 6.2 MULTI-CLOCK .................................. 60 6.2.1 LifeCycleofaPage .............................. 61 6.2.2 PromotionMechanism . 64 6.2.3 DemotionMechanism ............................. 64 6.3 Implementation .................................. 65 6.4 Evaluation..................................... 68 6.4.1 EmulationPlatform.. .. .. .. .. .. .. .. .. 69 6.4.2 Micro-benchmark............................... 69 6.4.3 GraphLab.................................... 71 6.4.4 MemcachedYCSBbenchmark. 74 6.4.5 VoltDBTPC-Cbenchmark . 78 6.5 Discussion..................................... 79 6.6 Summary ..................................... 81 6.7 Credits....................................... 82 7. RELATEDWORK ................................. 83 7.1 CORIOLIS:ScalableVMClusteringinClouds. 83 7.2 Non-blockingWritestoFiles . .... 84 7.3 Managing Tiered Memory Systems with MULTI-CLOCK ............ 86 8. CONCLUSIONS .................................. 89 BIBLIOGRAPHY ................................... 91 VITA ..........................................102 viii LIST OF TABLES TABLE PAGE 3.1 Comparison of Memory Technologies [DRZ+16]. ............... 11 4.1 Similaritytypesrelevantforeachusecase. ......... 18 4.2 TimeforSimilarityandMergeoperations. ....... 20 5.1 Workloadstracedandtheirdescriptions.. ......... 32 5.2 SPECsfs2008writesizes. .. 49 6.1 Comparison of Memory Technologies [DRZ+16]. ............... 55 6.2 Linuxsourcecodemodificationsinnumberoflines. ......... 68 ix LIST OF FIGURES FIGURE PAGE 3.1 Anatomyofawrite. ............................... 10 4.1 Distribution of content and semantic similarity for 25 VMimagepairs. 15 4.2 CORIOLIS SystemContext. ........................... 17 4.3 CORIOLIS architecture. ............................. 23 4.4 Tree-basedclustering. ... 24 4.5 ClusteringanewimageF. .. .. .. .. .. .. .. 26 4.6 Scalability of k-medoids and CORIOLIS’ tree-based clustering algorithms. 27 5.1 A non-blocking write employing asynchronous fetch. ........... 31 5.2 Breakdown of write operations by amount of page data overwritten. 32 5.3 Non-blocking writes as a percentage of total write operations. ......... 33 5.4 Anon-blockingwriteemployinglazyfetch.