Eth-28647-02.Pdf

Research Collection Doctoral Thesis Adaptive main memory compression Author(s): Tuduce, Irina Publication Date: 2006 Permanent Link: https://doi.org/10.3929/ethz-a-005180607 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Doctoral Thesis ETH No. 16327 Adaptive Main Memory Compression A dissertation submitted to the Swiss Federal Institute of Technology Zurich (ETH ZÜRICH) for the degree of Doctor of Technical Sciences presented by Irina Tuduce Engineer, TU Cluj-Napoca born April 9, 1976 citizen of Romania accepted on the recommendation of Prof. Dr. Thomas Gross, examiner Prof. Dr. Lothar Thiele, co-examiner 2005 Seite Leer / Blank leaf Seite Leer / Blank leaf Seite Leer lank Abstract Computer pioneers correctly predicted that programmers would want unlimited amounts of fast memory. Since fast memory is expensive, an economical solution to that desire is a memory hierarchy organized into several levels. Each level has greater capacity than the preceding but it is less quickly accessible. The goal of the memory hierarchy is to provide a memory system with cost almost as low as the cheapest level of memory and speed almost as fast as the fastest level. In the last decades, the processor performance improved much faster than the performance of the memory levels. The memory hierarchy proved to be a scalable solution, i.e., the bigger the perfomance gap between processor and memory, the more levels are used in the memory hierarchy. For instance, in 1980 microprocessors were often designed without caches, while in 2005 most of them come with two levels of caches on the chip. Since the fast upper memory levels are small, programs with poor locality tend to access data from the lower levels of the memory hierarchy. Therefore, these programs run slower than programs with good locality. The sizes of all memory levels have increased continuously. Fol¬ lowing this trend, the applications would fit in higher (and faster) memory levels. However, the application developers have even more agresively increased their memory demands. Appli¬ cations with large memory requirements and poor locality are becoming increasingly popular, as people attend to solve large problems (e.g., network simulators, traffic simulators, model checking, databases). Given the technology and application trends, efficiently executing large applications on a hierarchy of memories remains a challenge. Given that the fast memory levels are close to the processor, bringing the working set of an application closer to the processor tends to improve the performance of the application. An approach to bringing the application's data closer to the processor is compressing one of the existing memory levels. This approach is becoming increasigly attractive as processors become faster and more cycles can be dedicated to (de)compressing data. This thesis provides an example of efficiently designing and implementing a compressed- memory system. We choose to investigate compression at the main memory level (RAM) be¬ cause the management of this level is done in software (thus allowing for rapid prototyping and validation). Although our design and implementation are specific to main memory compression, the concepts described are general and can be applied to any level in the memory hierarchy. The key idea of main memory compression is to set aside part of main memory to hold compressed data. By compressing some of the data space, the effective memory size available to the applications is made larger and disk accesses are avoided. One of the thomy issues is that sizing the region that holds compressed data is difficult and if not done right (i.e., the region is too large or too small) memory compression slows down the application. There are two claims that make up the core of the thesis. First, the thesis shows that it is possible to implement main memory compression in an efficient way, meaning that while v vi applications execute the size of memory that stores compressed data can be changed easily if it advisable to do so. We describe a practical design for an adaptive compressed-memory system and demonstrate that it can be integrated into an existing general-purpose operating system. The key idea of our design is to keep compressed pages in self-contained zones, and to grow and shrink the compressed region by adding and removing zones. Second, the thesis shows that it is possible to estimate - during an application's execution - how much data should be compressed such that compression improves the application's perfor¬ mance. The technique we proposed is based on an application's execution profile and finds the compressed region size such that the application's working set fits into uncompressed and com¬ pressed regions. If such a size does not exist or compression hurts performance, our technique turns off compression. This way, if compression cannot improve an application's performance, it will not hurt it either. To determine whether compression hurts performance, we compare - at runtime - an application's performance on the compressed-memory system with an estimation of its performance without compression. The performance estimation is based on the memory access pattern of the application, efficiency of the compression algorithm and amount of data being compressed. The design proposed in this thesis is implemented in Linux OS, runs on both 32-bit and 64-bit architectures, and has been demonstrated to work in practice under complex workload conditions and memory pressure. To assess the benefits of main memory compression, we use benchmarks and real applications that have large memory requirements. The applications used are: Symbolic Model Verifier (SMV), NS2 network simulator, and qsim car traffic simulator. For the selected benchmarks and applications, the system shows an increase in performance by a factor of 1.3 to 55. To sum up, this thesis shows that a compressed-memory level is a beneficial addition to the classical memory hierarchy, and this addition can be provided without significant effort. The compressed-memory level exploits the tremendous advances in processor speed that have not been matched by corresponding increases in memory performance. Therefore, if access times to memory and disk continue to improve over the next decade at the same rate as they did during the last decade (the likely scenario), compressed-memory systems are an attractive approach to improve total system performance. Zusammenfassung Pioniere der Informatik haben korrekt vorhergesehen, dass Programmierer unbegrenzten Bedarf nach schnellem Speicher haben würden. Schneller Speicher ist jedoch teuer. Eine ökonomische Lösung zur Zufriedenstellung des Bedarfs ist eine in mehrere Ebenen aufgeteilte Speicherhier¬ archie. Jede Ebene bietet eine grössere Kapazität als die vorgängige, jedoch eine geringere Zugriffsgeschwindigkeit. Das Ziel der Speicherhierarchie ist die Bereitstellung eines Speicher¬ systems zu Kosten, die fast der günstigsten Ebene entsprechen, und mit einer Geschwindigkeit, die fast der schnellsten Ebene entspricht. Die Leistungsfähigkeit von Prozessoren hat in den letzten Jahrzehnten markant schneller zugenommen als diejenige der Speicherebenen. Die Idee des hierarchischen Speichersystems hat sich dabei als skalierbare Lösung erwiesen; die Anzahl der verwendeten Ebenen wird vergrössert, umso grösser die Differenz zwischen der Leistungsfähigkeit von Prozessoren und Speicher wird. 1980 wurden Mikroprozessoren zum Beispiel oft ohne Caches entworfen; 2005 besitzen jedoch die meisten Prozessoren zwei Cacheebenen direkt auf dem Prozessorchip. Da die schnellen höheren Speicherebenen klein sind, tendieren Programme mit schlechter Lokalität dazu, auf Daten in den tieferen Speicherebenen zuzugreifen. Das führt zu Geschwindigkeitseinbussen im Vergleich zu Programmen mit guter Lokalität. Trotz des kon¬ tinuierlichen Wachstums aller Speicherebenen hat sich diese Situation nicht verbessert, da die Nachfrage nach Speicher durch Anwendungsentwickler in noch stärkerem Masse zugenommen hat. Das Lösen immer grösserer Problemstellungen (z.B. Netzwerksimulation, Verkehrssimu¬ lation, Model-Checking, Datenbanken) führt vermehrt zu Applikationen mit hohen Speicheran¬ forderungen und schlechter Lokalität. Unter diesen Technologie- und Anwendungstrends bleibt die effiziente Ausführung von grossen Applikationen eine Herausforderung. Wenn das Working Set einer Applikation näher zum Prozessor gebracht wird, kann damit tendenziell die Ausführungsgeschwindigkeit erhöht werden, da die schnelleren Speicherebenen nahe beim Prozessor liegen. Ein Ansatz, um die Daten einer Applikation näher zum Prozessor zu bringen, besteht in der Komprimierung einer bestehenden Speicherebene. Dieser Ansatz wird zunehmend attraktiver, bedingt durch die Zunahme der Prozessorgeschwindigkeit, die es erlaubt einen Teil der Prozessorzeit zur Komprimierung und Entkomprimierung von Daten aufzuwenden. Diese Doktorarbeit beschreibt den Entwurf und die Implementation eines effizienten Spe¬ ichersystems mit Speicherkomprimierung. Wir fokussieren uns dabei auf die Komprimierung auf Hauptspeicherebene (RAM). Diese Ebene wird softwareseitig verwaltet, was eine schnelle Prototypisierung und Validation ermöglicht. Obwohl unser Entwurf und die Implementation spezifisch auf die Hauptspeicherkomprimierung ausgerichtet sind, können die beschriebenen Konzepte auf alle Ebenen der Speicherhierarchie

Eth-28647-02.Pdf

Memory Interference Characterization Between CPU Cores and Integrated Gpus in Mixed-Criticality Platforms

A Cache-Efficient Sorting Algorithm for Database and Data Mining

How to Solve the Current Memory Access and Data Transfer Bottlenecks: at the Processor Architecture Or at the Compiler Level?

Prefetching for Complex Memory Access Patterns

Detecting False Sharing Efficiently and Effectively

A Hybrid Analytical DRAM Performance Model

Performance Analysis of Cache Memory

Classifying Memory Access Patterns for Prefetching

STEALTHMEM: System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud

Embedded Memory Architecture for Low-Power Application Processor

A Visual Performance Analysis Tool for Memory Bound GPU Kernels

Predicting Multiprocessor Memory Access Patterns with Learning Models