Proceedings of the Linux Symposium

Total Page:16

File Type:pdf, Size:1020Kb

Proceedings of the Linux Symposium Proceedings of the Linux Symposium Volume One June 27th–30th, 2007 Ottawa, Ontario Canada Contents The Price of Safety: Evaluating IOMMU Performance 9 Ben-Yehuda, Xenidis, Mostrows, Rister, Bruemmer, Van Doorn Linux on Cell Broadband Engine status update 21 Arnd Bergmann Linux Kernel Debugging on Google-sized clusters 29 M. Bligh, M. Desnoyers, & R. Schultz Ltrace Internals 41 Rodrigo Rubira Branco Evaluating effects of cache memory compression on embedded systems 53 Anderson Briglia, Allan Bezerra, Leonid Moiseichuk, & Nitin Gupta ACPI in Linux – Myths vs. Reality 65 Len Brown Cool Hand Linux – Handheld Thermal Extensions 75 Len Brown Asynchronous System Calls 81 Zach Brown Frysk 1, Kernel 0? 87 Andrew Cagney Keeping Kernel Performance from Regressions 93 T. Chen, L. Ananiev, and A. Tikhonov Breaking the Chains—Using LinuxBIOS to Liberate Embedded x86 Processors 103 J. Crouse, M. Jones, & R. Minnich GANESHA, a multi-usage with large cache NFSv4 server 113 P. Deniel, T. Leibovici, & J.-C. Lafoucrière Why Virtualization Fragmentation Sucks 125 Justin M. Forbes A New Network File System is Born: Comparison of SMB2, CIFS, and NFS 131 Steven French Supporting the Allocation of Large Contiguous Regions of Memory 141 Mel Gorman Kernel Scalability—Expanding the Horizon Beyond Fine Grain Locks 153 Corey Gough, Suresh Siddha, & Ken Chen Kdump: Smarter, Easier, Trustier 167 Vivek Goyal Using KVM to run Xen guests without Xen 179 R.A. Harper, A.N. Aliguori & M.D. Day Djprobe—Kernel probing with the smallest overhead 189 M. Hiramatsu and S. Oshima Desktop integration of Bluetooth 201 Marcel Holtmann How virtualization makes power management different 205 Yu Ke Ptrace, Utrace, Uprobes: Lightweight, Dynamic Tracing of User Apps 215 J. Keniston, A. Mavinakayanahalli, P. Panchamukhi, & V. Prasad kvm: the Linux Virtual Machine Monitor 225 A. Kivity, Y. Kamay, D. Laor, U. Lublin, & A. Liguori Linux Telephony 231 Paul P. Komkoff, A. Anikina, & R. Zhnichkov Linux Kernel Development 239 Greg Kroah-Hartman Implementing Democracy 245 Christopher James Lahey Extreme High Performance Computing or Why Microkernels Suck 251 Christoph Lameter Performance and Availability Characterization for Linux Servers 263 Linkov Koryakovskiy “Turning the Page” on Hugetlb Interfaces 277 Adam G. Litke Resource Management: Beancounters 285 Pavel Emelianov, Denis Lunev and Kirill Korotaev Manageable Virtual Appliances 293 D. Lutterkort Everything is a virtual filesystem: libferris 303 Ben Martin Conference Organizers Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering C. Craig Ross, Linux Symposium Review Committee Andrew J. Hutton, Steamballoon, Inc., Linux Symposium, Thin Lines Mountaineering Dirk Hohndel, Intel Martin Bligh, Google Gerrit Huizenga, IBM Dave Jones, Red Hat, Inc. C. Craig Ross, Linux Symposium Proceedings Formatting Team John W. Lockhart, Red Hat, Inc. Gurhan Ozen, Red Hat, Inc. John Feeney, Red Hat, Inc. Len DiMaggio, Red Hat, Inc. John Poelstra, Red Hat, Inc. Authors retain copyright to all submitted papers, but have granted unlimited redistribution rights to all as a condition of submission. The Price of Safety: Evaluating IOMMU Performance Muli Ben-Yehuda Jimi Xenidis Michal Ostrowski IBM Haifa Research Lab IBM Research IBM Research [email protected] [email protected] [email protected] Karl Rister Alexis Bruemmer Leendert Van Doorn IBM LTC IBM LTC AMD [email protected] [email protected] [email protected] Abstract 1 Introduction An I/O Memory Management Unit (IOMMU) creates one or more unique address spaces which can be used IOMMUs, IO Memory Management Units, are hard- to control how a DMA operation, initiated by a device, ware devices that translate device DMA addresses to accesses host memory. This functionality was originally machine addresses. An isolation capable IOMMU re- introduced to increase the addressability of a device or stricts a device so that it can only access parts of mem- bus, particularly when 64-bit host CPUs were being in- ory it has been explicitly granted access to. Isolation troduced while most devices were designed to operate capable IOMMUs perform a valuable system service by in a 32-bit world. The uses of IOMMUs were later ex- preventing rogue devices from performing errant or ma- tended to restrict the host memory pages that a device licious DMAs, thereby substantially increasing the sys- can actually access, thus providing an increased level of tem’s reliability and availability. Without an IOMMU isolation, protecting the system from user-level device a peripheral device could be programmed to overwrite drivers and eventually virtual machines. Unfortunately, any part of the system’s memory. Operating systems uti- this additional logic does impose a performance penalty. lize IOMMUs to isolate device drivers; hypervisors uti- The wide spread introduction of IOMMUs by Intel [1] lize IOMMUs to grant secure direct hardware access to and AMD [2] and the proliferation of virtual machines virtual machines. With the imminent publication of the will make IOMMUs a part of nearly every computer PCI-SIG’s IO Virtualization standard, as well as Intel system. There is no doubt with regards to the benefits and AMD’s introduction of isolation capable IOMMUs IOMMUs bring. but how much do they cost? We seek in all new servers, IOMMUs will become ubiquitous. to quantify, analyze, and eventually overcome the per- formance penalties inherent in the introduction of this new technology. Although they provide valuable services, IOMMUs can impose a performance penalty due to the extra memory accesses required to perform DMA operations. The ex- 1.1 IOMMU design act performance degradation depends on the IOMMU design, its caching architecture, the way it is pro- A broad description of current and future IOMMU grammed and the workload. This paper presents the hardware and software designs from various companies performance characteristics of the Calgary and DART can be found in the OLS ’06 paper entitled Utilizing IOMMUs in Linux, both on bare metal and in a hyper- IOMMUs for Virtualization in Linux and Xen [3]. The visor environment. The throughput and CPU utilization design of a system with an IOMMU can be broadly bro- of several IO workloads, with and without an IOMMU, ken down into the following areas: are measured and the results are analyzed. The poten- tial strategies for mitigating the IOMMU’s costs are then • IOMMU hardware architecture and design. discussed. In conclusion a set of optimizations and re- sulting performance improvements are presented. • Hardware ↔ software interfaces. • 9 • 10 • The Price of Safety: Evaluating IOMMU Performance • Pure software interfaces (e.g., between userspace IOMMU (which will be discussed later in detail) does and kernelspace or between kernelspace and hyper- not provide a software mechanism for invalidating a sin- visor). gle cache entry—one must flush the entire cache to in- validate an entry. We present a related optimization in Section 4. It should be noted that these areas can and do affect each other: the hardware/software interface can dictate some It should be mentioned that the PCI-SIG IOV (IO Vir- aspects of the pure software interfaces, and the hardware tualization) working group is working on an Address design dictates certain aspects of the hardware/software Translation Services (ATS) standard. ATS brings in an- interfaces. other level of caching, by defining how I/O endpoints (i.e., adapters) inter-operate with the IOMMU to cache This paper focuses on two different implementations translations on the adapter and communicate invalida- of the same IOMMU architecture that revolves around tion requests from the IOMMU to the adapter. This adds the basic concept of a Translation Control Entry (TCE). another level of complexity to the system, which needs TCEs are described in detail in Section 1.1.2. to be overcome in order to find the optimal caching strat- egy. 1.1.1 IOMMU hardware architecture and design 1.1.2 Hardware ↔ Software Interface Just as a CPU-MMU requires a TLB with a very high hit-rate in order to not impose an undue burden on the The main hardware/software interface in the TCE fam- system, so does an IOMMU require a translation cache ily of IOMMUs is the Translation Control Entry (TCE). to avoid excessive memory lookups. These translation TCEs are organized in TCE tables. TCE tables are anal- caches are commonly referred to as IOTLBs. ogous to page tables in an MMU, and TCEs are similar The performance of the system is affected by several to page table entries (PTEs). Each TCE identifies a 4KB cache-related factors: page of host memory and the access rights that the bus (or device) has to that page. The TCEs are arranged in a contiguous series of host memory pages that comprise • The cache size and associativity [13]. the TCE table. The TCE table creates a single unique IO address space (DMA address space) for all the devices • The cache replacement policy. that share it. • The cache invalidation mechanism and the fre- The translation from a DMA address to a host mem- quency and cost of invalidations. ory address occurs by computing an index into the TCE table by simply extracting the page number from the The optimal cache replacement policy for an IOTLB DMA address. The index is used to compute a direct is probably significantly different than for an MMU- offset into the TCE table that results in a TCE that trans- TLB. MMU-TLBs rely on spatial and temporal locality lates that IO page. The access control bits are then used to achieve a very high hit-rate. DMA addresses from de- to validate both the translation and the access rights to vices, however, do not necessarily have temporal or spa- the host memory page. Finally, the translation is used by tial locality. Consider for example a NIC which DMAs the bus to direct a DMA transaction to a specific location received packets directly into application buffers: pack- in host memory.
Recommended publications
  • Implementation of a Linux Kernel Module to Stress Test Memory Subsystems in X86 Architecture
    Robert Taylor Implementation of a Linux Kernel Module to Stress Test Memory Subsystems in x86 Architecture Helsinki Metropolia University of Applied Sciences Bachelor of Engineering Information Technology 27th February 2011 Abstract Author(s) Robert Taylor Title Implementation of a Linux Kernel Module to Stress Test Memory Subsystems in x86 Architecture Number of Pages 36 pages Date 27 February 2011 Degree Bachelor of Engineering Degree Programme Information Technology Specialisation option Software Engineering Antti Piironen, Dr (Project Supervisor) Instructors Janne Kilpelainen (Hardware Supervisor, NSN) Antti Virtaniemi (Software Supervisor, NSN) The aim of the project was to implement software which would stress test the memory subsystem of x86 based hardware for Nokia Siemens Networks (NSN). The software was to be used in the validation process of memory components during hardware develop- ment. NSN used a number of software tools during this process. However, none met all their requirements and a custom piece of software was required. NSN produced a detailed requirements specification which formed the basis of the project. The requirements left the implementation method open to the developer. After analysing the requirements document, a feasibility study was undertaken to determine the most fitting method of implementation. The feasibility study involved detailed discussions with senior engineers and architects within both the hardware and software fields at NSN. The study concluded that a Linux kernel module met the criteria best. The software was successfully implemented as a kernel module and met the majority of the requirements set by NSN. The software is currently in use at NSN and is being actively updated and maintained.
    [Show full text]
  • Kutrace: Where Have All the Nanoseconds Gone?
    KUTrace: Where have all the nanoseconds gone? Richard Sites, Invited Professor EPFL 2017.10.27 Tracing Summit 2017, Prague Richard Sites 2017.10.27 Outline Observability problem statement KUTrace solution Tracing design goals Goals drive the design Results Display Design Comparisons Conclusions Richard Sites 2017.10.27 Observability problem statement Richard Sites 2017.10.27 3 Problem statement This talk is about tail latency in real user-facing datacenter transactions. It is not about batch processing throughput, nor about benchmarks. Context: A datacenter of perhaps 20,000 servers running software services that spread work for a user-facing transaction across a few hundred or thousand machines in parallel. Each server handles hundreds of transactions per second. Some transactions are unusually slow, but not repeatably. Slow transactions occur unpredictably, but there are several per minute. We wish to observe where all the time goes in such transactions, and observe why they are slow. Richard Sites 2017.10.27 Problem statement ☞ Some transactions are unusually slow, but not repeatably. ∴ There is some source of interference just before or during a slow transaction. Understanding tail latency requires complete traces of CPU events over a few minutes, with small enough CPU and memory overhead to be usable under busiest-hour live load. Existing tracing tools have much-too-high overhead. Problem: build better tail-latency observation tools Richard Sites 2017.10.27 KUTrace solution Richard Sites 2017.10.27 6 KUTrace solution KUTrace uses minimal Linux kernel patches on a single server to trace every transition between kernel- and user-mode execution, on every CPU core, with small enough overhead to use routinely on live loads.
    [Show full text]
  • Lguest a Journey of Learning the Linux Kernel Internals
    Lguest A journey of learning the Linux kernel internals Daniel Baluta <[email protected]> What is lguest? ● “Lguest is an adventure, with you, the reader, as a Hero” ● Minimal 32-bit x86 hypervisor ● Introduced in 2.6.23 (October, 2007) by Rusty Russel & co ● Around 6000 lines of code ● “Relatively” “easy” to understand and hack on ● Support to learn linux kernel internals ● Porting to other architectures is a challenging task Hypervisor types make Preparation ● Guest ○ Life of a guest ● Drivers ○ virtio devices: console, block, net ● Launcher ○ User space program that sets up and configures Guests ● Host ○ Normal linux kernel + lg.ko kernel module ● Switcher ○ Low level Host <-> Guest switch ● Mastery ○ What’s next? Lguest overview make Guest ● “Simple creature, identical to the Host [..] behaving in simplified ways” ○ Same kernel as Host (or at least, built from the same source) ● booting trace (when using bzImage) ○ arch/x86/boot/compressed/head_32.S: startup_32 ■ uncompresses the kernel ○ arch/x86/kernel/head_32.S : startup_32 ■ initialization , checks hardware_subarch field ○ arch/x86/lguest/head_32.S ■ LGUEST_INIT hypercall, jumps to lguest_init ○ arch/x86/lguest/boot.c ■ once we are here we know we are a guest ■ override privileged instructions ■ boot as normal kernel, calling i386_start_kernel Hypercalls struct lguest_data ● second communication method between Host and Guest ● LHCALL_GUEST_INIT ○ hypercall to tell the Host where lguest_data struct is ● both Guest and Host publish information here ● optimize hypercalls ○ irq_enabled
    [Show full text]
  • Drilling Network Stacks with Packetdrill
    Drilling Network Stacks with packetdrill NEAL CARDWELL AND BARATH RAGHAVAN Neal Cardwell received an M.S. esting and troubleshooting network protocols and stacks can be in Computer Science from the painstaking. To ease this process, our team built packetdrill, a tool University of Washington, with that lets you write precise scripts to test entire network stacks, from research focused on TCP and T the system call layer down to the NIC hardware. packetdrill scripts use a Web performance. He joined familiar syntax and run in seconds, making them easy to use during develop- Google in 2002. Since then he has worked on networking software for google.com, the ment, debugging, and regression testing, and for learning and investigation. Googlebot web crawler, the network stack in Have you ever had the experience of staring at a long network trace, trying to figure out what the Linux kernel, and TCP performance and on earth went wrong? When a network protocol is not working right, how might you find the testing. [email protected] problem and fix it? Although tools like tcpdump allow us to peek under the hood, and tools like netperf help measure networks end-to-end, reproducing behavior is still hard, and know- Barath Raghavan received a ing when an issue has been fixed is even harder. Ph.D. in Computer Science from UC San Diego and a B.S. from These are the exact problems that our team used to encounter on a regular basis during UC Berkeley. He joined Google kernel network stack development. Here we describe packetdrill, which we built to enable in 2012 and was previously a scriptable network stack testing.
    [Show full text]
  • Lguest the Little Hypervisor
    Lguest the little hypervisor Matías Zabaljáuregui [email protected] based on lguest documentation written by Rusty Russell note: this is a draft and incomplete version lguest introduction launcher guest kernel host kernel switcher don't have time for: virtio virtual devices patching technique async hypercalls rewriting technique guest virtual interrupt controller guest virtual clock hw assisted vs paravirtualization hybrid virtualization benchmarks introduction A hypervisor allows multiple Operating Systems to run on a single machine. lguest paravirtualization within linux kernel proof of concept (paravirt_ops, virtio...) teaching tool guests and hosts A module (lg.ko) allows us to run other Linux kernels the same way we'd run processes. We only run specially modified Guests. Setting CONFIG_LGUEST_GUEST, compiles [arch/x86/lguest/boot.c] into the kernel so it knows how to be a Guest at boot time. This means that you can use the same kernel you boot normally (ie. as a Host) as a Guest. These Guests know that they cannot do privileged operations, such as disable interrupts, and that they have to ask the Host to do such things explicitly. Some of the replacements for such low-level native hardware operations call the Host through a "hypercall". [ drivers/lguest/hypercalls.c ] high level overview launcher /dev/lguest guest kernel host kernel switcher Launcher main() some implementations /dev/lguest write → initialize read → run guest Launcher main [ Documentation/lguest/lguest.c ] The Launcher is the Host userspace program which sets up, runs and services the Guest. Just to confuse you: to the Host kernel, the Launcher *is* the Guest and we shall see more of that later.
    [Show full text]
  • Debugging Kernel Problems
    Debugging Kernel Problems by GregLehey Edition for AsiaBSDCon 2004 Taipei, 13 March 2004 Debugging Kernel Problems by GregLehey([email protected]) Copyright © 1995-2004 GregLehey 3Debugging Kernel Problems Preface Debugging kernel problems is a black art. Not manypeople do it, and documentation is rare, in- accurate and incomplete. This document is no exception: faced with the choice of accuracyand completeness, I chose to attempt the latter.Asusual, time was the limiting factor,and this draft is still in beta status. This is a typical situation for the whole topic of kernel debugging: building debug tools and documentation is expensive,and the people who write them are also the people who use them, so there'satendencytobuild as much of the tool as necessary to do the job at hand. If the tool is well-written, it will be reusable by the next person who looks at a particular area; if not, it might fall into disuse. Consider this book a starting point for your own develop- ment of debugging tools, and remember: more than anywhere else, this is an area with ``some as- sembly required''. Debugging Kernel Problems 4 1 Introduction Operating systems fail. All operating systems contain bugs, and theywill sometimes cause the system to behave incorrectly.The BSD kernels are no exception. Compared to most other oper- ating systems, both free and commercial, the BSD kernels offer a large number of debugging tools. This tutorial examines the options available both to the experienced end user and also to the developer. In this tutorial, we’ll look at the following topics: • Howand whykernels fail.
    [Show full text]
  • Techniques for Improving the Performance of Software Transactional Memory
    Techniques for Improving the Performance of Software Transactional Memory Srdan Stipi´c Department of Computer Architecture Universitat Polit`ecnicade Catalunya A thesis submitted for the degree of Doctor of Philosophy in Computer Architecture July, 2014 Advisor: Adri´anCristal Co-Advisor: Osman S. Unsal Tutor: Mateo Valero Curso académico: Acta de calificación de tesis doctoral Nombre y apellidos Programa de doctorado Unidad estructural responsable del programa Resolución del Tribunal Reunido el Tribunal designado a tal efecto, el doctorando / la doctoranda expone el tema de la su tesis doctoral titulada ____________________________________________________________________________________ __________________________________________________________________________________________. Acabada la lectura y después de dar respuesta a las cuestiones formuladas por los miembros titulares del tribunal, éste otorga la calificación: NO APTO APROBADO NOTABLE SOBRESALIENTE (Nombre, apellidos y firma) (Nombre, apellidos y firma) Presidente/a Secretario/a (Nombre, apellidos y firma) (Nombre, apellidos y firma) (Nombre, apellidos y firma) Vocal Vocal Vocal ______________________, _______ de __________________ de _______________ El resultado del escrutinio de los votos emitidos por los miembros titulares del tribunal, efectuado por la Escuela de Doctorado, a instancia de la Comisión de Doctorado de la UPC, otorga la MENCIÓN CUM LAUDE: SÍ NO (Nombre, apellidos y firma) (Nombre, apellidos y firma) Presidente de la Comisión Permanente de la Escuela de Secretaria de la Comisión Permanente de la Escuela de Doctorado Doctorado Barcelona a _______ de ____________________ de __________ To my parents. Acknowledgements I am thankful to a lot of people without whom I would not have been able to complete my PhD studies. While it is not possible to make an exhaustive list of names, I would like to mention a few.
    [Show full text]
  • Chapter 1. Origins of Mac OS X
    1 Chapter 1. Origins of Mac OS X "Most ideas come from previous ideas." Alan Curtis Kay The Mac OS X operating system represents a rather successful coming together of paradigms, ideologies, and technologies that have often resisted each other in the past. A good example is the cordial relationship that exists between the command-line and graphical interfaces in Mac OS X. The system is a result of the trials and tribulations of Apple and NeXT, as well as their user and developer communities. Mac OS X exemplifies how a capable system can result from the direct or indirect efforts of corporations, academic and research communities, the Open Source and Free Software movements, and, of course, individuals. Apple has been around since 1976, and many accounts of its history have been told. If the story of Apple as a company is fascinating, so is the technical history of Apple's operating systems. In this chapter,[1] we will trace the history of Mac OS X, discussing several technologies whose confluence eventually led to the modern-day Apple operating system. [1] This book's accompanying web site (www.osxbook.com) provides a more detailed technical history of all of Apple's operating systems. 1 2 2 1 1.1. Apple's Quest for the[2] Operating System [2] Whereas the word "the" is used here to designate prominence and desirability, it is an interesting coincidence that "THE" was the name of a multiprogramming system described by Edsger W. Dijkstra in a 1968 paper. It was March 1988. The Macintosh had been around for four years.
    [Show full text]
  • Optimization of System's Performance with Kernel Tracing by Cohort Intelligence
    I.J. Information Technology and Computer Science, 2017, 6, 59-66 Published Online June 2017 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2017.06.08 Optimization of System’s Performance with Kernel Tracing by Cohort Intelligence Aniket B. Tate Dept. of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, 411048, India E-mail: [email protected] Laxmi A. Bewoor Dept. of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, 411048, India E-mail: [email protected] Abstract—Linux tracing tools are used to record the execution time etc. which shows better results. While events running in the background on the system. But scheduling a process on one of the cores, the scheduler these tools lack to analyze the log data. In the field of considers the average waiting time, turnaround time, time Artificial Intelligence Cohort Intelligence (CI) is recently quantum for a process, number of context switches, proposed technique, which works on the principle of self- earliness, the tardiness of process etc. But the scheduler learning within a cohort. This paper presents an approach does not take CPU load into consideration. As a result of to optimize the performance of the system by tracing the this, the cores get unevenly loaded and many of the cores system, then extract the information from trace data and will be kept in ideal state. Cohort Intelligence (CI) is pass it to cohort intelligence algorithm. The output of recently introduced meta-heuristics [6][7] it works on cohort intelligence algorithm shows, how the load of the self-supervised learning behavior in a cohort.
    [Show full text]
  • ARM Assembly Shellcode from Zero to ARM Assembly Bind Shellcode
    Lab: ARM Assembly Shellcode From Zero to ARM Assembly Bind Shellcode HITBSecConf2018 - Amsterdam 1 Learning Objectives • ARM assembly basics • Writing ARM Shellcode • Registers • System functions • Most common instructions • Mapping out parameters • ARM vs. Thumb • Translating to Assembly • Load and Store • De-Nullification • Literal Pool • Execve() shell • PC-relative Addressing • Reverse Shell • Branches • Bind Shell HITBSecConf2018 - Amsterdam 2 Outline – 120 minutes • ARM assembly basics • Reverse Shell • 15 – 20 minutes • 3 functions • Shellcoding steps: execve • For each: • 10 minutes • 10 minutes exercise • Getting ready for practical part • 5 minutes solution • 5 minutes • Buffer[10] • Bind Shell • 3 functions • 25 minutes exercise HITBSecConf2018 - Amsterdam 3 Mobile and Iot bla bla HITBSecConf2018 - Amsterdam 4 It‘s getting interesting… HITBSecConf2018 - Amsterdam 5 Benefits of Learning ARM Assembly • Reverse Engineering binaries on… • Phones? • Routers? • Cars? • Intel x86 is nice but.. • Internet of Things? • Knowing ARM assembly allows you to • MACBOOKS?? dig into and have fun with various • SERVERS?? different device types HITBSecConf2018 - Amsterdam 6 Benefits of writing ARM Shellcode • Writing your own assembly helps you to understand assembly • How functions work • How function parameters are handled • How to translate functions to assembly for any purpose • Learn it once and know how to write your own variations • For exploit development and vulnerability research • You can brag that you can write your own shellcode instead
    [Show full text]
  • Hardware-Driven Evolution in Storage Software by Zev Weiss A
    Hardware-Driven Evolution in Storage Software by Zev Weiss A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2018 Date of final oral examination: June 8, 2018 ii The dissertation is approved by the following members of the Final Oral Committee: Andrea C. Arpaci-Dusseau, Professor, Computer Sciences Remzi H. Arpaci-Dusseau, Professor, Computer Sciences Michael M. Swift, Professor, Computer Sciences Karthikeyan Sankaralingam, Professor, Computer Sciences Johannes Wallmann, Associate Professor, Mead Witter School of Music i © Copyright by Zev Weiss 2018 All Rights Reserved ii To my parents, for their endless support, and my cousin Charlie, one of the kindest people I’ve ever known. iii Acknowledgments I have taken what might be politely called a “scenic route” of sorts through grad school. While Ph.D. students more focused on a rapid graduation turnaround time might find this regrettable, I am glad to have done so, in part because it has afforded me the opportunities to meet and work with so many excellent people along the way. I owe debts of gratitude to a large cast of characters: To my advisors, Andrea and Remzi Arpaci-Dusseau. It is one of the most common pieces of wisdom imparted on incoming grad students that one’s relationship with one’s advisor (or advisors) is perhaps the single most important factor in whether these years of your life will be pleasant or unpleasant, and I feel exceptionally fortunate to have ended up iv with the advisors that I’ve had.
    [Show full text]
  • Container-Based Virtualization for Byte-Addressable NVM Data Storage
    2016 IEEE International Conference on Big Data (Big Data) Container-Based Virtualization for Byte-Addressable NVM Data Storage Ellis R. Giles Rice University Houston, Texas [email protected] Abstract—Container based virtualization is rapidly growing Storage Class Memory, or SCM, is an exciting new in popularity for cloud deployments and applications as a memory technology with the potential of replacing hard virtualization alternative due to the ease of deployment cou- drives and SSDs as it offers high-speed, byte-addressable pled with high-performance. Emerging byte-addressable, non- volatile memories, commonly called Storage Class Memory or persistence on the main memory bus. Several technologies SCM, technologies are promising both byte-addressability and are currently under research and development, each with dif- persistence near DRAM speeds operating on the main memory ferent performance, durability, and capacity characteristics. bus. These new memory alternatives open up a new realm of These include a ReRAM by Micron and Sony, a slower, but applications that no longer have to rely on slow, block-based very large capacity Phase Change Memory or PCM by Mi- persistence, but can rather operate directly on persistent data using ordinary loads and stores through the cache hierarchy cron and others, and a fast, smaller spin-torque ST-MRAM coupled with transaction techniques. by Everspin. High-speed, byte-addressable persistence will However, SCM presents a new challenge for container-based give rise to new applications that no longer have to rely on applications, which typically access persistent data through slow, block based storage devices and to serialize data for layers of block based file isolation.
    [Show full text]