Porting Cilk to the Barrelfish OS

Total Page:16

File Type:pdf, Size:1020Kb

Porting Cilk to the Barrelfish OS Porting Cilk to the Barrelfish OS CHAU HO BAO LE KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2013 TRITA-ICT-EX-2013:66 KTH Royal Institute of Technology Dept. of Software and Computer Systems Degree project for the degree of Master of Science in Information and Communications Technology Porting Cilk to the Barrelfish OS Author: Chau Ho Bao Le Supervisor: Georgios Varisteas, MSc Examiner: Prof. Mats Brorsson, KTH, Sweden Abstract Barrelfish operating system is an experimental instance of multikernel structure which exhibits good features such as hardware heterogeneity, scalability, dynamicity, etc. Bar- relfish is in progress and lacks applications. Therefore, there is a need to investigate the efficiency of applications running in Barrelfish and one of candidates is a shared-memory application. To conduct an empirical study, Cilk is chosen inasmuch as its runtime li- brary is designed for shared-memory architectures and it has been known to expose good performance. This thesis focuses on making Cilk run on top of Barrelfish in order to reach two goals: portability which is described to be supported by Barrelfish, and good speed afterwards. The porting involves compiling Cilk runtime source code by replacing its pthread subroutines with set of APIs in Barrelfish and then changing the way Cilk scheduler spawns worker thread on multiple cores. However, the main point of the porting is to make different cores access to the same virtual address space. Luckily, Barrelfish provides a notion of domain which specifies the number of cores in an application so that these cores can share the same memory space. This thesis also has carried out benchmarks on some Cilk programs and found that Cilk does not perform as well as it is expected. In addition measurements on parallel workers shows that Cilk on Barrelfish takes more cycles to perform computation. Although Cilk still maintains work-first principle, it cannot achieve the time bound. The spanning domain cost is proportional to the number of cores, but it will matter if applications take small time to complete. Key words: Barrelfish, Cilk, porting, multikernel, shared-memory, work-stealing, mes- sage passing Acknowledgment I would like to show my gratitude to Professor Mats Brorsson who has helped me by giving such good advice at the very first steps of this thesis and has been patient with my silly issues. This thesis has received a significant supervision from PhD. Researcher Georgios Varis- teas. I am deeply thankful to him because he has given me suggestions, instructions as well as his experience is this field. Finally, the special thanks go to my family and friends for their love and supports through the duration of my studies. Stockholm, March 20, 2013 Chau Ho Bao Le v Contents 1 Introduction 1 1.1 Overview . .1 1.2 Problem statement . .1 1.3 Related work . .2 1.4 Report layout . .2 2 Background 3 2.1 OSes on multiple processors . .3 2.1.1 Factored Operating System (fos) . .3 2.1.2 Tessellation . .3 2.1.3 Barrelfish . .4 2.2 Parallel programming models . .4 2.2.1 Shared memory . .4 2.2.1.1 Task-centric or task-based model . .4 2.2.1.2 Explicit threading . .5 2.2.2 Message passing . .5 2.3 Porting . .5 3 Barrelfish OS 7 3.1 Introduction . .7 3.1.1 Overview . .7 3.1.2 Multikernel structure . .7 3.2 Conceptions and Notions . .8 3.3 Building, Compiling and Booting . 10 3.3.1 Building . 10 3.3.2 Compiling . 11 3.3.3 Booting . 11 3.4 Summary . 12 4 Cilk 13 4.1 Brief Overview . 13 4.2 Compiling . 15 4.2.1 Compilation process . 15 4.2.2 Compilation strategy . 15 4.3 Scheduling . 16 4.3.1 Work-stealing scheduler . 16 vii 4.3.2 Implementation . 16 4.4 Summary . 19 5 Porting Cilk to Barrelfish 21 5.1 Challenges . 21 5.2 Multithreaded Model . 22 5.2.1 Cilk on the original platform . 22 5.2.2 Cilk on Barrelfish OS . 23 5.3 Modifications on Cilk . 26 5.3.1 Compile time . 26 5.3.2 Runtime . 27 5.4 Modifications on Barrelfish . 29 5.4.1 Hake . 29 5.4.2 Makefile . 29 5.5 Summary . 30 6 Benchmarks 31 6.1 Environment settings . 31 6.2 Measurements . 32 6.2.1 Measurements of serial applications . 32 6.2.2 Measurements of Cilk applications . 32 6.3 Experiments . 33 6.4 Evaluation . 53 7 Conclusion 55 7.1 Contribution . 55 7.2 Future Work . 56 viii List of Figures 3.1 The multikernel structure . .8 3.2 Barrelfish structure . .9 4.1 A serial C and a Cilk program to compute the nth Fibonacci number . 14 4.2 The Cilk dag computes the 3th Fibonacci number . 14 4.3 The compilation process of a Fibonacci program . 15 4.4 Runtime data structures of a deque . 17 4.5 Interactions between thief and worker in the three cases . 18 5.1 Model of shared-memory scheduler in Cilk . 23 5.2 Multithreaded model of Cilk scheduler in its original platform . 23 5.3 The model of Cilk scheduler in Barrelfish . 24 5.4 Model of Cilk scheduler in Barrelfish with a domain . 25 5.5 Multithreading model of Cilk scheduler in Barrelfish . 25 5.6 Compilation progress of a Cilk program in Barrelfish . 26 5.7 wait/notify mechanism to replace POSIX create and join ....... 28 6.1 Cilk application invokes the runtime library . 32 6.2 Spanning domain overheads over cores . 33 6.3 Comparison of TW of cilksort on Barrelfish and Linux . 36 6.4 Speedup vs. serial versions of cilksort ................... 36 6.5 Thread distribution over 8 cores of cilksort ................ 37 6.6 Comparison of TW of FFT on Barrelfish and Linux . 39 6.7 Speedup vs. serial versions of FFT ..................... 39 6.8 Thread distribution over 8 cores of FFT .................. 40 6.9 Comparison of TW of fib on Barrelfish and Linux . 42 6.10 Speedup vs. serial versions of fib ...................... 42 6.11 Thread distribution over 8 cores of fib ................... 43 6.12 Comparison of TW of LU on Barrelfish and Linux . 45 6.13 Speedup vs. serial versions of LU ...................... 45 6.14 Thread distribution over 8 cores of LU ................... 46 6.15 Comparison of TW of matmul on Barrelfish and Linux . 48 6.16 Speedup vs. serial versions of matmul ................... 48 6.17 Thread distribution over 8 cores of matmul ................ 49 6.18 Comparison of TW of strassen on Barrelfish and Linux . 51 6.19 Speedup vs. serial versions of strassen ................... 51 6.20 Thread distribution over 8 cores of strassen ................ 52 ix List of Tables 6.1 Hardware configurations of the virtual machine . 31 6.2 Spanning domain cost . 33 6.3 Execution time of 6 serial Cilk programs . 34 6.4 Measurements of cilksort on Barrelfish and Linux . 35 6.5 Number of steals with 8 workers of cilksort ................ 37 6.6 Number of threads spawned in 8 workers of cilksort ........... 37 6.7 Measurements of FFT on Barrelfish and Linux . 38 6.8 Number of steals with 8 workers of FFT .................. 40 6.9 Number of threads spawned in 8 workers of FFT ............. 40 6.10 Measurements of fib on Barrelfish and Linux . 41 6.11 Number of steals with 8 workers of fib ................... 43 6.12 Number of threads spawned in 8 workers of fib .............. 43 6.13 Measurements of LU on Barrelfish and Linux . 44 6.14 Number of steals with 8 workers of LU ................... 46 6.15 Number of threads spawned in 8 workers of LU .............. 46 6.16 Measurements of matmul on Barrelfish and Linux . 47 6.17 Number of steals with 8 workers of matmul ................ 49 6.18 Number of threads spawned in 8 workers of matmul ........... 49 6.19 Measurements of strassen on Barrelfish and Linux . 50 6.20 Number of steals with 8 workers of strassen ................ 52 6.21 Number of threads spawned in 8 workers of strassen ........... 52 x Chapter 1 1 Introduction 1.1 Overview With the advancement of technology, computer hardware has developed for years so that it have changed computers from big machines to small ones, altered processors from single-core to many-core and made them more diverse. Different new processors (cores) and heterogeneous hardware have led to the demand for scalable operating sys- tems (OSes) which adapt to operate on such environments; therefore the multikernel architecture [8] concept has arisen as one of OSes for scalable parallel computers. In this architecture, an OS is considered to be distributed, hence it obviously inherits ben- efits from a distributed system like heterogeneity, large-scale ability, less communication latency, and etc. Barrelfish OS is an instance of the multikernel model. When the von-Neumann model for sequential programming is not appropriate to HPC, parallel programming model has emerged to exploit underlying hardware and scalable OSes in which programmers can make use of parallelism of programming language at high level. One well-known model is task-centric model which uses shared-memory to interact be- tween threads. It is known that multithreaded and shared-memory programming models have exposed good performance on multicore machines with operating systems that they have been developed for. One of the representatives of this model is Cilk [19]. In Cilk, an execution unit is a task which is distributed across cores through a work-stealing scheduler designed for shared-memory machines. In order to get benefits from a scalable OS and a parallel programming model, there is an idea to combine these two factors into one paradigm, that is running a multi-threaded share-memory application in a multiker- nel.
Recommended publications
  • Hardware Configuration with Dynamically-Queried Formal Models
    Master’s Thesis Nr. 180 Systems Group, Department of Computer Science, ETH Zurich Hardware Configuration With Dynamically-Queried Formal Models by Daniel Schwyn Supervised by Reto Acherman Dr. David Cock Prof. Dr. Timothy Roscoe April 2017 – October 2017 Abstract Hardware is getting increasingly complex and heterogeneous. With different compo- nents having different views of the system, the traditional assumption of unique phys- ical addresses has become an illusion. To adapt to such hardware, an operating system (OS) needs to understand the complex address translation chains and be able to handle the presence of multiple address spaces. This thesis takes a recently proposed model that formally captures these aspects and applies it to hardware configuration in the Barrelfish OS. To this end, I present Sockeye, a domain specific language that uses the model to de- scribe hardware. I then show, that code relying on knowledge about the address spaces in a system can be statically generated from these specifications. Furthermore, the model is successfully applied to device management, showing that it can also be used to configure hardware at runtime. The implementation presented here does not rely on any platform specific code and it reduced the amount of such code in Barrelfish’s device manager by over 30%. Applying the model to further hardware configuration tasks is expected to have similar effects. i Acknowledgements First of all, I want to thank my advisers Timothy Roscoe, David Cock and Reto Acher- mann for their guidance and support. Their feedback and critical questions were a valuable contribution to this thesis. I’d also like to thank the rest of the Barrelfish team and other members of the Systems Group at ETH for the interesting discussions during meetings and coffee breaks.
    [Show full text]
  • Message Passing for Programming Languages and Operating Systems
    Master’s Thesis Nr. 141 Systems Group, Department of Computer Science, ETH Zurich Message Passing for Programming Languages and Operating Systems by Martynas Pumputis Supervised by Prof. Dr. Timothy Roscoe, Dr. Antonios Kornilios Kourtis, Dr. David Cock October 7, 2015 Abstract Message passing as a mean of communication has been gaining popu- larity within domains of concurrent programming languages and oper- ating systems. In this thesis, we discuss how message passing languages can be ap- plied in the context of operating systems which are heavily based on this form of communication. In particular, we port the Go program- ming language to the Barrelfish OS and integrate the Go communi- cation channels with the messaging infrastructure of Barrelfish. We show that the outcome of the porting and the integration allows us to implement OS services that can take advantage of the easy-to-use concurrency model of Go. Our evaluation based on LevelDB benchmarks shows comparable per- formance to the Linux port. Meanwhile, the overhead of the messag- ing integration causes the poor performance when compared to the native messaging of Barrelfish, but exposes an easier to use interface, as shown by the example code. i Acknowledgments First of all, I would like to thank Timothy Roscoe, Antonios Kornilios Kourtis and David Cock for giving the opportunity to work on the Barrelfish OS project, their supervision, inspirational thoughts and critique. Next, I would like to thank the Barrelfish team for the discussions and the help. In addition, I would like to thank Sebastian Wicki for the conversations we had during the entire period of my Master’s studies.
    [Show full text]
  • The Barrelfish Multi- Kernel: an Interview with Timothy Roscoe
    INCREASING CPU PERFORMANCE WITH faster clock speeds and ever more complex RIK FARROW hardware for pipelining and memory ac- cess has hit the brick walls of power and the Barrelfish multi- bandwidth. Multicore CPUs provide the way forward but also present obstacles to using kernel: an interview existing operating systems design as they with Timothy Roscoe scale upwards. Barrelfish represents an ex- perimental operating system design where Timothy Roscoe is part of the ETH Zürich Computer early versions run faster than Linux on the Science Department’s Systems Group. His main research areas are operating systems, distributed same hardware, with a design that should systems, and networking, with some critical theory on the side. scale well to systems with many cores and [email protected] even different CPU architectures. Barrelfish explores the design of a multikernel Rik Farrow is the Editor of ;login:. operating system, one designed to run non-shared [email protected] copies of key kernel data structures. Popular cur- rent operating systems, such as Windows and Linux, use a single, shared operating system image even when running on multiple-core CPUs as well as on motherboard designs with multiple CPUs. These monolithic kernels rely on cache coherency to protect shared data. Multikernels each have their own copy of key data structures and use message passing to maintain the correctness of each copy. In their SOSP 2009 paper [1], Baumann et al. describe their experiences in building and bench- marking Barrelfish on a variety of Intel and AMD systems ranging from four to 32 cores. When these systems run Linux or Windows, they rely on cache coherency mechanisms to maintain a single image of the operating system.
    [Show full text]
  • Message Passing and Bulk Transport on Heterogeneous Multiprocessors
    Master’s Thesis Nr. 118 Systems Group, Department of Computer Science, ETH Zurich Message Passing and Bulk Transport on Heterogeneous Multiprocessors by Reto Achermann Supervised by Prof. Timothy Roscoe, Dr. Kornilios Kourtis October 17, 2014 Abstract The decade of symmetric multiprocessors is about to end. Due to physical limitations, individual cores cannot be made any faster and adding cores won’t work anymore (dark silicon). Operating systems face the next big paradigm shift: systems are getting faster by exploiting specialized hardware leading to the era of asymmetric processors and heterogeneous systems. An ever growing count and the diversity of cores yields new challenges regarding scheduling and resource management. Multiple physical address spaces within single machines let them appear as a cluster-like system. Today’s commodity operating systems are not well designed to deal with het- erogeneity and asymmetric processors. Many attempts have been made to tackle the challenges of heterogeneous hardware, most of which treat avail- able co-processors like devices or independent execution environments rather than equivalent processors. Thus, true support of a multi-architecture system is rarely seen, despite the fact that such hardware already exists. Examples are, systems like the OMAP44xx SoC or Intel’s Xeon Phi co-processor. In this thesis we will elaborate the arising challenges introduced by heteroge- neous system architectures. We ported the Barrelfish research operating system to the Xeon Phi execution environment and explored the impact on perfor- mance and system design imposed by the emerging hardware characteristics such as different instruction set architectures, multiple physical address spaces and asymmetric cores.
    [Show full text]
  • A Tool for Modeling, Viewing, and Checking Distributed System
    ;login FALL 2016 VOL. 41, NO. 3 : & POSIX: The Old, the New, and the Missing Vaggelis Atlidakis, Jeremy Andrus, Roxana Geambasu, Dimitris Mitropoulos, and Jason Nieh & Runway: A Tool for Modeling, Viewing, and Checking Distributed Systems Diego Ongaro & Create a Threat Model for Your Organization Bruce Potter & Eliminating Toil Betsy Beyer, Brendan Gleason, Dave O’Connor, and Vivek Rau Columns Separating Protocol Implementations from Transport in Python David Beazley Shutting Down Go Servers with Manners Kelsey Hightower Using Consul for Distributed Key-Value Stores David N. Blank-Edelman Monitoring Paging Trauma Dave Josephsen Making Security into a Science Dan Geer Distributed Systems Robert G. Ferrell UPCOMING EVENTS OSDI ’16: 12th USENIX Symposium on Operating Enigma 2017 Systems Design and Implementation January 30–February 1, 2017, Oakland, CA, USA Sponsored by USENIX in cooperation with ACM SIGOPS enigma.usenix.org November 2–4, 2016, Savannah, GA, USA www.usenix.org/osdi16 FAST ’17: 15th USENIX Conference on File and Co-located with OSDI ’16 Storage Technologies INFLOW ’16: 4th Workshop on Interactions of NVM/Flash Sponsored by USENIX in cooperation with ACM SIGOPS with Operating Systems and Workloads February 27–March 2, 2017, Santa Clara, CA, USA November 1, 2016 Submissions due September 27, 2016 www.usenix.org/inflow16 www.usenix.org/fast17 LISA16 SREcon17 December 4–9, 2016, Boston, MA, USA March 13–14, 2017, San Francisco, CA, USA www.usenix.org/lisa16 Co-located with LISA16 NSDI ’17: 14th USENIX Symposium on Networked SESA
    [Show full text]
  • Customized OS Kernel for Data-Processing on Modern Hardware
    Imperial College London MEng Computing Individual Project Customized OS kernel for data-processing on modern hardware Author: Supervisor: Daniel Grumberg Dr. Jana Giceva An Individual Project Report submitted in fulfillment of the requirements for the degree of MEng Computing in the Department of Computing June 18, 2018 iii Abstract Daniel Grumberg Customized OS kernel for data-processing on modern hardware The end of Moore’s Law shows that traditional processor design has hit its peak and that it cannot yield new major performance improvements. As an answer, computer architecture is turning towards domain-specific solutions in which the hardware’s properties are tailored to specific workloads. An example of this new trend is the Xeon Phi accelerator card which aims to bridge the gap between modern CPUs and GPUs. Commodity operating systems are not well equipped to leverage these ad- vancements. Most systems treat accelerators as they would an I/O device, or as an entirely separate system. Developers need to craft their algorithms to target the new device and to leverage its properties. However, transferring data between compu- tational devices is very costly, so programmers must also carefully specify all the individual memory transfers between the different execution environments to avoid unnecessary costs. This proves to be a complex task and software engineers often need to specialise and complicate their code in order to implement optimal memory transfer patterns. This project analyses the features of main-memory hash join algorithms, that are used in relational databases for join operations. Specifically, we explore the relation- ship between the main hash join algorithms and the hardware properties of Xeon Phi cards.
    [Show full text]
  • Baumann: the Multikernel: a New OS Architecture for Scalable Multicore
    The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann,∗ Paul Barham,y Pierre-Evariste Dagand,z Tim Harris,y Rebecca Isaacs,y Simon Peter,∗ Timothy Roscoe,∗ Adrian Schüpbach,∗ and Akhilesh Singhania∗ ∗Systems Group, ETH Zurich yMicrosoft Research, Cambridge zENS Cachan Bretagne Abstract App App App App Commodity computer systems contain more and more OS node OS node OS node OS node processor cores and exhibit increasingly diverse archi- Agreement algorithms State State State Async messages State tectural tradeoffs, including memory hierarchies, inter- replica replica replica replica connects, instruction sets and variants, and IO configu- Arch-specific code rations. Previous high-performance computing systems have scaled in specific cases, but the dynamic nature of Heterogeneous x86 x64 ARM GPU modern client and server workloads, coupled with the cores impossibility of statically optimizing an OS for all work- Interconnect loads and hardware variants pose serious challenges for operating system structures. Figure 1: The multikernel model. We argue that the challenge of future multicore hard- ware is best met by embracing the networked nature of the machine, rethinking OS architecture using ideas from Such hardware, while in some regards similar to ear- distributed systems. We investigate a new OS structure, lier parallel systems, is new in the general-purpose com- the multikernel, that treats the machine as a network of puting domain. We increasingly find multicore systems independent cores, assumes no inter-core sharing at the in a variety of environments ranging from personal com- lowest level, and moves traditional OS functionality to puting platforms to data centers, with workloads that are a distributed system of processes that communicate via less predictable, and often more OS-intensive, than tradi- message-passing.
    [Show full text]
  • Cherios: Designing an Untrusted Single-Address-Space Capability Operating System Utilising Capability Hardware and a Minimal Hypervisor
    CheriOS: Designing an untrusted single-address-space capability operating system utilising capability hardware and a minimal hypervisor Lawrence G. Esswood University of Cambridge Computer Laboratory Churchill College July 2020 This thesis is submitted for the degree of Doctor of Philosophy Declaration This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. I further state that no substantial part of my thesis has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. It does not exceed the prescribed word limit for the relevant Degree Committee. Lawrence G. Esswood July 2020 iii Abstract CheriOS: Designing an untrusted single-address-space capability operating system utilising capability hardware and a minimal hypervisor Lawrence G. Esswood This thesis presents the design, implementation, and evaluation of a novel capability operating system: CheriOS. The guiding motivation behind CheriOS is to provide strong security guarantees to programmers, even allowing them to continue to program in fast, but typically unsafe, languages such as C. Furthermore, it does this in the presence of an extremely strong adversarial model: in CheriOS, every compartment – and even the operating system itself – is considered actively malicious. Building on top of the architecturally enforced capabilities offered by the CHERI microprocessor, I show that only a few more capability types and enforcement checks are required to provide a strong compartmentalisation model that can facilitate mutual distrust.
    [Show full text]
  • The Multikernel: a New OS Architecture for Scalable Multicore Systems
    The Multikernel: A new OS architecture for scalable multicore systems Andrew Baumann,∗ Paul Barham,y Pierre-Evariste Dagand,z Tim Harris,y Rebecca Isaacs,y Simon Peter,∗ Timothy Roscoe,∗ Adrian Schüpbach,∗ and Akhilesh Singhania∗ ∗ y z Systems Group, ETH Zurich Microsoft Research, Cambridge ENS Cachan Bretagne ABSTRACT App App App App Commodity computer systems contain more and more processor cores and exhibit increasingly diverse architectural tradeoffs, in- OS node OS node OS node OS node Agreement cluding memory hierarchies, interconnects, instruction sets and algorithms State State State Async messages State variants, and IO configurations. Previous high-performance com- replica replica replica replica puting systems have scaled in specific cases, but the dynamic nature Arch-specific of modern client and server workloads, coupled with the impossi- code bility of statically optimizing an OS for all workloads and hardware Heterogeneous variants pose serious challenges for operating system structures. x86 x64 ARM GPU cores We argue that the challenge of future multicore hardware is best met by embracing the networked nature of the machine, rethinking Interconnect OS architecture using ideas from distributed systems. We investi- gate a new OS structure, the multikernel, that treats the machine as a Figure 1: The multikernel model. network of independent cores, assumes no inter-core sharing at the lowest level, and moves traditional OS functionality to a distributed system of processes that communicate via message-passing. Such hardware, while in some regards similar to earlier paral- We have implemented a multikernel OS to show that the ap- lel systems, is new in the general-purpose computing domain.
    [Show full text]
  • Festschrift Honoring Rick Rashid on His 60Th Birthday
    estschrift for Rick Rashid F on his 60th birthday Rich Draves Pamela Ash 9 March 2012 Dan Hart illustration for School of Computer Science, © 2009 Carnegie Mellon University, used with permission. INTRODUCTION RICH DRAVES ou’re holding a Festschrift honoring Rick Rashid on his 60th birthday. You may well ask, “What is a Festschrift?” According to Wikipedia, a Festschrift “is a book Y honoring a respected person, especially an academic, and presented during his or her lifetime.” The Festschrift idea dates back to August 2011, when a group of friends and colleagues suddenly realized that Rick’s 60th birthday was rapidly approaching. In our defense, I can only point to Rick’s youthful appearance and personality. Peter Lee suggested that we organize a Festschrift. The idea was foreign, but everyone immediately recognized that it was the right way to honor Rick. The only problem was timing. Fortunately, this is a common problem and the belated Festschrift is somewhat traditional. We set a goal of holding a colloquium on March 9, 2012 (in conjunction with Microsoft Research’s annual TechFest) and producing a volume by September 2012, within a year of Rick’s birthday. This volume has content similar to the colloquium, but some authors made a different written contribution. The DVD in the back of the book contains a recording of the colloquium. I want to emphasize that we’re celebrating Rick’s career and accomplishments to date but we are also looking forward to many more great things to come. In other words, we need not fear his retirement.
    [Show full text]
  • Transactional Memory 2Nd Edition Copyright © 2010 by Morgan & Claypool
    Transactional Memory 2nd edition Copyright © 2010 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Transactional Memory, 2nd edition Tim Harris, James Larus, and Ravi Rajwar www.morganclaypool.com ISBN: 9781608452354 paperback ISBN: 9781608452361 ebook DOI 10.2200/S00272ED1V01Y201006CAC011 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Lecture #11 Series Editor: Mark D. Hill, University of Wisconsin Series ISSN Synthesis Lectures on Computer Architecture Print 1935-3235 Electronic 1935-3243 Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISAC, HAPP, MICRO, and APOLLOS. Transactional Memory, 2nd edition Tim Harris, James Larus, and Ravi Rajwar 2010 Computer Architecture Performance Evaluation Models Lieven Eeckhout 2010 Introduction to Reconfigured Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009
    [Show full text]
  • Operating System Support for Warehouse-Scale Computing
    Operating system support for warehouse-scale computing Malte Schwarzkopf University of Cambridge Computer Laboratory St John’s College October 2015 This dissertation is submitted for the degree of Doctor of Philosophy Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. This dissertation is not substantially the same as any that I have submitted, or, is being concur- rently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University of similar institution. This dissertation does not exceed the regulation length of 60,000 words, including tables and footnotes. Operating system support for warehouse-scale computing Malte Schwarzkopf Summary Computer technology currently pursues two divergent trends: ever smaller mobile devices bring computing to new everyday contexts, and ever larger-scale data centres remotely back the ap- plications running on these devices. These data centres pose challenges to systems software and the operating system (OS): common OS abstractions are fundamentally scoped to a single machine, while data centre applications treat thousands of machines as a “warehouse-scale computer” (WSC). I argue that making the operating system explicitly aware of their distributed operation can result in significant benefits. This dissertation presents the design of a novel distributed operating system for warehouse-scale computers.
    [Show full text]