Towards Scalable Synchronization on Multi-Cores

Towards Scalable Synchronization on Multi-Cores THÈSE NO 7246 (2016) PRÉSENTÉE LE 21 OCTOBRE 2016 À LA FACULTÉ INFORMATIQUE ET COMMUNICATIONS LABORATOIRE DE PROGRAMMATION DISTRIBUÉE PROGRAMME DOCTORAL EN INFORMATIQUE ET COMMUNICATIONS ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR Vasileios TRIGONAKIS acceptée sur proposition du jury: Prof. J. R. Larus, président du jury Prof. R. Guerraoui, directeur de thèse Dr T. Harris, rapporteur Dr G. Muller, rapporteur Prof. W. Zwaenepoel, rapporteur Suisse 2016 “We can only see a short distance ahead, but we can see plenty there that needs to be done.” — Alan Turing To my parents, Eirini and Charalampos Acknowledgements “The whole is greater than the sum of its parts.” — Aristotle To date, my education (i.e., diploma, M.Sc., and Ph.D.) has lasted for 13 years. I could not possibly be here and sustain all the pressure (and of course the financial expenses) without the help and support of my family, Eirini (my mother), Charalampos (my father), and Eleni (my sister). I want to deeply thank them for being there for me throughout these years. In my experience, a successful Ph.D. thesis in the area called “systems” (i.e., with a focus on software systems) requires either 10 years of solo work, or 5–6 years of fruitful collaborations. I was lucky enough to belong in the latter category and to have the chance to collaborate with many amazing people in producing the research that is included in this dissertation. This is actually the main reason why in the main body of this dissertation I use “we” instead of “I.” First and foremost, I would like to thank my advisor, Rachid Guerraoui. Naturally, without him this dissertation would not exist. Rachid is among the most clever and optimistic people I have ever met. To him, I owe three of the most important attributes which I gained during my Ph.D.: (i) being laconic, (ii) optimism, and (iii) my “marketing skills.” The first one can be simply explained by his single-sentence answers to several paragraphs long e-mails. My optimism towards research and my marketing skills can be summarized by my current belief that there are no “bad” research results, but there are definitely bad ways to present those results. In other words, Rachid taught me that if the research work is solid, people will always be interested to read about it. As I mentioned earlier, my work is a result of several fruitful collaborations. I first want to thank Tudor David, with whom we traveled, partied, but also wrote the first two papers of my thesis. Similarly, Javier Picorel has been a close friend throughout the five plus years of my Ph.D. After a couple of years of friendship, we did not resist and decided to collaborate on a very cool project (Chapter 5), where we combined my software with his hardware expertise. On that same project, I also had the chance to collaborate with Babak Falsafi, who I deeply appreciate for his advice on how to present and improve my research results. Additionally, I wish to thank Tim Harris, who offered me the great opportunity to join Oracle Labs for a three-months internship in Cambridge and who also was one of the five members i Acknowledgements of my Ph.D. defense jury. Albeit the short duration of that internship, we were able to finalize the project and, with some additional work here at EPFL, to write a paper (Chapter 7). For that same project, I want to thank my friend and collaborator Georgios Chatzopoulos for his help. Furthermore, I must thank the three other committee members for my private Ph.D. defense, namely Gilles Muller, Willy Zwaenepoel, and James Larus. Our discussions during the defense, as well as their written comments, helped me improve the quality of my dissertation. Naturally, as a Ph.D. student my main/only interest was to solve problems. Still, I had zero interest in solving problems unrelated to research, such as traveling expenses and other bureaucratic stuff, or installing and configuring processors and software. Luckily, I did not really have to handle any of these distractions due to the help of the two secretaries of our lab, Kristine Verhamme and France Faille, and our system administrator Fabien Salvi. Similarly, I want to thank Peva Blanchard for translating my thesis’ abstract to French. Of course, many more people contributed in a way or another to this Ph.D. I would like to thank my colleagues and my friends (many people belong to both groups) for making my everyday life simpler and happier. Without David, Javier, Iraklis, Nadia, Christina, Iris, Matt, George, Tudor, Adi, Matej, Karolos, ... my life during these five years would have been unbearable. In particular, with Iraklis Psaroudakis, Javier Picorel, and David Kozhaya we started the Ph.D. in the same year. I am lucky and grateful to have them as my friends, as they are the people with whom I “cried” about the difficulties of the Ph.D., I celebrated happy moments, etc. Similarly, my “non-Lausannois” friends were always there for me, to listen to my problems over Skype and to spend awesome vacations together. Last but not least, I want to give special thanks to my girlfriend Bojana Paunovic, who arrived in my life almost a year ago and made this past complicated year of my life significantly more pleasant. Lausanne, EPFL, 29 September 2016 Vasileios Trigonakis ii Preface This dissertation presents part of the Ph.D. work performed under the supervision of Prof. Rachid Guerraoui at EPFL in Switzerland. The main results of this thesis appeared originally in the following conference publications (author names are in alphabetical order). 1. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. “Everything you always wanted to know about synchronization but were afraid to ask.” Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP). ACM, 2013. (Chapter 4) 2. Tudor David, Rachid Guerraoui, and Vasileios Trigonakis. “Asynchronized concurrency: The secret to scaling concurrent search data structures.” Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2015. (Chapter 6) 3. Rachid Guerraoui, and Vasileios Trigonakis. “Optimistic concurrency with OPTIK.” Pro- ceedings of the Twenty-First ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 2016. (Chapter 6) 4. Babak Falsafi, Rachid Guerraoui, Javier Picorel and Vasileios Trigonakis. “Unlocking energy.” Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC). USENIX, 2016. (Chapter 5) 5. George Chatzopoulos, Rachid Guerraoui, Tim Harris, and Vasileios Trigonakis. “Ab- stracting multi-core topologies with MCTOP.” Under submission. (Chapter 7) Besides the above-mentioned publications that constitute the backbone of this thesis, I further worked on the following conference publications (author names are in alphabetical order): 1. Vincent Gramoli, Rachid Guerraoui, and Vasileios Trigonakis. “TM2C: A software trans- actional memory for many-cores.” Proceedings of the Seventh European Conference on Computer Systems (EuroSys). ACM, 2012. 2. Jelena Antic, Georgios Chatzopoulos, Rachid Guerraoui, and Vasileios Trigonakis. “Lock- ing made easy.” Proceedings of the Seventeenth Annual Middleware Conference (Middleware). ACM, 2016. iii Abstract The shift of commodity hardware from single- to multi-core processors in the early 2000s com- pelled software developers to take advantage of the available parallelism of multi-cores. Unfor- tunately, only few—so-called embarrassingly parallel—applications can leverage this available parallelism in a straightforward manner. The remaining—non-embarrassingly parallel— applications require that their processes coordinate their possibly interleaved executions to ensure overall correctness—they require synchronization. Synchronization is achieved by constraining or even prohibiting parallel execution. Thus, per Amdahl’s law, synchronization limits software scalability. In this dissertation, we explore how to minimize the effects of synchronization on software scalability. We show that scalability of synchronization is mainly a property of the underlying hardware. This means that synchronization directly hampers the cross-platform performance portability of concurrent software. Nevertheless, we can achieve portability without sacrificing performance, by creating design patterns and abstractions, which implicitly leverage hardware details without exposing them to software developers. We first perform an exhaustive analysis of the performance behavior of synchronization on several modern platforms. This analysis clearly shows that the performance and scalability of synchronization are highly dependent on the characteristics of the underlying platform. We then focus on lock-based synchronization and analyze the energy/performance trade- offs of various waiting techniques. We show that the performance and the energy efficiency of locks go hand in hand on modern x86 multi-cores. This correlation is again due to the characteristics of the hardware that does not provide practical tools for reducing the power consumption of locks without sacrificing throughput. We then propose two approaches for developing portable and scalable concurrent software, hence hiding the limitations that the underlying multi-cores impose. First, we introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. We illustrate the power of our OPTIK pattern by devising

Towards Scalable Synchronization on Multi-Cores

Is Parallel Programming Hard, And, If So, What Can You Do About It?

11. Kernel Design 11

Understanding the Linux Kernel, 3Rd Edition by Daniel P

Linux Kernel Synchronization

Synchronizations in Linux

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Université De Montréal Low-Impact Operating

What Is RCU, Fundamentally?

Locking in Linux Kernel

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Development of a Linux Driver for a MOST Interface and Porting to RTAI

Can Seqlocks Get Along with Programming Language Memory Models?