Operating System and Network Co-Design for Latency-Critical Datacenter Applications

Thèse n° 7108 Operating System and Network Co-Design for Latency-Critical Datacenter Applications Présentée le 28 août 2020 à la Faculté informatique et communications Laboratoire des Systèmes de Centres de Calcul Programme doctoral en informatique et communications pour l’obtention du grade de Docteur ès Sciences par Evangelos Marios KOGIAS Acceptée sur proposition du jury Prof. J. R. Larus, président du jury Prof. E. Bugnion, directeur de thèse Prof. C. Kozyrakis, rapporteur Dr R. Bianchini, rapporteur Prof. K. Argyraki, rapporteuse 2020 Νὰ εὔχεσαι νά <nai μακρὺς ὁ δρόμος. Polλὰ tὰ kalokairiνὰ πρωϊὰ νὰ εἶnai ποὺ μὲ tÐ eÎqarÐsthsh, μὲ tÐ χαρὰ θὰ mpaÐneic σὲ limènac prwtoeidwmènouc& — K. P. Καβάφης Hope the voyage is a long one. May there be many a summer morning when, with what pleasure, what joy, you come into harbors seen for the first time; — C. P.Cavafy To my family. Acknowledgements This thesis is the final stop of a journey started five years ago, a journey that formed as a researcher and as a person in general. Throughout this journey I had several people that stood by me and supported me, including advisors, collaborators, friends, and family. This journey would not be possible without them and I wholeheartedly want to say a big thanks to them. First and foremost I would like to thank my advisor Ed Bugnion. His huge systems insight deeply affected the way I approach and think about problems. Ed trusted me since the early days of my PhD and allowed me to take academic initiatives. Despite the big pressure his trust put on me, this trust helped me grow, form my identity as a researcher, make choices and stand by them, and advise many students. I will never forget our heated whiteboard sessions and my midnight Slack outbursts that he always dealt with profound patience. Then, I would like to thank my thesis committee: James Larus, Katerina Argyraki, Christos Kozyrakis, and Ricardo Bianchini. They gave me a lot of constructive feedback that made this thesis much better but also influenced me throughout my PhD. Jim made me understand that simple questions are the hardest ones. Katerina served as my academic "aunt" during my time at EPFL and she was my go-to person during the downs of this journey. I had the luck to work with Christos during my first year at EPFL and he also deeply affected the way I approach research by making it clear that "it’s about the projects you finish, not the projects you start". Ricardo broadened my cloud horizons serving as my mentor during my MSR internship and taught me how to be a team player in a big project. DCSL was never a big lab but I never felt alone in this journey. I am grateful to my labmates and our friends from VLSC for all the discussions we had throughout my PhD that helped this thesis come to its final form. Thank you very much Sam, Jonas, Mia, Dmitrii, Stuart, Sahand, David, Mayy. I want to specially thank my officemates George and Adrien. George was the senior student at DCSL when I joined and I learnt a lot from him. Adrien has always been there to discuss anything with his great ability to be direct, caring, and judgemental at the same time, while knowing which attribute of his to use in which discussion. I want to thank DCSL’s mom, Maggy, for being extremely helpful with any issue I might have and dealing with our steam always with a big smile. Having great mentors aspired me to try and become a good mentor myself. For that I would like to thank all my students both as a POCS TA and the ones that did projects with me. Kontantine, Martin, Antoine, Ogier, Lisa, Paul, Guillaume, Mikael, Kosta, Adi, Matthieu, and Charline thanks for working with me. I also want to thank my students in the greek dance class and the greek school. Teaching how to dance to such a versatile group of people taught i Acknowledgements me how to explain things differently and adjust to my audience. Doing a PhD is an extremely stressful period in a student’s life and having great friends is the only way out. I want to thank Lefteris and George for being close to me in every step from day zero till the end. I feel grateful to all my Lausanne friends and especially to Irene, Patricia, Helena, Hermina, Lana, Kiril, Dusan, Mark, Rishabh, Arseniy, Arash, Bogdan, and the greek gang Stella, Thanos, Panagiotis, Panos, Aggelos, Pavlos, Natali, Vangelis, Prodromos, Tolis, Melivia, Georgia. Thanks for all the discussions (academic and non-), the BBQs, the beers at Great Escape, the ski trips. I want to specifically thank my friend Matt for luring me intro the world of motorbikes. I, also, want to thank my friends in Greece who, despite being far, always make me feel as I never left. Special thanks to my NTUA and SourceLair friends Antonis M, Antonis K, Paris, Thanasis, and Christos, my friend Anastasis for all the songs we’ve played together, and my great childhood friends George and Yiannis. I wouldn’t be the person I am today without the unconditional love and support of my family. I feel deeply grateful to my parents, Thanasis and Dimitra, for the way they raised me, my dad for showing me that every obstacle can be overtaken if you stay calm, and my mom for letting me dream without borders, my sister Alexia for reminding me the importance of people skills that can be easily lost throughout the PhD and my sister Evita for empowering me with her strong and non-conventional personality. Last but not least, I want to thank Vivi. Vivi came in my life when I was 14 and never left ever since. She has been next to me in all the ups and downs holding different roles, but being my best friend above all, while she never stopped showing me that home is where they love you. Vivi has left indelible marks on who I am and I hope she will continue doing so in the years to come. Thank you very much for riding along. ????? I would also like the thank the people and the organisations that financially supported this thesis and allowed me to work on problems I found challenging. My thesis was supported by a VMware research grant and the Microsoft Swiss Joint Research Center. I am also a grateful recipient of the IBM PhD fellowship. Lausanne, August 2020 M. K. ii Abstract Datacenters are the heart of our digital lives. Online applications, such as social-networking and e-commerce, run inside datacenters under strict Service Level Objectives for their tail latency. Tight latency SLOs are necessary for such services to remain interactive and keep users engaged. At the same time, datacenters operate under a single administrative domain which enables the deployment of customized network and hardware solutions based on specific application requirements. Customization enables the design of datacenter-tailored and SLO-aware mechanisms that are more efficient and have better performance. In this thesis we focus on three main datacenter challenges. First, latency-critical, in-memory datacenter applications have µs-scale services times and run on top of hardware infrastructure which is also capable of µs-scale inter-node round-trip times. Existing operating systems, though, were designed under completely different assumptions and are not ready for µs-scale computing. Second, the base of datacenter communications is Remote Procedure Calls (RPCs) that depend on a message-oriented paradigm, while TCP still remains widely-used for intra datacenter communications. The mismatch between TCP’s bytestream-oriented abstraction and RPCs causes several inefficiencies and deteriorates tail latency. Finally, datacenter applications follow a scale-out paradigm based on large fan-out communication schemes. In such a scenario, tail latency becomes a critical metric due to the tail at scale problem. The two main factors that affect tail latency is interference and scheduling/load balancing decisions. To deal with the above challenges we advocate for a co-design of network and operating system mechanisms targeting µs-scale tail optimisations for latency-critical datacenter applications. Our approach investigates the potential of pushing functionality to the network leveraging emerging in-network programmability features. Whenever existing abstractions fail to meet the µs-scale requirements or restrict our design space, we propose new ones given the design and deployment freedom the datacenter offers. This thesis contributions can be split in three main parts. We, first, design and build tools and methodologies for µs-scale latency measurements and system evaluation. Our approach depends on a robust theoretical background in statistics and queueing theory. We, then, revisit existing operating system and networking mechanisms for TCP-based datacenter applications. We design an OS scheduler for µs-scale tasks, while we modify TCP to improve L4 load balancing, and provide an SLO-aware flow control mechanism. Finally, after identifying the problems related to TCP-based RPC services, we introduce a new transport protocol for datacenter RPCs and in-network policy enforcement that enables us to push functionality to iii Abstract the network. We showcase how the new protocol improves the performance and simplifies the implementation of in-network RPC load balancing, SLO-aware RPC flow control, and application-agnostic fault-tolerant RPCs. Keywords: datacenters, datacenter networking, remote procedure call, in-network compute, µs-scale computing, scheduling, load balancing, P4, state machine replication iv Résumé Les centres de données sont au coeur de notre vie digitale. Les services en ligne tels que les réseaux sociaux ainsi que le commerce en ligne tournent dans des centres de données sous des objectifs de niveau de services (SLO) très strictes sur leur latence de traîne (tail latency).

Operating System and Network Co-Design for Latency-Critical Datacenter Applications

Brackets Third Party Page (

Open Source Used in Coral-Services 11.1.0

Azure Cloud Shell Your Azure Management Multi-Tool

Niektoré Možnosti Programovania V C++

Edit, Compile, Execute and Debug C++ on the Web

Editorconfig (/)

Open Source Used in IOX-CAF 1.13.0.0

Feng Lab - Data Science Toolbox and Chip-Seq Release 1.0

List of Online Dev Environments

Open Source Used in Infinite Webengine 52

Edge Code Preview

Cloudteams Playbook