A Distributed Compute Fabric for Edge Intelligence

A Distributed Compute Fabric for Edge Intelligence DISSERTATION zur Erlangung des akademischen Grades Doktor der Technischen Wissenschaften eingereicht von Dipl.-Ing. Thomas Rausch, BSc Matrikelnummer 00726439 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Univ.Prof. Dr. Schahram Dustdar Diese Dissertation haben begutachtet: Weisong Shi Ming Zhao Wien, 5. Mai 2021 Thomas Rausch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at A Distributed Compute Fabric for Edge Intelligence DISSERTATION submitted in partial fulfillment of the requirements for the degree of Doktor der Technischen Wissenschaften by Dipl.-Ing. Thomas Rausch, BSc Registration Number 00726439 to the Faculty of Informatics at the TU Wien Advisor: Univ.Prof. Dr. Schahram Dustdar The dissertation has been reviewed by: Weisong Shi Ming Zhao Vienna, 5th May, 2021 Thomas Rausch Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.at Erklärung zur Verfassung der Arbeit Dipl.-Ing. Thomas Rausch, BSc Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 5. Mai 2021 Thomas Rausch v Acknowledgements This thesis is the culmination of over fours years of research, tinkering, personal growth, and wonderful collaborations. I have lots of things to be grateful for, have been immensely privileged to be where I am today, and I owe thanks to numerous people whose involvements in this thesis I want to acknowledge. First I want to thank my supervisor Schahram Dustdar, who gave me the opportunity to pursue my PhD in his group. He has given me the two things I needed most to grow as a researcher and person: the freedom to pursue my own path, and the trust that this path will lead to great outcomes. I owe thanks to Waldemar Hummer, whose advice has helped me get through the many stages of the PhD. Waldemar has been a teacher, mentor, colleague, and friend, and I’m extremely grateful to him. His tenacity lead to an internship at IBM Research, where we, together with Vinod Muthusamy, developed key ideas of this thesis. I also want to thank Mahadev Satyanarayanan from Carnegie Mellon University, and Padmanabhan Pillai from Intel Labs, who made possible my research stay at Satya’s edge computing lab at CMU. I owe thanks to my two thesis reviewers Ming Zhao and Weisong Shi, who I’ve also enjoyed stimulating conversations with at Edge Computing conferences. I also want to thank Wolfgang Kastner and Uwe Zdun for agreeing to be in my proficiency evaluation committe, giving me valuable feedback in the earlier stages of my research. I’ve had many colleagues at the Distributed Systems Group who have helped me directly or indirectly with my thesis. Renate, Christine, and Margret have been immensely helpful with administrative issues, and I’m very grateful for their patience. Alexander Knoll, who has become my friend, was always eager to help with acquiring hardware for testbeds, build compute clusters, and provide cute videos of cats. Moreover, I wish to thank all my students who have contributed to the research presented in this thesis, and who helped me grow as a teacher: Andreas Bruckner, Manuel Geier, Cynthia Marcelino, Jacob Palecek, Philipp Raith, Alexander Rashed, David Schneiderbauer, and Silvio Vasiljevic. Finally, I’m immensely grateful for Katharina Krösl, who has been a fixed point in the tumultuous past decade of my life. Her never ending patience, tolerance, and helpful character have significantly contributed to my life and my career. Through the many fruitful discussions, we have found exciting topics at the intersection of our research, that have opened up new avenues I hope to continue to pursue together. The research presented in this thesis was funded by TU Wien research funds, as well as the Austrian Federal Ministry of Science through the Austrian infrastructure program (HRSM 2016) as part of the CPS/IoT Ecosystem project. vii Kurzfassung Edge Intelligence ist ein post-Cloud Computing Paradigma, und die Konsequenz des letzten Jahrzehnts an Entwicklungen in den Bereichen der Künstlicher Intelligenz (KI), Internet der Dinge (IoT), und Human Augmentation. An der Schnittstelle dieser Do- mänen entstehen Anwendungsszenarien mit herausfordernden Anforderungen, etwa den Echtzeitzugriff auf Sensordaten aus der Umgebung, KI-Modell-Inferenz mit geringer La- tenz, oder den sicheren Zugriff auf Daten aus privaten Edge Netzwerken um KI-Modelle zu trainieren. Diese Anforderungen stehen in klarem Widerspruch zum zentralisierten Cloud Computing Paradigma, und haben weitreichende Auswirkungen auf das Design der unterstützenden Computersysteme. Edge Intelligence erfordert neuartige Systeme die explizit für die Charakteristika von KI Anwendungen und Edge Computing Umgebungen entwickelt sind. Diese Systeme verweben mit den entsprechenden Abstraktionen, Cloud und Edge Ressourcen zu einem neuartigen Rechnerverbund, dem “Distributed Compute Fabric”. Die Ziele der vorliegenden Arbeit sind die Untersuchung der mit diesen neuen Systemen verbundenen Herausforderungen, und die Entwicklung und Evaluierung von Prototypen um die Anwendbarkeit zu demonstrieren. Um das Konzept der intelligenten Edge Netzwerke, oder “Edge Intelligence”, zu untermau- ern, analysieren wir zuerst aktuelle Trends in Human Augmentation, Edge Computing und KI. Wir erarbeiten Szenarien in denen dezentrale IT-Ressourcen einen Rechnerver- bund bilden, der für die verteilte Ausführung von Anwendungen dienlich sein kann. Das Analogon zum Cloud Server Computer ist der “multi-purpose Edge Computer Cluster”, welches die Infrastruktureinheit für Edge Intelligence bildet. Anhand von Experimenten mit einem Prototypen den wir entwickelt haben, verdeutlichen wir die Herausforderungen an Orchestrierungsmechanismen in solchen Systemen. Wir entwickeln neue Evaluierungs- methodologien für Edge Computing. Insbesondere entwickeln wir anhand einer Analyse neuartiger Edge Systeme ein Framework um synthetische aber plausible Cluster und Infrastruktur-Konfigurationen zu generieren, die als Input für einen Simulator verwendet werden. Um verteilte Infrastruktureinheiten zu einem Verbund zu verweben, entwickeln wir zwei orthogonale Systeme: eine elastisch skalierende Message-Oriented Middleware, und eine Serverless Edge Computing Plattform. Von einem zentralen Deployment in der Cloud, diffundieren Nachrichten-Broker nach Bedarf und Ressourcenverfügbarkeit in Edge Netzwerke. Das System optimiert kontinuierlich Kommunikationslatenzen in dem es die Distanz zwischen Clients und Broker überwacht und das Netzwerk entsprechend neu konfiguriert. Unsere Serverless Plattform erweitert existierende Systeme mit einem neuar- ix tigen Scheduler, der Zielkonflikte zwischen Datentransfer und Codemobilität zur Laufzeit auflöst, und spezielle Anforderungen von AI Anwendungen analysiert und entsprechend zu spezialisierten IT-Ressourcen, wie etwa GPUs, zordnet. Weiters präsentieren wir eine Methode für das automatische Feineinstellen von Scheduler-Parameter, um Betriebsziele der Infrastruktur zu optimieren. Abstract Edge intelligence is a post-cloud computing paradigm, and a consequence of the past decade of developments in Artificial Intelligence (AI), Internet of Things (IoT), and human augmentation. At the intersection of these domains, new applications have emerged that require real-time access to sensor data from the environment, low-latency AI model inferencing, or access to data isolated in edge networks for training AI models, all while operating in highly dynamic and heterogeneous computing environments. These requirements have profound implications on the scale and design of supporting computing platforms that are clearly at odds with the centralized nature of cloud computing. Instead, edge intelligence necessitates a new operational layer that is designed for the character- istics of AI and edge computing systems. This layer weaves both cloud and federated edge resources together using appropriate platform abstractions to form a distributed compute fabric. The main goals of this thesis are to examine the associated challenges, and to provide evidence for the efficacy of the idea. To further develop the concept of Edge Intelligence, we first discuss emerging technology trends at the intersection of edge computing and AI. We then examine scenarios where distributed computing resources can be federated and served as a utility, and argue that multi-purpose edge computer clusters will be a fundamental infrastructural component. Through experiments on prototypes we have built, we highlight the challenges faced by operational mechanisms such as load balancers in these environments. We extend the body of evaluation methods for edge computing, which is, compared to cloud computing research, still underdeveloped. Most notably, we present a toolkit to generate synthetic infrastructure configurations and net- work topologies that are grounded in our examination of existing edge systems, and serve as input for a trace-driven simulator we have built. To create the distributed computing fabric, we develop two orthogonal systems: an elastic message-oriented middleware, and a serverless edge computing platform. From a static centralized deployment

Load more