Extending the Got Model Into a Multi-Language, Multi-Platform Distributed System Through Interoperable Spacetime Dataframes
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, IRVINE Extending the GoT model into a multi-language, multi-platform distributed system through interoperable Spacetime dataframes THESIS submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Software Engineering by Shrijith Saraswathi Venkatramana Thesis Committee: Cristina Videira Lopes, Chair Sam Malek James A. Jones 2021 © 2021 Shrijith Saraswathi Venkatramana TABLE OF CONTENTS Page LIST OF FIGURES v LIST OF TABLES vi ACKNOWLEDGMENTS vii ABSTRACT OF THE THESIS viii 1 Introduction 1 2 Related Work 5 2.1 RPC/Distributed Objects . .6 2.2 Message-Oriented . .7 2.3 Publish-Subscribe . .8 2.4 Tuple Spaces . .8 3 The GoT Distributed Computing Model 10 3.1 GoT, Git but for Objects . 11 3.2 GoT Example: Simplified Space Race . 11 3.2.1 Data Model . 13 3.2.2 Server Node: Physics Simulator . 14 3.2.3 Client Node: Player . 15 3.2.4 Conflict Detection and Resolution . 16 3.3 Extending Spacetime into a multi-language, multi-platform system . 17 4 Designing an Interface Definition Language for GoT 18 4.1 The Purpose Behind Creating Interface Definition Languages . 18 4.2 A Brief History of Interface Definition Languages . 19 4.3 Observations on Earlier Systems . 21 4.3.1 Case Study: Technical Issues with the CORBA API . 21 4.4 GoT IDL and Code Generator . 22 4.4.1 Overview . 22 4.4.2 Example Usage . 22 4.4.3 Architecture . 25 4.4.4 Implementation . 25 ii 5 Generic Dataframe Communication Protocol in GoT 28 5.1 Interoperability in CORBA . 28 5.1.1 Object Request Broker (ORB) . 28 5.1.2 Generic Inter-ORB Protocol (GIOP) . 29 5.1.3 Common Data Representation (CDR) . 30 5.2 GoT Communication Protocol . 31 5.2.1 Data Format . 31 5.2.2 Fetch Cycle . 32 5.2.3 Push Cycle . 35 6 Support for Spacetime Dataframes in the Browser 38 6.1 Extended Spacetime Abstraction Layers Overview . 40 6.2 WebAssembly (WASM) . 41 6.3 Emscripten . 42 6.4 The Bindings . 42 6.4.1 Backend Dataframe Inner Bindings (Python bindings) . 43 6.4.2 WASM Dataframe Inner Bindings (Embind) . 44 6.5 Backend Dataframe Core . 45 6.6 Websockify and Network Protocol Compatibility . 46 7 Validation: Building an Online Presence Application 47 7.1 Application Requirements . 47 7.2 Generic Interface Design for the Web Presence Application . 48 7.2.1 Generated Python Class: IPresence.py . 49 7.2.2 Generated Typescript Class: IPresence.ts . 49 7.3 Implementing Application Logic Using the Generated Classes . 51 7.3.1 Client Logic in the Browser Using Typescript Binding . 51 7.3.2 Server Logic in the Backend Using Python Binding . 53 7.4 Websockify: Translate Between WebSockets and POSIX Network Calls . 54 7.5 Result . 55 8 Discussion: Comparison with Parse Platform 56 8.1 Parse Platform Architecture . 58 8.2 Implementing the Web Presence Application in Parse Platform . 58 8.3 Developer API Comparison for Extended Spacetime and Parse Platform . 60 8.4 Feature Comparison for Extended Spacetime and Parse Platform . 61 9 Future Work and Conclusion 64 9.1 Future Work . 64 9.1.1 Object Persistence . 64 9.1.2 Advanced Object Queries . 64 9.1.3 More Language Bindings and Runtimes . 65 9.1.4 Class Level Permissions and ACLs . 65 9.1.5 Advanced Data Types . 65 9.2 Conclusion . 65 iii Bibliography 67 iv LIST OF FIGURES Page 1.1 Extended Spacetime Architecture . .2 2.1 A high-level overview of middleware systems . .6 3.1 Structure of a GoT Node. Arrows denote the direction of data flow. 12 4.1 High level overview of GoT IDL and code generator . 25 4.2 Details of IDL parser and code generator . 26 5.1 The fetch cycle sequence . 33 5.2 The push cycle sequence . 36 6.1 Detailed Layered Spacetime Architecture for enabling heterogeneous function- ality ........................................ 39 6.2 Extended Spacetime bindings: purpose and implementation technologies . 43 7.1 A sample interaction with the Web Presence Application . 55 8.1 Parse Platform Architecture . 57 v LIST OF TABLES Page 1.1 Contribution Details . .3 3.1 API Table for a Dataframe . 12 4.1 History of IDLs in the context of their respective environments . 20 5.1 GoT Protocol Data format . 32 8.1 Application Developer API comparison for Extended Spacetime and Parse Platform . 60 8.2 Summary of comparison between Spacetime and Parse Platform . 61 vi ACKNOWLEDGMENTS I would like to thank my advisor Professor Cristina V. Lopes for her constant support and guidance. I would also like to express my gratitude to Rohan Achar for introducing me to the subject, mentoring me throughout and also for various code contributions. And I want to thank Xiaochin Yu for providing valuable code contributions to the framework's core and technical insights. vii ABSTRACT OF THE THESIS Extending the GoT model into a multi-language, multi-platform distributed system through interoperable Spacetime dataframes By Shrijith Saraswathi Venkatramana Master of Science in Software Engineering University of California, Irvine, 2021 Cristina Videira Lopes, Chair Over the years, the computing landscape at large has become more diverse and heterogeneous in terms of programming languages, operating systems, networking methods, and hardware. With such increasing heterogeneity, any new distributed programming model needs to func- tion successfully across the gamut of languages and platforms. The Global Object Tracker (GoT) is a recently introduced distributed programming model, initially implemented as the Spacetime framework in a single implementation language { Python. In this thesis, we explore the possibility and feasibility of extending Spacetime into a multi-language, multi- platform distributed system through a combination of generic and platform-specific tech- niques. In particular, I explain the design of an Interface Definition Language (IDL) for GoT, in the backdrop of historically significant IDLs. Moreover, we explore the underlying protocol defined in GoT which enables interoperability among heterogeneous components. The thesis is validated by demonstrating a simple Web Presence application where much of the complexity emerging due to heterogeneity is hidden from the application developer. The complexity hiding in the example application is explained in the context of a layered archi- tecture. Moreover, we compare the extended Spacetime framework with the Parse Platform, a contemporary heterogeneous distributed system originally from Facebook. viii Chapter 1 Introduction The Global Object Tracker (GoT) is a recently introduced distributed programming model [13, 12]. During initial introduction, it was implemented in a single implementation language { Python. One of the important challenges for a distributed programming model to re- solve is that of heterogeneity of languages, platforms, and systems. This thesis explores the possibility and feasibility of extending GoT's initial implementation, Spacetime, into a multi-language, multi-platform distributed system. This thesis aims to establish the feasibility of extending Spacetime into a multi-language and multi-platform distributed system through a combination of generic and platform-specific techniques. We go into the details of designing an interface definition language, the protocol underlying the GoT programming model, and platform specific concerns such as networking compatibility. The system is validated by building an application on top of the extended Spacetime framework. Moreover, a comparison with a contemporary system, Parse Platform is included. The extended Spacetime framework described in this thesis in particular takes advantage of various techniques to enable interoperability: 1 Figure 1.1: Extended Spacetime Architecture 2 • Specifying and using an Interface Definition Language (IDL) and accompanying code generator • Taking advantage of GoT's flexible and platform independent protocol for dataframe communication • Building a common C++ core into multiple target platforms • Accessing the common core through language and platform dependent binding tech- niques A high level overview of the extended Spacetime architecture is shown in Figure 1.1. All the aforementioned techniques are represented as various components in the architecture diagram, and subsequent chapters will cover each component in detail. This thesis has benefited greatly from the contributions of various people as detailed in Table 1.1. Table 1.1: Contribution Details Extended Spacetime Component Contributor Version graph core in C++ Xiaochin Yu Python bindings and GoT protocol specification Rohan Achar The rest of the thesis is organized as follows. In Chapter 2, I discuss related work. Chap- ter 3 describes the original GoT programming model and an example application designed for its implementation framework, Spacetime. In Chapter 4, a historical study of IDLs is combined with the design of an IDL for Spacetime. Chapter 5 describes the generic commu- nication protocol in GoT that enables interoperability. Chapter 6 explains how Spacetime was extended to the browser. Chapter 7 describes the validation of the overall framework through an example Web Presence application. Chapter 8 compares Extended Spacetime to 3 the Parse Platform. Chapter 9 outlines Future work and the conclusion. 4 Chapter 2 Related Work As the heterogeneity of components and platforms increase in the computing landscape, successful interoperability is a requirement for any new distributed model. Interoperability of components in complex distributed systems has been a topic of study for quite a while [14]. A very high-level overview of traditional middlware is shown in Figure 2.1. We see that various platform variations are hidden from the application through the middleware abstraction. Extended Spacetime, while based on a uniquely Git-inspired model, still shares many of similarities with traditional interoperability middleware. These traditional middleware systems can be divided into various categories such as [15]: • RPC/Distributed Objects • Message-based Systems • Publish-Subscribe Systems • Tuple Spaces 5 Figure 2.1: A high-level overview of middleware systems 2.1 RPC/Distributed Objects The Remote Procedure Call (RPC) based Distributed Objects paradigm is the oldest form of distributed systems and was introduced as early as 1988 in the form of Network Computing Architecture (NCA) [18].