<<

UNIVERSITY OF CALIFORNIA, IRVINE

Extending the GoT model into a multi-language, multi-platform distributed system through interoperable Spacetime dataframes

THESIS

submitted in partial satisfaction of the requirements for the degree of

MASTER OF SCIENCE

in Software Engineering

by

Shrijith Saraswathi Venkatramana

Thesis Committee: Cristina Videira Lopes, Chair Sam Malek James A. Jones

2021 © 2021 Shrijith Saraswathi Venkatramana TABLE OF CONTENTS

Page

LIST OF FIGURES v

LIST OF TABLES vi

ACKNOWLEDGMENTS vii

ABSTRACT OF THE THESIS viii

1 Introduction 1

2 Related Work 5 2.1 RPC/Distributed Objects ...... 6 2.2 Message-Oriented ...... 7 2.3 Publish-Subscribe ...... 8 2.4 Tuple Spaces ...... 8

3 The GoT Distributed Model 10 3.1 GoT, Git but for Objects ...... 11 3.2 GoT Example: Simplified Space Race ...... 11 3.2.1 Data Model ...... 13 3.2.2 Node: Physics Simulator ...... 14 3.2.3 Node: Player ...... 15 3.2.4 Conflict Detection and Resolution ...... 16 3.3 Extending Spacetime into a multi-language, multi-platform system ...... 17

4 Designing an Interface Definition Language for GoT 18 4.1 The Purpose Behind Creating Interface Definition Languages ...... 18 4.2 A Brief History of Interface Definition Languages ...... 19 4.3 Observations on Earlier Systems ...... 21 4.3.1 Case Study: Technical Issues with the CORBA API ...... 21 4.4 GoT IDL and Code Generator ...... 22 4.4.1 Overview ...... 22 4.4.2 Example Usage ...... 22 4.4.3 Architecture ...... 25 4.4.4 Implementation ...... 25

ii 5 Generic Dataframe in GoT 28 5.1 Interoperability in CORBA ...... 28 5.1.1 Object Request Broker (ORB) ...... 28 5.1.2 Generic Inter-ORB Protocol (GIOP) ...... 29 5.1.3 Common Data Representation (CDR) ...... 30 5.2 GoT Communication Protocol ...... 31 5.2.1 Data Format ...... 31 5.2.2 Fetch Cycle ...... 32 5.2.3 Push Cycle ...... 35

6 Support for Spacetime Dataframes in the Browser 38 6.1 Extended Spacetime Abstraction Layers Overview ...... 40 6.2 WebAssembly (WASM) ...... 41 6.3 Emscripten ...... 42 6.4 The Bindings ...... 42 6.4.1 Backend Dataframe Inner Bindings (Python bindings) ...... 43 6.4.2 WASM Dataframe Inner Bindings (Embind) ...... 44 6.5 Backend Dataframe Core ...... 45 6.6 Websockify and Network Protocol Compatibility ...... 46

7 Validation: Building an Online Presence Application 47 7.1 Application Requirements ...... 47 7.2 Generic Interface Design for the Web Presence Application ...... 48 7.2.1 Generated Python Class: IPresence.py ...... 49 7.2.2 Generated Typescript Class: IPresence.ts ...... 49 7.3 Implementing Application Logic Using the Generated Classes ...... 51 7.3.1 Client Logic in the Browser Using Typescript Binding ...... 51 7.3.2 Server Logic in the Backend Using Python Binding ...... 53 7.4 Websockify: Translate Between and POSIX Network Calls . . . 54 7.5 Result ...... 55

8 Discussion: Comparison with Parse Platform 56 8.1 Parse Platform Architecture ...... 58 8.2 Implementing the Web Presence Application in Parse Platform ...... 58 8.3 Developer API Comparison for Extended Spacetime and Parse Platform . . 60 8.4 Feature Comparison for Extended Spacetime and Parse Platform ...... 61

9 Future Work and Conclusion 64 9.1 Future Work ...... 64 9.1.1 Object Persistence ...... 64 9.1.2 Advanced Object Queries ...... 64 9.1.3 More Language Bindings and Runtimes ...... 65 9.1.4 Class Level Permissions and ACLs ...... 65 9.1.5 Advanced Data Types ...... 65 9.2 Conclusion ...... 65

iii Bibliography 67

iv LIST OF FIGURES

Page

1.1 Extended Spacetime Architecture ...... 2

2.1 A high-level overview of systems ...... 6

3.1 Structure of a GoT Node. Arrows denote the direction of data flow...... 12

4.1 High level overview of GoT IDL and code generator ...... 25 4.2 Details of IDL parser and code generator ...... 26

5.1 The fetch cycle sequence ...... 33 5.2 The push cycle sequence ...... 36

6.1 Detailed Layered Spacetime Architecture for enabling heterogeneous function- ality ...... 39 6.2 Extended Spacetime bindings: purpose and implementation technologies . . 43

7.1 A sample interaction with the Web Presence Application ...... 55

8.1 Parse Platform Architecture ...... 57

v LIST OF TABLES

Page

1.1 Contribution Details ...... 3

3.1 API Table for a Dataframe ...... 12

4.1 History of IDLs in the context of their respective environments ...... 20

5.1 GoT Protocol Data format ...... 32

8.1 Application Developer API comparison for Extended Spacetime and Parse Platform ...... 60 8.2 Summary of comparison between Spacetime and Parse Platform ...... 61

vi ACKNOWLEDGMENTS

I would like to thank my advisor Professor Cristina V. Lopes for her constant support and guidance. I would also like to express my gratitude to Rohan Achar for introducing me to the subject, mentoring me throughout and also for various code contributions. And I want to thank Xiaochin Yu for providing valuable code contributions to the framework’s core and technical insights.

vii ABSTRACT OF THE THESIS

Extending the GoT model into a multi-language, multi-platform distributed system through interoperable Spacetime dataframes

By

Shrijith Saraswathi Venkatramana

Master of Science in Software Engineering

University of California, Irvine, 2021

Cristina Videira Lopes, Chair

Over the years, the computing landscape at large has become more diverse and heterogeneous in terms of programming languages, operating systems, networking methods, and hardware. With such increasing heterogeneity, any new distributed programming model needs to func- tion successfully across the gamut of languages and platforms. The Global Object Tracker (GoT) is a recently introduced distributed programming model, initially implemented as the Spacetime framework in a single implementation language – Python. In this thesis, we explore the possibility and feasibility of extending Spacetime into a multi-language, multi- platform distributed system through a combination of generic and platform-specific tech- niques. In particular, I explain the design of an Interface Definition Language (IDL) for GoT, in the backdrop of historically significant IDLs. Moreover, we explore the underlying protocol defined in GoT which enables interoperability among heterogeneous components. The thesis is validated by demonstrating a simple Web Presence application where much of the complexity emerging due to heterogeneity is hidden from the application developer. The complexity hiding in the example application is explained in the context of a layered archi- tecture. Moreover, we compare the extended Spacetime framework with the Parse Platform, a contemporary heterogeneous distributed system originally from Facebook.

viii Chapter 1

Introduction

The Global Object Tracker (GoT) is a recently introduced distributed programming model [13, 12]. During initial introduction, it was implemented in a single implementation language – Python. One of the important challenges for a distributed programming model to re- solve is that of heterogeneity of languages, platforms, and systems. This thesis explores the possibility and feasibility of extending GoT’s initial implementation, Spacetime, into a multi-language, multi-platform distributed system.

This thesis aims to establish the feasibility of extending Spacetime into a multi-language and multi-platform distributed system through a combination of generic and platform-specific techniques. We go into the details of designing an interface definition language, the protocol underlying the GoT programming model, and platform specific concerns such as networking compatibility. The system is validated by building an application on top of the extended Spacetime framework. Moreover, a comparison with a contemporary system, Parse Platform is included.

The extended Spacetime framework described in this thesis in particular takes advantage of various techniques to enable interoperability:

1 Figure 1.1: Extended Spacetime Architecture

2 • Specifying and using an Interface Definition Language (IDL) and accompanying code generator

• Taking advantage of GoT’s flexible and platform independent protocol for dataframe communication

• Building a common C++ core into multiple target platforms

• Accessing the common core through language and platform dependent binding tech- niques

A high level overview of the extended Spacetime architecture is shown in Figure 1.1. All the aforementioned techniques are represented as various components in the architecture diagram, and subsequent chapters will cover each component in detail. This thesis has benefited greatly from the contributions of various people as detailed in Table 1.1.

Table 1.1: Contribution Details

Extended Spacetime Component Contributor Version graph core in C++ Xiaochin Yu Python bindings and GoT protocol specification Rohan Achar

The rest of the thesis is organized as follows. In Chapter 2, I discuss related work. Chap- ter 3 describes the original GoT programming model and an example application designed for its implementation framework, Spacetime. In Chapter 4, a historical study of IDLs is combined with the design of an IDL for Spacetime. Chapter 5 describes the generic commu- nication protocol in GoT that enables interoperability. Chapter 6 explains how Spacetime was extended to the browser. Chapter 7 describes the validation of the overall framework through an example Web Presence application. Chapter 8 compares Extended Spacetime to

3 the Parse Platform. Chapter 9 outlines Future work and the conclusion.

4 Chapter 2

Related Work

As the heterogeneity of components and platforms increase in the computing landscape, successful interoperability is a requirement for any new distributed model. Interoperability of components in complex distributed systems has been a topic of study for quite a while [14]. A very high-level overview of traditional middlware is shown in Figure 2.1. We see that various platform variations are hidden from the application through the middleware abstraction. Extended Spacetime, while based on a uniquely Git-inspired model, still shares many of similarities with traditional interoperability middleware.

These traditional middleware systems can be divided into various categories such as [15]:

• RPC/Distributed Objects

• Message-based Systems

• Publish-Subscribe Systems

• Tuple Spaces

5 Figure 2.1: A high-level overview of middleware systems

2.1 RPC/Distributed Objects

The Remote Procedure Call (RPC) based Distributed Objects paradigm is the oldest form of distributed systems and was introduced as early as 1988 in the form of Network Computing Architecture (NCA) [18]. The location broker used UUIDs for the first time for uniquely identifying objects in this system. This was a proprietary system from Apollo and Hewlett Packard.

Later in 1990, SUN Microsystems released Open Network Computing (ONC) which provided object tracking and communication facilities based on RPC [6]. It also made use of a common platform independent data format known as XDR.

Next in 1993, the Open Software Foundation (OSF) defined a layered design called Dis- tributed Computing Environment (DCE) that combined aspects of NCA and ONC into an open standards for distributed applications [28].

6 In 1997, the Object Management Group (OMG) introduced CORBA [30], and the same year, Microsoft unveiled DCOM [29]. Both these systems provide object tracking and synchroniza- tion functions through a combination of an interface definition language and communication protocols.

This paradigm is similar to GoT in at least two respects: one, the use of IDLs and object trackers to choose which data to transmit, and second, the use of platform independent protocols and data formats.

2.2 Message-Oriented

In the Message-oriented category, MSMQ [27] and Java Message Service (JMS) [22] are examples of traditional middleware. These systems usually use a message queue that decou- ples senders and receivers. A message-based application denotes a set of developer specified messages and a set of clients that exchange them. A message here can be thought of as a pre- cisely defined asynchronous request that carries valuable information and helps coordinate various components in a distributed system.

From an application developer point of view, this is a much more generic and flexible system, since the message can be constructed in whichever way the developer wishes. Essentially the developer is tasked with coming with a communication strategy for nodes in message oriented systems, whereas in GoT, communication is hidden from the developer in the middleware layer.

7 2.3 Publish-Subscribe

One could argue that Publish-Subscribe is a type of message-oriented system. However, it is listed here as a separate category due to the signifiance and common occurance of this sort of system. In the Publish-Subscribe category, the producers and consumers of messages are anonymous from one another, and communicate through topic-based or content-based filtering. SIENA [17] and JMS support this style of interoperability. The style can be charac- terized as event-based because components in Publish-Subscribe systems react to particular types of notifications. This type of service can provide properties such as asynchrony, het- erogeneity, and security. Moreover, the loose coupling enables a client to post messages even if the server is not running.

In contrast to GoT, systems of this sort provide expressiveness in the selection of target nodes. Also, in GoT the nodes have to explicitly specify the server address to establish a connection, whereas, in Publish-Subscribe systems components are decoupled.

2.4 Tuple Spaces

In the tuple space category, the Linda platform [16] introduced tuple-spaces as a shared memory to which clients can write data into and read data from, much like a relational . However, unlike relational the object query API provided is of a more primitive kind in contrast to SQL. This requires a set of tuple servers to host the tuple space. JavaSpaces is another example in this category of distributed systems [20].

The GoT platform makes use of Directed Acyclic Graph in both the clients and servers as its core data structure. Moreover, in Tuple Spaces, the clients are kept simple by purpose, and this can be both an advantage and a disadvantage. The disadvantage is that data

8 synchronization requires manual read and write queries to the tuple space, and this can lead to mixing up of concerns of data modification and transmission. Additionally, manual queries lack the convenience provided by automatic object tracking. The advantage is that the client is not burdened by framework overhead, making it suitable for resource deficient devices.

9 Chapter 3

The GoT Model

Parts of this chapter have been included from [13] with the author’s permission

The GoT model was proposed as an approach to solve the object state synchronization problem among components of a distributed system. The model drew inspiration from Git [24], a widely used distributed version control system, that uses a Directed Acyclic Graph called the Version graph and various operations on it. The Version Graph operations include diff, commit, checkout, fetch, push and merge. The GoT model was made practical in the Spacetime framework, written in Python.

One of the key observations in the design of GoT was that state synchronization among distributed components is a common theme among all types of distributed architectures such as client-server, peer-to-peer, map-reduce and so on. GoT is designed for distributed applications such as multiplayer gaming and multi-agent simulations.

10 3.1 GoT, Git but for Objects

The core idea behind GoT is that the state synchronization problem can be seen from a distributed version control problem point of view. Multiple components can make changes to their local working copies. These local changes can be committed into dataframes (equivalent of Git repositories), and then the dataframes can send or receive changes via fetch and push. If the changes are in conflict, three-way conflict resolution routines can be executed to resolve the conflicts. In essence, GoT is the formalization of an object-oriented programming model based on causal consistency with application-level conflict resolution strategies whose elements are taken from decentralized version control systems.

The GoT model doesn’t replicate all the functionalities of Git; for example, Git-like branches are not supported, neither is the ”undo” capabilities. The functions supported by the dataframe are summarized in Table 3.1

3.2 GoT Example: Simplified Space Race

In this section, we simplify and present relevant excerpts of an example Space Race game from the original GoT paper.

Dataframes

The key abstraction used in GoT is the dataframe, which is can be compared to the repository concept in Git. In Git, files are tracked, whereas in GoT, objects in memory are tracked. Changes are made to the working data of the dataframe and GoT tracks changes as they are made. These changes can then be committed into the revision history and then transferred to remote dataframes. The flow of data is depicted in Figure 3.1.

11 Table 3.1: API Table for a Dataframe

Dataframe API Equivalent Git API Purpose read {one, all} N/A Read objects from local snap- shot. add {one, many} git add Add new objects to local snap- shot. delete {one, all} git rm <files > Delete objects from local snap- shot. git add Objects are locally modified which is tracked by the local snapshot. commit git commit Write staged changes in local snapshot to local version his- tory. checkout git checkout Update local snapshot to the local version history HEAD. push git push Write changes in local version history to a remote version his- tory. fetch git fetch && git merge Get changes from remote ver- sion history to local version history. pull git pull fetch and then checkout.

Figure 3.1: Structure of a GoT Node. Arrows denote the direction of data flow.

12 3.2.1 Data Model

At this point, we will see how a typical data model in Spacetime looks like, in the Python language. The example is in Listing 3.2.

1 @pcc set 2 class Player ( object ): 3 oid = primarykey( int ) 4 p l a y e r id = dimension( str ) 5 ready = dimension( bool ) 6 winner = dimension( bool ) 7 8 def i n i t ( s e l f ) : 9 self.oid = random.randint(0, sys.maxsize) 10 ... other initializations, including non−shared fields 11 s e l f . world = World ( ) # Example non−shared field 12 13 def act ( s e l f ) : 14 ... do something smart with ship ... Listing 3.1: A typical node in Spacetime

Line 1 introduces the pcc set decorator, which marks the subsequent class objects created to be tracked through the GoT tracker.

Line 3 uses the primarykey method to denote the name for the attribute which can be used to uniquely identify a particular object of that class.

Line 4-6 define attributes that are to be tracked and synchronized.

Line 11, in the initialization function we see an additional attribute which is not tracked or synchronized by the GoT tracker.

13 3.2.2 Server Node: Physics Simulator

The data model defined in the previous section is used in various GoT nodes to build the application logic. In this case, we briefly outline a Physics Simulator node, that acts as the authoritative component of all the nodes and also enforces game rules. The simulator node is in Listing 3.2.

1 class Game( object ): 2 #Declarations 3 4 def i n i t ( s e l f , df ) : 5 self.dataframe=df #store the dataframe 6 self.world= self.setup world ( ) #setup the game. 7 self.dataframe.commit() #Push asteroids into the version graph. 8 9 def play(self ): 10 game over = False; 11 x = x gen ( ) 12 while not game over : 13 s t a r t t = time.perf c o u n t e r ( ) 14 self.dataframe.checkout() 15 16 for p in self .dataframe. read all(Player): 17 i f p . oid not in self .current p l a y e r s : #new players 18 s e l f . c u r r e n t players[p.oid] =p; 19 p . ready=True 20 21 s e l f . move asteroids() 22 s e l f . move ships ( ) 23 s e l f . d e t e c t collisions() 24 25 self.dataframe.commit() 26 27 e l a p s e d t = time.perf counter() start t 28 time . s l e e p ( Game.DELTA TIME elapsed t ) 29 30 def s r physics( dataframe ) : 31 game=Game( dataframe ) 32 my print ( ”READY FOR NEW GAME”) 33 while True : 34 game . play ( ) 35 my print ( ”GAME OVER” ) 36 time . s l e e p (WAIT FOR START) 37

14 38 def main(port ): 39 node = GotNode ( s r physics , server port = port , dataframe=”spacerace.got” , 40 Types= [ Player , Ship , Asteroid ] ) 41 node . s t a r t ( ) Listing 3.2: The Physics Simulator node

Line 39 demonstrates creating the node where the entry function, port, dataframe name, and list of types are passed.

Line 30 defines the entry function that takes in the dataframe is passed as an argument. This function initiates the game loop.

Line 14, in the game loop, we see the server checking out data from the version history. And from the dataframe, all Player objects are read, and the dictionary of current players is updated.

From line 21, we call some functions that update fields under version control, after which the changes are committed into the version graph.

3.2.3 Client Node: Player

One of the types of client nodes is of type Player. During initialization, this node can form a connection with a remote port, defined either as through initialization parameters or through a got URL such as got://somehost.edu[:port]/spacerace.got. . The player client node is in Listing 7.4

1 SYNC TIME = 0 . 3 #s e c s 2 def b o t driver(dataframe , player c l a s s ) : 3 dataframe.pull() 4 my player = player class(dataframe) 5 dataframe.add one(Player, my player ) 6 dataframe.commit();dataframe.push() 7 8 my player. init w o r l d ( )

15 9 while True : 10 s t a r t t = time.perf c o u n t e r ( ) 11 12 dataframe . p u l l ( ) 13 survived = my player.act() 14 dataframe.commit(); 15 dataframe . push ( ) 16 . . . player l o g i c . . . 17 e l a p s e d t = time.perf counter() start t 18 s l e e p t = SYNC TIME elapsed t 19 i f s l e e p t > 0 : 20 time . s l e e p ( s l e e p t ) 21 22 def main ( ) : 23 args = . . . # parse command line args 24 player . s t a r t ( g e t class(args.player)) Listing 3.3: Player Driver

In line 2, in the client handler function bot driver the dataframe is received.

Line 3 pulls data from the server into its local version graph.

Line 4-5, we create a new player representing the node and add it to the dataframe

Line 6, commit and push operations ensure the changes are sent to the physics node.

Line 9-20 define the game loop where loop, act, commit, push actions are performed in order.

3.2.4 Conflict Detection and Resolution

Conflicts are detected whenever the receiving update starts with a version different from the local version graph’s HEAD (latest local update). GoT allows the developer to define a three-way merge between the original snapshot (common ancestor), local snapshot and remote snapshot.

For instance, the physics node has the a function that allows a three-way merge as illustrated

16 in Listing 3.4:

1 def c o n f l i c t resolution(conflict iter , original s n a p , my snap , t h e i r s n a p ) : 2 for original , yours, theirs in c o n f l i c t i t e r : 3 if isinstance (yours, Ship): 4 i f abs (theirs.velocity) <= World .MAX SPEED: 5 mine.velocity=theirs.velocity 6 my snap. resolve w i t h ( mine ) 7 else : 8 #if it is an asteroid 9 my snap. resolve w i t h ( mine ) 10 return my snap Listing 3.4: Conflict Resolution

Line 4 chooses the incoming Ship velocity over local velocity, as long as the incoming value is less than or equal to World.MAXSPEED.

3.3 Extending Spacetime into a multi-language, multi-

platform system

The separation of concerns established by the working data and Version History components is of value in extending Spacetime into a multi-language and multi-platform distributed system. The Version Graph or the History of changes can be thought of as the core of the application, and can be written in a language such as C++ with portability in mind. This core can then be accessed in various platforms through language bindings. And the language bindings can interact with language-specific working data handlers. In this thesis, we target two platforms: the browser and a linux backend to demonstrate the feasibility of this architecture in detail.

17 Chapter 4

Designing an Interface Definition Language for GoT

In this section, we discuss the problem of designing an Interface Definition Language for distributed systems in generic terms and then in that context specify a language for GoT. In particular, we will describe the reason for creating interface definition languages, their history in brief, application in the space of distributed systems, and how past attempts have fared in the world at large.

4.1 The Purpose Behind Creating Interface Definition

Languages

The original goal for creating interface definition languages was to hide the complexity of distributed systems from the applications programmer [19]. The complexity arises in distributed systems due to the heterogeneity of:

18 1. Application development languages

2. Operating Systems

3. Communication and network protocols

This sort of heterogeneity forces the developer to consider an assortment of interoperability issues. These additional set of considerations could be a large burden to the application developer or be at the root of ”hard to find and debug” bugs.

The way to minimize the burden of dealing with interoperability issues is to create an inter- face definition language (IDL) as an abstraction that hides the aforementioned heterogeneity.

An IDL defines the following aspects for programming objects:

1. Functional responsibility of particular classes of objects

2. Method arguments

3. Direction of parameters (i.e server to client or client to server or both)

4. User data structures

5. Method return types

IDLs do not deal with the code or functionality for the specifications. That is, IDLs specify merely the what of objects, not the how.

4.2 A Brief History of Interface Definition Languages

19 Table 4.1: History of IDLs in the context of their respective environments

Year of in- RPC/Distributed troduction Description Objects System (approxi- mate)

Introduced Network Interface Definition Language Network Computing (NIDL). The NIDL compiler generated C source 1988 Architecture (NCA) code, letting the developer extend it as per the re- quirements.

Included an IDL based code generator known as RPCGEN. The IDL was called Remote Procedure Call Language (RPCL) and the compiler generated Open Network Com- C source code.Unlike NIDL, RPCGEN encourages 1990 puting (ONC) developers to build their own protocol from scratch every time. The data transfer happen through a uni- form, platform independent data format commonly known as XDR.

Distributed Com- Combined RPCGEN’s compiler with NIDL’s UUID puting Environment 1993 based object identification. Moreover, DCE sup- (DCE) ported XDR for a platform-independent data format.

Common Object Re- Object Management Group (OMG) defined a C++- quest Broker Archi- 1997 like interface definition language. tecture (CORBA)

Distributed Compo- Microsoft introduced Microsoft Interface Definition nent Object Model 1997 Language (MIDL), as part of DCOM. (DCOM)

20 4.3 Observations on Earlier Systems

Despite various attempts at establishing a standards for distributed computing, the afore- mentioned systems have mostly fell out of large-scale adaption due to various factors [23]:

1. Sun ONC, Apollo NCS, and DCE, were restricted C and unix, and hence were not really workable for heterogeneous systems

2. Microsoft’s DCOM was mostly restricted to the Windows environment and the ports to Unix didn’t gain traction

3. During the explosive growth of Web, on the basis of Java, HTTP and Enterprise JavaBeans (EJB), CORBA didn’t provide sufficient support for the Web

4. Complexity of the CORBA API

4.3.1 Case Study: Technical Issues with the CORBA API

Of all the attempts listed earlier, CORBA had the highest traction and also the most com- prehensive set of features. However, the following limitations were observed [23]

1. Complexity

(a)” CORBA’s object adapter requires more than 200 lines of interface definitions, even though the same functionality can be provided in 30 lines”

(b) CORBA’s Interoperable object references (IORs) as an architectural decision cre- ated many unpleasant developer experiences. IORs force the use of a naming service because the clients cannot create object references without the help of an external service. Also, they require remote calls to compare object identity reliably, the overhead of which is prohibitive for many applications.

21 2. Lack of versioning

(a) Making gradual upgrades to programming objects in a backward compatible way was not possible in CORBA which forced all parts of the deployed application to be replaced at once – typically infeasible to fulfill in real-time, practical systems.

Architecturally, the CORBA model fails to fully separate object tracking and object trans- mission. This is something GoT addresses through the concept of dataframe. In GoT, the object’s change tracking is done through PCC set libraries, whereas the transmission is done through dataframe operations.

4.4 GoT IDL and Code Generator

4.4.1 Overview

The GoT IDL is an interface definition language that can be used to describe PCC sets. In the original reference implementation of GoT, the Spacetime framework, PCC sets [13] expressed through a decorator specify what objects are to be tracked for changes.

Through rtypegen, GoT IDL code can generate PCC sets and required code for various languages. The syntax for GoT IDL is derived from a variation of the original Python based PCC-set class definitions.

4.4.2 Example Usage

Suppose we wanted to use GoT for tracking a student’s grade. Then one would write a PCC set definition using GoT IDL, in a file called input.pcc as shown in Listing 4.1.

22 1 class grade: 2 primary int student i d 3 i n t p o i n t s 4 bool passed 5 merge func handle c o n f l i c t s Listing 4.1: GoT IDL Example

Next, one could use the code generator rtypegen, to generate stubs for Python and Javascript:

python3 rtypegen.py −i input . pcc

The generated Python stub code would is shown in Listing 4.2:

1 from rtypes import primarykey, dimension , pcc s e t 2 3 @pcc set 4 class grade : 5 s t u d e n t id = primarykey( int ) 6 points = dimension( int ) 7 passed = dimension( bool ) 8 def h a n d l e conflicts(self ): pass Listing 4.2: Python generated class

The generated Typescript stub code is shown in Listing 4.3.

1 import{ dimension, primarykey, Dimension} from 'rtypes/attributes'; 2 import{ Datatype} from 'rtypes/utils/enums'; 3 import{ pccSet} from 'rtypes/types/pcc_set'; 4 5 @pccSet 6 export class grade{ 7 name: string; 8 9 _student_id: Dimension; 10 _points: Dimension; 11 _passed: Dimension; 12 13 student_id: any; 14 points: any; 15 passed: any 16

23 17 /* 18 Do not remove the following line 19 -- required for GoT object tracker functionality 20 */ 21 static dtype: Map = new Map(); 22 23 constructor() { 24 this._student_id= primarykey(Datatype.INTEGER) 25 this._points= dimension(Datatype.INTEGER); 26 this._passed= dimension(Datatype.BOOLEAN) 27 28 /* 29 Do not remove the following line 30 -- required for GoT object tracker functionality 31 */ 32 this.transformProps(); 33 } 34 handle_conflicts() { 35 36 } 37 } Listing 4.3: Generated Typescript class

Both the Python and Typescript generated classes can create Spacetime objects which later can communicate with each other through underlying dataframes across heterogeneous plat- forms. More about this is described in the future chapters.

Dimension Type Support

As of the writing of this thesis, the framework supports primitives types such as integers, booleans and strings for the dimensions. However, this support can be extended to support advanced data types such as lists, maps and so on.

24 Figure 4.1: High level overview of GoT IDL and code generator

4.4.3 Architecture

Figure 4.1 shows a high-level overview of how the IDL, rtypegen and the core dataframes work in unison to produce an interoperable heterogenous system.

4.4.4 Implementation

The rtypegen parser and code generator details are represented in Figure 4.2. The IDL syntax is defined through a PEG grammar, which is used to generate the RType Parser.

25 Figure 4.2: Details of IDL parser and code generator

This parser is coupled with a set of semantic actions corresponding to various languages. When fed a PCC set data model, the parser can generate output in the target language. To make the code functional references to support libraries are made within this code. The support libraries essentially perform the object tracking function, and also implement the local heap.

The PEG grammar for the parser is presented in Listing 4.4.

1 @@grammar : : rtype 2 3 start = file:file $ ; 4 5 f i l e = { pccset:classdef }+ ; 6 7 classdef = classname:classname ’:’ classbody:classbody ; 8 classname = ’class’ @:identifier ; 9

26 10 classbody = [primarydef:primarydef] 11 declarations:normaldefs mergefunc:mergefunc; 12 13 primarydef = ’primary’ @:statement ; 14 15 normaldefs = { statement } ∗ ; 16 statement = type:typedef name:identifier ; 17 18 mergefunc = [’merge’ ’func’ @:identifier ]; 19 20 typedef = | ’ int ’ | ’ bool ’ | ’ str ’ ; 21 22 i d e n t i f i e r = / [ a−zA−Z][ a−zA−Z0−9]∗/; Listing 4.4: PEG Grammar for the parser

27 Chapter 5

Generic Dataframe Communication Protocol in GoT

One of the key aspects for achieving interoperable functionality among heterogeneous com- ponents is a common communication protocol. For example, CORBA uses the General Inter- ORB Protocol [30] whereas DCOM relies on its remote protocol for communication [26]. For historical context, we will take a brief look at CORBA’s General Inter-ORB protocol before explaining the GoT approach.

5.1 Interoperability in CORBA

5.1.1 Object Request Broker (ORB)

The Object Request Broker (ORB) component facilitates the communication between clients and objects. It is the most important component in the CORBA architecture because almost every other component depends on it. The ORB hides from a client many aspects, such as:

28 • Object Location (different host, same host but different process, same process)

• Object implementation (language)

• Object Execution state (already loaded into memory or not)

• Object communication mechanism (TCP/IP, shared memory, etc)

Interoperable Object Reference (IOR)

To make an object request, the client uses an object reference [9]. When a CORBA object is created (usually through the naming service), an object reference is also simultaneously created which uniquely identifies the particular object across the system. These references are specified in standard formats and are called IORs. Once a client receives an IOR, it can create a proxy object using a generated stub from OMG IDL to operate on the object. Operations to the local object are forwarded to the server by adhering to common standards such as GIOP and CDR explained below.

5.1.2 Generic Inter-ORB Protocol (GIOP)

The Generic Inter ORB protocol specifies a standard transfer syntax and a set of message for- mats for communications between ORBs over any connection-oriented transport. Introduced in CORBA 2.0, the GIOP resolved many previous concerns about lack of interoperability in CORBA.

Internet Inter-ORB Protocol (IIOP)

The Internet Inter-ORB protocol specifies how GIOP is built over TCP/IP transports. That is, GIOP is the abstract protocol, whereas IIOP is a practical implementation on top of

29 TCP/IP.

A typical message contains the following parts:

• Message header (GIOP version, message type, size, byte order)

• Request header

• Request body

5.1.3 Common Data Representation (CDR)

The IIOP message request and body are governed by the Common Data Representation (CDR). Messages could be of two types: client request or server response. The client can send message types: Request, LocateRequest, CancelRequest, Fragment and MessageError. The server can send message types: Reply, LocateReply, CloseConnection, Fragment and MessageError.

A fragment of CDR in action is shown Listing 5.1

1 0x47 0x49 0x4f 0x50 −> GIOP, the key 2 0x01 0x00 −> GIOP version 3 0x00 −> Byte order (big endian) 4 0x00 −> Message type (Request message) 5 0x00 0x00 0x00 0x2c −> Message size (44) 6 0x00 0x00 0x00 0x00 −> Service context 7 0x00 0x00 0x00 0x01 −> Request ID 8 0x01 −> Response expected 9 0x00 0x00 0x00 0x24 −> Object key length in octets (36) 10 0xab 0xac 0xab 0x31 0x39 0x36 0x31 0x30 11 0x30 0x35 0x38 0x31 0x36 0x00 0x5f 0x52 12 0x6f 0x6f 0x74 0x50 0x4f 0x41 0x00 0x00 13 0xca 0xfe 0xba 0xbe 0x39 0x47 0xc8 0xf8 14 0x00 0x00 0x00 0x00 −> Object key defined by vendor 15 0x00 0x00 0x00 0x04 −> Operation name length (4 octets long) 16 0x61 0x64 0x64 0x00 −> Value of operation name (”add”)

30 17 0x20 −> Padding bytes to align next value Listing 5.1: An example of CDR in action

The IIOP and CDR definitions are necessarily complex to account for the complexities of the underlying systems. However, the example above demonstrates the highly structured form of the format which ensures the message can be understood regardless of the language or platform at either end of the communication.

5.2 GoT Communication Protocol

There are essentially two sorts of communication mechanisms by which clients can commu- nicate with server dataframes.

1. Fetch

2. Push

These methods are explained in this section in detail. Before moving on to that, we will briefly summarize the data format used for encoding information in the protocol.

5.2.1 Data Format

Table 5.1 lists a set of value types that can be communicated using the GoT protocol. Examples of the data format in action can be found in the following sections.

31 Table 5.1: GoT Protocol Data format

Field Field In- Field Description Name dex

AppName 0 The GoT application name Data 1 A section for specifying primary key, dimensions, and diff data Request- 2 Specify whether the request is of type Push or Pull Type StartVer- This is the version ID in the Version Graph from which diff is 3 sion calculated This is the version ID in the Version Graph till where the diff is EndVersion 4 calculated Specify whether the server should wait until the merge is com- Wait 5 pleted WaitTime- 6 Used with the previous field, specify in seconds how long to wait out Something like HTTP status codes, ex: 200 on successful ac- Status 7 knowledgement Specify what PCC set classes are used in this particular trans- Types 8 action

5.2.2 Fetch Cycle

The fetch cycle consists of a client requesting an update from the server, subsequently re- ceiving the data, and then sending out an acknowledgement. An overview is shown in in Fig 5.1 The fetch cycle involves 3 steps.

32 Figure 5.1: The fetch cycle sequence

33 Step 1: The client dataframe constructs the fetch request and sends it to the server. The client’s construct fetch request method generates a request of the sort illustrated in Listing 5.2:

1 { 2 ”0”: ”GotCounter”, 3 ”2”: 0 , 4 ”3”: ”04a17d60−2df8 −4c1b−94e0 −19c3949b6a60”, 5 ”5”: false, 6 ”6”: 0 . 0 , 7 ”8”: { 8 ”datamodel.Counter”: [ 9 ”datamodel.Counter” 10 ] 11 } 12 } Listing 5.2: Fetch request example

The generation process involves using the repository manager’s retrieve data method to traverse the version graph and acquire the appropriate tags. The data is later encoded into Concise Binary Object Representation (CBOR) [7], and the client’s send fetch request is used to send the request to the server.

Step 2: The server dataframe processes the request. First, the CBOR encoded data package is decoded. Next, the type of request is determined, and if it s a fetch request, then a retrieve data is executed to acquire the appropriate data from the version graph. Then the diff data is packaged into a format shown in Listing 5.3.

1 { 2 ”0”: ”GotCounter”, 3 ”1”: { 4 ”datamodel.Counter”: { 5 ”countDocument”: { 6 ”dims ” : { 7 ” count ” : { 8 ” type ” : 0 , 9 ” value ” : 0 10 } , 11 ”pkey ” : {

34 12 ” type ” : 1 , 13 ”value”:”countDocument” 14 } 15 } , 16 ” types ” : { 17 ”datamodel.Counter”:0 18 } 19 } 20 } 21 } , 22 ”3”: ”ROOT” , 23 ”4”: ”4 e4b331b −900b−41f9 −9c42 −08142b5efc3d”, 24 ”7”: 200 25 } Listing 5.3: Diff data before CBOR encoding

Step 3: The client receives the above data and then on the version graph does a receive data, and if the call is successful, an acknowledgement is sent back to the server. Finally, the server receives the acknowledgement of this update, and stores it in the repository.

5.2.3 Push Cycle

The Push cycle is simpler with only 2 steps as summarized in Figure 5.2.

35 Figure 5.2: The push cycle sequence

Step 1: First, The client sends the diff data to the server in CBOR encoded form. In the example shown in Listing 5.4, notice that Field 1 is populated with the diff data.

1 { 2 ”0”: ”GotCounter”, 3 ”1”: { 4 ”datamodel.Counter”: { 5 ”countDocument”: { 6 ”dims ” : { 7 ” count ” : { 8 ” type ” : 0 , 9 ” value ” : 5 10 } 11 } , 12 ” types ” : {

36 13 ”datamodel.Counter”:1 14 } 15 } 16 } 17 } , 18 ”2”: 1 , 19 ”3”: ”04 a17d60−2df8 −4c1b−94e0 −19c3949b6a60”, 20 ”4”: ”aa4ba1b8−b34b−4154−89fb −96b8f8daf6ea”, 21 ”5”: f a l s e 22 } Listing 5.4: An example push request

Step 2: The server decodes the data and then through Field 2 recognizes that it is a push request. Since Field 5 (Wait) is false, an acknowledgement is sent immediately. The push request is then handled by accessing Field 1 value consisting of the diff data and then the Version Graph does a receive data.

37 Chapter 6

Support for Spacetime Dataframes in the Browser

In this chapter, we attempt to build support for Spacetime dataframes in the browser. The solution should function despite the following constraints:

• Allow the definition of Spacetime data models in Typescript with object tracker support

• Minimize the amount of complex code that needs to be ported over to Typescript from the original

• Make the core version graph function as efficiently as possible

Given the aforementioned constraints, we choose the following solution outline:

• Implement the core Version Graph functionality in C++ such that it can simultane- ously target the WebAssembly runtime in the browser and the Linux server backend. This minimizes the amount of complex core code that needs to be ported and also delivers near-native speeds for the core in both the platforms.

38 Figure 6.1: Detailed Layered Spacetime Architecture for enabling heterogeneous functional- ity

• Use embind to provide an inner binding through which the core Version Graph func- tionality is exposed to the browser’s Javascript runtime

• Use Typescript to provide object tracking and other end user level GoT

• Use Websockify in the server side to work around network type incompatibilities be- tween the WebAssembly runtime and the Linux backend

39 6.1 Extended Spacetime Abstraction Layers Overview

Figure 6.1 shows an overview for the solution architecture. The block on the left represents the components working in the browser, whereas the block on the right represents compo- nents working in the Linux backend. Note that the server backend can handle multiple client dataframes, although for simplicity’s sake, the diagram doesn’t highlight this aspect.

In the web browser, the bottom-most layer consists of a WebAssembly dataframe built out of C++ source code. This component hosts the Spacetime Version Graph, which is essentially a Directed Acyclic Graph (DAG). This core data structure records the various GoT operations. The diagram mentions POSIX emulation, a topic which will be explained in detail in an upcoming section.

Above the core, an inner binding component based on Embind [1] is used to expose certain GoT operations to the Javascript part. This inner binding uses the Repository abstraction in the WebAssembly core to the Javascript runtime environment.

On top of that, there is a typescript based outer binding which adds convenience to the lower level APIs. For example, the high-level checkout operation consists of a lower-level retrieve data call and another receive data call on the local heap.

Above that, the outer bindings are abstracted over by the generated typescript code. This implements the object tracker, which essentially watches for changes in the object, and whenever a commit or checkout operation is executed, will do the necessary data transfer to and from the local heap.

At the topmost layer, we have the actual application logic for the Web Presence application. This is the only layer an application programmer has to be concerned with, where the actual computations are specified and the GoT operations are used to handle data synchornization.

40 The linux backend server dataframe mirrors various pieces described above earlier with a critical difference: there is an additional component Websockify, which is used to convert WebSocket requests into POSIX and vice versa.

6.2 WebAssembly (WASM)

Any approach used to obtain Spacetime functionality in the browser first of all has to address how the core Version Graph can be implemented. One of the options would be to re- implement this core in Javascript from scratch. However, the version graph is a complex component requiring high performance. Managing multiple language implementations of the same core is a burden, since any changes to the original would need to be mirrored in the re-writes. Hence, a decision was made to use the web-browser’s WebAssembly runtime as a target.

In its essence, WebAssembly is a portable low-level bytecode virtual machine [21]. Inspired by its precursor asm.js, WebAssembly aims to execute at native speed while at the same time providing a memory-safe, sandboxed execution environment.

The WebAssembly runtime provides numerous benefits such as:

1. Safe: WebAssembly runtime guarantees memory-safety, that is, the guarantee not to tamper with user data or system state, even for low-level code such as C++.

2. Fast: Low-level code can be optimized at compile time, and the executable can lever- age the full potential of the underlying hardware without overheads, such as garbage collection.

3. Portable: Web support implies a wide range of machine architectures, browsers, and operating systems. As of 2021, 92.69% of all browser users can execute code on the

41 WebAssembly runtime [10].

4. Compact: The binary encoded WebAssembly format is much more compact than equivalent Javascript code, even when minified and compressed.

6.3 Emscripten

The Spacetime dataframe binary for the WebAssembly runtime is built using a build tool called Emscripten [31], which is a compiler from LLVM (Low Level Virtual Machine) as- sembly to Javascript or WASM code. It is essentially a backend compiler to the LLVM. For example, for C++ there already exist frontend compilers that translate C++ to LLVM as- sembly. This assembly output can be translated into Javascript or WASM runtimes through using the Emscripten compiler.

The Emscripten compiler generates implicitly statically typed code and hence is more perfor- mant than vanilla Javascript. It also includes a loop reconstruction algorithm called Relooper which can create high-level loop structures in Javascript out of LLVM assembly code.

6.4 The Bindings

There are two sets of bindings used in Extended Spacetime, summarized in Figure 6.2

• Outer bindings: This category of bindings are implemented in the application devel- oper’s language. As a start, we have bindings in Javascript and Python, which will be discussed in detail in the next chapter. The purpose of the outer binding is to provide the object tracker, local heap and high level GoT APIs to the developer.

42 Figure 6.2: Extended Spacetime bindings: purpose and implementation technologies

• Inner Bindings: These bindings are glue between the outer binding and the Space- time core’s Repository APIs. Embind is used as this glue component in the browser, whereas Python C++ bindings are used in the backend. These will be explained next.

6.4.1 Backend Dataframe Inner Bindings (Python bindings)

The inner bindings are implemented as a Python Extension Module [3]. For example, the receive data API from the core is exposed as shown in Listing 6.1.

1 #include 2 3 static PyObject ∗ R e p o s i t o r y r e c e i v e d a t a ( 4 RepositoryObject ∗ self , PyObject ∗ argv [ ] , int argc ) { 5 i f ( argc != 5) { 6 PyErr BadArgument (); 7 return NULL; 8 } 9 std::string app name = p y s t r t o string(argv[0]); 10 11 std : : s t r i n g s t a r t v = p y s t r t o string(argv[1]); 12 13 std : : s t r i n g end v = p y s t r t o string(argv[2]); 14 15 // implement receive d a t a l o g i c 16

43 17 } 18 19 static PyMethodDef Repository methods [ ] = { 20 {” r e c e i v e data”, (PyCFunction) Repository r e c e i v e d a t a , 21 METH FASTCALL, ” r e c e i v e d a t a ( str −>std:: string appname , 22 str −>std:: string s t a r t v , str −>std:: string end v , 23 bytes−>json d i f f d a t a , bool f r o m e x t e r n a l ) −> bool succ ” } , 24 . . . 25 } ; Listing 6.1: Python Inner Bindings Excerpt

As shown above, Line 1 includes Python.h which allows access to Python’s C/C++ API. The argv PyObject array is used to accept inputs from the Python program and, pystr to string or similar convenience functions are used to extract data in appropriate formats.

6.4.2 WASM Dataframe Inner Bindings (Embind)

The inner bindings expose the lower-level functions of the core Spacetime Version graph. For example, the function retrieve data can be used to obtain data from the version graph, whereas, receive data can be used to put data into the version graph. Higher-level func- tions such as commit and checkout are implemented at the outer bindings on top of these primitives.

The functions are exposed to the higher level through an emscripten component called Em- bind.

The following example demonstrates how receive data is exposed to the higher level (Javascript) through Embind as shown in Listing 6.2:

1 #i n c l u d e 2 3 using namespace emscripten; 4 5 class JsRepository 6 {

44 7 public : 8 JsRepository() {} 9 10 bool r e c e i v e data(std:: string appname, 11 std : : s t r i n g s t a r t v, std::string end v , 12 std : : s t r i n g d i f f d a t a , bool f r o m e x t e r n a l = true ) { 13 // method code 14 } 15 16 // other methods 17 } // end c l a s s 18 19 EMSCRIPTEN BINDINGS( my module ) 20 { 21 c l a s s (”JsRepository”) 22 . constructor <>() 23 &JsRepository::receive d a t a ) ; 24 } Listing 6.2: Embind bindings for Javascript

The aforementioned piece of code allows the method receive data to be invoked through WASM module initializion as shown in Listing 6.3:

1 var instance= new Module. JsRepository (); 2 // intermediate code 3 instance.receive d a t a ( app name , s t a r t v , end v , d i f f d a t a ) Listing 6.3: Initializing the WASM Spacetime core

6.5 Backend Dataframe Core

The backend is built into a static library with clang++ and then is installed as a Python extension module through the following script:

1 CXX=/usr/bin/clang++ cmake −DCMAKE BUILD TYPE=Release 2 −S core / −B build py / 3 CXX=/usr/bin/clang++ cmake −−build build py / 4 cd python && sudo python3 setup.py install

45 6.6 Websockify and Network Protocol Compatibility

According to the Emscripten Networking documentation, Emscripten attempts to emulate POSIX socket connections to take place over the WebSocket protocol [4]. The networking library for Spacetime adheres strictly to the limited types of calls demonstrated in the Em- scripten sources [2]. While in the browser side, the emulation is seemless, in the backend, the server deals with standard POSIX socket APIs to be compatible with standard clients. Hence, a translation between Websockets and POSIX sockets is required. Emscripten doc- umentation recommends the use of Websockify, a tool that converts Websocket traffic into POSIX compatible traffic [11].

46 Chapter 7

Validation: Building an Online Presence Application

The extended Spacetime framework will be validated through an example application con- sisting of communicating Spacetime nodes running in the web browser and a Linux backend server. Both of these services will be built on top of generated object classes from the IDL specified in Chapter 4.

7.1 Application Requirements

We will be designing a simple Web Presence application on top of the extended Spacetime framework. The application is to fulfill the following requirements:

• Allow guests to join a Web Presence space with their choice of user names

• Show a list of all the active users in the Web Presence space in real-time

47 • Allow logged in users to exit or log out of the Web Presence space while updating displays of all the other logged in users reflecting this change

7.2 Generic Interface Design for the Web Presence Ap-

plication

The first step to take when writing an application in the GoT context is designing the data model. In this particular case, the goal is to track ”How many users are online?” and ”What are their usernames?” This hints at having two dimensions at least, the first one, an integer representing count and another, a string users, representing the list of names.

Using the Interface Definition Language described in Chapter 4, one can come up with an interface definition as follows:

class Presence: primary str pkey i n t count s t r u s e r s merge func handle c o n f l i c t s

The above file can be used to generate both a Python and Typescript interface definitions through the following command:

python3 rtypegen.py −i IPresence.pcc

This command generates two files IPresence.py and IPresence.ts.

We will take a deeper look at both the outputs in the subsequent sections.

48 7.2.1 Generated Python Class: IPresence.py

For python, the output, with initialization added afterwards is shown in Listing7.1:

1 from rtypes import primarykey, dimension, pcc s e t 2 3 @pcc set 4 class Presence: 5 pkey = primarykey(str) 6 count = dimension(int) 7 users = dimension(str) 8 9 def h a n d l e conflicts(self ): 10 pass 11 12 def i n i t ( s e l f ) : 13 # the following block is added manually post−g e n e r a t i o n 14 self.pkey=”Presence” 15 s e l f . count = 0 16 s e l f . u s e r s = ” [ ] ” Listing 7.1: Generated IPresence.py

Line 1 imports everything necessary to initialize the object tracker. The primarykey and dimension methods are used to demarcate the type of fields that need to be synchronized. The pcc set decorator attaches the object tracker components to the dimensions defined. The type Presence can now be directly used during the initialization of a dataframe.

7.2.2 Generated Typescript Class: IPresence.ts

The interface stub generator, rtypegen at the same time generates a typescript class too, which look as shown in Listing 7.2 (with initializers defined after generation):

1 import{ dimension, primarykey, Dimension} 2 from 'rtypes/attributes'; 3 import{ Datatype} from 'rtypes/utils/enums'; 4 import{ pccSet} from 'rtypes/types/pcc_set'; 5 6 @pccSet

49 7 export class Presence{ 8 9 _pkey: Dimension; 10 _count: Dimension; 11 _users: Dimension; 12 13 count: any; 14 pkey: any; 15 users: any; 16 17 /* ### Don't remove these ###*/ 18 static dtype: Map = new Map(); 19 get dtype() { return Presence.dtype;} 20 transformProps: any; 21 /* ###*/ 22 23 constructor() { 24 this._pkey= primarykey(Datatype.STRING) 25 this._count= dimension(Datatype.INTEGER); 26 this._users= dimension(Datatype.STRING) 27 this.name="datamodel.Presence"; 28 29 /* ### don't remove the following ###*/ 30 this.transformProps(); 31 32 // the following initializers were added 33 // added after the class was generated 34 this.pkey="Presence"; 35 this.count = 0; 36 this.users="[]"; 37 } 38 39 handle_conflicts() { 40 41 } 42 } Listing 7.2: Generated IPresence.ts

The generated stub for Typescript involves more machinery compared to the Python output, which is due to its runtime restrictions and experimental support for decorators [8]. Each IDL dimension is defined twice: first, with an underscore prefixed to its name while assigned

50 to its dimension type; and second, written literally while assigned to the actual values. The application programmer need only deal with the second notation, and can ignore the prefixed-with-underscore version of these attributes.

7.3 Implementing Application Logic Using the Gener-

ated Classes

The generated classes simplify the task of implementing the application requirements. From the server side, we will see how the application server is instantiated with the generated Python class. From the browser dataframe, we will use a bit of Javascript to showcase the implementation for logging in a user.

7.3.1 Client Logic in the Browser Using Typescript Binding

The outer binding in Typescript provides functions such as object tracking, dataframe ini- tialization, and GoT’s high-level APIs.

An example Listing 7.3 shows how a GoT repository is initialized, and then subsequently a series of pull, change, commit, and push operations are performed.

1 import{ Presence} from './presence'; 2 3 // Module refers to the WASM Spacetime core 4 var instance= new 5 Module.JsRepository(); 6 var presence_obj= new Presence(); 7 var df= new Spacetime.Dataframe( 8 "PresenceApp", 9 [Presence], 10 "127.0.0.1", 59159, instance) 11 var person= prompt("Please share your username:","Harry Potter");

51 12 var initPresence= setInterval(function() { 13 var dp= df.pull(); 14 presence_obj= df.read_one(presence_obj,"Presence") 15 if(dp == true){ 16 presence_obj.count += 1 17 var user_list= JSON.parse(presence_obj.users); 18 user_list.push(person); 19 presence_obj.users= JSON.stringify(user_list); 20 } 21 df.commit() 22 var dp= df.push() 23 if(dp == true){ 24 clearInterval(initPresence) 25 } 26 }, 30) 27 28 ... Listing 7.3: Client Logic in the browser using Typescript

Line 5 create a new instance of the WASM datafrane.

Line 6 uses the generated class to create a new Presence object

Line 7 initializes a Spacetime dataframe with parameters: application name, application types, server host, port and finally a Spacetime WASM core reference.

Line 12-26 sets up creates an interaction loop where the various GoT operations are per- formed on the presence object.

Line 24 signifies the push operation is successful and the execution moves on to line 27 and so on. The operations for the ”display update” and ”exit/log out” use cases are similar to the above use case.

52 7.3.2 Server Logic in the Backend Using Python Binding

The Python outer bindings implement object tracking, local heap and GOT APIs in Python. For the Web Presence server, the application logic is implemented in Listing ??.

1 import sys 2 import os 3 import time 4 5 from datamodel import Presence 6 from spacetime import Node 7 8 def host ( df ) : 9 p r e s e n t c o u n t = −1 10 presence = Presence() 11 df . add one(Presence , presence) 12 df . commit ( ) 13 while True : 14 df . checkout ( ) 15 presence = df . read one(Presence , ”Presence”) 16 incoming users = presence.users.split() 17 i f p r e s e n t count != presence.count: 18 p r e s e n t count = presence.count 19 print ( ”No . o f u s e r s o n l i n e = ” , present count , flush=True) 20 print ( ” Users o n l i n e : ” , incoming users ) 21 s t = time . p e r f c o u n t e r ( ) 22 i f df : 23 et = time . p e r f c o u n t e r ( ) 24 i f ( et − s t ) < 0 . 3 : 25 time . s l e e p ( et − s t ) 26 27 i f name == ” m a i n ”: 28 Node ( host , s e r v e r port=59160, is server=True, 29 Types=[Presence], pure python=False ). start() Listing 7.4: Presence Server

In Line 1, we import the generated model.

In Line 10-12, we initialize the server with the desired type. From this point on the clients can obtain this object through a fetch request, make changes to it and push updates back into the server.

53 In line 13, within a loop, We use the outer binding supplied GoT APIs to checkout data and display the latest user count.

Line 26 shows the initialization of the Node, and of interest here is the server port being used, which is different from the one specified in the Typescript version. The reason for this will be explained in the Websockify section next.

7.4 Websockify: Translate Between WebSockets and

POSIX Network Calls

As explained in the previous chapter, the network traffic between POSIX-based servers and Websockers-based browser clients requires a translation layer in-between. In the context of our example, Websockify listens at port 59159 as a translation layer. The browser client connects to port 59159, which Websockify forwards to port 59160, where the Python server listens for traffic. Similarly, the network traffic moves in the opposite direction.

54 7.5 Result

(a) Tab 1: Harry Potter goes Online (b) Tab 2: Shrijith Venkatramana goes Online

(c) Tab 3: Random Guest goes Online (d) Tab 3: Random Guest Logs Off

(e) Tab 2: Shrijith Venkatramana notices Count and Userlist updated

Figure 7.1: A sample interaction with the Web Presence Application

55 Chapter 8

Discussion: Comparison with Parse Platform

Parse Platform is described by its official website as an infrastructure to Build applications faster with object and file storage, user authentication, push notifications, dashboard and more out of the box. [5] The platform comes with SDKs and bindings for many languages, including Javascript and Python. It is usually listed under the mobile backend as a service category of applications. The most well known comparison in this category of applications is Firebase from Google [25].

Of contemporary systems capable of object tracking and synchronization, from an appli- cation programming point of view, both Parse Platform and the GoT model share similar goals at least in some respects. This chapter will relate the re-implementation of the Web Presence application in Parse Platform and subsequently attempt to delineate similarities and differences in these systems.

56 Figure 8.1: Parse Platform Architecture

57 8.1 Parse Platform Architecture

The Parse Platform architecture consists of primarily three layers. At the bottom, there is an object persistence layer implemented through MongoDB or the File System. On top of this layer there is the Parse Server which provides all the core functionalities such as object store APIs, queries, push notifications and so on. The aforementioned layers constitute the backend of the Parse Platform. These backend services can be accessed via client SDKs or REST APIs which form the top layer. This description is reflected pictorially in Figure 8.1.

8.2 Implementing the Web Presence Application in Parse

Platform

While GoT requires the application developer to think ahead and define the data model upfront, Parse Platform allows the developer to define and modify classes during runtime.

Listing 8.1 shows an re-implementation of the web presence application discussed in the previous chapter using the Parse Platform’s Typescript bindings.

1 // Initialize with app id, javascript key, master key 2 Parse.initialize("webpresence","webpresence","webpresence"); 3 // backend URL 4 Parse.serverURL= 'http://localhost:1337/parse'; 5 6 // obtain the class reference 7 const WebPresence= Parse.Object.extend("WebPresence"); 8 const query= new Parse.Query(WebPresence); 9 query.equalTo("appName","WebPresence"); 10 // check if the object exists in the backend already 11 const results= query.find(); 12 13 results.then((r) => 14 { 15 if(r.length == 0) 16 {

58 17 // initialize the object for the first time 18 wp= new WebPresence(); 19 20 wp.set("appName","WebPresence"); 21 wp.set("count", 1); 22 wp.set("users",[person]); 23 wp.save();// sync to the backend 24 .then((wp) => 25 { 26 console.log('New object created with objectId: ' 27 + wp.get("count")); 28 } 29 ); 30 } 31 else 32 { 33 // acquire reference to the existing object 34 wp=r[0]; 35 wp.set("count", parseInt(wp.get("count")) + 1); 36 var tmp= wp.get("users"); 37 tmp.push(person); 38 wp.set("users", tmp); 39 wp.save()// sync to backend 40 .then((wp) => 41 {} 42 ); 43 44 } Listing 8.1: WebPresence Implemented through Parse Platform Typescript Bindings

Line 1-4 establish the Websockets connection with the backend server.

Line 7 obtains a reference to the WebPresence class.

Line 8-11 does an object query to the backend to obtain all objects with the appName attribute set to WebPresence.

Line 15 decides what to do on the resolution of the query promise.

Line 17-28 is executed for the first client. This section basically populates the WebPresence

59 object for the first time, and then uses the save() method to synchronize this data to the backend.

Line 33-42 is executed in every other client than the first. This section essentially increments the count and then syncs the data to the backend.

8.3 Developer API Comparison for Extended Space-

time and Parse Platform

The GoT model clearly separates the working copy of data from the communicating dataframe. The GoT model borrows from Git, the widely used distributed version control system. The names commit, push, fetch and checkout come from Git. The Parse Platform doesn’t have a notion of the working copy and hence the names save and fetch are used to send and receive updates respectively. Table 8.1 shows a comparison of high level API between GoT and Parse Platform.

Table 8.1: Application Developer API comparison for Extended Spacetime and Parse Plat- form

Dimension Extended Spacetime Parse Platform

Initialize node()/dataframe() initialize()

Send updates commit() + push() save()

Receive Updates fetch() + merge() + checkout() fetch()

Conflict Resolution User defined merge function No support

60 8.4 Feature Comparison for Extended Spacetime and

Parse Platform

Extended Spacetime and Parse Platform are compared along multiple dimensions in Table 8.2. And comments on each dimension follows.

Table 8.2: Summary of comparison between Spacetime and Parse Platform

Extended Dimension Parse Platform Spacetime

Object tracking Yes Yes

Multiple Language Support Yes Yes

Object Persistence No Yes

Interface Definition Language Yes No

Runtime Modification to Classes No Yes

Conflict Resolution Yes No

Advanced Object Queries No Yes

User Management No Yes

Fine-grained Push Notifications No Yes

Dashboard to manage multiple apps No Yes

Class level permissions and ACLs No Yes

• Object tracking: Both Spacetime and Parse Platform provide convenient object tracking features in various languages and both are capable of sending only the diffs.

• Multiple Language Support: While both Spacetime and Parse Platform support multiple languages, Parse Platform due to its overall maturity has a wider and deeper range of support for languages and platforms.

• Object Persistence: Parse Platform provides persistence through MongoDB or File

61 System whereas Spacetime doesn’t support this dimension as of now.

• Interface Definition Language: Extended Spacetime comes with an Interface Def- inition Language and code generator that can quickly generate stubs for various lan- guages. Parse Platform lacks this, which can cause various classes to diverge, making greater demands on the developer over the long run.

• Run-time modification to classes: The Parse Platform has a dynamic nature in terms of its application classes and programmers can make up attributes as they go. GoT has a much more stricter static constraint and expects developers to define the datamodel upfront.

• Conflict Resolution: Spacetime, due to its DAG-based Version Graph data struc- ture in its core, supports conflict resolution whenever a divergence is detected. Parse Platform has no such mechanism.

• Advanced Object Queries: Parse Platform allows the users to construct object queries to retrieve various objects in a precise way. The GoT model allows this feature to be implemented although the present manifestation in Spacetime doesn’t support such operations as of now.

• User Management: Parse Platform supports user management, session creation and specific funtionality to users.

• Fine-grained Push Notifications: It is possible to target individual devices or target devices based on various filters to target for push notifications. As of now, Spacetime doesn’t provide a push notification application.

• Dashboard: One can manage multiple Parse apps through a web UI whereas GoT doesn’t provide such a feature.

62 • Class Level Permissions and ACLs: In the Parse Platform, Class level permis- sions and ACLs can be used to restrict access to objects based on pre-defined rules. Spacetime doesn’t support access control.

• Interactive Debugger: Spacetime supports an Interactive Distributed Debugger whereas Parse Server doesn’t provide any such option.

63 Chapter 9

Future Work and Conclusion

9.1 Future Work

There are many possibilities for further exploration, and a few of them are listed below.

9.1.1 Object Persistence

The GoT model uses the directed acyclic graph as a core data structure. As of now, once all the nodes have shut down, the data is lost. However, in many scenarios we need object persistence. For instance, the Parse Platform uses MongoDB and File System in its object persistence layer.

9.1.2 Advanced Object Queries

The GoT model, through its PCC set abstraction can in principle support advanced object queries. Implementing this would allow one to query objects through all kinds of object

64 selection criteria.

9.1.3 More Language Bindings and Runtimes

Right now, the extended Spacetime framework supports the browser and linux backends. There are other major platforms such as Android, iOS and so on which require framework support.

9.1.4 Class Level Permissions and ACLs

One of the key ideas found in the Parse Platform is user roles and related granular permissions management it enables. Right now, the extended Spacetime framework does not support any type of permissions management.

9.1.5 Advanced Data Types

Right now, the exteded Spacetime framework supports primitive data types. There is a large surface of potential improvements in the data type area. For example, a collections type that can be synced through GoT can enable a whole host of interesting user level applications.

9.2 Conclusion

In this thesis, I presented an extension to the original implementation of GoT spacetime that enables it to function across the gamut of languages and platforms. We validated the thesis through a demonstration of a Web Presence application in which browser-based Javascript Spacetime clients communicate with a Linux-based Python Spacetime server node.

65 The demonstration established the feasibility of GoT as a distributed programming model that can in practice successfully interoperate among heterogeneous languages and platforms. At the same time, the thesis is limited by the breadth of its coverage of platforms and lan- guages. For instance, both the languages discussed in this thesis are dynamic in nature, and challenges related to communication between static and dynamic languages have re- mained unexplored. Moreover, the feasibility of operating Spacetime in resource constrained environments such as mobile devices has not been established yet.

Finally, we ended with a discussion on how the Spacetime framework can be extended fur- ther, informed by a detailed comparison with the Parse Platform. My observation is that the possibilities for GoT’s growth are of two types. The first type of improvements involves extending the core of the model itself to support for example advanced data structures such as collections. The second type of improvements revolve around easing the effort required to build non-trivial application on the platform. An example of this second type of improve- ments would be establishing mechanisms for supporting application level concerns such as users, permissions and ACLs. In conclusion, the Spacetime framework has many avenues for further valuable extensions.

66 Bibliography

[1] Embind: bind C++ functions and classes to JavaScript. https://emscripten.org/ docs/porting/connecting_cpp_and_javascript/embind..

[2] Emscripten Tests - Echo Client. https://github.com/emscripten-core/emscripten.

[3] Extending Python with C or C++ — Python 3.9.1 documentation. https://docs. python.org/3/extending/extending.html.

[4] Networking — Emscripten 2.0.12 documentation. https://emscripten.org/docs/ porting/networking.html.

[5] Parse Platform Homepage. https://parseplatform.org/.

[6] RFC 5531: Open Network Computing (ONC) Remote Procedure Call (RPC) version 2 protocol. https://tools.ietf.org/html/rfc5531.

[7] RFC 7049: Concise Binary Object Representation (CBOR). https://www.hjp.at/ doc/rfc/rfc7049.html.

[8] TypeScript Decorators overview. https://www.typescriptlang.org/docs/handbook/ decorators.html.

[9] Under the hood: IORs, GIOP and IIOP. http://www.ibm.com/developerworks/ library/ws-underhood/index.htm.

[10] Web Assembly Usage/Support Statistics. https://caniuse.com/wasm.

[11] Websockify, a WebSocket to TCP proxy/bridge. https://github.com/novnc/ websockify.

[12] R. Achar. The global object tracker decentralized version control for replicated objects // by Rohan Achar. PhD thesis, Irvine, Calif, 2020. Ph.D. University of California, Irvine 2020.

[13] R. Achar and C. V. Lopes. Got: Git, but for Objects. arXiv e-prints, page arXiv:1904.06584, Apr 2019.

[14] P. A. Bernstein. Middleware: a model for distributed system services. Communications of the ACM, 39(2):86–98, 1996.

67 [15] G. S. Blair, M. Paolucci, P. Grace, and N. Georgantas. Interoperability in complex distributed systems. In International School on Formal Methods for the Design of Com- puter, Communication and Software Systems, pages 1–26. Springer, 2011.

[16] N. Carriero and D. Gelernter. Linda in context. Communications of the ACM, 32(4):444–458, 1989.

[17] A. Carzaniga, D. S. Rosenblum, and A. L. Wolf. Achieving and expressiveness in an internet-scale event notification service. In Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing, pages 219–227, 2000.

[18] T. H. Dineen, P. J. Leach, N. Mishkin, J. N. Pato, and G. L. Wyant. The network computing architecture and system: An environment for developing distributed appli- cations. In COMPCON, pages 296–299, 1988.

[19] C. Exton, D. Watkins, and D. Thompson. Comparisons between corba idl com/d- com midl: interfaces for distributed computing. In Proceedings. Technology of Object- Oriented Languages and Systems, TOOLS 25 (Cat. No.97TB100239), page 15–32, Nov 1997.

[20] E. Freeman, S. Hupfer, and K. Arnold. JavaSpaces: principles, patterns, and practice. Addison-Wesley Professional, 1999.

[21] A. Haas, A. Rossberg, D. L. Schuff, B. L. Titzer, M. Holman, D. Gohman, L. Wagner, A. Zakai, and J. Bastien. Bringing the web up to speed with . page 16.

[22] M. Hapner, R. Burridge, R. Sharma, J. Fialli, and K. Stout. Java message service. Sun Microsystems Inc., Santa Clara, CA, 9, 2002.

[23] M. Henning. The rise and fall of corba: There’s a lot we can learn from corba’s mistakes. Queue, 4(5):28–34, Jun 2006.

[24] J. Loeliger and M. McCullough. Version Control with Git: Powerful tools and techniques for collaborative software development. ” O’Reilly Media, Inc.”, 2012.

[25] L. Moroney. The Firebase Realtime Database, page 51–71. Apress, 2017.

[26] O. Office. Ms-dcom: Overview.

[27] A. Redkar, K. Rabold, R. Costall, S. Boyd, and C. Walzer. Pro MSMQ: Microsoft Message Queue Programming. Apress, 2004.

[28] A. Schill. Dce—the osf distributed computing environment client/server model and beyond international dce workshop karlsruhe, germany, october 7–8, 1993 proceedings. In Conference proceedings DCE, page 24. Springer, 1993.

[29] R. Sessions. COM and DCOM: Microsoft’s vision for distributed objects. John Wiley & Sons, Inc., 1997.

68 [30] S. Vinoski. Corba: Integrating diverse applications within distributed heterogeneous environments. IEEE Communications magazine, 35(2):46–55, 1997.

[31] A. Zakai. Emscripten: an llvm-to-javascript compiler. In Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion - SPLASH ’11, page 301. ACM Press, 2011.

69