USING A PHYSICAL METAPHOR TO SCALE UP COMMUNICATION IN VIRTUAL WORLDS

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF COMPUTER SCIENCE AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Daniel Reiter Horn February 2011

© 2011 by Daniel Reiter Horn. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- 3.0 United States License. http://creativecommons.org/licenses/by/3.0/us/

This dissertation is online at: http://purl.stanford.edu/tg227ps1931

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Patrick Hanrahan, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Philip Levis

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Mendel Rosenblum

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii iv Preface

This work was conducted at the Stanford Computer Science department from 2003 to 2011. Many of the key theoretical insights are presented in the July-September 2009 issue of IEEE Pervasive computing. The development and systems building was done in collaboration with the Levis networking group. Virtual worlds today fail to provide seamless, online, shared environments. Re- strictive communication primitives, usually based on broadcast, are partly to blame for scalability problems that limit world size and the scope of the experience. Unicast would provide a more natural interface for most application-level traffic. The geo- metric nature of virtual worlds, however, makes unicast more difficult to scale than the broadcast primitives of existing systems. This dissertation argues that application-level messaging in virtual worlds must have five properties to enable scalability while avoiding the undesirable limitations of existing systems: recipient selection, minimum quality of service, graceful degrada- tion, fine-grained multiplexing and high utilization. To address these issues, the Sirikata system architecture, a new back-end system, was developed that achieves these five properties. Sirikata’s key insight is to leverage the geometric nature of virtual worlds by applying a physical metaphor to communication. Object communication follows an inverse square law, behaving similarly to point-source radio transmitters and receivers. The theoretical scalability results are proven, and some valid approximations are investigated. Then an implementation of a message forwarder that supports a large number of objects and prioritizes traffic using such an inverse square falloff is introduced. Evaluations of Sirikata show that it satisfies the stated requirements, performs better

v than current virtual worlds, and can closely follow the real-world radio communication analogy. Finally, a range of sample application demonstrates the effectiveness of this approach. Each sample application is coded in the world and studied when the system is loaded.

vi Acknowledgments

I am extremely grateful to my advisor Pat Hanrahan who has allowed me to explore a wide range of areas in my time at Stanford including parallel systems, graphics hard- ware, computing theory, and light fields and finally, the topic of my thesis: virtual worlds. Pat is a veritable modern-day renaissance man and his varied yet fascinating interests have helped widen my perspective and fuel my growing voracity for knowl- edge. Pat has also made it a priority to nurture my disparate interests and has been the key to sustaining me as a graduate student by writing numerous grants, connect- ing me with internships and helping me win many semesters of fellowship awards. I owe my deepest gratitude to Phil Levis who guided my discovery of networked systems and how they relate to virtual worlds and graphics. It was so gratifying that Phil helped me keep focused on a single directed goal that could evolve to something scientifically important and with depth. Phil is one of those rare people with whom you can sit down and talk about any issue, and during that time together, shut out all the outside world and focus on exactly that topic. This has been extremely helpful when pursuing the narrow research direction that evolved into a dissertation. Phil’s enthusiasm for the area of virtual worlds has also been a constant source of driving energy for me in this area, and I’ve been glad to interact with someone who is as fired up about it as I am. I would like to thank both Mendel Rosenblum and Nick McKeown for their quick wit and challenging rebuttals and questions during my defense as well as their com- ments and discussions during the development and refinement of my dissertation. I would next like to thank Ge Wang for his probing questions when I first intro- duced my thesis topic, helping me climb the ladder towards finishing my thesis, and

vii immortalizing the colossal day of my defense by helping through the storm of the closed door session. Up to that point, my defense date was a fleeting beacon of hope, but I was able to forge ahead and eventually observe the light from the void. Rays of sun still ring in my mind’s eye as the council ruled on the twilight of my PhD and the passing of my defense, I have Ge to thank for making it cheerful. And I must drop a Smule reference here if possible: I wish Smule many successful apps and expansions. I would also like to thank the organizations that helped to fund my education. The Department of Energy supported my investigations into Hidden Markov Models and scientific computation on the GPU. AMD/ATI provided me many quarters of funding to investigate applications of raytracing and virtual worlds on modern GPU hardware. I would also like to thank the Siebel Foundation for propelling me through my final year and connecting me to dozens of fascinating entrepreneurial scholars. Tremendous thank yous go out to all of my virtual world comrades. From Ewen Cheslack-Postava, who keeps the virtual world team from fraying at the edges by closing every single bug to Behram Mistree who’s always eager to plunge into a new problem at 10pm and just get it done, and Tahir Azim and Bhupesh Chandra and Lilly Bao: you all make the platform worth building. Mike Freedman and Jeff Terrace: it’s been so fun to collaborate with you, and it allows Sirikata to blossom beyond Stanford, giving us perspective and drive. Likewise, my KataLabs friends Henrik Bennetsen, Dan Miller and Jeffrey Schnapp remind me every day why virtual worlds research matters and how the ideas we develop are going to propel the future of 3D. Pat’s group has been a constant source of support, collaboration and fantastic ideas. It’s been amazing to see Pat’s group grow over the years with all the friendly old and new faces. It was absolutely awesome to start in the Brook team with the formidable five: Ian Buck, Mike Houston, Jeremy Sugerman, Tim Foley and Kayvon Fatahalian, have consistent fun with you gslackers: Augusto Rom´an,Bill Chen, Jeff Klingner, Dan and Mary Morris, turning me into a board game strategist, and watch myself be passed up by all of the awesome shining stars progressing on the PhD path. Jeff Klingner: you’ve been such a friend and unending support as we both undertook the final stretch. Matt Fisher: your DX wrapper and all of its micromanagement

viii helped me scan virtual worlds. Eldar and Montse: you spur me to chase the dream. The graphics lab has been tremendously supportive, and is a very exciting place for a graduate student to pursue a PhD with so many talks, collaborations and shared spaces. I would like to thank all the glab faculty and the students with whom I have shared ideas. Sid Chaudhuri: your math skills are amazing, and our work on scenes planted the seed of using a falloff in communication. Jerry Talton, Alex Mattos and Yi-Ting Yeh: it was fun and hard work building systems together. Brian Barsky, Pradeep Sen, Bill and Zhengyun Zhang: it was epic, and you propelled my horizon of graphics past polygons to lightfields. I would like to thank my fellow Elysia developers Andrew Carroll and Nghia Vuong, for helping me think outside the box and for igniting the dream that eventually the box will think outside of us. Paul Rhodes: you’ve helped fuel the dream, and seed my knowledge of neural computation and gave me reasons to build a virtual world. Throughout my PhD years, Cromem has made life not only bearable but plain old fun. Hai Nguyen always sets up those events keeping me sane and happy, from Hai steaks to ; Ilya Veygman reminds me why engineering is fun but not just limited to software; Ken Soong is the man behind the cooking club curtain making sure we all eat and have the mental nutrition to survive; Nghia always keeps life interesting; Noah King always has that subtle sentence that summarizes everything; Pi Chuan Chang always proves that there is happiness at the end of the PhD tunnel; and Andrew is the board game master, from Settlers to Caylus: it’s always been a fun way to socialize during the PhD stretch. The Vega Strike team was instrumental in my systems education and in building a neat game. From Jack Sampson who is the heart and soul of the universe, to Alan Shieh, my initial collaborator and Claudio Freire who keeps the engine’s heart beating, plus John Cordell, Jason and Petey: you are all inspirational. I thank the art team from Chris Platz to Strangelet, Oblivion and Howard Day who built Vega Strike and the occasional next day demo rush. Matt Gunn: it felt like we were classmates together again at Stanford with our frequent lunches and our arduous hikes and defeats of Half Dome. I admire your focus and deep economic insights, which all keep reminding me that there’s life beyond the

ix PhD. Kevin Olson with your love of the outdoors and wide range of startup knowledge, you and Greg Doolittle have been awesome during the trials and tribulations of my PhD. Derrick Chu, Yuan Ming Chiao, Maneesh Goel, Andrew Row, Brittany Burrows and Anthony Gilbertson: you’ve all been constant support and friendship. Patrick: as a brother and a friend, you’ve always been the most hard working, dedicated genius I know. Your coding skill is an inspiration to me when I sit down to make things happen, including this thesis. To my family, here, in Maryland, in LA and Germany: thank you for everything. I owe it all to my parents for doing such an amazing job of raising Patrick and me. My father Ron has given me endless good PhD guidance. My mother, Elke has kept me strong when I needed it the most. Finally and most importantly, I would like to thank my lovely wife, Debra for her unending support: I wouldn’t have made it this far without her. Her love and care has always been a guiding star throughout my PhD.

x Contents

Preface v

Acknowledgments vii

1 Introduction 1 1.1 What Is a Virtual World ...... 1 1.2 Virtual World Challenges ...... 3 1.3 Communication and Scalability ...... 4 1.4 A Case for Unicast ...... 5 1.5 Contributions ...... 7

2 Related Work 9 2.1 A Brief History of Virtual Worlds ...... 9 2.1.1 Text-Based Virtual Worlds ...... 9 2.1.2 MMORPGs ...... 10 2.1.3 Social Virtual Worlds ...... 12 2.2 Virtual World Timeline ...... 15 2.3 Relating Virtual Worlds to Distributed Object Systems ...... 17 2.4 Distributed Simulation ...... 19

3 Example Virtual Worlds and Applications 21 3.1 Example Applications ...... 22 3.1.1 HitPoint ...... 22 3.1.2 Recount ...... 22

xi 3.1.3 Airport ...... 22 3.1.4 Marketplace ...... 23 3.1.5 Gatherer ...... 23 3.1.6 Spider ...... 23 3.2 Application Analysis ...... 24 3.2.1 HitPoint ...... 24 3.2.2 Recount ...... 24 3.2.3 Marketplace ...... 25 3.2.4 Gatherer ...... 25 3.2.5 Spider ...... 25 3.2.6 Airport ...... 26 3.2.7 Results ...... 26 3.3 Requirements and Challenges ...... 27 3.3.1 TCP-like Streams ...... 28 3.3.2 Quality of Service Zones ...... 29

4 A Weight-Based Approach 31 4.1 Minimum Quality of Service ...... 31 4.2 Geometric Flow Weights ...... 32 4.3 Choosing the Falloff ...... 33 4.4 Evaluating the Chosen Falloff Function ...... 35 4.4.1 Experimental Setup ...... 35 4.4.2 Results ...... 37 4.5 Proving Seamless Unicast Scales ...... 39 4.5.1 Defining the Scalar Falloff Function ...... 39 4.5.2 Bounding Differential Bandwidth ...... 40 4.5.3 Implications ...... 43 4.6 Approximating the Falloff Function ...... 43 4.6.1 Challenges Requiring Approximation ...... 44 4.6.2 Regional Approximation ...... 44 4.6.3 Volumetric Approximation ...... 46

xii 4.6.4 Distance Approximation ...... 46 4.6.5 Theoretical Evaluation of Approximation ...... 46 4.6.6 Simulation of Approximation ...... 48

5 Architecture and Implementation 51 5.1 Sirikata Overview ...... 51 5.1.1 Space Server Responsibilities ...... 52 5.1.2 The Object Host ...... 57 5.1.3 The Content Distribution Network ...... 57 5.1.4 Putting it All Together ...... 58 5.2 Forwarder System Design ...... 58 5.2.1 Forwarder Structure ...... 63 5.2.2 Queueing Implementation Details ...... 64 5.2.3 Worst case guarantees on queueing system ...... 66 5.3 Example Execution ...... 72 5.4 Implementation ...... 74

6 Virtual World Applications in Sirikata 75 6.1 Applications Using Seamless Unicast ...... 76 6.1.1 Developer’s Perspective ...... 76 6.1.2 Underutilized Behavior ...... 76 6.1.3 Saturated Behavior ...... 77 6.2 Experimental Setup ...... 78 6.3 Hit Point Application Workload ...... 78 6.3.1 Application Behavior on Saturated Space Server ...... 79 6.3.2 Application Behavior on Unsaturated Space Server ...... 81 6.4 Recount Application Workload ...... 82 6.5 Airport Application Workload ...... 83 6.6 Marketplace, Gatherer and Spider Workloads ...... 85 6.7 End-to-End Evaluation ...... 86 6.8 Communication Rate Control ...... 88 6.9 Microbenchmarks ...... 90

xiii 6.10 Results for Seamless Unicast ...... 92

7 Discussion 94 7.1 Contributions ...... 94 7.1.1 Seamless Unicast ...... 94 7.1.2 Falloff Approximation ...... 95 7.1.3 System Design ...... 95 7.1.4 Evaluation ...... 96 7.2 Further Work ...... 96 7.3 Last Thoughts ...... 97

xiv List of Tables

4.1 Fairness cost for approximations of object position and volume on per- message fairness for object position and size distributions measured from 64 Second Life servers...... 47

6.1 Space server forwarding performance...... 91 6.2 Second Life message performance between two objects...... 91

xv List of Figures

2.1 Curve illustrate portions of Second Life that have delivered their con- stituent objects to the viewer. Other objects are outside a fixed radius. 20

4.1 Approximate bandwidths using seamless unicast – doubling volume 1 doubles bandwidth, doubling distance roughly reduces bandwidth to 4 of what it was...... 34 4.2 Experimental layout where a large grid of servers are all flooding the central server...... 36 4.3 Percent bandwidth reserved for neighboring servers...... 38 4.4 Visualization of the approximations tested with the seamless unicast falloff function...... 45 4.5 Approximations’ contributions to reduction of fairness in network re- source apportionment...... 48 4.6 Comparison of JFI between a region-based fair queueing system that treats all objects in a region the same (bottom, red) and the distance approximation(top, blue), where each object is prioritized individually. Objects are sorted by weight and JFI is computed for all objects at higher weight, to see where the fairness is lost...... 49 4.7 The blue dots represent individual throughput obtained by pairs of objects, sorted by their fairness on the x axis. The red line represents the ideal bandwidths those objects would have gotten without any approximation...... 50

xvi 5.1 The Sirikata architecture layout. Object Hosts running objects connect to a Space, comprised of space servers. The space is authoritative for object position and the communication fabric of the world...... 52

5.2 A virtual world environment in Sirikata with several scripted flying entities...... 53

5.3 Sirikata virtual world with shaders and a live in-world browser. . . . . 54

5.4 Viewing a Sirikata virtual world inside a browser that supports the WebGL specification...... 55

5.5 Sirikata virtual world sunset overlooking terrain and trees...... 56

5.6 Four ways to deploy Sirikata. In (a), a game company runs all com- ponents except for the clients. In (b) Sirikata is configured to appear like Second Life, where Object hosts live on the same CPU as their respective space servers, and an object migration policy is in place. In (c), one company runs the space and CDN for an open social virtual world, while third parties provide their own objects, and in (d), many CDN’s and spaces coexist and objects pull their contents from private webservers and connect to multiple spaces...... 59

5.7 Inter-object messaging. Logically, a message from A to B passes through

space α; in the system, the message passes from object host OHA to

space server α1, to space server α2, to OHB...... 60

5.8 Flowchart of the object message forwarding pipeline in a Sirikata space server. The bottom of the right side is the case when a space server receives message for an object that has moved away...... 60

5.9 Potential forwarding architectures...... 61

5.10 Fair queueing design. A message from an object in region SSB (running

on OH2) sent to an object in region SSC (running on OH3) passes through three queueing stages. Each queue is shown by a downward- facing triangle, and the message’s path by the dark line...... 64

xvii 5.11 When a new connection arrives, the low weight flow’s fair share rate will be lowered because it used more than its entire fair share. Its drop rate will increase to the new value as soon as the notification packet returns from the downstream server, but the mix of packets between

flows µ1 and µ2 will remain incorrect until the TCP queues drain and the new ratio of packets percolates to the front of the queue...... 68

5.12 In the region-based approximation, the object sending above its fair share rate gets more traffic than it should both before, and after addi- tional traffic is added into the system...... 69

5.13 This case is similar to Figure 5.11, but in this example, the unfairness during the queue draining transition period is equal to the unfairness garnered by the region-based object fairness approximation...... 70

5.14 In the region-based approximation, the object sending above its fair share rate gets more traffic than it should both before, and after addi- tional traffic is added into the system. In this case it performs in the same manner as Figure 5.13 during the transition period ...... 71

5.15 Space server internals. Dashed lines show network connections. . . . . 73

6.1 Packet traces from applying seamless unicast to the HitPoint appli- cation under loaded conditions. Each received update causes a sharp drop in error. When saturated, nearby objects (top) receive updates more frequently than distant objects (bottom)...... 79

6.2 Applying seamless unicast to the HitPoint application...... 80

6.3 When the system has low utilization, almost all updates are success- fully received. Both nearby and distant objects receive updates at about 30 Hz...... 81

6.4 In the recount scenario, nearby objects are more up-to-date than far- ther objects...... 82

xviii 6.5 Airplane communication in a saturated system with seamless unicast versus the same application in a system that has unweighted fair traffic prioritization. Note that the first sample point indicates an airplane on the same server as the control tower, resulting in similar overall error rates since these packets go through the fast path. Farther airplanes get similar error rates as the airplane within 50 meters but the relative error is much lower than it would be without seamless unicast. . . . . 84 6.6 Marketplace, Gather and Spider application workload, demonstrating that objects with a larger solid angle get a higher bandwidth of impor- tant data in a system flooded with 7 MB/s of irrelevant data. . . . . 85 6.7 4x4 Second Life object distribution map and Second Life screen shot captured from highlighted space server. a) Sample Second Life data for a 4x4 server grid. b) Screen shot from server in row 3 and column 2. 86 6.8 Average message latency in end-to-end experiment. Percentages show cache miss rates experienced...... 87 6.9 Flow throughput under two workloads...... 89

xix xx Chapter 1

Introduction

For nearly two centuries, electrical technologies have been used for the purposes of bringing people together, enabling people to send thoughts and communiques across the world.[44] The advent of the Internet allowed the mechanisms for telepresence to become more varied and rich. Email, VOIP and video conferencing software have become mainstream; however, these communication mechanisms still fall far short of physical presence and fail to provide a shared space for group interactions. Newsgroups and eventually social networks like Facebook, MySpace, and Orkut strive to provide a common space for community interactions. These networks provide a mechanism for a large group of people to share a co-located textual space, where they can inspect and manipulate data about each other in a graph of interconnected pages. While these mechanisms allow many simultaneous users to interact at once, they operate in the space of asynchronous textual communication, rather than mirroring the real-time three dimensional world in which we live and hence fall short of the ability to collaborate face to face in real time.

1.1 What Is a Virtual World

Science fiction authors have written for years about technological mechanisms for bringing distant people together into a three dimensional virtual space where those people can interact with a fidelity nearing or surpassing that of the physical world.

1 2 CHAPTER 1. INTRODUCTION

For instance, in Snow Crash[53], users experienced an expansive virtual metropolis, crammed with crowds, clubs, katanas and kouriers. Users can speed around the world, battle each other, teleport, fly and generally participate in something of a shared dream, where depiction and interaction of the impossible is routine.

However, current mechanisms for simulating a virtual environment, letting a mul- titude of users interact, bring with them serious disadvantages that have not allowed them to escape their niche user group. An analogy may be drawn between the prob- lems facing modern virtual world environments and the uptake of interactive graphical multimedia during the initial fledging of the world wide web.

In, the 1980’s there were a number of network service providers like Prodigy, Quantum Link/America Online, and Minitel which provided users with specialized graphical content and required users to subscribe to the particular service to see the content specific to that service. This policy for centralizing control of information is commonly referred to as the walled-garden approach. Tim Berners-Lee, credited with pioneering the web, helped to break free from the walled garden and define a standard mechanism for serving and browsing static formatted documents. The network effects from allowing any user with a browser to both create and view documents resulted in an explosion of content and eventually revolutionized network applications.

Initially composed of only static documents, the web evolved into an application platform backed by a heterogeneity of technologies. This evolution fundamentally changed the structure and design of the underlying Internet. Without this ecosystem, web applications technologies ranging from SSL and AJAX to compute clouds and rails might not have formed.

The unsupervised evolution of the web and the haphazard development of its underlying technologies have led to systems that are complex combinations of many disparate, yet overlapping components. Constituent parts can interact in unforeseen and dangerous ways. A clean and carefully planned design with sufficient foresight would undoubtedly have been simpler and safer. 1.2. VIRTUAL WORLD CHALLENGES 3

1.2 Virtual World Challenges

Virtual worlds are still in their infancy, like the web was almost two decades ago, yet they promise to provide a compelling medium for shared, networked environments where people can communicate, shop, socialize, collaborate, and learn. Can an im- mersive 3-dimensional online virtual world platform avoid some complexities facing the web by applying insight into how to build applications and services before they are subject to the short-term necessities of commercial development? Applications of virtual worlds are already gaining traction. Numerous multiplayer online games such as World of Warcraft, Everquest, Lineage and EVE Online demon- strate that virtual worlds are a lucrative and powerful platform for entertainment. An ever-growing list of blue chip companies looking to increase telepresence are deploying their own worlds, as evidenced by Intel and IBM’s research into virtual worlds using Open Sim and Sony’s Home. We are currently witnessing the beginning of the evolution of such systems. Un- fortunately this evolution is as ad hoc as the evolution of the web. Most systems are constructed totally independently, sharing little if any architectural aspects and offering no interoperability. Systems are extended with new capabilities in an ad hoc manner: user generated content is produced and accepted in custom formats, new world-specific programming languages are created for programmable behaviors, and proprietary protocols run each world. Systems today are closed, limited, or do not scale. The problem of designing open, programmable, scalable, secure, and extensible virtual worlds is still open. The Sirikata Project at Stanford University is focused on designing and imple- menting a domain specific architecture for the virtual worlds of the future, leveraging the virtual geometry in the world to scale the system. While Sirikata cannot compete with the content creation of commercial virtual worlds, it can, like the original world wide web at CERN, raise basic questions of system structure to open a wider range of possible virtual worlds. The futurist’s virtual world is a dizzying experience, where surrounding active objects engage a user in an immersive environment. Objects collaborate to create 4 CHAPTER 1. INTRODUCTION

complex and interesting behaviors, and this collaboration involves inter-object com- munication. For a dog to wag its tail when an rubs its stomach or a siren to scream as a monster approaches, the objects receive messages about the events and respond to them. Unlike messages such as movement directives, these interactions are application-level messages because they are opaque to the system, which simply routes them between objects. Today’s virtual worlds fall far short of this imagined potential. To scale to millions of users and objects, systems such as Second Life, EvE Online, and World of Warcraft partition their worlds into disjoint regions. Within these regions, they enforce harsh restrictions on the rate and range of object communication in order to support the size of their user bases.

1.3 Communication and Scalability

Although frequently couched as a component of these worlds’ narratives, these re- strictions are actually artifacts of the underlying design and the implementation of their application-level messaging systems. A different design has the potential to lift these restrictions and enable the virtual worlds of tomorrow to move closer to the science fiction of today. One core problem in current systems is that they are heavily tailored toward broadcast: sending a message delivers it to many receivers. Systems select these receivers through either geometric (all objects within n meters, as in Second Life) or organizational (all avatars in a “guild”, as in World of Warcraft) means. Receivers are responsible for filtering messages, such as by specifying to which channels to listen. Unicast, while available, is strongly restricted and therefore unusable for most applications. For instance, Second Life restricts objects to sending only one unicast message per second, and in Habbo Hotel, a social virtual world, avatars cannot send unicast messages to others outside of their “hotel”. Relying on broadcast, existing systems scale by limiting objects’ scopes of in- teraction. With geometric broadcast, only objects within a preset distance receive 1.4. A CASE FOR UNICAST 5

messages, while with organizational broadcast, receivers must be members of a guild or group. This approach is inherently inefficient, as in practice, any given object is interested in a small subset of the messages it hears. Organizational broadcast suffers from similar problems. Organizations (e.g., WoW guilds) are artificially limited to relatively small, tractable sizes, creating disconnected islands of communication. Finally, existing systems rely on static resource allocation. As we will show in Section 6.9, a completely solitary pair of nearby Second Life objects can only com- municate at a few hundred kilobits per second, even though the entire capacity of the server is available to them. Such an approach wastes resources because most servers remain idle. On the other hand, virtual worlds cannot simply borrow the techniques developed in more traditional research areas of distributed object systems (e.g., MPI processes or CORBA objects [50]) or distributed simulation (e.g., HLA [1, 19]). Communication patterns in these models are spread over all objects: in being more general, they cannot take advantage of the geometric locality preferences inherent to virtual worlds.

1.4 A Case for Unicast

Furthermore, such hard limits are undesirable for applications because they lose signif- icant context. Consider DistanceWorld, where messages are received only by objects within 100m of the sender. If an oil refinery in DistanceWorld explodes, it broadcasts a message indicating the event. All virtual homes within 100m will react to this mes- sage, causing their windows to shatter, but homes located even centimeters outside of the refinery’s radius will remain unaffected. Even worse, DistanceWorld’s fire station, located outside the radius does not react to this virtual calamity. Such behavior is typical in virtual worlds today. “Pulling,” for example, refers to the in-game strategy of slowly approaching the closest member of a group of monsters, until a player barely enters its detection range and the monster attacks alone. Thus, the challenge is to design a scalable communication fabric that unties the communication model from broadcast, avoids hard cutoffs like in the distance world, dynamically allocates communication resources and benefits from locality exposed by 6 CHAPTER 1. INTRODUCTION

the geometry of the underlying virtual world. Virtual worlds should adopt a different communication model, based on a geometric- based unicast primitive called seamless unicast. Seamless unicast allows arbitrary ob- jects to communicate, but controls communication rates in a continuous and graceful way, such that nearby objects can communicate with higher fidelity than more distant ones. Specifically, seamless unicast should address the problems of existing communi- cation systems. Since current systems are heavily broadcast-based and they do not allow programmable recipient selection, seamless unicast should address this in a flex- ible way. Since the Internet is built upon high performance unicast mechanisms, it makes sense to build the virtual world communication layer on top of unicast as well, and more complex facilities can be layered on top of this. Additionally, in virtual worlds, many objects get little or no service if they go beyond certain regions, they are not members of the same guild or faction, or they are sufficiently far apart. This makes programming of the objects difficult, and it should be addressed by allowing any two objects some measure of service so that they can coordinate with each other. Hard distance limits make it difficult to discover and coordinate between objects, so a next generation messaging service should be designed to allow more flexible and performant ranged communication. A communication service should avoid partitions in the network and should allow any two objects to communicate with one another as the system load allows. Objects themselves are the actors in a shared space, so pairs of communicating objects, not servers or rooms or regions, are the natural entity for load balancing and resource allocation. Finally, the communication system should make use of all system resources at hand. A high-bandwidth pipe should be apportioned between few object pairs at similar efficiency to how it is apportioned between many object pairs. This will allow unloaded portions of the system to attain greater performance than overloaded por- tions of the system, and the objects will be able to make use of the excess load. It also will also give incentives to grow the scale and performance of the world organically, 1.5. CONTRIBUTIONS 7

immediately absorbing new system resources into the fabric of the communication layer. To summarize, seamless unicast has five important requirements.

1. Recipient selection: A sender can decide the recipients of a message.

2. Graceful degradation: There are no sudden discontinuities in communication as distance increases.

3. Minimum quality of service: Each object pair can be guaranteed a mini- mum, non-zero throughput, regardless of other object pairs’ communication.

4. Low latency: The available capacity is shared between object pairs at a fine granularity, thereby minimizing latency.

5. High utilization: The system can achieve high utilization even if only a few object pairs are communicating.

Such a model avoids the undesirable artifacts of existing broadcast-based systems. We elaborate upon these properties more in Section 3.3. However, seamless unicast is not intended to be a complete replacement for broad- cast. Many operations, such as shouting, naturally benefit from a broadcast primitive.

1.5 Contributions

The key insight of this research is that applying a physical metaphor – namely, the inverse square law of light – to communication rates allows a virtual world system to satisfy the requirements of seamless unicast. In the physical world, with limited power of transmission, we are not flooded by unbounded levels of light or bombarded by the audio of distant speakers or crowds. Instead, because these point sources fall off by the inverse square of distance, our environment is relatively calm and information propagation scales. Building a virtual world communication system that enforces this falloff function brings about a design that achieves the above requirements. Enforcing the function 8 CHAPTER 1. INTRODUCTION

is challenging due to its dependence on distributed state, such as the positions of objects which are hosted on separate servers. Despite these challenges, the resulting messaging system does not limit the scope of object interaction, and it provides higher messaging rates than current approaches. Thus this dissertation’s contributions are as follows.

1. Seamless unicast: a mechanism that defines point to point behavior between virtual world objects using a falloff function

2. Falloff approximation: an approximation for the falloff function in a dis- tributed virtual world system

3. System design: an efficient mechanism to achieve the five properties of seam- less unicast in an real virtual world system

4. Evaluation: a demonstration that the virtual world satisfies the properties of seamless unicast and competes well with best known methods.

The remainder of this dissertation is organized as follows: Chapter 2 reviews relevant related work in virtual world systems; Chapter 3 gives deeper background and motivates the requirements enumerated above; Chapter 4 presents a geometric falloff function that meets the above requirement and proves that this falloff function meets the requirements for any size world; Chapter 5 presents a design and implementation of the message forwarding system; Chapter 6 provides an example of how a set of applications might be scripted to use our communication primitive and evaluates the proposed messaging system on these applications; and Chapter 7 concludes. Chapter 2

Related Work

2.1 A Brief History of Virtual Worlds

To see how the scaling virtual world communication fits into the greater space of exist- ing virtual worlds, this chapter presents a brief history, starting from early text-based systems for intercommunication, and moving towards the state of the art commer- cially available systems.

2.1.1 Text-Based Virtual Worlds

Text-based multi-user dungeons were introduced in 1978 when Roy Trubshaw and [11] at Essex University made a simple chat system called Multi User Dungeon, or MUD, with a series of interconnected spaces in which players could chat with each other. Less than a year later he factored out a definition of the world map and allowed players to add rooms, descriptions, and novel commands to the environment while it was running using a language known as MUDDL. This resulted in a collection of interesting, yet disparate areas that could be explored by players and facilitate player interactions. In 1980, when Essex University was connected to the ARPANET, MUD became the first Internet-based virtual world[61]. From this time, different styles of MUDs, ranging from themed fantasy to themed

9 10 CHAPTER 2. RELATED WORK

science fiction worlds blossomed, often challenging computational and network re- source limits of the respective systems that hosted them. The next major step forward in text based worlds was a decade later with the cre- ation of TinyMUD by [12], then TinyMOO and finally LambdaMOO, by [18]. The difference between these textual virtual world and previous ones was the cooperative social aspect of the worlds. No longer were players com- peting, but instead the world focused on social interaction and creation of interesting objects and puzzles for each other to solve. LambdaMOO went a step further and allowed a complete decentralization of content control and addition of security mea- sures. In previous MUDs users had either complete access to alter a region or no access to it. LambdaMOO allowed any user to fashion and script items and rooms in the world and had a complex security system to prevent any given user from writing a script that could impair other users. This was accomplished by exposing a sand- boxed scripting language with very fine grained access controls to all users. While the system was very successful, it was a single server architecture and could not scale beyond the number of connections and state updates that a machine of the time could handle. The solution to this was allowing a number of copies of the same world with differing user bases to run on separate computers.

2.1.2 MMORPGs

Modern massively multiplayer online role playing games (MMORPGs) have a large wealth of rich content, a vast number of subscribers, and sometimes exceeding flexi- bility. However these systems are not without serious design limitations as well. Most of these games use a single server for all persistent data, a server to simulate scripts and physics, and a server for delivering world content as it updates on a weekly or monthly basis. This design appears to prevent scaling altogether, but the games have made various game design choices that allow them to scale nevertheless. Addition- ally all of these games have been developed as walled gardens, where the content, formats, rules and optimizations are very specific to the particular applications and do not translate to each other. 2.1. A BRIEF HISTORY OF VIRTUAL WORLDS 11

Initial games did not present the user with a three dimensional world, but forced them to remain within a two dimensional realm. These realms were populated with hazards and puzzles, usually requiring collaboration between users. This content was designed by artists and content creators employed by their respective companies. Some of these earlier massively multiplayer games were Sierra’s , 3DO’s , Electronic Arts’ and eventually NC Soft’s Lineage. A major advancement in scaling came in 1995 from “The Realm Online”, widely credited for inventing a technique called instance dungeons. It later was used in “An- archy Online” and is a technique that is still common in many MMORPGs including Blizzard’s World of Warcraft[12, 43]. The idea is to place lucrative items within isolated caves, known as instances. Unlike previous versions of instances, The Realm Online allowed a team of players to visit a cave in isolation. This is preferable to both Sierra Online and the player, allowing Sierra to migrate connections from the party onto a separate dedicated server. Players prefer this because they can compute the probability of obtaining a lucrative item by selecting their party size. While these systems scaled from 200,000 to millions of subscribers, their services could only support hundreds or a few thousand users in any given copy of the world. Their solution to scaling without a world partitioning scheme has been to instantiate many concurrently running copies of the same world, such that no separate copy can affect on another copy. Each copy, known as a shard, is then forced below the ascertained server limit by implementing a login queue, and encouraging users to login to less crowded shards in the event of a long queue. This has become an industry standard for scaling in most subsequent Massive worlds. Next in the line of innovations was Sony’s Everquest in 1999, which introduced full 3D support, where characters’ avatars and hazards were rendered in 3D. While Everquest did not have all of the features of Ultima Online, it supported a similar mechanism of gameplay that included collaborative challenges. The most popular MMORPG was launched in 2004 by Blizzard Entertainment. World of Warcraft supports 11 million subscribers and almost a million concurrent users by combining standard shards and instancing [31].World of Warcraft splits its 11 million users across 772 shards [2, 8], each of which is a small cluster [7, 62]. 12 CHAPTER 2. RELATED WORK

Additionally, on each shard, World of Warcraft has four islands with slow mov- ing transit mechanisms between them. The islands Kalimdor, Eastern Kingdoms, Outland and Northrend are hosted on separate servers, and if one island’s server is unavailable, users may log onto other islands.[6] On each of the world’s 4 continents is a seamless virtual space run on a single server, but most core game content is in “instances”: partitioned regions that run on separate servers. Instances are limited to at most 80 players. When a replica is overloaded, users wait in login queues. The maximum number of users on a partition is in the thousands. Users can add functionality through plugins, which have an “add- on” messaging API. Messages are either broadcast to one of a small set of groups defined by the system (parties up to 5, raids of up to 40, guilds of up to 500), or use a “whisper” mode that allows unicast. However, a user can only whisper messages to others of the same “faction”, within the same instance, or that are members of the user’s friend list (which is limited to 100 friends). Not all commercial games, however, use instancing or shards to scale. CCP Games’ EVE Online has taken a different approach to scaling for its 30,000 concurrent users. CCP Games purchased an extremely expensive supercomputing cluster that keeps the entire world database on large ram disks and take the game offline for an hour once per day to perform backups. Additionally, unlike the previously mentioned worlds, EVE Online takes place in empty realms of space, where activity is highly centralized around planets and stars. Since the world is not actually seamless, the discrete star systems provides them with a natural partition mechanism to assign users to different servers within their cluster.

2.1.3 Social Virtual Worlds

Virtual worlds relying on centralized content

Unlike role playing worlds in which players collaborate to fight hazards or armies of actual human characters, Social virtual worlds have no set agenda and allow players to set their own goals and interact socially. These reflect older text based social MUDs like LambdaMOO, and sometimes try to mimic the flexibility of those worlds. 2.1. A BRIEF HISTORY OF VIRTUAL WORLDS 13

In parallel to the development in text-based systems in the early 90’s, LucasArts developed Habitat, a graphical virtual world in which users could concurrently inter- act with each other and solve scripted challenges for a subscription fee. Given the high cost of drawing graphics and centrally making cohesive adventures, the designers were faced with an ever expanding cost of making new content for their increasing user base. Eventually a simplified version of their world was able to support 15,000 user accounts. Habbo hotel is a modern example of a 100,000-concurrent user social 2D virtual world similar to that of Habitat. Content is centrally developed in both worlds, but Habbo hotel differs in that individual possessions in the world must be purchased from the central source in many micro-transactions. Habbo Hotel is one of the largest social virtual worlds existing where a company provides all content for users, and the users must purchase these items for a fee. These centralized models employ a fixed centralized content distribution system. CDN Systems like Coral Cache and Bittorrent are able to scale static data distribution, and these mechanisms can be applied to distributing the graphical world data among users in the world[26, 47]. Also users inhabit a number of independent areas that do not affect each other, and chat is primary mode of utilization.

Social virtual worlds with decentralized content creation

Only a few modern virtual worlds depart from the Habitat model of centralized con- tent development and take a more web-like model of encouraging all participating users to build portions of the virtual environment. The most popular 3D virtual world with user created content, having a peak num- ber of concurrent users reaching 50,000, is Linden Lab’s Second Life. Second life has a very small number of contiguous worlds, each without instances. In this virtual world, every user can create 3D objects using limited in game tools and can upload images from their own computers for a small fee. Users can choose to enable physics on a subset of the objects in the world, which tells the server to simulate collisions and reaction forces on those objects. In addition, users can imbue these objects with behaviors using a custom scripting language called LLscript. Code written in 14 CHAPTER 2. RELATED WORK

LLScript, a C-like language, may be bound to a given 3D object and may register event callbacks for various world events. Resource limits on LLScript programs are extremely harsh and no automated mechanism for code reuse or libraries is provided. However, the scripting system has limited means for external communication to the Internet through http requests and can thereby, with the use of external computa- tional resources, compute any desired function on the world events at hand.

To scale, Linden Labs decided to make a direct mapping from servers to 256m x 256m square plots of land. This guarantees them a minimum level of service for each plot of land, and makes the economics of charging actual currency per acre of land trivial to compute. However this also results in a user:server ratio of between 3:1 and 20:1 [45, 48], even though any given server that has any users is likely to be forced to cope with more than the total ratio. Second Life’s user:server ratio is more than two orders of magnitude less than most MMORPGs and indicates a severe limitation in their ability to scale in the long term. Part of the reason a uniform partition of the world performs poorly is that the popularity of any given region in a given world map is roughly proportional to the Zipfian distribution[14]. Thus many regions of the world are uniformly populated with architecture and objects but have are essentially devoid of users, and others devoid of interesting objects or users. These regions can continue to consume computing cycles simulating objects on them at similar rates to the most crowded regions[56].

Because of this architecture, clients can only see and interact with objects within a small distance. Most regions can hold only 40 avatars, which often requires large events to occur at the intersection of 4 regions [3]. Objects primarily communi- cate through short-range broadcast “say” messages with preset 10m, 20m, and 100m ranges. Objects may use “channels” to filter these messages. Longer-range unicast communication is possible, but strongly rate limited [5]. 2.2. VIRTUAL WORLD TIMELINE 15

2.2 Virtual World Timeline

1978 First MUD developed (MUD1)

1982 SIMNET DARPA project for battle simulations

1984 Islands of ; first commercial MMORPG from Compuserve

1985 Lucasfilm’s Habitat for Commodore 64

1988 Club Caribe; Habitat rebranding when partnered with QuantumLink

1989 TinyMUD by James Aspnes; a Social MUD

1990 LambdaMOO by Paul Curtis; Xerox PARC

1993 DARPA publishes Distributed Interactive Simulation standard 16 CHAPTER 2. RELATED WORK

1994 Fujitsu licenses Habitat as WorldsAway[49] 1994 VRML specification released 1994 CyberTown chat environment 1995 Active Worlds beta; First user-crafted graphical virtual world 1995 The Realm Online by Sierra Online 1995 Meridian 59 for 3DO

1997 Ultima Online

1998 HLA Simulation specification for updated battlefield simulations

1998 Lineage

1999 EverQuest 1999 Asheron’s Call 1999 Neopets 1999 Habbo Hotel

2001 Anarchy Online

2002 Sims Online

2002 Final Fantasy XI by Square Enix: first cross-platform MMORPG

2003 Entropia: first 3D World allowing users to inject and extract legal tender. 2003 EVE Online by CCP Games: first shardless MMORPG 2003 Toontown Online 2003 Second Life: 3D world with user generated scripting and content 2003 There.com 2003 Croquet 2.3. RELATING VIRTUAL WORLDS TO DISTRIBUTED OBJECT SYSTEMS17

2004 2004 IMVU (IMVU) 2004 World of Warcraft (Blizzard) 2004 Club Penguin (New Horizon)

2006 Dungeons and Dragons Online

2007 Burning Crusade expansion for World of Warcraft 2007 OpenSim; BSD Licensed, open source implementation of Second Life 2007 Lord of the Rings Online 2008 HiPiHi Beta 2008 Star Trek Online 2008 Wrath of the Lich King expansion for World of Warcraft 2009 WebGL standard to bring 3D to browsers announced

2010 Cataclysm expansion for World of Warcraft

2.3 Relating Virtual Worlds to Distributed Object Systems

In addition to directly building upon the work of virtual worlds, this dissertation draws from several research areas ranging from distributed object systems to dis- tributed simulation. At a very high level, virtual world applications are a distributed object system. Objects in the virtual world, such as apples and avatars are akin to actors in an actor model. Hewitt et al. formalized the actor model as a series of independent 18 CHAPTER 2. RELATED WORK

concurrent computations that may only interact by passing asynchronous messages to each other [33]. The actor model spawned several research programming languages such as Act1 and Cantor [10, 41]. Joe Armstrong developed Erlang at Ericsson to support distributed fault tolerant applications designed to be run indefinitely[9]. These applications share some similarities with live virtual worlds, but generally do not deal with object geometry or location in virtual space. Another analogy may be drawn between virtual world objects and CORBA ob- jects, passing messages to one another. CORBA, for instance, deploys an Object Request Broker (ORB) that, broadly, performs a similar task to Sirikata’s messag- ing layer. CORBA’s ORB permits point-to-point messaging, manages message rates, and hides hardware addresses for objects [50]. The idea is that objects are program- ming language objects that expose class interfaces through a language called IDL. The language instructs a compiler on how to serialize and transmit arguments and results from any method in the interface. Calls to remote objects are made through an RPC mechanism that blocks if a function requires the result. This allows intuitive abstraction of any network barriers that exist between the collection of objects, and allows the system to optimize and batch calls between objects in a larger application. Vellon et al. at Microsoft research built a virtual world system on top of COM RPC mechanisms [58]. These mechanisms share significant similarity with CORBA RPC’s. Instead of having users generate IDL files for each object, they have a single generic object that allows generic property lookup and late binding. Their architecture uses this distributed object system to coordinate between clients and a single server and to log changes for persistence. MPICH-G2 implements MPI on a grid, addressing similar challenges [37]. The fundamental difference of virtual worlds and Sirikata is that objects are em- bedded in a 3D space rather than a logical namespace. CORBA, COM, and other distributed object systems are typically data-centric, using naming services or other logical organizations. Locality in virtual worlds is the true meaning of the word: ge- ometric proximity. Because virtual world objects inhabit a digital metaphor for the physical world, the governing system can apply physical laws, analogously to the real world. Sirikata leverages this geometric embedment to exploit locality and manage 2.4. DISTRIBUTED SIMULATION 19

message rates in a semantically meaningful way.

2.4 Distributed Simulation

While CORBA has been used for distributed simulation [22, 46], a number of other standards have been developed for virtual simulations. The Distributed Interactive Simulation (DIS) was a standard developed by DARPA over the 80’s. The results were a series of public standards for distributed battlefield simulations and proto- cols. NPSNet was an architecture to research DIS performance, developed at the Naval Postgraduate School [42]. NPSNet focuses significant effort to streamline com- munication by communicating object broadcast messages using low-level multicast protocols. The follow-on standard was the High Level Architecture (HLA) simula- tion system [19]. HLA grew out of DARPA funded research to help train combat troops [16, 39]. HLA supports messaging based on a publish-subscribe model: ob- jects simulation send their messages on a channel on which other objects listen. While implementations are typically closed, the systems discussed in the literature [21, 32] make no mention of using geometric information to control and scale communication. Some research-based multiuser worlds offer communication primitives other than broadcasts. RedDwarf Server [4, 59], previously known as Darkstar uses a similar publish-subscribe model to the HLA, but it is backed by a database. Dive [25], MiMaze [30], and RING [28] build communication around multicast primitives. None of these systems provide a unicast primitive. Eraslan et al. use IPv6 QoS to adapt unicast traffic to congestion, but leave the selection of these priorities undefined [24]. In contrast to this prior work, one of this dissertation’s contributions is applying flow weights based on a model of electromagnetic radiation, thereby providing semantically meaningful behavior as well as the properties listed in Section 1.5. Research on how to restrict communication has examined using a radius around the object [15, 35], including orientation and recent interactions to leverage limi- tations of human attention [13]. These approaches change the rate of updates be- tween users depending on their perceptual importance to the users of the world using application-specific knowledge. In this sense, they closely reflect the communication 20 CHAPTER 2. RELATED WORK

Figure 2.1: Curve illustrate portions of Second Life that have delivered their con- stituent objects to the viewer. Other objects are outside a fixed radius. falloff presented here. One difference is that the chosen communication rates are tuned for a specific application, rather than being as general as the falloff function selected here. Knutsson et al. partition their virtual world into disjoint regions to allow for small peer to peer sessions in each disjoint region, trading off long distance sight for increased data partitioning [38]. A similar idea in Second Life also, for example, does not display the world outside a user’s view range, as depicted in Figure 2.1. RING, like this dissertation’s seamless unicast, leverages geometric information to control quality of service between sender-receiver pairs in a semantically meaningful, but completely different, way. RING’s messaging system precomputes object visibility and uses this information to cull, delay, or degrade packets sent between distant or occluded objects [27]. In practice, this precomputation poses challenges and is not effective in dynamic environments such as virtual worlds filled with user content. However, its use of geometry to scale communication is in a similar vein of research as this work. Chapter 3

Example Virtual Worlds and Applications

To better ground the technical motivations and design considerations of the rest of this dissertation, this section provides an overview of several virtual world applications. It examines how existing communication systems are poorly suited to these application requirements, motivating the need for seamless unicast.

In the context of this dissertation, virtual worlds are interactive, continuous, and shared 3D spaces. Participants appear as avatars in the space, which also contains simulated objects – anything from mountains to clocks to clothing. All objects have a physical presence and properties, such as position, geometric shape, and appearance. The world can enforce physical laws such as gravity and collisions.

Beyond simulating physics, a virtual world should also allow objects to run scripts to simulate behaviors, bringing the world to life. For objects to interact in interesting ways, they need to communicate with each other: an avatar’s eating a virtual apple, swinging a virtual sword, or planting a virtual tree all involve application messages. A virtual world’s object messaging system is fundamental to creating engaging and immersive spaces for users.

21 22 CHAPTER 3. EXAMPLE VIRTUAL WORLDS AND APPLICATIONS

3.1 Example Applications

To familiarize readers with virtual world applications, this dissertation presents sev- eral demonstrative examples: HitPoint, Recount, Airport, Marketplace, Spider, and Gatherer. Then the implications of their inter-object communication requirements are examined.

3.1.1 HitPoint

HitPoint is an example game logic application that tracks the health status of other avatars. HitPoint periodically broadcasts a timestamped message containing the nu- merical health status of the avatar on which it is running. These numeric values are displayed onscreen or used to drive the virtual world simulation and game mechanics, allowing a user to aid an avatar or more thoughtfully orchestrate conflict, as well as game logic to decide what the fate of a character may be.

3.1.2 Recount

Recount is a popular “add-on” in World of Warcraft, that groups of players use during and after a battle to track which of them are most effective team members and doing their share of the effort in a given mission. Both up to date and eventually consistent results are required for this add-on to be accurate at the end of a battle and allow the team members to apportion the rewards.

3.1.3 Airport

Virtual worlds like Second Life have a wide range of applications that attempt to create scenarios and simulate them. To study such a user defined application scenario, imagine an airport with planes circling overhead. The airport needs to coordinate with the planes and guide them in for a safe landing. However, passengers on the virtual planes may be using their cell phones and generally communicating with other items in the world or around them. The goal of the application is for the control tower at the airport to guide in the planes with a safe landing path, updated at low latency. 3.1. EXAMPLE APPLICATIONS 23

3.1.4 Marketplace

Marketplace is an example of a common class of Second Life application. A large object acts as a commercial center where users can advertise, browse, buy, sell, and trade using Second Life’s in-game currency. A marketplace serves hun- dreds or even thousands of customers at once. To purchase a painting, an object sends a message to Marketplace, inquiring whether any paintings are for sale, and Marketplace responds with a list of paintings and their prices.

3.1.5 Gatherer

Gatherer is a WoW application with over 7 million downloads. A main component of the WoW narrative is resource acquisition and management: players “mine” for gold, hunt for rare plants and compounds, and search for hidden treasures. Resources are spread across the world and spawn in preset locations. The Gatherer add-on tracks the locations where a user finds resources to make them easier to recall later. The Gatherer add-on also broadcasts the location of each discovered resource to the player’s guild, party, or raid members. Their add-on, in turn, handles these messages to build a database of resource locations from all over the world. All other players in the user’s guild receive the message, regardless of their proximity to the resource, ability to extract it, or even whether they are running the add-on. Although Gatherer is used in WoW for resource mapping, similar traffic patterns might be useful in other games, for instance providing geo-tagging of interesting features.

3.1.6 Spider

Spider is a Second Life application designed to catalog and archive objects in the world. It is this kind of application that was used to gather the world layouts used to measure object layouts used in the experimental evaluations presented in the results section. In Second Life, the spider uses the client protocol to traverse the world, gathering data for nearby objects that the servers inform it about. Hence, in Second Life a 24 CHAPTER 3. EXAMPLE VIRTUAL WORLDS AND APPLICATIONS

spider is forced to go near objects to measure them due to Second Life’s restriction on view distance. Similar spiders have been used in World of warcraft to gather data on object and avatar configurations.

3.2 Application Analysis

These examples highlight a fundamental assumption of many world designers: a rigid class of objects should receive all messages with the same fidelity. This assumption provides no method for reducing traffic when the system is under load, leading system developers to impose undesirable restrictions, such as harsh communication cutoffs, limited social groups, and other constraints described below.

3.2.1 HitPoint

At first glance, HitPoint seems to be naturally suited to broadcast because all play- ers may be interested in the status of other players. However, as the number of participants increases, the messaging rate increases quadratically. To prevent the re- sulting overload, virtual world systems either explicitly constrain battle by limiting the number of participants (WoW’s instances) or implicitly limit battle through a user feedback loop that is triggered by increased latency (e.g., Eve Online’s space battles).

3.2.2 Recount

This application, where users communicate to a guild leader requires eventual con- sistency to guarantee accurate accounting. This is difficult to build on top of a pure broadcast mechanism, requiring acks, windows, etc. Point-to-point messaging is a better fit for an application like Recount. Further evidence of this is demonstrated in that the current Recount code has arbitrary rate-limiting constants within to prevent system overload during heavy raids. A better mechanism would be at lower layers in the system, providing dynamic flow control, fairness, or congestion control. 3.2. APPLICATION ANALYSIS 25

3.2.3 Marketplace

Implemented with broadcasts, Marketplace and its clients easily overload a system, since all messages between a client and the Marketplace will be delivered to all nearby objects. Leaving aside any privacy and security implications, this approach is clumsy and wasteful. Second Life delivers a single customer’s query to disinterested by- standers and even nearby scripted squirrels. Further, distance-restricted broadcast mars the semantics of the world with hidden, sharp discontinuities. With a single step, a customer goes from receiving no messages from Marketplace to a deluge of ad- vertisements, product suggestions, and guidance. A single step back instantly silences this flood. Implementing Marketplace with unicast messaging quickly runs afoul of Second Life’s rate limiting of one message per second (with minor provisions for bursts). Such a rate may be acceptable for an initial query message, but it is far too restrictive to support Marketplace’s large intended customer base. Correspondingly, marketplaces in Second Life can only support a few concurrent users.

3.2.4 Gatherer

This broadcast-based application is similarly wasteful. Although only a few users may care about the discovered resources, Gatherer delivers messages to every avatar in the group. Furthermore, organizational broadcast treats all receivers equally, even though geometric proximity may inform who might be interested in the discovered information. An herbalist many miles from a rich Dark Iron deposit, and who is not running the add-on to boot, will receive a Gatherer broadcast at the same time and rate as a miner ten feet away from it.

3.2.5 Spider

When querying objects in Second Life, upon object discovery, an equal level of band- width is allotted for each object, rate limited to the documented rate in the system. World of Warcraft additionally uses visibility to restrict the data that a client will obtain, so the spider actually needs to navigate the world to find hidden areas in 26 CHAPTER 3. EXAMPLE VIRTUAL WORLDS AND APPLICATIONS

order to scan them. Requiring complex navigation routines to index a world may be interesting, but it makes robust indexing of the world difficult and error prone. A better design could be for any object to be reachable from any other object, but for nearby objects to be able to transmit faster at a higher priority. To benefit the spider, however, far objects should be able to use the excess band- width. Under this scenario, in an underutilized world, a spider could scan all objects in place, no matter their distance. But when the world is overloaded, the spider would still be able to quickly scan near objects for their data and be able to travel to the farther objects to get information about them. This type of traffic prioritization is not available in current systems and hence spiders need to travel in a type of space filling curve to get within a fixed range of any given object and be able to communicate with each other object in turn.

3.2.6 Airport

The airport application suffers greatly if the system is under load. Important control tower-to-airplane communication, between the large airport transmitters and the big airplanes, needs to be prioritized over standard object-object conversations. Without a whole-system-level weighting for packets, other objects in the system can overload the communication channels, leaving air traffic transmissions delayed or lost. Currently deployed virtual world systems have no inbuilt mechanism for favoring traffic from the large, important objects. Second Life offers differentiated message performance based on the API used to send messages, but the limits are ad-hoc, and things are limited to the slow mechanisms for traffic more than 100 meters away, which is quite common for a fast moving object like an airplane.

3.2.7 Results

While, each of the aforementioned applications can and are programmed into current virtual world systems, their implementations all have serious drawbacks and failure cases. However, current virtual worlds have grown organically, putting communica- tion systems in place as they are demanded by their developers. 3.3. REQUIREMENTS AND CHALLENGES 27

This patchwork of communication systems with their innate performance tradeoffs was not architected from the ground-up. A communication mechanism designed for virtual worlds in specific might be able to address the needs of the above applications with a number of simple rules that govern communication. This dissertation argues that Seamless Unicast is one such set of rules.

3.3 Requirements and Challenges

The examples in Section 3.1 motivate the need for unicast as an additional core communication primitive by exposing a number of key challenges that face current virtual worlds. HitPoint illustrates a need for geometrically-based load shedding, so that nearby objects get newer and more accurate updates. Recount illustrates the need for a point-to-point messaging system over which reliability may be built. Marketplace and gatherer highlight that recipient selection is important and that geometry plays an important role in the relevance of messages delivered by the sys- tem. Spider illustrates that high utilization of system resources can contribute to the overall efficiency of the denizens in the world. And finally, Airport illustrates that object volume can play an important role in the level of traffic that the object should be allowed. These requirements raise the question, however: what properties and performance characteristics should the the lowest level virtual world communication primitive have? To start, there are basic system design considerations. Rather than use brittle and wasteful static resource allocation, the system should dynamically allocate its capacity, so it can achieve high utilization for both large and small numbers of ac- tive flows. Additionally, the available capacity should be shared at a fine temporal granularity (e.g., messages) to minimize latency. Rather than enforce arbitrary communication restrictions, the primitive’s aim should be to support communication. Therefore, it should provide a minimum, non- zero guaranteed quality of service to each object pair. This minimum may depend on properties of the particular object pair and may be very small when many object pairs are communicating, but guarantees that no objects are ever completely cut off 28 CHAPTER 3. EXAMPLE VIRTUAL WORLDS AND APPLICATIONS

from each other. Of course, to achieve high utilization, a better quality of service should be provided when possible. Finally, communication throughput and latency should degrade gracefully. Rather than create sudden discontinuities, leading to confusing artifacts, performance changes should be smooth and gradual. Generalizing these application requirements to five properties results in: Recipient selection: A sender can decide the recipients of a message. Graceful degradation: Closer objects have greater throughput than more distant ones, and discontinuities in communication as distance changes should be avoided when possible and small where necessary. Minimum quality of service: Each object pair i is guaranteed a minimum through-

put ti > 0, regardless of the number of other communicating object pairs and their

demand. While far away object pairs may have a tiny ti, nearby pairs receive a reasonable quality of service. Low latency: Because many message exchanges have a real-time component, the system should deliver messages with low latency, even under load (e.g., 15ms1). High utilization: If demand is less than capacity, the system satisfies all demand, even if from only a few object pairs. These properties are simple and not especially narrow: could simple, existing tech- niques satisfy them? This section examines two straw man solutions, using TCP-like fairness and quality of service based on geometric regions. Their failures help explain the challenges the above properties present.

3.3.1 TCP-like Streams

One straightforward implementation of unicast in a virtual world system would be to use TCP, or a TCP-like protocol, directly between objects. Communication between each object pair would map to a TCP flow, receiving an equal share of the available bandwidth.

1Although larger latencies are usable, 15ms is a good baseline as it is within a single frame’s period for a 60 Hz display. 3.3. REQUIREMENTS AND CHALLENGES 29

TCP-like fairness performs poorly in terms of the graceful degradation property, as nearby and distant objects receive the same bandwidth. A larger problem arises when the system saturates the network: it cannot guarantee a minimum quality of service for each flow without limiting the number of flows (n). TCP fairness provides 1 each pair n of the network capacity. Suppose a system with throughput capacity T claims to guarantee an object pair i a throughput of ti. It can provide this guarantee T T as long as n < . Otherwise < ti. Of course the system could assume n is absurdly ti n large to be safe (e.g., the number of silicon atoms in the universe), but such a weak lower bound guarantee is not useful to a developer. The Marketplace application helps explain why TCP-like behavior is problematic. The marketplace needs a minimum quality of service to nearby avatars, so that it can respond quickly and not lose potential customers due to latency. But as the street becomes crowded and more people communicate with each other, the bandwidth between the marketplace and potential customers can degrade below an acceptable level. This problem is similar to the conflict between BitTorrent clients and other applications, where BitTorrent’s hundred open connections can starve other flows [51].

3.3.2 Quality of Service Zones

Another approach is to geometrically prioritize traffic. Second Life, for example, defines three quality of service zones:

• objects on the same server can broadcast with the highest performance messag- ing,

• objects within 100 meters can send broadcasts at a high quality of service, and

• all other objects pairs must use a severely rate limited unicast of 1 message/second with some support for short bursts.

While this approach gives greater bandwidth to closer objects, it does so by creat- ing two pronounced discontinuities in throughput (we defer a quantitative evaluation of these quality of service levels to Section 6.9). The number of regions as well as 30 CHAPTER 3. EXAMPLE VIRTUAL WORLDS AND APPLICATIONS

their boundaries and capacities are arbitrary. Furthermore, with a finite number of zones, at least one must have unbounded size. In Second Life, the third set of objects above encompasses all objects more than 100 meters away – almost the entire world. Objects within that region can suffer from the same weak capacity guarantee as described in the TCP-fairness section above, with both nearby and distant objects having a throughput guarantee so small as to be useless. Chapter 4

A Weight-Based Approach

The examples in chapter 3 show how the properties described in Section 3.3 are challenging to provide. In particular, graceful degradation and minimum quality of service represent a difficult combination. This chapter more formally describes the mathematical basis of the Seamless unicast communication model, which provides both. Chapter 5 describes the challenges in implementing the model scalably and efficiently.

4.1 Minimum Quality of Service

The Sirikata system aims to enforce max-min weighted fair allocation of bandwidth. Max-min weighted fairness guarantees high utilization. Existing algorithms, such as weighted fair queueing [20], can simultaneously provide max-min weighted fairness and low latency. The system assigns each pair a weight wi based on properties of that pair and allocates bandwidth according to the weights of all flows in the system. The challenge is to select these weights to provide both a minimum quality of service and graceful degradation. At first glance, the minimum quality of service guarantee might seem difficult to satisfy. In the TCP-like stream example, the bandwidth of each flow is inversely proportional to the number of flows. Adding a new flow reduces the bandwidth available to all of the existing ones. Correspondingly, unless the system places a hard

31 32 CHAPTER 4. A WEIGHT-BASED APPROACH

limit on the number of flows, it cannot provide a lower bound on a flow’s bandwidth.

More formally, each flow i receives a weight wi. As i → ∞, the sum of weights

P wi W = wi approaches infinity, and W → 0. The mathematical intuition for how Seamless Unicast can provide the minimum quality of service property is that if W in the above equation converges to a constant

wi as i → ∞, and ∀i, wi > 0, then ∀i, W > 0. Put another way, the bandwidth available wi to a particular flow (wi) has a lower bound of the non-zero constant W . This constant is independent of how large the world grows or how many object pairs communicate.

wi If the system can enforce these weights, flow i will always receive at least W of the total bandwidth.

Of course, for W to approach a constant as i → ∞, then wi must approach zero. While some flows may receive significant bandwidth, the guaranteed bandwidth to the ith flow becomes vanishingly small. Any world or system ultimately has a finite number of flows: the important property is that the system can guarantee a bandwidth lower bound independent of how many flows there are. One trivial (and undesirable) example of a weighting scheme that satisfies these −i P properties is wi = 2 . As i → ∞ , W = wi → 2. For a system with throughput T T T , flow i has a guaranteed minimum bandwidth of ti = 2i+1 . For example, t0 = 2 , T while t100 = 2101 (this exponential falloff is why the scheme is not very desirable).

However, using fair queueing, as wi remains non-zero, the available throughput to flow i scales appropriately in the presence or absence of load, following the high T utilization property. Although t100 = 2101 , when flow 100 is the only active flow it receives all of T .

4.2 Geometric Flow Weights

The primary question for this weighting scheme is: how should we assign weights so they converge, regardless of the number of object pairs? There are a number of schemes that might make sense, ranging from the reser- vation objects obtain when signing in, as covered in the previous setion to a scheme based on a financial transaction. 4.3. CHOOSING THE FALLOFF 33

We argue that a third option, a geometric approach, which uses both object coordinates and object volumes, best reflects the relative importance of object pairs. This geometric scheme dedicates more bandwidth to important pairs by favoring both large and close objects. It also allows us to design a weighting scheme which is guaranteed to converge, regardless of the world’s size or object configuration, so long as no two objects overlap. Intuitively, two avatars standing next to each other should receive a larger share of bandwidth than avatars standing miles apart, as their actions are relatively more important to each other. However, the size of objects is also important. For example, a large building in the distance is likely more significant to an avatar than a small pebble meters away. To emphasize the semantics of such weights, consider again the uniform-weight case mentioned in the previous section. In such a scenario, a congested system would reduce all messaging between communicating pairs to unusable levels. In contrast, the weights that we argue for only degrade flows between distant, small objects, maintaining reasonable throughput for nearby, large objects. We model objects messaging each other as sources of radiation. Each object transmits and receives communication proportional to its volume and inversely pro- portional to slightly more than the square of its distance to its receiver. Thus, each object pair’s flow weight can be computed using a simple equation inspired by electro- magnetic waves. The weight of an object pair – conceptually, the number of photons passing between sender and receiver – shrinks as distance increases or objects become smaller relative to one another, but always remains greater than zero.

4.3 Choosing the Falloff

The selected falloff is roughly based on the cross-sectional area of the source and destination objects. Let us denote the source object as os and assume it covers the points in 3-space enclosed by volume Vs. Likewise, calling the destination object od, which likewise overlaps the set of points within Vd. The weight for data flowing between these object pairs is set as follows: 34 CHAPTER 4. A WEIGHT-BASED APPROACH

Figure 4.1: Approximate bandwidths using seamless unicast – doubling volume dou- 1 bles bandwidth, doubling distance roughly reduces bandwidth to 4 of what it was.

Z Z 1 w = dV dV (4.1) OsOd 2 2 d s Vs Vd (sr + ρ) log (sr + ρ) Figure 4.1 shows a simplified example of this model to provide an intuition for weights. The flow between the emu and Mothra have a normalized bandwidth of k bps. The flow between the emu and Godzilla, however, receives twice the bandwidth because Godzilla is the same distance from the emu as Mothra but twice as large. Finally, the flow between the emu and the Empire State Building receives a bandwidth 3 of 4 x because the building is twice as far from the emu as Mothra, but three times as large. Section 4.5 proves that the sum of the weights produced by the above function converges to a constant for a set of non-overlapping objects. This means that even for a world of infinite size, completely packed by nonoverlapping objects of all sizes, any pair of objects has apportioned to it a non-zero rate of communication. Furthermore, 4.4. EVALUATING THE CHOSEN FALLOFF FUNCTION 35

as long as a given server is only responsible for forwarding messages to objects within a finite region, that server nevers need to exceed a throughput proportional to the size of the region it manages. As will be explained and evaluated in Section 4.6.4 we use an approximation of the distance between sender and receiver, rather than the real-time value, in practice. In the worlds evaluated in this dissertation, s = 0.0065 and ρ = 64, and we simply square the log component. If r is in meters, these settings mean that for a given server, objects on other servers within 1 km can receive 80-90% of the that server’s throughput to other servers and the rest of the world receives 10-20%. This geometrically-based falloff is just one of many possible weighting schemes, but we argue that it is appropriate for virtual worlds because virtual world communi- cation is inherently geometric. Visually more important objects – larger and closer – receive more throughput. As shown, it satisfies the minimum quality-of-service prop- erty. Because the falloff function is smooth, it also satisfies the graceful-degradation property. In fact, this smooth falloff can be thought of as an ideal version of the cutoff approach or its logical progression to multiple discrete quality-of-service levels, or- ganized by distance. In this sense, the Second Life model with two tiers of service: broadcast, and email, could be thought of as a coarse approximation to the smoother falloff idea, but without the physical motivation, nor the requisite number of tiers.

4.4 Evaluating the Chosen Falloff Function

To show that the falloff function outlined in the previous section has good properties, it is valuable to look at how it performs in the worst case scenario: a denial of service attack.

4.4.1 Experimental Setup

To setup the world in a configuration that reflects a denial of service, we need to single out a server to be flooded with data from every object in the world. This requires a 36 CHAPTER 4. A WEIGHT-BASED APPROACH

Figure 4.2: Layout where all servers are flooding the central server, Server 0. Exper- iments measure the percentage bandwidth from the blue Server 1, the green Server 2 and red Server 3 to the gray central Server 0 respectively. mapping from coordinates in the world to servers that are authoritative for regions of it. To simplify the setup, a simulated grid of millions of servers is setup, with each server chalk full of objects covering their entire volumes. Each server is configured, 1 2 like in Second Life, to cover a square region of 16 km . The assumption is that the world has a fixed height and hence such a gridding would have responsibility over a finite volume. Figure 4.2 illustrates the simulated traffic pattern that occurs, from every object in the world to every object in the gray central server, server 0. The widths of the 4.4. EVALUATING THE CHOSEN FALLOFF FUNCTION 37

arrows indicate approximate aggregate bandwidth between all the objects on that server and each object on the central server. The experiment tests what percentage of the incoming bandwidth to server 0, in gray will be devoted to the blue server 1, green server 2 and red server 3 respectively in the flood condition. For this experiment, the GSL[29] library is used to compute the definite integral between source and destination server volumes using Monte Carlo integration with 10,000,000 iterations for the neighboring servers, including the blue region and 1,000 iterations per ring of perimeter servers past the neighbors, including the green and red regions respectively.

4.4.2 Results

The experiment takes the numerical integral of the bandwidth falloff function over a densely packed world of a fixed size and compares that with the very constrained integral over the neighboring servers to compute the reserved bandwidth a neighboring server can have, even in a worst case scenario. When doing that integration over the function with the constants specified in Section 4.3, the result is a distribution of bandwidth that reserves almost 5% of bandwidth for neighboring server under a worst case denial of service scenario. The plot in figure 4.3 shows the percentage reserved for the exact neighboring server with the solid blue line. If the world is small, the neighbor can get up to 17% of the receiving server’s capacity as evidenced to the left of the graph, and it converges to just under 5% by the right of the curve on worlds of varying sizes. For the server 2 grid cells away, 1% of the bandwidth is reserved for that server. Notice for an earth-sized world depicted as the rightmost data points in Figure 4.3(c) under constant flood, the neighboring servers still get significant bandwidth: e.g., for a server with a gigabit link, more than 10 megabits would be reserved for servers a distance of two away. In this manner we have examined a falloff function that reserves some bandwidth for important object communications to nearby servers but does not restrict farther communications or provide any hard cutoffs for any reasonably sized worlds. This fulfills the nonzero property, the minimum quality of service property and the graceful 38 CHAPTER 4. A WEIGHT-BASED APPROACH

20

Neighbor server (Server 1) 15 Server distance 2 10 Server distance 3

5

0 50 150

% bandwidth given to server 0 100 200 Number of servers along width of square world (a) Plot of world size to the percentage of a flooded server’s capacity ap- portioned to neighboring servers.

20

Neighbor server (Server 1) 15 Server distance 2 10 Server distance 3

5

0 2000 6000 10000

% bandwidth given to server 0 4000 8000 12000 Number of servers along width of square world (b) Percent of a space server’s capacity in flooded system apportioned to neighboring servers for worlds ranging from half a square kilometer on the far left up to 3 million square kilometers.

20

Neighbor server (Server 1) 15 Server distance 2 10 Server distance 3

5

0 50000 150000 250000

% bandwidth given to server 0 100000 200000 300000 Number of servers along width of square world (c) Percent bandwidth in flooded system apportioned to neighboring servers for up to Earth-sized world

Figure 4.3: Percent bandwidth reserved for neighboring servers. 4.5. PROVING SEAMLESS UNICAST SCALES 39

degradation property of seamless unicast presented previously.

4.5 Proving Seamless Unicast Scales

In this manner we have examined a falloff function that reserves some bandwidth for important object communications to nearby servers but does not restrict farther communications or provide any hard cutoffs for any reasonably sized worlds. This fulfills the nonzero property, the minimum quality of service property and the graceful degradation property of seamless unicast presented previously. Now that we have found a function that fulfills several of the important properties of seamless unicast by appearing to reserve bandwidth for its neighboring servers regardless of the world size, it is necessary to prove that the falloff function actually converges to a nonzero apportionment of bandwidth between any finite sized server within an infinitely sized world. With this proof, the falloff function is sufficient for seamless unicast regardless of the world server setup or the object layout within that server. This proof makes a number of assumptions about the object layout. First objects are assumed not to overlap, meaning that no point in space maps to more than one object in space. The second assumption is that a virtual world finite height beyond which objects cannot traverse. This will allow for a more lenient falloff function and is exhibited in most virtual worlds today.

4.5.1 Defining the Scalar Falloff Function

To compute the flow weights, a falloff function F (Rs,Rd) is evaluated, where Rs and Rd are the regions covered by the source and destination objects, or as a close approximation, the bounding region of their meshes. Defining F (Rs,Rd) in terms of a scalar function f(r) allows us to define weights solely in terms of distance, as integrals over the regions of the source and destination account for size:

ZZ F (Rs,Rd) = f(|ps − pd|)ddds (4.2)

As long as f is greater than zero everywhere, F is also non-zero everywhere, 40 CHAPTER 4. A WEIGHT-BASED APPROACH

allowing all object pairs to communicate. In the worst case of an infinitely-large, fully- packed world, where objects fill every position, the total amount of communication from all objects in the world to a single region, managed by a single server must be bound by a constant. Formally, let S(r) be the set of points in a zero centered sphere of radius r, in other words, the volume covered by the world, and v be a point inside S(r). Let X be a finite sized region (either an object or server). The convergence and non-zero requirements can be formulated in terms of f as: Z Z lim f(|x − v|)) dX dV ≤ c (Convergence) r→∞ S(r)−X X

and f(r) > 0 , r ∈ [0, ∞) (Non-Zero)

4.5.2 Bounding Differential Bandwidth

Thus, the sum of weights from all points in the world to a single receiver point must converge to a constant c. If the differential bandwidth converges for a single point, p, it trivially extends to a finite region around p.

Z lim f(|v − p|) dV ≤ c r→∞ S(r) This guarantees convergence and therefore F will also satisfy the minimum quality of service requirement. In the actual system, this bounds the total input (or output) weight to a region handled by a server. Combined with the server’s capacity, a concrete minimum quality of service can be computed. The same equation may be rewritten in cylindrical coordinates

Z 2π Z R Z H √ lim lim f( r2 + h2) rdr dh dθ ≤ c R→∞ H→∞ 0 0 −H Z R Z H √ = 2π lim lim f( r2 + h2) rdr dh R→∞ H→∞ 0 −H 4.5. PROVING SEAMLESS UNICAST SCALES 41

Modern virtual worlds often have height limits: for instance, the maximum object height for Second Life viewers newer than 1.20 is 4 kilometers, and before that was limited to 7.68 kilometers, and World of Warcraft has similar height limits, measured from the height of the terrain underneath the avatar. Without this assumption, as in an expansive universe, the falloff function would need to be divided by an additional r term, but under this assumption,

Z R Z Hmax √ = 2π lim f( r2 + h2) rdr dh R→∞ 0 −Hmax Since we restricted f(r) > f(r + 1), r ≥ 0 for f(r) to converge, we can simply approximate the whole equation with a larger one

Z R ≤ 4πHmax lim f(r) rdr R→∞ 0

A diverging example: the inverse square law

1 Now integrating f(r) = (sr+ρ)2 turns into a harmonic that does not converge:

Z R −2 1 = 4πHmaxs lim ρ rdr R→∞ 2 0 (r + s )

Z R Z R −2 1 −1 1 = 4πHmaxs lim rdr − 4πHmaxρs lim dr R→∞ ρ r2 R→∞ ρ r2 s s Z R −2 1 = 4πHmaxs lim dr + k R→∞ ρ r s −2 = 4πHmaxs lim (lnR − lnρ) dr + k R→∞

1 Adding a log term such as f(r) = (sr+ρ)2log(sr+ρ) fails to converge. This is intu- itively because Z 1 rdr = log log r r2 log r and the limit: limr→∞ log log r 6= 0. 42 CHAPTER 4. A WEIGHT-BASED APPROACH

A converging example: the inverse square law over log squared

Raising either sr +ρ or log(sr +ρ) to a power greater than one does allow the formula to converge to a constant. Thus, one function close to the maximum bound (that is, the slowest falloff that meets the requirements) is:

1 f(r) = (4.3) (sr + ρ)2 · log1+(sr + ρ) where r is the distance between the source and destination point, s1 scales the falloff rate and ρ is non-zero and indirectly controls the maximum weight for an object pair2. To prove this we insert f(r) into our equation

Z R 4πHmax lim f(r) rdr R→∞ 0 and get Z R 1 = 4πH lim rdr max 2 2 R→∞ 0 (sr + ρ) log (sr + ρ) Now we pull out a factor of s2 from the denominator

Z R −2 1 = 4πHmaxs lim rdr R→∞ ρ 2 2 0 (r + s ) log (sr + ρ)

Then we rescale the rs inside the log term by factoring out s.

Z R −2 1 = 4πHmaxs lim ρ ρ rdr R→∞ 2 2 0 (r + s ) (log(r + s ) + log s)

Now we can pull remove the scaling factor, s from the integral entirely by throwing away the nonnegative log2 s term and turning the equality into an inequality.

1If the log term is raised to a noneven power, s must be greater than the base of the log term so that log2 s is greater than zero; this is used in the proof. 2 1 The additional log terms over r2 are needed because occlusion is not considered. 4.6. APPROXIMATING THE FALLOFF FUNCTION 43

Z R −2 1 ≤ 4πHmaxs lim rdr R→∞ ρ 2 2 ρ 0 (r + s ) log (r + s ) ρ Then we substitute the variable r + s with r and change the integration bounds

Z R 1 ρ Z R 1  = 4πH s−2 lim rdr + dr max 2 2 2 2 R→∞ ρs−1 r log r s ρs−1 r log r And finally we notice that the right indefinite integral converges, and we substitute it with the constant k

Z R 1 = 4πH s−2 lim rdr + k max 2 2 R→∞ ρs−1 r log r

−2 1 1 = 4πHmaxs lim ( − ) dr + k R→∞ s ln ρ log R And thus the limit of the definite integral converges to a constant.

4.5.3 Implications

With the proof that the desired falloff function converges to a constant, for an imple- mentation that perfectly applies the falloff function to apportion system bandwidth, the system will scale regardless of the server layout as long as each server is able to process the requisite bandwidth allowed by its corresponding region. In other words, a system that can enforce the falloff function will provide the guaranteed service to a virtual world system no matter the object density as long as objects are constrained to a maximal finite height and do not overlap.

4.6 Approximating the Falloff Function

Having shown that the chosen falloff function converges to a constant, regardless of the size of the world, the next challenge is how to apply it to a distributed virtual world system, with object state scattered across a multitude of servers, and moving objects invalidating their positional and even volumetric state. 44 CHAPTER 4. A WEIGHT-BASED APPROACH

For this chapter, the virtual world architecture is assumed to have a 1:1 mapping between coordinates in the world and servers that forward messages between objects. Thus, each location in the world is assumed to have exactly one server responsible for that region, which only has local knowledge about the objects within its control. These servers may only query other servers about the whereabouts of objects not within their control, but they do not have instantaneous access to other objects and must compose such queries as messages to an asynchronous service.

4.6.1 Challenges Requiring Approximation

Applying the converging falloff function in this distributed system requires approx- imating some of the object state since not all state is available on a single server. Lookups on object bounds and object position can be performed, and caches may be put into place for those object properties that help approximate the falloff function, but object position can change quickly. So, every time a cache is put into place, dated values may creep into the cache, and TTL’s and re-queries or invalidation mes- sages need to flow alongside application traffic. Another option is to determine an approximation for the falloff function that requires less data to compute. To determine the right set of parameters to mirror and which to approximate on the servers responsible for message forwarding, an examination of fairness is con- ducted. On the one hand, too much mirrored state can lead to memory bloat and stale values, but on the other hand, blind assumptions about that data can lead to unfairness in apportioning the network resources between the objects according to the falloff function.

4.6.2 Regional Approximation

The approximation depending on the least per-object state, is one that prioritizes packets on a per-server basis alone. In this regional approximation, all packets are assumed to emanate from the center of their respective servers and from a single object the size of that server. Figure 4.4(b) illustrates the assumption that the regional approximation makes, as compared with Figure 4.4(a). 4.6. APPROXIMATING THE FALLOFF FUNCTION 45

(a) An example object layout where an (b) When the regional approximation avatar is sending bits to an airplane is used, all objects appear to be the modem and a car is activating a garage same size at the center of the respec- door at a house. tive servers.

(c) The volumetric approximation as- (d) The distance approximation as- sumes objects are all the same size, but sumes destination objects are centered located in the correct position. on their receiving server but that their volumes and source positions are ex- actly correct.

Figure 4.4: Visualization of the approximations tested with the seamless unicast falloff function. 46 CHAPTER 4. A WEIGHT-BASED APPROACH

Essentially the servers themselves are the non-overlapping objects that allow the falloff function to converge. In this model, the largest or smallest object, closest or farthest on a given server get the same priority as each other. No state about individual objects needs to be maintained in the packet forwarding service, only a mapping from server identifier to region. Sections 4.6.5 and 4.6.6 will evaluate what the fairness trade-offs are when using the regional approximation.

4.6.3 Volumetric Approximation

In the volumetric approximation, depicted in Figure 4.4(c), objects positions are 1 tabulated exactly, but their volume is assumed to be n th the volume of the server, where there are n objects in the region covered by the server. In this model, both the largest and smallest objects in a given region get the same priority, but if one object is near the edge of a server it is communicating with, its weight is be higher than another object in the center.

4.6.4 Distance Approximation

The final approximation we will study is the distance approximation. In this model, the message forwarder is aware of the precise volume of each object, but the position of receiving object is assumed to be at the center of their respective servers. This is illustrated in Figure 4.4(d). This allows the, relatively unchanging, volume of an object to be cached at various points of the system without burdening the sending servers with excess information about the positions of other, receiving objects in the world.

4.6.5 Theoretical Evaluation of Approximation

To begin evaluating the trade-offs of the three approximation schemes listed in the previous section, an error for each scheme can be computed. To obtain the data set over which the analysis is run, we observe 64 Second Life 4.6. APPROXIMATING THE FALLOFF FUNCTION 47

Data Used Bandwidth Jain’s Fairness Index Misallocated Regional approximation 39.5% ± 7.2% 0.033 ± 0.031 Volumetric approximation 32.6% ± 3.8% 0.054 ± 0.060 Distance approximation 22.3% ± 10.5% 0.743 ± 0.130 Omniscient 0% 1.0

Table 4.1: Fairness cost for approximations of object position and volume on per- message fairness for object position and size distributions measured from 64 Second Life servers. servers for an hour, and the locations and volumes of all 50,000 objects present in that time frame are recorded. As far as traffic patterns, Second Life does not afford mechanisms to instrument object traffic in the world, and we also would be wary of biasing supposed traffic patterns by the restrictions imposed by current virtual world systems. In this experiment, each of the 2.5 billion object pairs are considered for the error computation. The actual weight is compared with the approximated weight, and the distribution of bandwidth is computed in both cases. So, the control distribution is weighted by the actual falloff function and the experimental distribution is weighted by the approximated falloff function. The error is computed once for each of the 64 Second Life servers and the data are averaged and deviations are computed in Table 4.1. As illustrated in Figure 4.6.5, The results show that the distance approximation is the best of the approximations by tracking the absolute amount of bandwidth given to the incorrect object. However, this metric gives little information about which objects have their bandwidth unfairly allocated. A better metric to gauge fairness in allocation is the Jain’s Fairness Index[36] 1 (JFI). JFI ranges from n to 1 where n is the number of entities to allocate resources 1 across. A JFI of n indicates that all of the resource is given to a single entity, in this case an object pair. A JFI of 1.0 indicates that every entity received its fair share of the resource. When comparing the approximations using the JFI, the Distance approximation 48 CHAPTER 4. A WEIGHT-BASED APPROACH

50 1

40

30

20 Jain's fairness index 10 Percent Bandwidth Misallocated 0 0 Regional Volumetric Distance Regional Volumetric Distance

(a) Percent bandwidth misallocated for re- (b) Jain’s Fairness Index for regional, vol- gional, volumetric and distance-based approx- umetric and distance-based approximations. imations per server when each object, mea- Error bars show how each different server has sured from 64 Second Life servers, commu- a different fairness based on its arrangement of nicates with each other object in the world. constituent objects and message destinations. Error bars show how different servers misallo- cate different percentages of bandwidth based on object layout on that server.

Figure 4.5: Approximations’ contributions to reduction of fairness in network resource apportionment. is far closer to the ideal JFI of 1 than any of the other approximations as depicted in Figure 4.6.5. This indicates that the distance approximation is reasonable and does not subvert the fairness of the overall system, without having the cost of needing replicated positional data for every object in the system at each message forwarding server.

4.6.6 Simulation of Approximation

Equation 4.2 describes how we compute flow weights based on the distance between communicating objects. Due to network latencies, it is impossible to compute this distance exactly. In practice, the transmitting server approximates the distance be- tween the sender and receiver using the center of the receiver’s space server as a proxy region based

region based

4.6. APPROXIMATING THE FALLOFF FUNCTION 49

1.0

0.8

0.6 JFI 0.4

Object-based fairness 0.2 Region-based fairness 0.0 0 50 100 150 200 object pairs

Figure 4.6: Comparison of JFI between a region-based fair queueing system that treats all objects in a region the same (bottom, red) and the distance approximation(top, blue), where each object is prioritized individually. Objects are sorted by weight and JFI is computed for all objects at higher weight, to see where the fairness is lost. for the position of the receiver object. To test how this approximation affects fairness, we built an ns-2 simulation of the system which permits comparison to the ideal case using perfect position information. The simulation arranges eight space servers in a line with four uniformly distributed objects per space server with bounds ranging from 1-8 m3 (spheres with radii of 1-2m). The 28 objects on the seven rightmost space servers iteratively sent messages to each of the four objects on the leftmost space server. With the distance approximation, 1 this simulation yields a JFI of 0.96±0.03 over ten iterations. JFI ranges from n to 1 1 where n is the number of items competing from fairness. A value of n indicates that all of the resource is going to a single item, and 1.0 indicates perfect fairness. Thus this result indicates that very little fairness is lost due to the distance approximation. In contrast the regional approximation results in a fairness of below 0.5. The contrast can be observed in Figure 4.6 where the red line, the regional approximation, is significantly worse than the blue line. 50 CHAPTER 4. A WEIGHT-BASED APPROACH

300 measured ideal 250

200

150

100 throughput (kbps) 50 ideal 0 0 50 100 150 200 object pairs

Figure 4.7: The blue dots represent individual throughput obtained by pairs of ob- jects, sorted by their fairness on the x axis. The red line represents the ideal band- widths those objects would have gotten without any approximation. Chapter 5

Architecture and Implementation

Conceptually, the falloff function in equation 4.3 is simple. Applying it in a real distributed system, however, creates numerous challenges. For example, since objects move, no central authority knows all object positions, and since endpoints are hosts rather than switches means that bottlenecks can occur at either the sender or receiver. To help explain these challenges and how to solve them, this section describes the Sirikata system architecture and walks through the design of the message forwarder, which provides a seamless unicast abstraction. The following section describes how this final design is implemented in practice.

5.1 Sirikata Overview

Although the application-level properties of worlds may vary – a vast desert planet is quite different from a single megalopolis – the goal of the Sirikata virtual world platform is to provide a unified set of components, primitives, and abstractions that can be used to construct varied worlds. This approach towards an open, extensible platform differs significantly from the closed, commercial products of today, which tightly couple system design with application-level architecture (e.g., EvE partitions its universe into disparate solar systems). The Sirikata architecture breaks this coupling by separating object execution from world simulation. Sirikata splits a virtual world into space servers, object hosts, and

51 52 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

Figure 5.1: The Sirikata architecture layout. Object Hosts running objects connect to a Space, comprised of space servers. The space is authoritative for object position and the communication fabric of the world. a content distribution network.

5.1.1 Space Server Responsibilities

A Sirikata world – a “space” – is quite literally an address space: objects have unique identifiers as well as geometric coordinates. The space is the final authority on what objects are in it, their locations and their physical properties. The space also handles geometric queries for object discovery and routes messages between objects. A given space is run by one or more space servers, which segment the geometric coordinates of the 3D world. All of the space servers for a world are under a single administrative 5.1. SIRIKATA OVERVIEW 53

Figure 5.2: A virtual world environment in Sirikata with several scripted flying enti- ties. domain. For two Sirikata objects to interact, they must be in the same space and exchange messages. Space servers mediate all inter-object communication, and objects interact with the world by directly sending messages to the space. For example, a movement command from an object goes to the space, while a request to open a door passes through the space and goes to the door. A Sirikata space has four basic responsibilities. First, it routes and forwards messages between objects. Second, it maintains the authoritative position and other geometric properties of objects by handling physics and movement requests. Third, it answers geometric queries about what other objects are nearby. Finally, it provides audio streams of the environment. Sirikata space servers handle the first three: special audio-mixing hosts handle the last. 54 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

Figure 5.3: Sirikata virtual world with shaders and a live in-world browser.

Coordinate Segmentation (CSEG)

Since a Sirikata space divides its virtual world into regions, each region maps to a space server process. This section describes Coordinate SEGmentation (CSEG), the service that maintains the mapping between regions and servers. Region sizes in CSEG can vary greatly, depending on the message rate objects in a region generate and the complexity of simulation. Variable-sized regions enables a space to control the amount of traffic that any server process has to support. In addition to mapping regions to space server processes, CSEG decides how a space is divided into regions, load-balancing across space server processes. Variable-sized regions introduce the challenge of maintaining a mapping between 5.1. SIRIKATA OVERVIEW 55

Figure 5.4: Viewing a Sirikata virtual world inside a browser that supports the WebGL specification. regions and space servers responsible for those regions. This mapping is required in many use cases: a new object joining the system at a certain point in the virtual world needs to know which server to connect to; an object moving from region A to region B needs to find out which server manages region B; and servers in the system need to know this mapping to recover from failures and coordinate with each other to respond to load. The Coordinate Segmentation (CSEG) service maintains a consistent mapping from a set of regions of space to the server processes that manage those regions. Objects and space servers pose queries in the form of a list of 3D bounding boxes: CSEG responds with a list of space server identifiers. Point queries, such as those from objects entering a world, are bounding boxes of size zero. The reverse query, from server identifier to list of regions, is also available. Maintaining a high performance, consistent, fault tolerant CSEG service is critical to a long running virtual world system; however, that goes beyond the scope of the work in this dissertation, and 56 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

Figure 5.5: Sirikata virtual world sunset overlooking terrain and trees. from this point CSEG is assumed to be a consistent black box system.

Messaging Within the Space

Space servers provide a best-effort datagram service between objects run on object hosts. Because a space is segmented across multiple space servers, a message between objects may pass through two servers. Figure 5.7(a) shows an example of the logical operation of object A sending a message to object B within space α. Figure 5.7(b) shows how this can map to servers. Figure 5.8 shows the flowchart for how a space server forwards messages between objects. A destination is local if the space server is authoritative for that object. The flowchart has three basic outcomes: send the message to the destination’s object host, forward the message to another space server, or drop the message. 5.1. SIRIKATA OVERVIEW 57

Object Segmentation (OSEG)

In order to send a message, a Sirikata space server must resolve on which space server a message’s recipient is located. The forwarder accesses a separate object segmentation service for this purpose, which provides an authoritative mapping between object and space server. OSEG is a key-value store built on top of CRAQ [57], a scalable chain- replicated storage system that organizes its storage nodes into a ring using consistent hashing. Often, read requests to the OSEG service are not needed to resolve an object’s location, as each space server maintains a cache of recent OSEG lookups. This cache improves message latency, diminishes the fairness distortions that lookups cause, and reduces load on OSEG. The cache is kept fresh through the cooperative invalidation of stale entries (as shown in Figure 5.8). And servers maintain a good cache hit rate by leveraging geometric information in their cache eviction policy, which provides better performance than least-recently-used eviction for some important working sets. While another interesting use of geometry to improve virtual world performance, these results are not contributions of this work.

5.1.2 The Object Host

Responsive and animate objects are key aspects of a compelling virtual world: a dog should bark when someone attempts to steal from its master’s virtual home; a flower should grow and blossom; a machine gun should run out of ammunition as it fires. To provide for such engaging objects, objects have associated scripts that specify their behavior. Unlike most commercial systems, Sirikata federates object scripting, creating a separate entity, the object host, specifically tasked with executing object code. Object hosts run object scripts while connected to space servers. Space servers in turn route location updates and message traffic back to the object host.

5.1.3 The Content Distribution Network

Although potentially dynamic and changing places, virtual worlds still have much large, static content. For instance, in a rich visual world, an object’s mesh/texture 58 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

could easily be several megabytes. Given that many avatars may access these large meshes and that latency can have a profound effect on a user’s perceptions [17], the Sirikata system incorporates a content distribution network(CDN), which serves large data resources. Objects do not communicate these large data items directly: they pass references to elements in the CDN. The CDN offloads high bandwidth communication from space servers and object hosts, while providing a natural way to manage replication and accessibility of commonly used resources (e.g., a particular vehicle model).

5.1.4 Putting it All Together

The Sirikata architecture supports federated, application-rich virtual worlds by sepa- rating a virtual world into three individually administered parts: space servers, object hosts, and a content distribution network (CDN). This section describes the architec- ture, focusing on space servers, which control and govern communication. The rest of the paper deals with the two core space-server communication services, the message forwarder and the object-to-server map (OSeg). This decomposition allows the Sirikata architecture to support federated worlds as well as more traditional applications. Users can run objects on hosts they control, yet interact in a neutral space, as in the open world of Figure 5.6(c). Similarly, a game company can run both object hosts and space servers, completely controlling the code in its world, as in Figure 5.6(a). The interested reader may refer to a fuller accounting of each of these systems, their submodules, and their responsibilities [34]; our adolescent virtual world platform is currently 84,355 LoC. Figures 5.1, 5.3 and 5.5 shows a screenshot of a graphical object host connected to a space server.

5.2 Forwarder System Design

To meet the requirements of seamless unicast, a Sirikata space server must ensure weighted fairness across both active inbound flows and active outbound flows between 5.2. FORWARDER SYSTEM DESIGN 59

(a) Game (b) Second Life

(c) Open Virtual World (d) Webby

Figure 5.6: Four ways to deploy Sirikata. In (a), a game company runs all components except for the clients. In (b) Sirikata is configured to appear like Second Life, where Object hosts live on the same CPU as their respective space servers, and an object migration policy is in place. In (c), one company runs the space and CDN for an open social virtual world, while third parties provide their own objects, and in (d), many CDN’s and spaces coexist and objects pull their contents from private webservers and connect to multiple spaces. 60 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

OH OH A B A object hosts B A B

α α α 1 2

space servers (a) Logical (b) Physical

Figure 5.7: Inter-object messaging. Logically, a message from A to B passes through space α; in the system, the message passes from object host OHA to space server α1, to space server α2, to OHB.

Receive from OH! Receive from SS!

Y! Local Y! Local destination? destination? N! N! Y! In OSeg Y! cache? In OSeg cache? N! N! OSeg Lookup Reply with Forward to SS! retry, drop!

Lookup OH Reply with invalidation, forward to SS ! Send to OH!

Figure 5.8: Flowchart of the object message forwarding pipeline in a Sirikata space server. The bottom of the right side is the case when a space server receives message for an object that has moved away. 5.2. FORWARDER SYSTEM DESIGN 61

(a) One queue per flow

(b) CSFQ tracking single stat per flow

(c) Tracking single statistic per server

Figure 5.9: Potential forwarding architectures. 62 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

object pairs. For example, if a server has three incoming flows of weights a = 1, b = 4, and c = 5, the space server gives 10% of its incoming capacity to a, 40% to b, and 50% to c. Similarly, another space space server that has three outgoing flows of weights a = 1, d = 2, and e = 2, gives 20% of its outgoing capacity to a, 40% to d, and 40% to e. Note that flow a is between the two servers, but the differing demands mean that the source is willing to give 20% of its outgoing capacity while the destination is only willing to give 10% of its incoming capacity. The values could also be reversed, and hence, either the source or destination can be the bottleneck. A sender is the bottleneck when it cannot send enough data to use its share of the receiver’s capacity. The data it does send, however, must still be fair across the individual object-pair weights. In this case, the receiver can spread the excess incom- ing capacity across objects from other servers. This can happen, for example, if the sender is communicating with many other servers but the receiver is communicating only with the sender. Conversely, a receiver is the bottleneck when it cannot receive data fast enough to use its share of the sender’s capacity. Figure 5.9(a) shows a simple design that achieves fairness in the system regardless of whether the bottleneck is at a sender or receiver. The forwarder maintains two sets of weighted fair queues [20], one for outgoing traffic and one for incoming traffic. Each object pair has a separate queue. Flow control between the sender and receiver space server on each object pair allows the receiver to indicate when it is the bottleneck, in turn allowing the sender to shift bandwidth to other flows. This enables high utilization when some space servers are saturated. While this simple design leads to the desired behavior, it has a serious drawback: it is prohibitively expensive, from both a state and computation perspective. Namely, it requires a separate queue per flow and O(log(n)) operations per message, where n is the number of flows [52]. This is especially problematic for incoming traffic: a single, extremely popular object may have hundreds of thousands of other objects interested in it. Core Stateless Fair Queueing (CSFQ) is a well-known approach for dealing with the state and CPU requirements of standard fair queueing [54]. Rather than sort all 5.2. FORWARDER SYSTEM DESIGN 63

of the active flows and apply admission control to n separate queues, CSFQ has a single FIFO queue. When CSFQ receives a message, it uses statistics maintained on active flows to decide whether to put the message on the FIFO queue or drop it. In our implementation, this requires 20 bytes of state per flow (as opposed to a separate queue that is multiple messages in depth). The end result is a queue that behaves like a fair queue but requires much less state and CPU overhead. Figure 5.9(b) shows a second forwarder design that uses CSFQ. While CSFQ requires less state than standard fair queueing, the need to maintain per-flow statistics can still be expensive on the receive path. Further, this design still requires flow-control state for each object pair on both sender and receiver. Fig- ure 5.9(c) shows the forwarder design, used by Sirikata, that solves these problems. By aggregating all of the traffic from a single server into a single queueing entry and a single flow between space servers, the forwarder has to maintain much less state. Because a sender is allocating that capacity fairly among its active flows, object pairs within that stream receive their fair share of the stream’s capacity.

5.2.1 Forwarder Structure

While Figure 5.9(c) is a simple conceptual diagram of the queueing structure in Sirikata, Figure 5.10 shows the details of how it works in practice. Sirikata splits queueing into three separate stages. The first two stages execute on the sending space server. The first queueing stage enforces fairness between flows sent between the same pair of space servers: if space server A has capacity K to space server C, each flow from A destined for C receives its fair share of K. The second stage enforces fairness within traffic from the sending server to all receiving servers: this fairly allocates output capacity. This stage determines this value K, based on the total output capacity O. The third stage executes on the receiving server, and it enforces fairness within traffic from any sending server to the receiving server: this fairly allocates input capacity. Together, these three stages follow the assigned flow weights, no matter where a bottleneck is. Every stage is necessary for correct operation. Without the first 64 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

OH1 OH2 OH3 ���� SS A ������ �������������������� ������������������������� ������ ���������������������� C CD ����� ������������������������� �������� ������������������� ������ ���� �������� � �� � SS ������ ����������������������� C ����� ��������������������� ������� �������������������� SSD ������ ��������������������������� � ������� ������������� �

OH1 OH2 OH3

Figure 5.10: Fair queueing design. A message from an object in region SSB (run- ning on OH2) sent to an object in region SSC (running on OH3) passes through three queueing stages. Each queue is shown by a downward-facing triangle, and the message’s path by the dark line.

stage, the capacity between a single pair of servers is not fairly allocated between the corresponding object pairs. Without the second stage, each destination server would receive an equal share of a sender’s capacity: a destination with a single flow with a tiny weight would use the same outgoing capacity as a destination with hundreds of flows which large weights. Finally, without the third stage, the converse problem can occur: a saturated receiver would give a destination with one tiny flow the same incoming capacity as a destination with hundreds of large flows.

5.2.2 Queueing Implementation Details

This section describes the details of Sirikata’s queueing implementation, in particular, how the incoming stage controls its input and how the stages communicate relevant 5.2. FORWARDER SYSTEM DESIGN 65

information to each other for correct weight calculations. Inter-space server communication is over TCP. This approach means that when a receiving space server is saturated, TCP’s flow control explicitly signals a sender when it may transmit a message: this protects a receiving space server from inadvertent denial-of-service attacks, as it receives messages at the rate it can process them. For each flow originating from it, a space server maintains an estimate of that object pair’s weight. The space server knows the exact size and position of the source object, because it is authoritative for that object. It estimates the position of the destination object as the center of the destination space server’s region. As described in 5.1.1, the object’s bounding volume, which is a conservative approximation of the region covered by the object, is obtained as part of the OSeg lookup that determines its authoritative server for routing. One challenge that arises in practice is that object pairs often have bursty com- munication or use less than their available capacity. Because the third queueing stage is not aware of per-pair weights, only the aggregate weight, a sender must provide real-time updates on this aggregate weight as flows shift between active and inactive. Otherwise, a high-weight flow with low utilization inflates the aggregate weight from that space server to its destination, giving other flows more than their fair share. For example, consider the case where there are two flows, with weights 1 and 100. If the weight 100 flow is only using quarter of its available share (25 units), then reporting the two flows as having a weight of 101 can lead the weight 1 flow to receive the excess capacity of 76, much more than its share. The correct aggregate “used weight” to report is 26.

The basic approach is to compute each flow’s “used weight,” ui for flow i, which is the fraction of its weighted share it is actively using (0 < ui ≤ wi). Sirikata defines ui for a flow i as ri ui = wi · min( , 1) Cwi/U where ri is the arrival rate, wi is the weight computed by F , C is the capacity, and P U is total used weight of all active flows ( ui).

Each stage one queue computes these ui and generates a stream of packets with 66 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

fairness enforced according to these ui. The stage one queue reports a single value to stages two and three, the sum of ui of its flows. This sum (a single floating point value) is used as the weights on the inputs of stages two and three. Because stage one has computed the used weights, enforcing fairness on the aggregate inputs at stages two and three generates the correct ratio of packets from all input streams.

U introduces a feedback loop: stages two and three compute U in terms of the ui

(indirectly via the sums provided by stage one), and the ui are computed using U. These two computations require sending and receiving space servers to share state with one another. This state sharing is embedded in fields of inter-server messages. The sender tells the receiver the sum of used weights for its flows; the receiver tells the sender its capacity and the sum of all incoming used weights. This is a simple,

cross-network control loop, as the values ui and U are dependent on one another. However, the simplicity of the function above, combined with a minimum send rate term, means they quickly converge. Finally, this dynamic behavior requires Sirikata to extend CSFQ. Standard CSFQ assumes that the output rate is a constant (i.e., the line rate). In Sirikata, this is not the case, as the rate at which a space server can send messages to another space server depends on the load at both servers. Sirikata uses dynamic measures of input and output capacity, to adapt drop probabilities. Our implementation also normalizes individual flow rate estimations to the total flow rate estimation since many flows are thin, causing noisy predictors. This ensures the CSFQ algorithm does not drop more messages than it should.

5.2.3 Worst case guarantees on queueing system

Both the CSFQ and standard weighted fair queueing algorithms have been studied in isolated systems. By coupling them with the variable rate TCP link, we have pushed the algorithms beyond their initial design and need to study how the added complexity affects the overall fairness the system delivers to its constituent objects. In a steady state, each system performs exactly as it would in isolation, with fixed output rates and fixed object input rates. Thus, in this case, the excess service received by a 5.2. FORWARDER SYSTEM DESIGN 67

flow with fair share rate rα sending at rate R with packets no larger than lmax would receive no more than  R  rα 1 + ln + lmax rα excess capacity based on the proof in Stoica et al.[55] Given this steady state behavior, the operational questions are: how the bound changes when the system is in flux and how long does the system take to get back to a steady state. To analyze the system, the possible situations need to be enumerated and analyzed separately.

1. Adding additional flows to a receiving space server

2. Removing flows from a receiving space server

3. Adding flows from a sending space server

4. Removing flows from a space server

If the servers are not saturated, every flow is sending at or below its fair share rate, so packets continue to get forwarded regardless of the new flow. This leaves only the case of adding or removing flows from saturated space servers.

Adding additional flows to a saturated receiving space server

In one scenario, the receiving server is at capacity and can only process a fixed number of packets, less than or equal to the total rate of packets destined for the receiving server. In a steady state, the queueing system operates exactly like CSFQ and abides by the above equation for excess service. However, when additional load is added, depending on whether the flows are using their entire fair share, the system may experience an additional reduction of fairness until the TCP network queues between space servers drain. Lets assume that a sending space server is forwarding one or more flows using less than that flow’s fair share and one or more flows that is sending at, or above, its fair share. 68 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

(a) A system in a steady state, about to receive (b) The temporary loss of fairness until the in- additional flows from objects connected to a new traspace TCP queues drain after the addition of space server. a new sending space server to an already satu- rated receiver.

Figure 5.11: When a new connection arrives, the low weight flow’s fair share rate will be lowered because it used more than its entire fair share. Its drop rate will increase to the new value as soon as the notification packet returns from the downstream server, but the mix of packets between flows µ1 and µ2 will remain incorrect until the TCP queues drain and the new ratio of packets percolates to the front of the queue.

This scenario is depicted in Figure 5.11(a) with a space server receiving two flows: one called µ1, arriving at a packet a millisecond with weight 2.0 and another called

µ2 arriving at 10 packets per millisecond with a weight of 1.0. As soon as a new flow,

µ3 from another space server is added to the system with weight 4.0, all funneled through a receiver processing 6 packets per millisecond, Figure 5.11(b) illustrates how the immediate effect of the change in drop rate quickly changes the ratio of packets admitted to the queue. It also shows that the incorrect mix will remain in the queue until all packets already within have been processed. Thus, over the length of the queue, the flow µ2 will be receiving more than its fair share of traffic. In the example in Figure 5.11 since half the traffic was being dropped before the addition of new downstream traffic, at least some of the flow is being stemmed preemptively. One of the approximations presented before was the region-based approximation, 5.2. FORWARDER SYSTEM DESIGN 69

Figure 5.12: In the region-based approximation, the object sending above its fair share rate gets more traffic than it should both before, and after additional traffic is added into the system.

where all objects on a server were treated equally. This approximation is be a good benchmark for how well the Sirikata implementation of the distance approximation does when the system is in flux. The region-based approximation for object traffic treats all object traffic into a given region equally, with traffic from the µ2 flow, being admitted at above its fair share rate. Figure 5.12 illustrates the effect that adding additional traffic has in the region-based approximation.

This comparison is important because, during fluctuations in the system, the region-based approximation is exactly the worst case bound on excess share being given to flows sending above their new fair share rates. Figure 5.13 shows a situation that precisely hits this fairness bound. Figure 5.14 illustrates the same mix of object flows’ packets being allowed through despite their weight differences due to the region- based approximation. 70 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

(a) A system in a steady state that will result in (b) The temporary loss of fairness until the in- fairness comparable to the region-based approx- traspace TCP queues drain after the addition of imation, about to receive additional flows from a new sending space server to an already satu- objects connected to a new space server. rated receiver.

Figure 5.13: This case is similar to Figure 5.11, but in this example, the unfairness during the queue draining transition period is equal to the unfairness garnered by the region-based object fairness approximation.

Proof that region approximation bounds worst-case deviation from fair- share rate

The proof for this worst case bound when load is being added to the receiver is as follows. First, we examine what happens when all flows are sending data at or above capacity before additional load is added to the system. In this steady state, all flows are being admitted in proportion to their weights alone since they are all at or exceeding their fair share. Hence, the mix of packets in queues between space servers will be in proportion to their weights. In this case, adding new flows will not increase the number of flows at capacity, nor will it change the desired mix of packets in the queues. So as long as the receiving space server pulls packets from all the senders to it at the correct new ratio, given the new data arriving at it, the fairness will not be impacted. This analysis leaves only the case where at least one flow is arriving at a rate below 5.2. FORWARDER SYSTEM DESIGN 71

Figure 5.14: In the region-based approximation, the object sending above its fair share rate gets more traffic than it should both before, and after additional traffic is added into the system. In this case it performs in the same manner as Figure 5.13 during the transition period its fair share. In some cases, other flows will be sending above their fair share rate, like in Figure 5.11, and in others they will be sending at or below their fair share rate as in Figure 5.13. Always in the former case and sometimes in the latter case, adding the downstream load will cause certain flows to be sending at a higher rate than would be allowed given the downstream rate and the receiver being the bottleneck. If this is not the case, fairness is preserved, despite the added downstream flows. But if one of the streams is sending at a higher rate, its rate should be reduced to accommodate packets from the underutilized flow. Indeed as soon as a packet is sent back, the drop rates are updated accordingly, but the TCP queues need to drain in order to obtain the new mix of packets. Figure 5.13 illustrates a case where, during the interim period, the mix of packets matches what the region-based approximation would have given, by admitting all packets from any flow into the space server at the same ratio as each other. When load is added downstream, the fair share rates will never be higher than they were for nodes in the current level, so the currently dropped proportions, which are necessarily 72 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

greater or equal to zero, will always be closer to the desired fair share rates than they would if packets were dropped uniformly. Therefore the region based method bounds the excess service given to a single flow in Sirikata’s queueing implementation, when load is added downstream.

Removing flows from a saturated receiving space server

When a downstream server suddenly has additional capacity, because, for instance, load subsided, packets will have been unnecessarily dropped. As soon as the down- stream server notifies the upstream of the new total weights and fair share capacities, the drop rate will reduce appropriately. Thus packets only have a round trip time’s worth of delay before they reach the new, higher fair share rate.

Adding and removing flows from the sending space server

When the flows are added or removed from the sending space server, the system behaves just as it would in the single server case, present in the CSFQ analysis. Updates to the drop rate are instant and there are no flow controlled queues within a space server, so the fairness naturally follows the bounds presented in the core stateless fair queueing report [55].

5.3 Example Execution

Figure 5.15 shows the internals of a Sirikata space server. To demonstrate how these services coordinate to provide a virtual world with scalable communication, an ex- ample of an object O entering a world and communicating with another object is presented. We assume that O has been granted entry to the space and given its initial position.

O’s object host must find the space server SSO authoritative for O’s position. To accomplish this task, the object host sends a query to any space server. That space server contacts the Coordinate Segmentation (CSeg) service, issuing a lookup which returns the authoritative space server ID, SSO, for that position. 5.3. EXAMPLE EXECUTION 73

SSB Space Server A (SSA) Loc Loc OSeg oid ssid OSeg cache

PIntO PIntO CSeg CSeg cache

Forw. Forwarder ?

SSC OH OH OH

Figure 5.15: Space server internals. Dashed lines show network connections.

The object host connects to SSO and registers O. Registering O puts an entry in

SSO’s location table (Loc). SSO writes an entry to the Object Segmentation service (OSeg), which maps object identifiers to their authoritative space servers.

O is now present in the world and visible to other objects. O registers a standing (streaming) query to discover other relevant objects with the Potentially Interest- ing Object (PIntO) service. PIntO uses geometric properties of objects to decide whether they are visible or interesting to each other. As soon as one object becomes visibly large enough to another, PIntO begins streaming object identifiers and CDN references to geometric properties such as meshes.

Using an object identifier returned from PIntO, O interacts with it by sending a message. The object host sends this message to SSO’s Forwarder. The Forwarder uses OSeg to determine the destination space server. The destination space server’s Forwarder forwards the message to the appropriate object host, which delivers it to the destination object. 74 CHAPTER 5. ARCHITECTURE AND IMPLEMENTATION

5.4 Implementation

The current Sirikata space server is approximately 34,500 lines of code (measured by SLOCCount [60]). Of those, there are approximately 4,000 in the forwarder, 7,500 in OSeg and its caches, 1,800 in CSeg, 3,300 in Loc and PIntO, and 17,900 in other assorted services and utilities, including 5,200 for the message forwarder and inter- object protocol. To leverage multicore processors, the space server is highly multithreaded: the current implementation has eight active threads. To reduce locking overhead and simplify concurrency, each thread handles a separate set of operations and threads pass asynchronous messages. For example, message forwarding is one thread, but an OSeg cache miss passes a message to the OSeg request thread. Inter-thread messaging can cause additional latency, so for messages that simply pass through the space server and are forwarded directly to an object host, avoiding OSeg, a special fast path is provided that quickly puts the message on the outbound queue for the appropriate object host, without exiting the handling thread. Chapter 6

Virtual World Applications in Sirikata

This chapter seeks to answer four questions:

1. How difficult is seamless unicast to implement?

2. Does the Sirikata seamless unicast implementation enforce its falloff function and satisfy the five communication requirements?

3. What inter-object throughput and latency can seamless unicast sustain?

4. Does the Sirikata seamless unicast implementation give good end-to-end object messaging performance when integrated into a full space server?

As most virtual world systems are closed, it is difficult to measure and compare our system to them. We are able to perform simple inter-object benchmarks in Second Life, and show that our system compares favorably, but the rest of our benchmarks are a best attempt to intuit reasonable workloads that stress the system in different ways.

75 76 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

6.1 Applications Using Seamless Unicast

Section 3.1 introduced several applications: HitPoint, Gatherer, Marketplace, Spider and Airport, explaining how standard broadcast primitives have undesirable conse- quences for even basic applications. Through a novel fair-queueing algorithm, Section 4.2 presented an alternative, seamless unicast, that ensures fine granularity in select- ing recipients, non-zero bandwidth between any pair of objects, good utilization of bandwidth under low load, smooth variation in bandwidth as distance increases, and fairness. This section explains how a developer would use seamless unicast to build Uni- HitPoint, UniGatherer, UniMarketplace, UniSpider and UniAirport, seamless unicast versions of HitPoint, Gatherer, Marketplace, Spider and Airport, respectively. It ex- amines how these applications behave both when the system is underutilized (load < capacity) and when it is saturated (load ≥ capacity).

6.1.1 Developer’s Perspective

Gatherer, Marketplace, and HitPoint can be simply re-written using seamless unicast by replacing their broadcast calls with individual unicast calls to separate destina- tions. The cost of this rewrite is that the source must send n messages rather than 1: the source’s use of its own capacity is equal to the capacity use it imposes on others. A more industrious developer can take advantage of the finer granularity seamless unicast provides by sending slightly different messages to each receiver or by performing application-level flow control.

6.1.2 Underutilized Behavior

When communication capacity is underutilized, seamless unicast does not restrict messaging. All destination objects and avatars receive messages from UniGatherer, UniMarketplace, and UniHitPoint practically simultaneously. This behavior differs from Second Life’s distance-limited broadcast: the rate at which objects and avatars 6.1. APPLICATIONS USING SEAMLESS UNICAST 77

receive messages is unaffected by distance. Furthermore, unlike WoW, seamless uni- cast does not limit the messaging rate based on social groups.

6.1.3 Saturated Behavior

WoW and Second Life allocate communication capacity statically: they are inflexible and cannot degrade gracefully in the presence of load. In contrast, seamless unicast can shed load in a semantically meaningful way: no object pair starves, and nearby and large objects receive a larger share of the available capacity. In UniGatherer’s case, this degradation means that objects closer to a resource receive updates faster. This prioritization is useful and fits well with expected use cases: closer resource deposits are more valuable and useful, as they are more con- venient. An object expends less time and assets to extract a resource that is closer than a resource that is farther. In UniMarketplace’s case, seamless unicast also provides semantically meaningful degradation. In the real world, when a store becomes crowded, customers form lines at registers to make their purchases. Those nearer the register receive a higher quality of service than the ones further back. Under load, seamless unicast exhibits similar behavior: those customers nearer-by receive a higher quality of service. Finally, under load, health alerts sent through UniHitPoint also arrive at different rates. In practice this means that those other objects and avatars in battle with you will receive health updates well before those that are more distant, ensuring that those most able to intervene to help you receive your health information first. Applications can use the finer granularity that seamless unicast provides to reduce system load in all three examples. Each application can easily pre-filter recipients so that message recipients receive only highly-relevant information. This pre-filtering re- duces the number of packets that a virtual world messaging system would be required to deliver. In addition, the prioritizations discussed above are not static allotments: the system divides bandwidth only over open flows, recycles resources of closed con- nections, and updates weights for flows as objects move across the world. Because communication weights consider not only distance between objects but 78 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

also their volume, large objects receive a larger share of network capacity than small ones. For example, large building representing a marketplace can send more data than a small avatar next to it. The implicit assumption is that the importance and degree of interaction with an object is correlated with its size; put another way, objects that everyone wants to interact with need to reflect their importance in their size.

6.2 Experimental Setup

To run our communication experiments we reserved 18 nodes of a 48 node cluster of dual 2.8 GHz pentium IV with 4 gigabytes of memory and switched 100 Mbit ethernet cards. Each node is running Ubuntu 8.04 and commit 8c15930e of Sirikata.

6.3 Hit Point Application Workload

This section examines the application-level benefits of seamless unicast using the Hit- Point/UniHitPoint example from Section 6.1. We choose this particular application because it has a very clear performance measure, the error between the observed and actual hit point value. In this experiment a single object updates 45 listening objects, connected to 8 other space servers, with 128 byte hit point update messages at 30 Hz. The hit point value drops continuously at 1,000 points per second. All objects are assigned the same radius to ease the interpretation of the results since inter-object distance is sufficient to determine the falloff function in that case. The test is run in both low utilization and saturated conditions. In the latter case, 9,600 1 kilobyte messages per second saturate the sending space server. At each timestep, the error for each listening object is computed as the difference between its most recent update and the current value at the source object. We obtain a baseline for HitPoint by broadcasting all the data once to each of the 8 other servers. All messages are equally likely to be dropped when the system is saturated. In the UniHitPoint setup, the system sheds load by discarding messages based on flow weights. The worst average latency of update messages across all tests 6.3. HIT POINT APPLICATION WORKLOAD 79

1000 25 meters 100 10 1 1000 664 meters 100 10 1 1000 917 meters 100 10 Perceived error (hit points)

1 0.0 1.0 2.0 3.0 0.5 1.5 2.5 Time (seconds)

Figure 6.1: Packet traces from applying seamless unicast to the HitPoint application under loaded conditions. Each received update causes a sharp drop in error. When saturated, nearby objects (top) receive updates more frequently than distant objects (bottom). is 20 ms, indicating that even when saturated the updates are received less frequently but are never outdated if they arrive.

6.3.1 Application Behavior on Saturated Space Server

Figure 6.1 shows the hit point error over a short interval for three objects at different distances when the source space server is saturated. For all three, error increases linearly with time until an update’s reception causes a sharp drop. However, because nearby objects have higher weight, fewer of their updates are dropped and they receive updates more frequently. Distant objects still receive updates, but at a diminished rate. Figure 6.2(a) shows the average error relative to the broadcast HitPoint version. The values are normalized to the broadcast HitPoint values, which are almost per- fectly uniform with distance. (The leftmost point represents an object on the same space server which does not require forwarding, resulting in less difference between 80 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

10

1.0 Relative error

0.1 0 200 400 600 800 1000 Distance (meters) (a) In a saturated system, average error relative to the broadcast baseline (dotted gray) is lower for nearby objects and higher for distant objects.

10

1.0 Relative error 0.1

0 200 400 600 800 1000 Distance (meters) (b) Average error compared to broadcast baseline with flatter falloff function. The tradeoff is a factor of 3 beneficial nearby and only a factor of 2 more costly in the distance.

10

1.0 Relative error 0.1

0 200 400 600 800 1000 Distance (meters) (c) In a saturated system, average error relative to the broadcast baseline (dotted gray) extremely low for nearby objects and much higher.

Figure 6.2: Applying seamless unicast to the HitPoint application. 6.3. HIT POINT APPLICATION WORKLOAD 81

1000 25 meters 100 10 1 1000 664 meters 100 10 1 1000 917 meters 100 10 Perceived error (hit points)

1 0.0 1.0 2.0 3.0 0.5 1.5 2.5 Time (seconds)

Figure 6.3: When the system has low utilization, almost all updates are successfully received. Both nearby and distant objects receive updates at about 30 Hz. the two approaches.) UniHitPoint provides more frequent updates to nearby objects, causing 4.5x less error than HitPoint. The tradeoff is that distant objects have up to 8x more error. We argue that this is beneficial since the impact of error likely decreases with distance.

6.3.2 Application Behavior on Unsaturated Space Server

Figure 6.3 illustrates the system’s behavior under low utilization. Because almost all updates are received, objects at all distances receive updates frequently and the average error is significantly lower. A few updates are dropped, as illustrated by missed updates in the bottom graph. These drops are due a temporary inaccuracy in the CSFQ rate estimator, which causes a few packets to be probabilistically dropped. However, the system responds correctly, dropping packets from the most distant object, and the estimator quickly resolves the inaccuracy. 82 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

1.0

0.1 Average error (seconds) 10-3 10-4 10-5 10-6 Solid angle (steradians)

Figure 6.4: In the recount scenario, nearby objects are more up-to-date than farther objects.

6.4 Recount Application Workload

The recount application workload is similar to the hit point workload; however, it has a number of key differences. Hit point is primarily a polling based application where updates were updated at the fastest rate possible. Instead, recount tracks damage rates in an event-based fashion. Each time damage is dealt, the Recount application is notified and sends the results to interested parties. This also means that the most recent update needs to be accurate and the data cannot be lost. Seamless unicast is a lossy protocol, so the way this is accomplished in our virtual world system is for a higher layer transport protocol to be built atop the seamless unicast base. In the same way as TCP lives on top of IP, a streaming protocol can be built on top of seamless unicast. For this application, the SST protocol is selected since ordering between updates is non-critical despite the overall necessity of reliability. A separate substream is created for each update, allowing updates to be received in an arbitrary order. The experiment is setup with an observer in the far right server and with 1500 6.5. AIRPORT APPLICATION WORKLOAD 83

objects per server flooding the observer’s server with sufficient bandwidth to saturate the system as demonstrated in Figure 6.9(a). Each of 36 objects of sizes ranging from .1 meters to 4 meters are damage dealing objects such that each object deals one hundredths point of damage each 10 ms to one of the stationary objects over a two minute run. These objects each report their damage total to the observing object each time they get a damage update. Each received update newer than the last modifies the perceived damage value in the tracking object. The actual error, since the damage rate is known upfront, is computed and shown, and increases linearly at one second per second when no update is received. Looking at previous results, Figure 6.2(a) suggests that for objects of the same size, nearer objects would have less error than farther objects. This happens because the falloff function is roughly proportional to the square of the distance times the respective object sizes, but the metric that equates quality is actually the solid angle of the observed object. To verify this, the recount application scenario was constructed and measured in a saturated space server. Indeed, Figure 6.4 shows how apparently larger objects by solid angle are perceived as being more up-to-date than farther objects. The curve is not exactly linear because the falloff function does not exactly match the radial distance function. But the correlation between the two is evident, and larger objects generally get more bandwidth than smaller objects.

6.5 Airport Application Workload

In the airport application, the largest objects are the important ones, and they are actually important because of their size. The gigantic airport communicating with its incoming airplanes represents a workload that benefits greatly from the falloff func- tion, and also one that suffers from strict fairness between packets. In the scenario, passengers are 1-2.5 meters in diameter and airplanes are 50 meters in diameter and the control tower is 120 meters in diameter. Figure 6.5 illustrates that in the overloaded case, airplanes individually get almost two orders of magnitude error improvement over the case where all packets are treated 84 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

10

1.0

0.1 Relative error

100 0 100 200 300 400 500 Distance (meters)

Figure 6.5: Airplane communication in a saturated system with seamless unicast versus the same application in a system that has unweighted fair traffic prioritization. Note that the first sample point indicates an airplane on the same server as the control tower, resulting in similar overall error rates since these packets go through the fast path. Farther airplanes get similar error rates as the airplane within 50 meters but the relative error is much lower than it would be without seamless unicast.

equally. This allows the airplane to be guided in despite the cell phone traffic, and passengers no longer have to be instructed to disable their virtual electronic devices on landing.

Note that airplanes on the same server as the airport i.e. those within 50 meters, coming in for final touchdown, go through the fast path for packet forwarding and don’t get any benefit from the prioritization, but their absolute error is very low in all cases. Any airplanes that happen to be on a different server, especially due to a optimized partitioning of the space go comparably fast to their fellow airplanes on the same server, but the relative error is orders of magnitude lower than it would be without Seamless Unicast. 6.6. MARKETPLACE, GATHERER AND SPIDER WORKLOADS 85

Figure 6.6: Marketplace, Gather and Spider application workload, demonstrating that objects with a larger solid angle get a higher bandwidth of important data in a system flooded with 7 MB/s of irrelevant data.

6.6 Marketplace, Gatherer and Spider Workloads

The marketplace, gatherer and spider applications look somewhat similar in their behavior. Each application has one object receiving data from a multitude of sources of varying importance levels in a live virtual world, thrumming with traffic. In these applications, nearby objects are more important than farther objects and data transfer rate is the defining characteristic of performance. To demonstrate that seamless unicast is a valuable contribution to these applications, it is compared with a control where all traffic is TCP Fair and gets a uniform priority level. Figure 6.6 demonstrates how nearer objects get a much higher throughput on average as compared with objects appearing smaller. This allows a marketplace to successfully send information to a user, enables relevant party members to find out detailed information about their surrounding resources without flooding those farther 86 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

(a) Object layout. (b) Second Life screenshot

Figure 6.7: 4x4 Second Life object distribution map and Second Life screen shot captured from highlighted space server. a) Sample Second Life data for a 4x4 server grid. b) Screen shot from server in row 3 and column 2. away and allows a spider object to get data from looming objects despite the system being overloaded by a constant stream of flooding. The graph shows how objects with higher solid angle get higher bandwidth, despite the fact that the falloff function is somewhat different than raw solid angle. This reinforces that the falloff function chosen in Section 4.3 was a reasonable choice. In this scenario, objects occupying large solid angles of the observer’s view get more than an order of magnitude more bandwidth than they would under the TCP- fair scenario. The cost is that farther objects get correspondingly less bandwidth when the system is saturated.

6.7 End-to-End Evaluation

Because there are no well-known and accepted virtual world workloads, explicit end- to-end evaluations of Sirikata’s messaging system can only be suggestive rather than 6.7. END-TO-END EVALUATION 87

2.5 working set Receiver Queue 2.0 exceeds cache Network Sender Queue forwarding 1.5 begins OSeg Cache Miss

1.0 Local Forwarding

0.5 Object Host Net latency (milliseconds)

0.0 103 104 # communicating objects, sorted by distance from center

Figure 6.8: Average message latency in end-to-end experiment. Percentages show cache miss rates experienced. definitive. For end-to-end evaluation, we use location and geometry information col- lected from 16 Second Life servers covering a 1km2 region with 19,000 objects, shown in Figure 6.7. To synthesize application-level messaging between these objects, we generate traffic according to a generative social network model [40]. For each object in the highlighted region of Figure 6.7, this model generates a set of “friends” with which to communicate. We parametrize the model with α = 2, so that each object’s expected number of friends is 35, which matches the 90th-percentile of guild sizes in World of Warcraft [23]. A message sender is chosen uniformly from those objects in the highlighted region, and a receiver is chosen uniformly over that sender’s friends. This captures what might be seen in a social world, where communication is mostly between friends, less between acquaintances, and rare between strangers. To measure how latency scales as the world grows, we progressively increase the number of objects in increments of 250 every 420 seconds. Objects join the world radiating out in increasing distance from the center of the highlighted region, and the aggregate send rate is 33 one kilobyte messages per second. The OSeg cache is artificially limited to 256 entries, in order to evaluate the effect of a working set that exceeds the cache size. Figure 6.8 shows how different factors contribute to message latency as the number 88 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

of objects increases. Latency increases at two major inflection points. The first occurs at 750 objects: this is when objects begin appearing on other space servers and thus require forwarding. The second inflection point is at 1,750 objects. Leading up to this point, network and sender queue latencies have been increasing, as a greater fraction of objects are remote. At 1,750 objects, the working set exceeds the cache size, causing OSeg misses (which also require remote network lookups) to dominate message latency. The miss rate increases until 12,000 objects, when it stabilizes at 82%. This stabilization occurs because the social workload selects new, further objects with very low probability. These results demonstrate the importance of the OSeg cache to reduce message latency, and they motivates a large cache size. However, even when the working set exceeds the cache size or the workload is uncacheable, a space server forwards well: cache misses add only a millisecond to ping times, which remain below 3ms.

6.8 Communication Rate Control

The measurements so far have used application or modeled workloads to evaluate whether Sirikata meets the five requirements laid out in Section 3 in the context of a full application. This section evaluates the system in steady-state, synthetic workloads to better evaluate its characteristics under load. To evaluate whether the forwarder enforces Equation 4.3 and achieves good uti- lization, we construct a simple linear world of nine square regions, shown in Fig- ure 6.9(c). Each region is a separate space server and contains 1,500 objects. Each object in servers s1 – s8 is paired with a random object on server d. Objects have bounding volumes of 4-270 m3 (spheres with radii of 1-8m). We evaluate two traffic patterns. In the first, objects flood the system with mes- sages such that half of the messages are generated to other objects in proportion to the weights of those objects and the other half are sent uniformly, irrespective of the falloff. This traffic pattern tests whether the space server can give all pairs some capacity (non-zero throughput requirement) while giving closer pairs their fair-share 6.8. COMMUNICATION RATE CONTROL 89

(a) All senders send as fast as possible

(b) High-weight object pairs send less than their share.

d! s1! s2! s3! s4! s5! s6! s7! s8!

(c) Linear world topology

Figure 6.9: Flow throughput under two workloads. 90 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

rate (minimum quality of service requirement). In the second, objects send at a con- stant rate so high weight pairs use less than their fair share: this tests whether the space server can deliver high utilization even under low load. The tests were run for 15 minutes. Figures 6.9(a) and 6.9(b) show the results. The space servers follow Equation 4.2 and therefore meet the non-zero throughput, graceful degradation, and fine-grained multiplexing requirements. Nearby object pairs receive significant throughput (30 kbps), yet distant pairs can still exchange messages. Each graph shows four values for all 12,000 flows, sorted by weight. The first three show throughput (left axis): the actual received throughput, the falloff throughput (if the object used its entire fair share), and the ideal throughput (if the system enforced fairness perfectly). In the flooding experiment, the falloff and ideal are the same. The fourth value shows the JFI (right axis) of received throughput for the n highest-weight pairs. The right-most data point shows the overall JFI, and the range of the axis is from a strong value of .85 to the ideal, 1.0. The received throughput closely follows the ideal throughput. In Figure 6.9(a), the JFI remains high until the tail. The dropoff in the tail occurs because in core stateless fair queueing a flow’s first packet is always accepted. If the flow is between a pair of tiny, distant objects, then under load a single packet may be much more than their share. The important point is that outside this noisy tail, JFI remains at 0.99. In Figure 6.9(b), JFI is much higher because space servers give the unused capacity of high weight flows to lower ones, reducing this discretization error. This demonstrates the high utilization requirement of seamless unicast, that the excess capacity not utilized by high-weight pairs is distributed fairly to the rest of the flows that could use it.

6.9 Microbenchmarks

The previous section demonstrated that the system enforces the falloff function un- der load and distributes excess load fairly. This section evaluates system perfor- mance through a series of microbenchmarks that measure latency, forwarding rate, 6.9. MICROBENCHMARKS 91

Latency Max Rate Throughput Local 692 µs 41876 pps 82.33 Mbps Remote 1232 µs 13747 pps 47.30 Mbps Remote Lookup 2672 µs 9454 pps 37.60 Mbps

Table 6.1: Space server forwarding performance.

Latency Throughput Local Server 12 ms 176 kbps Remote, distance < 100m 33 ms 57 kbps Remote, distance > 100m 480 ms 15 kbps

Table 6.2: Second Life message performance between two objects. and throughput between a pair of object hosts. Latency measures the ping time for 64 byte messages with idle space servers. Forwarding rate measures how fast a space server can forward 64 byte messages. Throughput measures the maximum inter-object throughput using 1 kilobyte messages.

We measure three forwarding paths: local, remote, and remote lookup. In the local path, the two objects are on the same space server. In the remote path, the two objects are on different space servers and the destination is in the OSeg cache. In the remote lookup path the destination requires an OSeg lookup. Remote takes longer than local because it passes through inter-server queues, has an additional network hop, and requires a thread context switch.

Table 6.1 shows the results. For local messages, Sirikata can process over 40,000 messages per second and has a ping time below 700µs. An OSeg lookup more than doubles message latency, cuts the forwarding rate by 40% and reduces throughput by 21%: this demonstrates the need for an OSeg cache. All in all the microbenchmarks demonstrate that the machinery needed to get fairness does not subvert good packet forwarding performance, and that the system can get fairness and perform well under load and in the absence of load.

Table 6.2 shows results from similar experiments in a nearly empty Second Life 92 CHAPTER 6. VIRTUAL WORLD APPLICATIONS IN SIRIKATA

region where there are few visibility computations. Local Server and proximate mes- saging uses llShout() and llListen(); remote uses llEmail(). Despite requiring fewer network hops since object and space simulation occur on the same server in Second Life, latency and throughput are orders of magnitude worse than in Sirikata. These tremendous differences are in part due to the fact that Second Life explicitly rate- limits traffic to ensure a smooth experience. In contrast, Sirikata’s weighted rate control allows even single object pairs to use the full system capacity, satisfying the high utilization requirement. However, the Second Life system is difficult to isolate since it is a live system, constantly performing scripting, map capture, indexing, physics and some client visi- bility calculations. A more fair comparison may be to compare the workloads within Second Life and understand how analogous workloads reduce the performance from the baseline single-server scenario. Switching from local communication to remote communication triples the latency and cuts the throughput by three. The Sirikata system sees similar throughput reduction, but the latency stays within a factor of two. Communication to a distant object requires the object to switch API’s to the long-distance communication primitive, increasing the latency by a factor of 15 and cutting the throughput by four. This illustrates that the Sirikata system compares favorably in relative terms for increasingly difficult workloads and that Sirikata bene- fits from a single seamless unicast primitive that can smoothly trade off load between near and far communication.

6.10 Results for Seamless Unicast

These experiments demonstrate that Seamless unicast can deliver high performance, low latency inter-object messaging. Seamless unicast has a number of advantages over traditional broadcast mechanisms.

1. Recipient selection: A sender can decide the recipients of a message: this means that no more messages leave the system than were delivered to it. This conservation of messages is important. 6.10. RESULTS FOR SEAMLESS UNICAST 93

2. Minimum quality of service: Since the system enforcess the falloff, it guar- antess each object pair a minimum, non-zero throughput, even when the system is saturated with load.

3. Graceful degradation: Due to teh design of the falloff, there are no sudden discontinuities in communication as distance increases, and as demonstrated in the recount application in Section 6.4, the curve roughly matches perceptual metrics of projected object size.

4. Fine-grained multiplexing: As shown in the microbenchmarks in Section 6.9, the available capacity is shared between object pairs at a fine granularity and has very low latency when the system is below capacity.

5. High utilization: The system can achieve high utilization even if only a few object pairs are communicating as illustrated in the microbenchmarks as well as the application behaviors.

These advantages suggest that Sirikata’s seamless unicast system is a valuable component of virtual world systems, and provides an interesting framework upon which to structure application level virtual world traffic. Chapter 7

Discussion

This dissertation makes several contributions to the area of computer systems, net- works and virtual worlds. In this chapter, we review these contributions and highlight some areas for future work.

7.1 Contributions

This dissertation has contributed a point to point behavior between virtual world objects, named Seamless Unicast, an approximation to facilitate this behavior, an efficient system design to implement seamless unicast and an evaluation of it.

7.1.1 Seamless Unicast

A new virtual world communication primitive, seamless unicast, provides fair and scalable application-level messaging. The key insight behind seamless unicast is lever- aging three-dimensional geometry, inherently embedded in virtual spaces, to define a falloff function. This falloff function, in turn, governs the portion of communication resources any given pair of objects may utilize. Although several prior systems have used geometry to inform messaging rates,

94 7.1. CONTRIBUTIONS 95

they have done so in a manner inflexible to system load that produces sharp disconti- nuities. In contrast, when built into Sirikata’s forwarding architecture, the contribu- tion of seamless unicast provides a quality-of-service guarantee that can achieve high utilization, smoothly varies with distance, and degrades in a semantically-meaningful fashion under saturated network conditions, meeting the requirements that virtual world applications introduce.

7.1.2 Falloff Approximation

There are a number of challenges in ensuring that the messaging system abides by the falloff function in a distributed virtual world system. Since a world is comprised of distributed object state, no one component of the routing fabric can intuit ev- ery location of every object nor all their geometry. The second contribution is an approximation of the falloff function using the center of the destination server as a proxy for object location and a numeric volume as a stand-in for object geometry. This allows high fairness over a wide range of object layouts, without requiring undue synchronization between space servers.

7.1.3 System Design

Because the weights from our falloff function can range across several orders of mag- nitude, and because we are concerned with both receiver and sender congestion, the third contribution is an extension of the standard Core Stateless Fair Queueing al- gorithm, as described in Section 4.5. As demonstrated in Section 4.2, when coupled with the chosen falloff function, this extension results in a message forwarder with a number of beneficial properties.

1. Low-latency: Local messages require 692µs to be delivered and remote mes- sages without cache hits require 2672µs to be delivered.

2. High-throughput: Local messages are forwarded at a rate of 82.33Mbps, and remote messages are delivered at 37.60Mbps.

3. Seamless unicast: All five criteria of seamless unicast are met. 96 CHAPTER 7. DISCUSSION

7.1.4 Evaluation

Thus, the final contribution is an evaluation demonstrating that, a single object pair can use the whole system capacity if available, and under the heavy congestion of tens of thousands of object pairs, Sirikata can guarantee every pair a non-zero throughput while simultaneously giving nearby objects 30kbps. This result was validated in the realistic HitPoint application setting, further demonstrating reasonable latencies and high bandwidths for nearby objects in spite of saturation-level load. In conclusion, the Sirikata system provides a new virtual world messaging layer, seamless unicast, that provides both fair and scalable, yet lossy transport for scripted messaging between objects.

7.2 Further Work

This foundation brings us the ability to build useful virtual world abstractions on top of a scalable core. However, to bring the system into a fully deployed virtual world setting, we may need to improve the application-level messaging system described above. For instance, although we have not observed Object Host-to-Space Server bandwidth to be a bottleneck, if it becomes a problem, we may develop a protocol that iteratively multicasts a packet on the space server rather than on the object host. Another solution for multicast or scene state update is to have an invisible ob- ject on each space server that handles multicasts to objects within that space server. Constructing a multicast tree between these per-space-server objects could be anal- ogous to a wireless flood protocol. This would bring about both static and dynamic challenges as the space servers reorganize themselves to handle the object load. Similarly, we are improving our own application-level protocol, which uses struc- tured streams to deliver messages between objects by adding a “send-last” primitive that guarantees that the most recently sent packet will eventually end up at the re- ceiver, while intermediate updates may be dropped or reordered. This can be useful for broadcasting a number of idempotent updates to a value and making sure that it comes to rest at the proper place when updates cease. 7.3. LAST THOUGHTS 97

In addition to the work on Sirikata’s messaging pipeline, much can also be im- proved in the rest of the system. We are currently testing and evaluating novel data structures to make spatial querying more efficient; designing and implementing a CDN to store object data and meshes; building a service that dynamically segments the virtual world over space servers; and forging a scripting language to make it easier to imbue objects with interesting and interactive behaviors. Thus, the communication layer developed in this dissertation is an important step towards the broader goal of building scalable, federated, and seamless virtual worlds. And since this is only a first step, many challenging systems problems remain, includ- ing multicast for for inherently broadcast updates (such as location updates), spatial querying for object detection, dynamic segmentation of the virtual world over space servers, and content distribution and scheduling for user-generated data and meshes. Our preliminary findings suggest that the geometric nature of virtual worlds, and the physical metaphors one can draw from this, may be the important distinguishing property that can help solve many of these problems.

7.3 Last Thoughts

In summary the Sirikata virtual world system provides an interesting tool for deploy- ing interactive 3D systems. Sirikata has been released as an open source project and our hope is that it will assist researchers trying to improve the budding virtual world sub-field within Computer Science. Much of the research effort in virtual worlds has been spent developing entire sys- tem architecture stacks. I hope that making Sirikata pluggable will enable researchers to replace just the portion of the system that they wish to study and improve. I envi- sion the development of a battery of virtual world benchmarks based on the Sirikata framework, which will make comparisons between components more meaningful than in independently developed software stacks and architectures. I also hope the community will deliver their source back to the public repositories so that improvements can snowball and a viable public virtual world platform can be created in a decentralized way. Bibliography

[1] IEEE 1516, High Level Architecture (HLA).

[2] Blizzard. http://us.blizzard.com/en-us/company/press/pressreleases. html?081121.

[3] New world notes. http://nwn.blogs.com/nwn/2007/07/unwired.html.

[4] Red dwarf server. http://www.reddwarfserver.org/.

[5] Second Life. http://wiki.secondlife.com/wiki/LlHTTPResponse.

[6] World of Warcraft census. "http://www.warcraftrealms.com/eu_ realmstats.php?sort=Total".

[7] World of Warcraft. http://www.worldofwarcraft.com/info/basics/ realmtypes.html, 2004.

[8] World of Warcraft. http://www.wowwiki.com/Realms_list, 2010.

[9] Joe Armstrong. History of erlang. In Proceedings of the 3rd ACM SIGPLAN conference on History of programming languages, 2007.

[10] William Athas and Nanette Boden. Cantor: An actor programming system for scientific computing. In Proceedings of the NSF Workshop on Object-Based Concurrent Programming, 1998.

[11] Richard Bartle. Interactive multi-player computer games. MUSE Ltd. 1990. ftp://ftp.lambda.moo..org/pub/MOO/papers/mudreport.txt.

98 BIBLIOGRAPHY 99

[12] Richard Bartle. . New Riders, 2003.

[13] Ashwin Bharambe, John R. Douceur, Jacob R. Lorch, Thomas Moscibroda, Jef- frey Pang, Srinivasan Seshan, and Xinyu Zhuang. Donnybrook: Enabling large- scale, high-speed, peer-to-peer games. In Proc. SIGCOMM, August 2008.

[14] Ashwin Bharambe, Jeffrey Pang, and Srinivasan Seshan. Colyseus: a distributed architecture for online multiplayer games. In NSDI’06: Proceedings of the 3rd conference on Networked Systems Design & Implementation, pages 12–12, Berke- ley, CA, USA, 2006. USENIX Association.

[15] Ashwin Bharambe, Jeffrey Pang, and Srinivasan Seshan. Colyseus: a distributed architecture for online multiplayer games. In Proc. Networked Systems Design & Implementation (NSDI), May 2006.

[16] L. Budge, R. Strini, R. Dehncke, and J. Hunt. Synthetic theater of war 97 overview. In Simulation Interoperability Workshop, 1998.

[17] Mark Claypool and Kajal Claypool. Latency and player actions in online games. Commun. ACM, 49(11), 2006.

[18] Pavel Curtis. Mudding social phenomena in tex-based virtual realities. In Peter Ludlow, editor, High Noon on the Electronic Frontier: Conceptual Issues in Cyberspace. ELIB/synchCMC, 1996.

[19] Judith S. Dahmann, Richard M. Fujimoto, and Richard M. Weatherly. The department of defense high level architecture. In Proc. Winter Simulation Con- ference (WSC), 1997.

[20] A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair queueing algorithm. In Proc. SIGCOMM, 1989.

[21] Department of Defense Defense Modeling and Simulation Office. High level architecture run-time infrastructure RTI 1.3-next generation programmer’s guide version 3.2, 2000. 100 BIBLIOGRAPHY

[22] Felicio Deriggi Jr, Mario Kubo, Antonio Sementille, Jose Brega, Simone San- tos, and Claudio Kirner. CORBA platform as support for distributed virtual environments. In Proc. IEEE Virtual Reality, 1999.

[23] Nicolas Ducheneaut, Nicholas Yee, Eric Nickell, and Robert J. Moore. The life and death of online gaming communities: a look at guilds in World of Warcraft. In Proc. CHI, 2007.

[24] M. Eraslan, N.D. Georganas, J.R. Gallardo, and D. Makrakis. A scalable network architecture for distributed virtual environments with dynamic QoS over IPv6. In Proc. ISCC, June-July 2003.

[25] Emmanuel Fr´econand M˚artinStenius. Dive: a scaleable network architecture for distributed virtual environments. Distributed Systems Engineering, 5(3), 1998.

[26] Michael J. Freedman, Eric Freudenthal, and David Mazi`eres. Democratizing content publication with Coral. In Proceedings of the Symposium on Networked Systems Design and Implementation, pages 239–252, Berkeley, CA, USA, 2004. USENIX Association.

[27] T. Funkhouser. Network services for multi-user virtual environments. In IEEE Network Realities, 1995.

[28] Thomas A. Funkhouser. RING: A client-server system for multi-user virtual environments. In Symp. on Interactive 3D Graphics, 1995.

[29] M. Galassi, J. Davies, J. Theiler, B. Gough, G. Jungman, P. Alken, M. Booth, and F. Rossi. GNU Scientific Library Reference Manual. Network Theory Ltd., third edition, 2009.

[30] L. Gautier and C. Diot. Design and evaluation of MiMaze, a multi-player game on the internet. ICMCS, 1998.

[31] Craig Glenday. Guinness World Records 2009. Random House. BIBLIOGRAPHY 101

[32] Claude Van Ham and Trevor Pearce. The SIP-RTI: An HLA RTI implementation supporting interoperability. Symp. on Distributed Simulation and Real-Time Applications, 2006.

[33] Carl Hewitt, Peter Bishop, and Richard Steiger. A universal modular actor formalism for artificial intelligence. In Proceedings of IJCAI3, 1973.

[34] Daniel Horn, Ewen Cheslack-Postava, Tahir Azim, Michael J. Freedman, and Philip Levis. Scaling virtual worlds with a physical metaphor. Pervasive Com- puting, 8(3), 2009.

[35] Shun-Yun Hu, Jui-Fa Chen, and Tsu-Han Chen. Von: A scalable peer-to-peer network for virtual environments. Network, IEEE, 20(4):22–31, July-Aug. 2006.

[36] R. Jain, D. Chiu, and W. Hawe. A quantitative measure of fairness and discrim- ination for resource allocation in shared computer systems. Technical Report TR-301, DEC Research, 1984.

[37] Nicholas T. Karonis, Brian Toonen, and Ian Foster. MPICH-G2: A grid-enabled implementation of the message passing interface. Parallel and Distributed Com- puting, 63(5), 2003.

[38] B. Knutsson, Honghui Lu, Wei Xu, and B. Hopkins. Peer-to-peer support for massively multiplayer games. In Proc. INFOCOM, March 2004.

[39] Yinghua Li, Yong Li, and Jie Liu. An HLA based design of space system simu- lation environment. Acta Astronautica, 61(1-6), 2007.

[40] David Liben-Nowell, Jasmine Novak, Ravi Kumar, Prabhakar Raghavan, An- drew Tomkins, and Ronald L. Graham. Geographic routing in social networks. Proc. National Academy of Sciences, 102(33), 2005.

[41] Henry Lieberman. Thinking about lots of things at once without getting con- fused: Parallelism in act 1, 1981. 102 BIBLIOGRAPHY

[42] Michael R. Macedonia, Donald P. Brutzman, Michael J. Zyda, David R. Pratt, Paul T. Barham, John Falby, and John Locke. NPSNET: a multi-player 3D virtual environment over the Internet. In SI3D ’95: Proceedings of the 1995 symposium on Interactive 3D Graphics, pages 93–ff. ACM, 1995.

[43] Brad McQuaid. Instancing in online gaming. http://web.archive.org/ web/20060324110936/http://www.gamergod.com/article.php?article_id= 2933.

[44] Donald Murray. The world’s work. Doubleday, Page & Company, 1902.

[45] Cory Ondrejka and Philip Rosedale. Glimpse inside a metaverse: Google TechTalks. 2006.

[46] St´ephaneLouis Dit Picard, Samuel Degrande, and Christophe Gransart. A CORBA based platform as communication support for synchronous collaborative virtual environment. In Proc. International Workshop on Multimedia Middleware (M3W), 2001.

[47] Dongyu Qiu and R. Srikant. Modeling and performance analysis of bittorrent-like peer-to-peer networks, 2004.

[48] Philip Rosedale and Cory Ondrejka. Enabling player-created online worlds with grid computing and streaming. 2003. http://www.gamasutra.com/resource_ guide/20030916/rosedale_01.shtml.

[49] Robert Rossney. Metaworlds. Wired Magazine, June 1996.

[50] Douglas C. Schmidt and Fred Kuhns. An overview of the real-time CORBA specification. Computer, 33(6), 2000.

[51] Stanislav Shalunov. Low extra delay background transport (ledbat). IETF Draft, 2010.

[52] M. Shreedhar and George Varghese. Efficient fair queueing using deficit round- robin. Trans. on Networking, 4(3), 1996. BIBLIOGRAPHY 103

[53] Neal Stephenson. Snow Crash. Bantam Spectra, 1992.

[54] Ion Stoica, Scott Shenker, and Hui Zhang. Core-stateless fair queueing: achieving approximately fair bandwidth allocations in high speed networks. SIGCOMM CCR, 28(4), 1998.

[55] Ion Stoica, Scott Shenker, and Hui Zhang. Core-stateless fair queueing: achieving approximately fair bandwidth allocations in high speed networks. Technical Report CMU CS 98 136, June 1998.

[56] Daniel Terdiman. ’Second Life’: Don’t worry, we can scale. http: //news.cnet.com/Second-Life-Dont-worry,-we-can-scale/2100-1043_ 3-6080186.html?tag=nefd.lede.

[57] Jeff Terrace and Michael J. Freedman. Object storage on CRAQ: High- throughput chain replication for read-mostly workloads. In Proc. USENIX An- nual Technical Conference, 2009.

[58] M. Vellon, K. Marple, D. Mitchell, and S. Drucker. The architecture of a dis- tributed virtual worlds system. Technical report, Microsoft Research, 1998.

[59] Jim Waldo. Scaling in games & virtual worlds. ACM Queue, 6(7), 2008.

[60] David Wheeler. SLOCcount. http://www.dwheeler.com/sloccount/, 2009.

[61] Bill Wisner. A brief history of muds. http://groups.google.com/group/alt. mud/msg/a0c1c5d5c4a66eba, 1990.

[62] Xinyu Zhuang, Ashwin Bharambe, Jeffrey Pang, and Srinivasan Seshan. Player dynamics in massively multiplayer online games. http://reports-archive. adm.cs.cmu.edu/anon/2007/CMU-CS-07-158.pdf, 2007.