On The Scalability and Security of Distributed Multiplayer Online Games
Seyed Amir Yahyavi Firouz Abadi
Doctor of Philosophy
School of Computer Science McGill University Montreal, Quebec, Canada
April 2014
A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy
c Seyed Amir Yahyavi Firouz Abadi, 2014 Abstract
Multiplayer Online Games (MOGs) are an extremely popular online technology, one that produces billions of dollars in revenues. Yet, providing scalable, fast, and cheat-resistant games is challenging. In this thesis we propose solutions to some of these challenges.
The underlying architecture plays a crucial role in enabling the games to meet the scala- bility, response time, and low cost requirements that are of utmost importance in designing a successful massively multiplayer online game. Peer-to-peer architectures where the game runs exclusively on client machines, and hybrid approaches, where some tasks run on the clients while a central server controls other aspects, have, due to their distributed and col- laborative nature, low infrastructure costs, and can achieve high scalability. They can also achieve fast response times by creating direct connections between players. However, they introduce their own challenges such as cheating. As one contribution of this thesis, we provide a comprehensive overview of current peer-to-peer and hybrid solutions for mas- sively multiplayer games using a uniform terminology. Most of the solutions we studied fail to meet one of the following main requirements: (1) fast response time, (2) scalability, (3) cheat-resistance. This thesis makes several proposals that address these requirements: Watchmen, mobile security, AntReckoning, and AntAI.
Watchmen is the first distributed scalable protocol designed with cheat detection and prevention in mind that supports fast paced games. It is based on a randomized dynamic proxy scheme for both the dissemination and verification of actions. We show that Watch- men, while scaling to hundreds of players and meeting the tight latency requirements of first person shooter games, is able to significantly reduce opportunities to cheat, even in
i the presence of collusion. In the context of cheating we also look at the fast rising genre of mobile games. We discuss existing and potential new avenues of cheating that can occur in such mobile environment, and suggest several solutions to improve security based on ex- isting and novel services that are or can be offered by carriers, ecosystems, and developers.
AntReckoning is a dead-reckoning algorithm, inspired from ant colonies, which mod- els players’ interests to predict their movements. It helps our architecture by allowing us to send updates at lower frequencies and with more accurate predictions which in turn im- proves the scalability and the quality of experience of the players. AntReckoning incorpo- rates a player’s interest in specific locations, objects, and avatars in the equations of motion in the form of attraction forces. In practice, these points of interest generate pheromones, which spread and fade in the game world, and are a source of attraction. AntReckoning greatly improves the accuracy of traditional dead reckoning techniques and can decrease the upload bandwidth by up to a third of overall traffic. Interest modeling can also be used to improve the performance of non-player characters in games. In our AntAI project, we detail how movements, interactions, use of items, and relying on static decision-making schemes result in markedly different behaviors from humans in a popular fist-person shooter game. From there, we propose a framework relying on our pheromone maps, which can lead to a more adaptive human-like behavior.
Altogether, this thesis presents components that we believe are key elements for a scal- able, cheat-resistant, and fast multiplayer online game architecture.
ii Abrégé
Les jeux multijoueurs en ligne (MOG) constituent à l’heure actuelle une technologie en ligne très populaire qui génère des milliards de dollars en revenus. En dépit de cela, offrir des jeux qui sont à la fois extensibles, rapides et résistants à la tricherie est toujours un défi. Dans cette thèse, nous proposons des solutions à certains de ces défis.
L’architecture sous-jacente joue un rôle crucial lorsque l’on souhaite développer des jeux qui rencontrent des contraintes d’extensibilité, de faible temps de réponse et de faible coût. Ces contraintes sont d’une importance capitale si l’on souhaite que le jeu massivement multijoueurs en ligne ainsi développé devienne un succès. Les architectures pair-à-pair, où le jeu s’exécute uniquement sur les machines des clients (joueurs), ainsi que les approches hybrides, où certaines tâches du jeu s’exécutent sur les machines des clients tandis qu’un serveur central gère d’autres aspects du jeu, permettent, de par leur nature distribuée et col- laborative, d’atteindre une haute extensibilité tout en minimisant les coûts d’infrastructure. Ces architectures permettent également d’atteindre des temps de réponse élevés en étab- lissant des connexions directes entre les joueurs. Cependant, elles introduisent également certains défis à résoudre tels que la possibilité de tricher. En tant que contribution à cette thèse, nous exposons une vue d’ensemble exhaustive des solutions pair-à-pair et hybrides courantes pour les jeux massivement multijoueurs en ligne utilisant une technologie uni- forme.
La plupart des solutions que nous avons étudié ne respectent pas l’une des exigences principales suivantes: (1) temps de réponse rapide, (2) extensibilité, (3) résistance à la tricherie. Cette thèse décrit les différents projets sur lesquels nous avons travaillé qui visent
iii à mettre en oeuvre ces exigences: Watchmen, sécurité mobile (mobile security), AntReck- oning et AntAI.
Watchmen est le premier protocole extensible et distribué conçu spécifiquement pour la détection et la prévention de la tricherie qui prend en charge les jeux à rythme rapide (fast-paced games). Ce protocole est basé sur un schéma d’assignation dynamique aléa- toire de noeuds-relais (“proxie”) pour effectuer tant la dissémination que la vérification des actions. Nous démontrons que Watchmen est en mesure de réduire de manière significative les possibilités de tricherie, même en présence de collusion et ce, tout en prenant en charge des centaines de joueurs et en rencontrant les exigences strictes de latence imposées par les jeux de tirs à la première personne. Dans le contexte de la tricherie, nous nous attardons également aux jeux mobiles. Nous discutons des possibilités existantes et des nouvelles avenues potentielles de tricherie qui peuvent survenir dans des environnements mobiles et nous suggérons différentes solutions pour améliorer la sécurité. Ces solutions sont basées sur des services existants ou nouveaux qui sont ou qui pourrait être offerts par les opéra- teurs, écosystèmes et développeurs.
AntReckonning est un algorithme de navigation à l’estime (dead-reckoning) inspiré des colonies de fourmis qui modèle les intérêts des joueurs pour prédire leurs mouvements. Ce modèle est bénéfique pour notre architecture puisqu’il nous permet de transmettre les mises à jour à des fréquences inférieures et avec des prédictions plus précises, ce qui per- met d’améliorer l’extensibilité et la qualité de l’expérience des joueurs. AntReckoning in- corpore l’intérêt des joueurs pour des emplacements, objets et avatars spécifiques au sein d’équations de mouvement en tant que forces d’attraction. En pratique, ces points d’intérêt génèrent des phéromones qui se diffusent et s’estompent dans le monde virtuel du jeu et qui sont des sources d’attraction. AntReckoning améliore de manière significative la précision des techniques de navigation à l’estime traditionnelles et peut réduire la bande passante de téléversement (upload) jusqu’à concurrence de un tiers du trafic total.
iv Contributions
The complete list of our contributions can be found in the publications section. Overall, seven papers were published out of which two of the papers complement our earlier publi- cations. I am the first author on all of the following works.
• [YK13] A. Yahyavi and B. Kemme. Peer-to-peer architectures for massively multi- player online games: A survey. ACM Computing Surveys (CSUR), 46(1):9:1–9:51, ACM, 2013. In this paper, we have the most comprehensive look to date at the requirements and different design aspects of the massively multiplayer online games (MMOGs). We present extensive background on the design of these environment and assemble a comprehensive taxonomy of important issues which includes object replication, in- formation dissemination, networking structure, interest management, fault tolerance, consistency control, persistence, security and cheating, industry solutions, incentive mechanisms, and more. We provide an in-depth analysis of each one of these aspects and how they affect performance, quality of experience, cost of design, scalability, and other important issues in the design of such games. This work studies and covers around 200 papers and research projects in this area. This work was done by me, with the guidance of my advisor.
• [YHGS`13] A. Yahyavi, K. Huguenin, J. Gascon–Samson, J. Kienzle, and B. Kemme. Watchmen: Scalable cheat-resistant support for distributed multiplayer online games. In Proceedings of International Conference on Distributed Computing Systems (ICDCS).
v pages 134–144, ACM, 2013. [Our earlier work was published in ACM SIGCOMM NetGames 2011 [HYK11]] In this paper we propose the first scalable and cheat-resistant distributed middleware for first person shooter (FPS) games. Other architectures offered before either com- pletely ignore the problem of cheating, are not scalable enough, or are not able to meet the tight latency requirements or high bandwidth requirements of these games. Our architecture can offer reasonable security by using prevention and cross veri- fication techniques while being able to support hundreds of players when standard broadband bandwidth is available to players. It achieves this even in the presence of hundreds of players in the same area of interest. We have a comprehensive look at the security implications of the design, and provide a detailed evaluation using a popular FPS game instead of simulations. Kévin and I worked on the initial ideas and implementation which was published as a poster paper in NetGames. Later, I completed the work, and Julien helped with debugging and improving the code, and running experiments. The work was done under the guidance of our advisors.
• [YHK12] A. Yahyavi, K. Huguenin, and B. Kemme. Interest modeling in games: The case of dead reckoning. Multimedia Systems (MMSJ), pages 1–16, Springer, 2012. [Our earlier work was published in ACM SIGCOMM NetGames 2011 [YHK11]] While many different generic machine learning and prediction models exist, our ap- proach is the first attempt to model players’ interests inside the game and to use that information in predicting their movement and actions. The significance of this work is that with more and more players joining large scale games, the game engines have to rely on sending updates at different rates to different players and then rely on compensation techniques to improve the quality of experience of players that receive updates at lower rates or experience lag, jitter, or loss. We explore a new simplified and low overhead way to model interesting items in the games and their influence on players’ decisions by using pheromone maps. This practical and low overhead approach is able to improve prediction quality by up to 44% and save network band- width usage by up to 32% compared to standard dead-reckoning techniques. This work was done by me with guidance from my advisor. Kévin contributed to writing the paper.
vi • [YPK13] A. Yahyavi, J. Pang, and B. Kemme. Towards providing security for mo- bile games. In ACM MobiCom International Workshop on Mobility in the Evolving Internet Architecture, MobiArch ’13, pages 47–52. ACM, 2013. In this paper, we compare the threats in mobile game environments with existing console and pc gaming environments. We discuss how these threats are different and what new kinds of threats can emerge. We also discuss new security mechanisms that can help prevent or detect these new threats. Jeff contributed by providing additional information on mobile platforms. The project was done with the guidance of my advisor.
• [YTVK13] A. Yahyavi, J. Tremblay, C. Verbrugge, and B. Kemme. Towards the design of a human-like FPS NPC using pheromone maps. In IEEE International Games Innovation Conference (IGIC), pages 275–282, IEEE, 2013. We develop new metrics to quantitatively measure the differences in human vs NPC players’ behavior in the game. We provide these measurements for the game Quake III as a sample architecture. Furthermore, we propose a platform based on our pheromone maps that can help improve the behavior of the NPCs. Jonathan contributed to writ- ing the paper and related work along with the guidance of our advisors.
vii Acknowledgements
Foremost, I would like to thank my advisor Bettina Kemme for many years of advice and guidance. Her support and contributions have been invaluable to my progress. I thank her for all the effort, enthusiasm, and personal time she spent on my work and for giving me the opportunity to work freely and be creative in my work.
I am grateful as well to Jörg Kienzle and Clark Verbrugge for their insightful comments, advices and for being part of my Ph.D. progress committee. Also, I would like to thank my friends and colleagues Samf, Kasra, Neda, Mitra, Kévin, Julien, and my labmates who it has been an honor knowing them and working with them.
My parents, Shahrzad and Aziz, have been a constant source of moral support and affection throughout my life as well as my sister Noushin and this thesis would certainly not have existed without them.
Last but not least, I want to express my deep appreciation and gratitude to my wife Laleh for her continuous support, patience and encouragement during all phases of this thesis.
viii Contents
1 Introduction 1
I Distributed Architectures for Multiplayer Online Games: A Sur- vey 5
2 Multiplayer Online Games: Background, Design Issues, and Proposed Solu- tions 6 2.1 Motivation ...... 6 2.2 Game Design Principles ...... 9 2.2.1 Object Types ...... 9 2.2.2 Player Interactions ...... 9 2.2.3 Object Replication ...... 10 2.2.4 Game Types and Latency Tolerance ...... 11 2.2.5 Bucket Synchronization & Frame-rate ...... 11 2.2.6 Bandwidth Requirements ...... 12 2.2.7 Interest Management ...... 13 2.2.8 Consistency Control ...... 16 2.3 MOG Architectures ...... 19 2.3.1 Client-Server Architecture ...... 19 2.3.2 Distributed Multi-Server Architecture ...... 20 2.3.3 Peer-to-Peer (P2P) Architecture ...... 21 2.3.4 Characteristics and Comparison ...... 21 2.3.5 Hybrid Architectures ...... 24 2.3.6 Heterogeneous and homogeneous P2P systems ...... 25 2.3.7 Other Functionalities ...... 26
ix 2.4 P2P Architectures ...... 26 2.4.1 Introduction ...... 26 2.4.2 Structured P2P Game Architectures ...... 29 2.4.3 Unstructured P2P Game Architectures ...... 32 2.4.4 Hybrid P2P Architectures ...... 34 2.5 Communications & Multicast ...... 35 2.5.1 Direct Communication ...... 36 2.5.2 Multicast Trees ...... 38 2.5.3 NAT & Firewalls ...... 40 2.6 Interest Management ...... 41 2.6.1 Structured Architectures ...... 41 2.6.2 Unstructured Architectures ...... 42 2.6.3 Challenges ...... 43 2.7 Replication & Consistency Control in P2P Architectures ...... 46 2.7.1 Replication Management ...... 46 2.7.2 Consistency Control ...... 48 2.8 Fault Tolerance & Persistence ...... 50 2.8.1 Fault Tolerance ...... 50 2.8.2 Persistence ...... 52 2.9 Cheating ...... 54 2.9.1 Definition ...... 55 2.9.2 Cheating Categories ...... 55 2.9.3 Cheating Prevention ...... 59 2.9.4 Cheating Detection ...... 61 2.9.5 Reputation Systems & Penalization ...... 64 2.10 Commercial Adoption ...... 66 2.10.1 Middlewares ...... 66 2.10.2 Industry Solutions ...... 66 2.10.3 Industry Models ...... 68 2.10.4 Client Incentives ...... 69 2.10.5 Other applications ...... 70 2.11 Survey Conclusions ...... 71
x II Cheat-Resistant Support For Multiplayer Online Games 73
3 Watchmen: A Scalable Cheat-Resistant Peer-to-Peer Architecture For Fast- Paced Multiplayer Online Games 74 3.1 Motivation for Watchmen ...... 75 3.2 Watchmen Background ...... 77 3.2.1 Interest Filtering in Quake III ...... 78 3.3 The Watchmen Architecture ...... 79 3.3.1 Subscription Model ...... 81 3.3.2 Proxy architecture ...... 83 3.4 Security Aspects of the Proxy Architecture ...... 84 3.5 Verifications, reputation and punishment ...... 86 3.5.1 Verification ...... 86 3.5.2 Reputation & Punishment ...... 88 3.6 Watchmen Performance Characteristics ...... 89 3.7 Watchmen Evaluation ...... 91 3.8 Watchmen Related Work & Compariosn ...... 98 3.9 Watchmen Conclusions ...... 99
4 Towards Providing Security For Mobile Games 101 4.1 Introduction to Security in Mobile Games ...... 101 4.2 Cheating in the Mobile Environment ...... 102 4.2.1 Existing Cheating Mechanisms ...... 103 4.2.2 Existing Security Mechanisms ...... 105 4.2.3 New Gaming & Cheating Mechanisms ...... 106 4.3 Trust Model & Design Principles in Mobile Games ...... 107 4.3.1 Trust Model ...... 107 4.3.2 Design Principles ...... 108 4.4 Security Architecture for Mobile Games ...... 109 4.4.1 New Security Mechanisms ...... 109 4.4.2 Location cheating ...... 110 4.4.3 Fake Sensor Readings ...... 111
xi 4.4.4 Disruption of Information Flow ...... 113 4.5 Mobile Game Security Conclusions ...... 114
III Interest Modeling Using Pheromone Maps 116
5 Interest Modeling in Games: The Case of Dead Reckoning 117 5.1 Introduction to AntReckoning ...... 118 5.2 Background on Dead-Reckoning ...... 119 5.3 Motivation and Design Rationale for AntReckoning ...... 122 5.4 The AntReckoning Algorithm ...... 126 5.5 Parametrization and Implementation ...... 132 5.5.1 Parametrization ...... 132 5.5.2 Discussion & Implementation ...... 135 5.6 AntReckoning Evaluation ...... 138 5.6.1 Experimental setup ...... 138 5.6.2 Sensitivity analysis ...... 140 5.6.3 Performance evaluation ...... 145 5.7 Discussion ...... 147 5.8 Related Work to AntReckoning ...... 148 5.9 AntReckoning Conclusions ...... 149
6 Towards the Design of a Human-Like FPS NPC using Pheromone Maps 151 6.1 Introduction to AntAI ...... 151 6.2 Existing NPC Technology and Game Adaptivity ...... 153 6.2.1 Decision Making: ...... 154 6.2.2 Game Adaptivity: ...... 155 6.3 NPC Performance Metrics ...... 156 6.4 Quake III Analysis ...... 160 6.4.1 Bot Detection ...... 163 6.5 AntAI Algorithm ...... 164 6.5.1 Decision Making ...... 165
xii 6.6 AntAI Conclusions ...... 167
IV Final Conclusion & Future Work 168
7 Final Conclusion & Future Work 169 7.0.1 Final Conclusion ...... 169 7.0.2 Future work ...... 170
List of Publications 174
Bibliography 175
Acronyms 201
xiii List of Figures
2.1 Multiplayer game components ...... 10 2.2 Different game zoning mechanisms ...... 14 2.3 Different gaming architectures ...... 21 2.4 Routing in Pastry and Mercury ...... 30 2.5 pSense and N-Tree architectures ...... 32 2.6 VON Voronoi, Solipsis(’03) convex hull, and Solipsis(’08) Raynet overlays 43
3.1 Heatmap of player positions in a Quake III ...... 79 3.2 Watchmensubscription-types and corresponding areas ...... 80 3.3 Watchmenproxy architecture ...... 84 3.4 Information exposure ...... 92 3.5 Witness information level ...... 94 3.6 Cheat detection success ...... 95 3.7 Update latency ...... 95 3.8 Number of IS and VS subscriptions ...... 96 3.9 Upload bandwidth projection ...... 97
5.1 Screenshot of a Quake III game environment ...... 121 5.2 The q3dm01 map from Quake III ...... 123 5.3 Presence of players in the q3dm01 map from Quake III ...... 124 5.4 Effect of player interactions on their behavior ...... 125 5.5 Overview of AntReckoning ...... 128 5.6 Illustrative example of the evolution of the concentration of pheromone . . 130 5.7 Prediction correction ...... 131 5.8 Illustration of the metric used in the evaluation of AntReckoning ...... 139
xiv 5.9 Experimental sensitivity analysis of AntReckoning ...... 142 5.10 Effect of visibility and post-processing on AntReckoning ...... 144 5.11 Performance evaluation of AntReckoning ...... 146
6.1 Experimental sensitivity analysis of AntAI ...... 157 6.2 Player movements during firefight ...... 163 6.3 Overview of AntAI ...... 164
xv List of Tables
2.1 Comparison of different architectures ...... 24
2.2 Comparison of representatives of different P2P architectures ...... 28
3.1 Popular cheating mechanisms in distributed multiplayer games ...... 81
4.1 Popular cheating mechanisms in distributed games ...... 104
4.2 Suggested API for securing game services ...... 106
5.1 Important parameters in AntReckoning...... 134
6.1 Differences in weapon usage between Players and NPCs ...... 160
xvi 1 Introduction
Video games have been the fastest-growing form of media over the past few years, with sales rising to $111 billion by 2015 [Gar13]. Video games have traditionally been mostly a social experience. Multiplayer video games are played competitively or cooperatively by using either multiple input devices, or through the network. Tennis for Two, arguably the first video game, was a two player game.
Early networked games were generally limited text based adventures or MUDs (Multi- User Dungeon) that were played remotely on a dedicated server. This was the result of both the slow speed connection (300–1200 bit/s), and the high cost. However, due to improve- ments in speed and latency of connections, the number of players in modern fast-paced multiplayer online games (MOGs) can be 32 or higher, while featuring enhanced graphics with integrated text and/or voice chat. MMOGs (massively multiplayer online games) can support an even higher number of simultaneous players; in 2013, Eve Online hit a record for the maximum number of simultaneous pilots online with 65,303 concurrent accounts logged on to the same server [EVE13].
However, most multiplayer games, in order to support more players, use a distributed architecture. These systems rely on multiple servers, peer-to-peer, or hybrid architectures to provide scalability. But this requires more complicated systems where some responsibilities are distributed among many nodes. As a result it brings up a range of new issues that will need to be addressed such as: object management, concurrency and consistency control, multicast, security and cheating, fault tolerance, availability, and persistence. Distributed
1 Introduction systems require a great deal of coordination and are less secure. Success of massively multiplayer online games, in particular, relies on scalability and fairness of the architecture. Scalability is necessary for the viability of the game while fairness is required for wide adoption by players.
Distributed architectures provide scalability by distributing processing or networking overheads of the game among servers and/or players. This is typically done by offloading the game objects to other servers or players or by relying on them to disseminate network traffic produced by the game. Some tasks such as verification of player actions can be delegated as well. However, this makes the system vulnerable to cheating.
Cheating essentially consists in gaining an unfair advantage and comes mostly in the following forms: disrupting the game state computation and dissemination, performing illegal actions, and gaining access to sensitive information [YR05, WS07]. In centralized games, cheat detection and prevention are achieved by making the server verify the players’ actions, ensure synchronization and reduce the information sent to players to the minimal amount required to render the game world [BLL07]. In decentralized games, however, detection of cheating is more difficult given the natural trade-offs between responsiveness, scalability, verification and information disclosure together with the issues of trust and collusion.
The goal of this thesis is to design a distributed, scalable, fast, and cheat resistant archi- tecture for multiplayer online games. To do that, we first provide a detailed survey of the existing distributed architectures for massively multiplayer online games (with a focus on peer-to-peer systems) in Chapter 2. A good study of this subject has been lacking, there- fore, we decided to do a comprehensive literature review. We detail many of the design issues that these systems face and provide an overview of different solutions offered so far. We categorize these solutions based on their design, and detail and compare their various aspects such as security, replication, consistency control, etc. While Chapter 2 discusses the background necessary for the thesis at the level of a general, broad survey, we further introduce additional background in the other chapters whenever more detailed descriptions are necessary to understand the particular topic of these chapters.
The remainder of the thesis presents new contributions to the field of scalable and cheat-
2 Introduction resistant distributed game architectures, and can be roughly split into two main themes. The first theme addresses cheating in distributed architectures, and our contributions are presented in Chapters 3 and 4. The second theme is to exploit ant-colony techniques to improve game performance and game play overall, in particular, to lower bandwidth use, increase scalability, and improve the quality of experience of players as well as reduce the amount of inconsistency they perceive. This theme covers Chapters 5 and 6.
More precisely, Chapter 3 proposes a scalable, cheat-resistant peer-to-peer architec- ture for multiplayer games. Common decentralized approaches to deal with cheating in- clude mutual verification and auditing, agreement protocols (e.g., lockstep [BLL07]), and position-based information filtering. However, most of the proposed approaches have one of the following drawbacks: they (1) rely on a central server or trusted third parties [WSL07]; (2) do not deal with collusion; (3) detect cheaters a posteriori [CFFS05]; (4) fail to provide responsiveness and scalability [BLL07].
In order to develop an approach that avoids these pitfalls we analyze the cheating op- portunities created by mechanisms commonly used in several types of peer-to-peer multi- player games, with a special focus on a distributed version of Quake III. From there, we propose Watchmen which uses the following techniques to prevent and detect cheating:
‚ Vision-based filtering and indirect communication are used to reduce the information available at each player close to the minimum amount necessary to render the game world; ‚ Players are assigned a proxy player in charge of update dissemination and action veri- fication. Proxies are chosen at random and dynamically renewed to limit the impact of collusion while allowing on-the-fly mid-term verifications. ‚ Verifications can lead to the emission of blames which can be used to directly take actions against cheaters.
In Chapter 4, we continue to look at cheating but move our attention to mobile games. We first discuss how these games are different to standard PC and console multiplayer games. Then, we study how cheating is different in the emerging mobile games makes, and present several new kinds of cheats that can be exploited due to the mobile environment. Based on this analysis, we propose new mechanisms to address these new types of cheating.
The work in Chapter 5 is motivated by the need to control the amount of messages
3 Introduction sent by distributed games and the need to appropriately handle the unavoidable delay in message transmissions. In interactive games, position update messages account for the bulk of the network traffic [KLXH04], As a result, techniques that predict player movements to reduce the update rate while keeping the error on player positions low are utilized. These techniques also help in coping with message loss or delay by extrapolating the new position when the new update is not received in time. Lowering the number of updates sent by the game has a critical impact on the scalability of the system.
We argue that key factors in an avatar’s motion are not only inertia but also the objec- tives of the game as well as entities in his vicinity that we refer to as points of interest. Following this line of reasoning, we propose AntReckoning. To the best of our knowledge it is the first interest-based approach to dead reckoning. The main concepts involved in AntReckoning are as follows:
‚ Each entity is assigned a given attractiveness leading to the generation of pheromones that fade and spread in the world; ‚ Pheromones in the vicinity of an avatar exert attraction on it. Attraction is integrated in the equations of motion, under the form of forces, to estimate its future position.
The main contributions of AntReckoning are: (1) to incorporate players’ interests into the equations of motion used for dead reckoning, and (2) to use pheromones to model such interests, taking temporal and spatial aspects into account. Moreover, pheromones offer a practical solution to the decentralized implementation of interest-based dead reckoning.
Finally, Chapter 6 analyzes how current non-player characters in the games can greatly differ from the human players. Using interest-modeling as introduced in Chapter 5, we propose our AntAI solution, and discuss how the quality of these non-player characters can be improved.
Overall, the contributions that we present in Chapters 3-6 offer scalability, cheat-resistance and overall better performance to multiplayer games.
4 Part I
Distributed Architectures for Multiplayer Online Games: A Survey
5 2 Multiplayer Online Games: Background, Design Issues, and Proposed Solutions
Scalability, fast response time, and low cost are of utmost importance in designing a suc- cessful multiplayer online game. The underlying architecture plays an important role in meeting these conditions. Peer-to-peer and hybrid architectures, due to their distributed and collaborative nature, have low infrastructure costs and can achieve high scalability. They can also achieve fast response times by creating direct connections between play- ers. However, these architectures face many challenges. Distributing a game among peers makes maintaining control over the game more complex. Peer-to-peer architectures also tend to be vulnerable to churn and cheating. Moreover, different genres of games have dif- ferent requirements that should be met by the underlying architecture, rendering the task of designing a general purpose architecture harder. Many peer-to-peer gaming solutions have been proposed that utilize a range of techniques while using somewhat different and con- fusing terminologies. This chapter of thesis presents a comprehensive overview of current peer-to-peer solutions for multiplayer games using a uniform terminology.
2.1 Motivation
Multiplayer Online Games (MOGs) and Massively Multiplayer Online Games (MMOGs) are among popular online technologies that produce billions of dollars in revenues and in- troduce several new and interesting challenges. One of the main attractions of these games
6 2.1 Motivation lies in the number of players that participate in the game. The more players that are in the game world, the more interactive, complex, and attractive the game environment will become. Successful games such as World of Warcraft with nearly 12 million subscriptions [Bli11] have to provide a truly scalable game world while maintaining responsiveness.
To better understand MOGs, we first need to give a definition. A video game is usually defined as an electronic game that is played by a controller and provides user interactions by generating visual feedbacks. A multiplayer game is a game played by several players. Players can be simply independent opponents or they can play in teams. They can play against each other or can play against the game, i.e., opponents that are controlled using Artificial Intelligence (AI). A MMOG is a game capable of supporting hundreds or thou- sands of players and is mostly played using the Internet. Many games such as World of Warcraft [Bli11], EVE Online [EVE11], and Final Fantasy XI [FFX11] have shown that MMOGs are a thriving business industry. For example, Star Wars: The Old Republic was able to achieve one million subscribers in three days after launch1. Second Life [Sec11], launched in 2003 by Linden Lab, is the most famous social virtual world with more than 16 million registered users. The emergence of social games (such as Farmville and Mafia Wars 2) with millions of subscribers [Zyn11] as well as mobile games that are played on smart- phones, and the popularity of handheld devices such as Sony PSP [PSP11] and Nintendo DS [Nin11], lay the foundation for potential integration of social and mobile environments into massively multiplayer games [ILc10, VV10].
MOGs can produce huge network traffic and processing loads [SM12, CHHL05]. Thus, the main challenges in MOGs are scalability, i.e. providing support for thousands of players simultaneously, consistency, security, and fast response time – usually all at the same time, otherwise customer satisfaction would be reduced. In the next sections we discuss these challenges and different solutions that have been proposed.
Client-server systems, where game execution and game state dissemination is com- pletely controlled by the server, are currently the prevalent game architecture. However, peer-to-peer architectures can be beneficial for gaming infrastructures in several ways. If
1http://www.swtor.com/news/press-release/20111223 2Zynga: 232 Million Monthly Players http://secfilings.com/searchresultswide. aspx?link=1&filingid=8022980
7 2.1 Motivation client nodes communicate directly with each other or perform part of the game state com- putation, server requirements in terms of computational power and network bandwidth can be significantly reduced. Even if the game execution remains completely controlled by servers, peer-to-peer technology can be used to coordinate multiple servers, such as maintaining distributed game state execution and management of server farms and fed- erated servers [CWD`05, IHK04, ASO09]. Cloud-based game streaming services based on content distribution networks [OnL11, Gai11] can also benefit from these architectures [CHJC10]. They can provide 3D streaming services where similar to audio or video me- dia streaming, 3D content is fragmented into pieces at a server, before it is transmitted, reconstructed, and displayed at the clients [WHT09].
Peer-to-peer architectures have received a great deal of research attention in the re- cent past as they distribute computational and network load among peers, can potentially achieve high scalability, low cost, and good performance as will be discussed further. While most of our discussions are focused on multiplayer games, in particular MOGs, many of these architectures can also be applied to other distributed systems such as distributed sim- ulation environments [Fuj00], virtual worlds such as Second Life [Sec11], and other net- worked virtual environments [BHML92, KAM04, KS03].
The remainder of this chapter is structured as follows. In Section 2.2 we study com- mon game design principles used in most multiplayer games, and in particular MOGs. In Section 2.3 we study and compare different architectures proposed for MOGs. Next, we present in detail issues related to structure (Section 2.4), update dissemination (Section 2.5), interest management (Section 2.6), replication and consistency (Section 2.7), fault tolerance, availability, and persistence (Section 2.8) in peer-to-peer based MOG solutions. Section 2.9 discusses cheating. We explain how games, in particular P2P architectures, are affected by cheating and discuss some of the security measures proposed for P2P-based games. In Section 2.10 we study different incentives for using P2P architectures by con- sumers and the industry. Furthermore, we discuss various applications and adoption models of P2P-based gaming architectures. Section 2.11 concludes the chapter.
8 2.2 Game Design Principles
2.2 Game Design Principles
Before getting into peer-to-peer architectures we first explain general concepts involved in designing a multiplayer game whether using client-server architectures or P2P. An overview of different components of a sample multiplayer game framework is shown in Figure 2.1(a). Here, we discuss the general concepts and execution patterns used in most multiplayer on- line games.
2.2.1 Object Types
In modern video games, the game world is usually made up of four types of components [KLXH04]: (1) immutable objects, such as landscape or terrain information, are usually designed and created offline and never change during the game. These objects are typically installed at the client and are initialized at the start of the game. (2) characters or avatars in the game world that are controlled by the player using an input device. The avatar has a position in the game world and is usually allowed three types of actions: player updates, player-object interaction, and player-player interaction. (3) mutable objects such as food, weapons, and tools that can be modified. For instance, players can interact with and/or use them in their interaction with other players. (4) Non-Player Characters (NPCs), also called bots, are characters or avatars that are not controlled by a player but are usually controlled using AI. In the rest of this chapter, unless otherwise stated, game objects refer to avatars, mutable objects and NPCs in the game. Object types are shown in Figure 2.1(b).
2.2.2 Player Interactions
Player interactions are typically divided into three categories: Player updates, player-object interactions, and player-player interactions [KLXH04, BPS06]. Player updates are inter- actions with the game world that only affect the player himself. Position updates and graph- ical updates to the player’s avatar are examples of player updates. In a simple and unopti- mized implementation, a large proportion of all player interactions can be position updates [KLXH04]. Player-object interactions are the interactions between a player and mutable objects in the game world. For example, picking up a health pack (adding it to the inven- tory) or consuming it are examples of player-object interaction. Player-player interactions
9 2.2 Game Design Principles
Game Logic Manager Replication Game View Manager Client Manager Engine Controller Interaction Replication Player 1 Collision Pathfinding NPC Graphical Client NPC 1 Manager Detection Manager Manager Dead Visibility Consistency Reckoning Manager World Engine & Local IM Controller Immutable Object Voice Partitioning Item Interest Manager Manager Item 2 Strategy Manager Player 2 P1 Monitoring Network Persistence NPC 2 Interaction & Score Manager Engine Manager & Logging Item 1
(a) Multiplayer game components (b) Object types and interactions Figure 2.1: (a) Different components of a multiplayer game, adapted from Mammoth [KVB`09] (adapted, c ACM 2009), a massively multiplayer game research framework, are presented. The components discussed in this section are highlighted. (b) Different game objects and their interactions. Players can interact with each other, objects and NPCs. are interactions between a player and other players in the game world. For example, at- tacking another player could decrease the other player’s health and increase the experience points for the attacker. Player interactions with NPCs, based on the game design, can be considered either player-object or player-player interactions.
The type of interaction is important when dealing with consistency issues that arise from concurrent and conflicting updates to the same object. Also, most security and cheat prevention techniques only apply to certain types of interactions.
2.2.3 Object Replication
When a player joins a game, he receives an instance of the game world (it can be a limited view of the game world) that is made up of various types of game objects. Most game engines follow a primary-copy replication approach. For each object and character there exists an authoritative copy, called primary or master copy. All other copies are secondary copies (also called replicas). Each player has, stored on his computer, copies of game ob- jects which are of interest to the player. Any update to the object has to be first performed on the primary copy. How primary and secondary copies are distributed (e.g., primary copies might always reside on the server or might also be held by clients) depends on the game architecture. If a player wants to perform an update on an object for which he does not have the primary-copy, he has to send the update to the primary copy. The holder of the primary
10 2.2 Game Design Principles copy decides whether to accept the update or not, and then sends the updated object to everyone that has a secondary copy, where the changes are applied.
The update dissemination mechanism is quite similar to publish-subscribe systems that have been widely studied [CDKR02]. Every replica becomes a subscriber to the primary copy of the object and receives publications (updates) from the primary copy (the pub- lisher).
2.2.4 Game Types and Latency Tolerance
We refer to the latency as the delay between execution of an update at the primary copy of an object and the replica receiving the object update. This latency is dependent on the architectural design and networking delays as will be discussed.
Various types of multiplayer and massively multiplayer games exist. In Real Time Strat- egy (RTS) and Role Playing Games (RPGs) [SGB`03] the focus is more on game strategy rather than responsiveness. The player tells the avatar to do a possibly complex and long lasting action, e.g., go to a destination, and the avatar performs the requested action. In First Person Shooter (FPS) games, however, the player does short-lived and less complex actions, i.e., the player actually does what he wants the character to do (for example, guides the avatar towards the destination) and as a result, higher responsiveness is required (see [Arm03]) for Quake III latency requirements). Higher latencies than the tolerance thresh- old of the game have an adverse effect on playability of the game and user satisfaction. Based on the game type and design, games typically can tolerate latencies between 100 to 300 milliseconds (ranging from FPS games to RPGs) [Arm03, BPS06, PW02b], however, games with higher latency tolerance exist (see [SM12]). Latency tolerance requirements have a dramatic effect on the architecture design for games. An architecture would only be feasible if it meets game latency requirements.
2.2.5 Bucket Synchronization & Frame-rate
Bucket synchronization [LGD98] (also called local lag [MVHE04]) is a method used to deal with latency and is used in most multiplayer games. Since network latency for each client is different and it is common for a primary copy to receive various update requests
11 2.2 Game Design Principles concurrently from different clients, most games deliberately lag behind in executing the events. This allows fairness despite latency variations and more control over update dis- semination costs. This is in essence similar to Nagle’s algorithm in IP networking. Games implement a discrete event loop (may also be referred to as frame) in which all actions (events) that have been submitted since the last execution of the loop, are received, buffered, and then executed. The updates are then sent at the end of the loop. In addition, most game objects, including NPCs, have a think function which is executed in every game loop and determines the actions of the object in this loop. The game loop is usually executed 10 to 20 times every second ; this frequency is sometimes referred to as frame-rate [BDL`08] (not to be confused with graphical frame-rate). Low frame-rates can degrade the game play experience or even render the game unplayable. Note that based on the game design the number of frames shown to the player, i.e., the graphical frame rate, may be equal to or different from this frame-rate.
The lag should be chosen so that it allows enough time for the updates of different clients to be received. This helps ensure fairness for clients that might have worse connec- tivity, and control the number of updates that have to be sent per second. At the same time, the lag has to be small enough not to be perceived by the players. In essence, a trade-off has to be found.
In this thesis, we focus on the impact of architectures on networking, processing delays and the resulting frame rate. We do not consider other delays such as those caused by graphical processing.
2.2.6 Bandwidth Requirements
The bandwidth requirement of MOGs can be calculated based on average message size, update rate, and number of recipients (active players). Games with millions of (active) sub- scribers (e.g., WoW), have high bandwidth requirements due to their dynamic environment (e.g., Second Life ), or high update rates (e.g., Quake) [CHHL05, SM12]. In addition, in or- der to have an acceptable game play, games should also accommodate occasional bursts in the game traffic that can be many times the average requirements. Such bursts happen due to sudden environment changes or battles. Moreover, even inside the same game different
12 2.2 Game Design Principles gaming activities, e.g, raiding vs. trading in WoW, generate different traffics [SSM11] that can lead to different network requirements. Game servers typically deal with this by over provisioning.
2.2.7 Interest Management
Interest management (IM) [BF93, Mor96] is an important mechanism used in many games, typically for scalability reasons. The idea is that players of a multiplayer game have only limited movement and vision capabilities. That is, players can only move a small distance in the game world in a given time interval. The players also have limited sensing capabili- ties, meaning that players can only interact with objects and other players in their vicinity [MGBY99]. As a result, data access in games shows spatial and temporal locality. Utilizing this fact, interest management limits the amount of game state any player can access. That is, the player only receives the game state relevant to it, based on his position and vision in the game world. Interest management is important for scalability, as clients only need repli- cas of objects that are interesting for them, therefore, keeping update and network overhead low. However, it can also be important as part of game semantics or to address other chal- lenges such as cheating. IM plays an important role in how replication and communication between players are managed as will be discussed in the next sections, but first, we explain how it is implemented.
Space-based interest management is based on proximity and follows an aura-nimbus model [BF93]. Aura is the bounding area around the player’s avatar. While nimbus defines the area around the player that is visible to the player, that is, the player can perceive game objects located in this area. Nimbus is also often referred to as Area-of-Interest (AOI) [KLXH04, SSJ`08], Domain of Interest [Mor96], Aura [GB02], or awareness area [KS03]. We mainly use the term AOI in this thesis. A player can typically only interact with objects and players in his AOI, and therefore, only needs to have copies of these objects. As a result, the necessary computational and network requirements are substantially reduced compared to maintaining copies of all objects.
Zoning The most common mechanism for interest management is zoning where the game world is partitioned into smaller parts, called zones or regions, as depicted in Figure 2.2.
13 Items Player & Vision Range Connections
2.2 Game Design Principles Items Player & Vision Range Connections
Items Player & Vision Range Connections Obstacle
(a) Free format (b) Grid (c) Hexagons (d) Triangles
Figure 2.2: Different game zoning mechanisms ([CWD`05] (adapted, c ACM 2005) and [BKV06] (adapted, c ACM 2006))
Zoning approaches differ in the shape of their zones and how the AOI is mapped onto zones. In the simplest approach, the entire AOI resides in the same zone in which the player is located. In some systems, the player can simply interact with all objects in his zone, i.e., his AOI is the entire zone [BC85, LGD98]. A player’s AOI changes only if he moves from one zone to another. In other approaches, the AOI is a sub-area of the entire zone. Often, it is a fixed-radius circle (or sphere) around the player. When the player moves, his AOI moves accordingly. Therefore interest management has to determine for each game object in the zone whether it falls in the current AOI of the player. Figure 2.2 (a) shows an example of this sub-area approach.
More advanced interest management schemes allow the AOI to cover more than one zone. This is important for continuous worlds. Players at the borders of a zone should be able to see and interact with objects that are just across the zone boundary in a neighboring zone. Figures 2.2(b),(c), and (d) show examples where the AOI can cover several zones.
In regard to zone shapes, the game world can simply be split statically into grid cells as shown in Figure 2.2(b). Since this is a straightforward approach, many solutions use it [CWD`05, IHK04, CRd`04]. Hexagonal zoning (Figure 2.2(c)) has also been widely used for game architectures [YV05, MZP`95, JET03] as well as other systems such as cellular networks. Hexagons have uniform orientation and uniform adjacency, meaning that players will always move to an adjacent zone. In addition, since most AOI mechanisms consider a circle, hexagons provide a good approximation.
14 2.2 Game Design Principles
Using triangulation, as shown in Figure 2.2(d), allows for taking obstacles in the game world into account. This can help in reducing the AOI and thus, the number of objects that have to be considered for interest management, leading to less object replicas to be main- tained by the players, and less update messages that need to be sent [BKV06]. Techniques such as Delaunay triangulation have been widely studied. They can be used to triangulate the area inside or around the polygon-shaped obstacle, so that triangles follow the bound- aries of obstacles [BKV06, BA08] as shown in the figure.
Games can also be divided into mini-worlds that have a free format and are connected to each other such as countries in the game world, with portals for moving between them [CWD`05, KLXH04] as in Figure 2.2 (a). This is particularly useful when AOI always resides within a single zone. While mini-worlds do not offer the concept of a continuous world, other approaches do. The zoning is only virtual, done merely for the purpose of interest management, and is not visible to the players.
Determining the right size for zones is challenging. Different games have very differ- ent characteristics, and even inside a game, different parts may require different zoning mechanisms. Therefore, an optimal zoning mechanism for all cases does not exist. A very large zone can result in too many objects in a single zone, making interest management less efficient as any player might only be interested in a small set of these objects. On the other hand, too small a zone may be much smaller than the AOI of players, resulting again in complex interest management between multiple zones. This trade-off is addressed by dynamic interest management schemes.
Limits of AOI & dynamic zoning AOI filtering suffers from specific user behaviors, e.g., flocking, which refers to the movement of many players to one area in the game world [PG07, CWD`05]. This can adversely affect replication management (see 2.7). Game hotspots form since some areas become more interesting or more profitable to the players in terms of experience points, treasures, new quests, or invitations by other players. This results in a large number of players coming to the same area while other areas become less populated. Hotspots can change quickly as players move to new interesting areas as they emerge, making it often difficult to prepare for such changes in advance. Populations in real games often follow a power law distribution [PG07], and the increase in the number of
15 2.2 Game Design Principles players in the area can result in a quadratic increase in the network traffic generated due to the increase in the number of interactions between players [BDL`08]. [VFBD11] find that in Second Life the number of objects per region is roughly constant over a one month period and that the active population at any point of time is between 30,000 and 50,000 avatars, i.e., about 0.3% of the registered avatars. About 30% of the regions are never visited in a six day period and less than 1% of the regions have large peak populations. Avatars tend to or- ganize in small groups of 2–10 avatars. Large groups of avatars are very rare and are driven by the presence of events such as concerts and shows. As a result, game designers may have to use artificial means to discourage players from gathering in the same region. This can prevent certain types of interesting game play such as epic battles [Bli11]. A more com- prehensive study of player movements in massively multiplayer online role playing games (MMORPG) and session patterns is provided in [VFBD11, SDM09, MC10].
Flocking can be partially addressed with dynamic zones. As more players move into a popular zone, the zone can be split. However, if zones become significantly smaller than the typical AOI, splitting does not really help anymore as inter-zone communications become the bottleneck. One possibility then is to shrink the AOI of the players. However, this might lower the game experience for the players. In addition, depending on the game design, AOI can be further broken down into different types [Mor96] but for the purpose of this thesis we use the general term AOI.
2.2.8 Consistency Control
In a distributed architecture like a multiplayer game, concurrent and possibly conflicting updates may be executed at different sites resulting in inconsistent states. Inconsistencies occur due to the execution of parallel and conflicting updates, and consistency mechanisms have to avoid or correct them. For example, if two players shoot a third player, nearly at the same time, all players should see the updates in the same order and only the first shot should be successful at all replicas.
Generally, systems are built such that if all replicas execute all updates in the same order, then all sites will have the same state. However, if messages are received out of order, their execution can result in inconsistent state if the out-of-order messages are causally
16 2.2 Game Design Principles dependent. For example, if a player drinks a healing potion to increase his health points and then, and only based on the increased amount of health, is able to pick up a sword, these actions have to be performed in the same order at all replicas, otherwise inconsistencies might become visible. Other types of inconsistency are caused by loss of updates. Most games, in order to deal with latency issues, use the fast but unreliable UDP messaging protocol where message loss is possible. Solutions are to send some updates with reliable TCP, or to use commit protocols for critical actions, in particular if they change the state of several objects and atomicity is required [MVHE04, BDL`08, BPS06, CF05].
Consistency Definitions Many definitions for consistency exist and consistency control has been studied in other distributed contexts (e.g., shared memory [AG96] or distributed systems [ÖV11]). A very strong form of consistency is achieved if every interaction is treated as a transaction that provides the transactional properties such as isolation and atom- icity. However, providing transactional properties is often costly and might not be feasible for all game actions. For instance, running a two-phase commit protocol to guarantee atom- icity leads to large latencies which could reduce the interactivity of the game significantly. Eventual consistency [TTP`95] is a weaker form of consistency. It allows individual copies of an object to be temporarily inconsistent but eventually consistent; meaning that if up- date activity did cease for sufficiently long time all object copies would eventually have the same state. Generally, there exist well-known trade-offs between performance and consis- tency restrictions, which can also be applied to MOGs. This fact has been well presented in Brewer’s “CAP conjecture” [GL02], indicating that a system can only achieve two out of three properties of consistency, availability, and handling network partitions. As interac- tivity is of essence to the success of most games, they often sacrifice consistency for other goals, and as a result, games usually provide inconsistency resolution instead of inconsis- tency prevention, enforcing not more than eventual consistency [CF05, MVHE04].
Different levels of consistency, implemented by various mechanisms, might be con- sidered for different object and interaction types as not all interactions are of the same importance [ZK11]. For example, many virtual game objects are considered valuable and can be sold or traded for real money, making consistency control a critical requirement. In 2009, micro-transactions generated 250 million dollars in U.S. alone, with the most ex-
17 2.2 Game Design Principles pensive item being 635 thousand dollars3, and are projected to hit 13.6 billion dollars by 2014 worldwide4. In contrast, player position updates have much lower consistency re- quirements, and eventual consistency will be sufficient.
Stale views In the previous sections we have introduced the primary-copy replication model that requires that all updates are first performed at the primary copy. This simpli- fies consistency management as it is not possible to have conflicting updates performed at different copies. Instead all updates are serialized at the primary copy. However, replicas will receive the update changes only some time after they occur at the primary. During this period, replicas are stale. Players observe these state values and might initiate invalid update requests to the primary copy, based on these stale values. Thus, actions submitted by players based on their local state might lead to results that the players did not anticipate. Ideally, the primary copy directly sends updates to all replicas in order to minimize stale- ness. However, this is expensive and other methods may be employed that may increase the latency and staleness experienced by the replicas. For instance, in Second Life, in a region crowded with virtual objects, avatars have an inconsistent view of their neighbor avatars half of the time, meaning either they do not see them or they see them at a wrong location. Moreover, in 50% of the cases, this inconsistency lasts more than one second [VFBD11].
Consistency Techniques Games use several techniques to hide staleness and inconsis- tencies due to the latency of update propagation. These techniques can also be used to hide message loss. They typically fall under the two categories of predictive contract mech- anisms (PCM) [IEE95] and multi-resolution simulation [HNP97] and are often used to- gether. Dead-reckoning is a common form of PCM [PW02a][GDK99]. It was originally designed in the nautical and aviation domain to calculate the current position of the plane based on the previous position and the motion vector. It is used in games to calculate the position of a player in the upcoming frame based on his previous location and his speed and direction of movement. It can also be used to detect collisions that will happen in the future. Dead-reckoning is used if messages do not arrive in time because of delays or loss.
3Planet Calypso Player Sells Virtual Resort for $635,000 USD http://www.prnewswire.com/news- releases/planet-calypso-player-sells-virtual- resort-for-63500000-usd-107426428.html 4Magid Associates and PlaySpan Release 2nd Annual Survey on Virtual Goods Mar- ket Penetration and Growth in North America. https://developer.playspan.com/developer/pdf/ PlaySpan_Magid_5_27_10_Final.pdf
18 2.3 MOG Architectures
However, it is also possible that all replicas continuously perform dead-reckoning, and the primary copy only sends a state update to the replicas if it detects that the state calculated by dead reckoning has a higher difference from the true state than a certain threshold. Neu- ral networks have been proposed as predictors [MWMD07a] for PCMs as well as a hybrid dead-reckoning/shortest-path predictor [MMWD05]. In Chapter 5, we present AntReck- oning which uses ant colony inspired methods to model players’ interest and improve the accuracy of the predictions based on a recent history of players movements and attraction or repulsion of surrounding game items.
Measuring Consistency The degree of inconsistency can be defined by comparing the (potentially inconsistent) state at each client to a virtual perfect site that receives and ex- ecutes all interactions with no delay and in the right order [MVHE04]. [CE11] provides an objective evaluation framework for Quality of Experience (QoE) in Games. QoE takes three basic perceptions: responsiveness, precision, and fairness. Responsiveness is the time the system takes to respond to an event, precision is the degree of accuracy required to complete an action successfully, and fairness is the degree of difference among all play- ers’ gaming environments. Similarly, [VFBD11] considers inconsistency, interactivity and discovery latency to measure QoE.
2.3 MOG Architectures
The main game architectures for MOG, as shown in Figure 2.3, are the traditional client- server architecture, multi-server architectures (MS) and peer-to-peer (P2P) architectures. Different architectures try to achieve scalability through various means. Scalability can be achieved either by (1) increasing the resources or by (2) reducing the consumption. The methods discussed above such as AOI filtering and dead reckoning are designed to reduce the resource consumption. Server-based architectures try to increase the amount of resources by adding multiple servers to distribute the load. P2P architectures increase the resources by using the resources available to each client as they join.
19 2.3 MOG Architectures
2.3.1 Client-Server Architecture
The typical client-server architecture is shown in Figure 2.3(a). In this architecture, the server holds the master copies of all mutable objects and avatars and maintains global knowledge of the game world. Clients connect to the server to receive the necessary in- formation about the game world. All player updates and player interactions are sent to the server for execution as well as conflict resolution, and the server is responsible for send- ing object updates to all interested players after the updates have been executed. The main drawback is that even the best provisioned servers can only support a limited number of players. The common solution is to add multiple servers to improve scalability.
2.3.2 Distributed Multi-Server Architecture
The multi-server architecture is shown in Figure 2.3(b). Large game companies usually maintain server farms to provide service for the clients [KDC10, Ter02, But03]. Multi- server architectures can be divided into two categories. In the first category, several com- plete instances of the game world exist, also called shards, and each shard is maintained by a server. Every server is responsible for a different set of clients and has a complete copy of the his own game world. That is, each set of clients and their server follow the traditional client-server architecture. There is usually no need for communication between these servers. Games and maps should be designed in a way that artificially prohibits play- ers from moving between different games instances. Users are typically assigned to servers based on their geographical location and each server is responsible for a separate region (e.g., North America, Europe, . . . ).
In the second category, only a single game world exists that is divided into several regions. Each region is maintained by a separate server. Players are all in the same game world; however, in most games they are only able to interact with other players in the same region. Players are allowed to move between regions but this requires support for a hand- off mechanism between servers which can be transparent to the player. As the player moves near the border of a region the necessary information by the neighboring region is sent to the player and the player can easily cross the border. The hand-off mechanism also may not be transparent as the player is asked to go through a special portal or gateway in order
20 2.3 MOG Architectures
GW Shard GW Shard Game World
Client Client Client Client Client Client
Server Server Server GW GW Client partitioned partitioned Client Client (a) Client-Server Client (b) Multi-Server Client (c) P2P Client Figure 2.3: Different gaming architectures: (a) In a client-server architecture the server is responsible for maintaining the whole game world. (b) In a multi server architecture either: (1) servers maintain completely separate game worlds (shards) or (2) the game world is divided into different zones maintained by different servers. (c) In a peer-to-peer architecture each peer maintains a part of the game world. to enter a new region. Thus, the concept of regions can be used for interest management, as described in the previous sections, as well as for load-balancing among servers.
2.3.3 Peer-to-Peer (P2P) Architecture
Another option to increase scalability is the use of a peer-to-peer architecture as shown in Figure 2.3(c). Basically every node acts both as a server and a client. In the context of games, this means that each node could become responsible for maintaining master copies of some of the game objects and/or for update dissemination to other nodes. This model can be highly scalable since the load is distributed among all nodes and addition of new nodes also introduces new resources to the system.
2.3.4 Characteristics and Comparison
Here we discuss advantages and disadvantages of each of these architectures.
Server Architecture Centralized architectures provide the highest level of control over the game world. The game state can be the most valuable property of game developers, specially in case the game world is persistent. Game creators can easily change and update the game state and have control over the necessary updates in the software such as software
21 2.3 MOG Architectures patches, character updates, addition of new missions and expansion packs. Manageability and control are two main reasons why game companies typically use this architecture. An- other reason is that programming in server-based architectures is simpler in comparison to P2P architectures. Overall, the client-server architecture is the simplest of these architec- tures.
Easy consistency management is another reason for the popularity of centralized archi- tectures. Since the server executes all updates and resolves conflicts, managing consistency is simpler than in multi-server architectures requiring hand-offs, or P2P architectures that have even higher distribution.
The biggest drawback of a single-server architecture is scalability. Even the best-provisioned servers are not able to handle more than a few thousand players. This number is further re- duced for games that have stringent latency requirements. For example, the Quake II game (a popular FPS MOG) uses a client-server architecture and its scalability in the traditional client-server architecture is only a few hundred as measured in [BPS06].
Further problems are cost and fault-tolerance. A single server that is able to handle the massive load will be extremely expensive, and it remains a single point of failure. Its failure will interrupt game play and non-persistent game state will be lost. This problem can be solved by adding backup servers that can take over in case of failure. But maintain- ing backup servers introduces more complexity and cost, and may result in even further decrease in scalability.
Multi-server architectures Maintaining multiple servers improves scalability and in- herits other benefits of the client-server architectures. Since the game world is divided into instances or regions and each instance is maintained by a separate server, a higher num- ber of players can be supported simultaneously. As servers can serve as backup for other servers, this scheme has also the potential for higher fault-tolerance than a single-server system. But even if no backup mechanism is used, the failure of one server will only affect the players connected to this server and not interrupt execution at other servers. Second Life contains 18,000 regions where each of the regions is managed by a dedicated server [VFBD09].
One of the main drawbacks of multi-server systems is isolation of players if shards are
22 2.3 MOG Architectures used. Players can only interact with other players in the same instance of the game world and are not able to interact with players in other instances. If the game supports movements between regions, a complicated hand-off mechanism [KLXH04] is required that maintains game state consistency. Furthermore, a single region hosted by a server can only support a limited number of players. In case of flocking behavior, even the best provisioned servers might not be able to provide service for an overloaded region. In addition, multi-server systems cannot be scaled without limit if a region-based approach is used, as the game world cannot be divided infinitely into smaller and smaller regions. Most game developers address this issue by expanding or adding new maps as the number of players increases in order to maintain a low population density. However, this approach prevents the players from experiencing a more interactive environment. Another issue with many regions is the inter-server communication due to player movements. Different methods have been proposed in order to balance the load between regions. Most architectures assume that a single server is able to handle the load on a single region and if the load on a region is low, a server may maintain several regions, preferably adjacent to each other. Dynamic load balancing schemes [CWD`05] have been proposed to dynamically allocate overloaded regions to new servers while maintaining region locality in games. A combination of both shards and regions can also be used by game companies.
Finally, a major drawback is the cost. Maintaining server farms is very expensive and can prevent start-up or small game companies from entering the MMOG market. Acquiring servers for 30,000 simultaneous players can be about $800,000 and the bandwidth costs can reach hundreds of thousands of dollars [MPK03]. Maintenance costs such as cooling can match these numbers.
P2P architectures In principle, P2P architectures have the highest potential for scalabil- ity as every peer that joins the game, adds new resources to the system. A particular advan- tage is that these resources are added at no extra cost for the game provider. As the load is managed by the peers the need for expensive servers is greatly reduced if not eliminated. Furthermore, as responsibility is distributed across many nodes, each of them carrying only a bit of the load, the failure of any individual peer should not affect many other players if churn is handled appropriately. Moreover, as will be discussed in more detail in Section 2.5, if direct connections between peers are created properly, low latency can be achieved
23 2.3 MOG Architectures
Table 2.1: Comparison of different architectures Architecture Pros Cons ` Simplicity ´´ Scalability Client-Server ` Easy management ´´ Fault tolerance ` Consistency control ´´ Cost ` Scalability ´ Isolation of players Multi-Server ` Fault tolerance ´ Complexity ´´ Cost `` Scalability ´ Harder to develop Peer-to-Peer `` Cost ´ Consistency control ` Fault tolerance ´ Cheating as there is no need to send the updates to a central server and from the server to other clients. Instead, updates are directly sent to interested peers. For example, P2P Second Life improves user experience, measured in term of consistency, by about 20% compared to a client-server architecture, and avatar interactivity is also five times faster [VFBD09].
On the other hand, P2P architectures have several drawbacks. One of the major is- sues with P2P architectures is security. Cheating is easier in a P2P environment [BLL07, KTCB05, GV08]. Another problem is that P2P systems are harder to manage and control given that there is no central server that maintains global knowledge. Since the state is distributed among peers, it would be hard for game companies to have complete control over the game. Providing consistency control in P2P systems is also more difficult since conflicting updates might be executed at different sites resulting in inconsistency. Gener- ally, coordination overhead and complexity is likely to rise with the number of nodes in the system.
A summary of the advantages and disadvantages of each architecture is provided in Table 2.1.
2.3.5 Hybrid Architectures
It is possible to combine a P2P architecture with a server-based architecture. These ap- proaches can be divided into several categories according to what is being handled by the P2P system (see Table 2.2):
24 2.3 MOG Architectures
Cooperative Message Dissemination: The game state is maintained by one or multi- ple servers but update dissemination uses a P2P approach. Typically, players send their interactions directly to the server. After the execution, the server uses a P2P multicasting mechanism to send the updates to the players. This allows for a reduction in bandwidth requirements (and cost savings) at the server side.
State Distribution: The game state is distributed among peers. Peers can hold primary copies of objects and thus, are responsible for execution of player actions. However, all or part of the communication between peers may be managed by the servers. Moreover, the servers are responsible for some centralized operations such as authentication, and keeping track of joins and leaves of players. These architectures achieve scalability by distributing the cost of state execution among clients.
Basic Server Control: Both message dissemination and state distribution are done only through the P2P overlay. The servers’ primary role is to keep highly sensitive data, such as user logins, payment information, and players’ progress and state. They may perform authentication and admission control, meaning the server controls joins and leaves. Servers may also coordinate some types of interactions between the peers, e.g., those, that require the highest consistency. However, they do not maintain games state or perform state dis- semination.
P2P systems can be of other uses to game developers as well. In World of Warcraft, for example, P2P systems are used to distribute software updates to clients. These systems can be similar to popular P2P file distribution systems. In this thesis, however, we focus on P2P systems that are used for maintaining game state or facilitate update dissemination.
2.3.6 Heterogeneous and homogeneous P2P systems
In heterogeneous solutions, some of the peers behave as super-peers (also referred to as co- ordinators [KLXH04], responsible nodes [YMYI05], and master nodes [YV05]) and some as normal nodes. Super-peers are responsible for most functionalities of the architecture. Super-peers can be chosen among the normal nodes that are given higher responsibilities or can be nodes with higher levels of resources (such as federated servers). Most architec- tures that divide the game world into regions and have a coordinator for that region deploy
25 2.4 P2P Architectures super-peers.
In homogeneous approaches, all nodes have similar responsibilities. For example, in pSense [SSJ`08], even though there are several types of possible roles for nodes, e.g., sensor nodes and neighbor nodes, all nodes have these responsibilities, resulting in a ho- mogeneous architectures. Most mutual notification systems fall under this category.
2.3.7 Other Functionalities
Inter server communication & streaming Multi-server environments can also benefit from many of the techniques presented in P2P architectures. The idea is to deploy peer- to-peer protocols for the communication among servers [KDC10, IHK04, BPS06]. Servers are treated as peers and the same state distribution and update dissemination mechanisms suggested for a peer-to-peer architecture are applied to the server system. Slight modifi- cations will be necessary in order to differentiate between servers (peers in this case) and game clients. This is further discussed in Section 2.10.
2.4 P2P Architectures
Here we discuss different P2P architectures proposed and categorize them according to the underlying P2P structure.
2.4.1 Introduction
Peer-to-peer systems are usually created by ad-hoc addition of nodes and build an applica- tion layer overlay network on top of the network layer. A fundamental requirement for very high scalability is that no single peer can know all other peers in the system. Instead it is always only “connected” to a fraction of peers. A prominent use of P2P systems has been in content (file) distribution systems. Instead of having a single server providing content to all clients, peers can download content from other peers, and in turn, offer their content to others. In order for a peer to find the content it is looking for, efficient query routing protocols have to be in place.
Based on how nodes are connected to each other, P2P overlays are generally divided
26 2.4 P2P Architectures into two types: structured and unstructured. In structured P2P systems, a deterministic pro- tocol is used to form a specific graph structure (by deciding which peers build connections to each other) that ensures that any node can route a message to any other node (or find an object) in the network overlay (by exchanging OplogpNqq messages where N is the number of nodes). Key-Based routing mechanisms are used to provide a lookup service similar to a hash table but different in that the (key, value) pairs are distributed throughout the nodes. They are able to add or remove nodes from the overlay with low overhead. Examples of such P2P substrates are Pastry [RD01a], Chord [SMK`01], Tapestry [ZKJ01] and Mer- cury [BAS04]. Distributed Hash Tables (DHTs) can use these substrates as the underlying mechanism for providing a hash table functionality distributed over many nodes.
In unstructured P2P systems, no deterministic algorithm for organization and optimiza- tion of network connections between peers exists. Network connections are generally es- tablished randomly or using probabilistic mechanisms that aim to converge into a suit- able overlay (e.g., by connecting with a higher probability to nodes that are semantically close). Search in such networks is often done in a probabilistic manner, and in order to speed up the search, redundancy mechanisms, such as flooding or content replication are used [SNW08, NWL`06, CMPC03]. Gnutella and Freenet are examples of unstructured P2P systems. The basic idea of many P2P-based games is to distribute the game state among peers and along with it processing, network, and storage tasks. In a primary-copy based replication scheme, this means distributing primary copies of objects among peers. Furthermore, the P2P overlay can be used for update dissemination. Both structured and unstructured P2P-gaming architectures have been proposed. A summary of some of the proposed P2P-based gaming solutions is presented in Table 2.25. This table is not meant to include all the P2P architectures that have been developed for games but to compare examples that exploit different architectures and strategies. We will describe many of these systems in this chapter.
5For an alternative list see: http://vast.sourceforge.net/relatedwork.php. While the categorization and terminology is somewhat different from this thesis, it provides a comprehensive list of related work.
27 2.4 P2P Architectures
Table 2.2: Comparison of representatives of different P2P ar- chitectures. (N/A: Not applicable)
Architecture Type Network Overlay Interest Management Replication & Consis- tency Control SimMud Structured DHT: Pastry + AL Static Regions: Rectan- Primary Copy: Region [KLXH04] Multicast: Scribe gular Cells Controller N-Trees Structured DHT: Pastry + AL Nested Regions Primary Copy: Region [GLZ05] Multicast: N-trees (scopes) + Scoped Controller events: Grid Cells Colyseus Structured DHT: Mercury + Di- Dynamic Rectangular Primary Copy: Each [BPS06] rect Cells + Prefetching Peer + Proactive Repli- objects cation + Soft-State Storage P2P Second Life Structured DHT (Kad) Dynamic: Adaptive Primary Copy: Region [VDB09] Cells Controller Badumna Structured DHT Regions + Bounded dy- Primary Copy: Region [KDC10] namic AOI Controller pSense Unstructured Direct(Neighborhood) Dynamic Circular AOI N/A [SSJ`08] + Automatic forward- ing ASCEND Unstructured Direct(Neighborhood) Dynamic AOI, Voronoi Primary Copy: Each [HL04] + Connectivity assur- Diagrams Peer ance Solipsis(’03) Unstructured Direct(Neighborhood) Dynamic AOI, Convex Primary Copy: Each [KS03] Hulls Peer Solipsis(’08) Unstructured Direct(Neighborhood) Dynamic: Voronoi Primary Copy: Each [FRP`08] based on RayNet Peer Message Exchange Unstructured Direct(Neighborhood) Dynamic BW depen- Primary Copy: Region Scheme dent AOI, Neighbor- Controller [KAM04] hood Donnybrook Unstructured Direct (All-to-All) + Game world, Interest Primary Copy: Each [BDL`08] Forwarding pools Sets Peers + Doppleganger Amaze Unstructured Direct(All-to-All) Game world Primary Copy: Each [BC85] Peer VSM Unstructured Direct(Neighborhood) Dynamic AOI, Voronoi Primary Copy: Each [HCJ08] diagram Peer MiMaze Hybrid: Server + IP IP Multicast Groups Game World Primary Copy: Each [DG99] Multicast peer FreeMMG Hybrid: Server + Server + Direct Com- Static Regions: Rectan- Primary Copy: Each [CRd`04] Unstructured munication gular Cells Peer Quazal Hybrid: Server + Server + Direct Com- Duplication Spaces Duplication Spaces [Qua11] Unstructured munication Dist. Avatar Mgmt. Hybrid: Server + Direct(Neighborhood) Dynamic: Region + De- N/A [VFBD09] Unstructured launay Triangulation Hydra Hybrid: Server + Region Server + Static Region Primary Copy: Re- [CYB`07] Unstructured Proxies gion(slice) Controller
28 2.4 P2P Architectures
Distributed Event De- Hybrid: Server + Server + Load bal- Static Regions: Rectan- Primary Copy: Region livery System Structured ancing tree gular Cells Controller [YMYI05] IRS Hybrid: Server + Server Message Re- Server Primary Copy: Proxy [GV08] Structured laying + Proxy Exe- peer cution MM-VISA Hybrid: Server + AL Multicast Hexagonal Regions + Primary Copy: Region [ASO09] Structured Player Clusters Controller MOPAR Hybrid: Structured DHT: Pastry + AL Hierarchical AOI + Primary Copy: Region [YV05] + Unstructured Multicast: Scribe Static Regions: Hexag- Controller onal cells VoroGame Hybrid: Unstruc- Direct(Neighborhood) Dynamic: Voronoi Primary Copy: Each [BAC09] tured + Structured + DHT (Convex Polygon) Peer (Random DHT) Mammoth Hybrid: Server + Region Dynamic Regions: Primary Copy: Region [KVB`09] Unstructured Controller(All- Rectangular tiles Controller to-All) Zoned Federation Hybrid: Server + DHT Static Regions: Rectan- Primary Copy: Region [IHK04] Structured gular Cells Controller
2.4.2 Structured P2P Game Architectures
Structured P2P architectures typically use a DHT as the underlying mechanism for game state distribution and update dissemination. In this section, we will describe SimMud [KLXH04] in detail as an example of a DHT-based system. Most other solutions use a conceptually similar approach but optimize various aspects of the architecture.
SimMud uses Pastry [RD01a] to distribute game objects and Scribe [CDKR02] for update dissemination. Here we first shortly describe these protocols and then explain Sim- Mud’s use of them.
Figure 2.4(a) shows Pastry’s overlay and message routing. In Pastry, every node has a 128bit NodeID. NodeIDs can be uniformly distributed using a hash code (e.g. SHA-1) of the nodes’ IP addresses or their public keys. In general, in order to distribute objects using Pastry (here object refers to any application object), each object is given an ObjectID (a hash code of the object) and its primary copy is placed on the Pastry node whose NodeID is closest to the ObjectID. An object can later be looked up by routing a message with ObjectID as the key. The message will automatically be routed to the node that has the object. Thus, when a node with a replica wants to update an object, it has to send the update message with the ObjectID as key and it will be automatically received by the node
29 2.4 P2P Architectures