Beautiful Architecture – Architecting for scale summary

Summary written by: Oliver Leisalu

One of the most important aspect of nowdays systems is to ensure flexibility in the scale of the system. Web based systems experience a situation where amount of people using the system may increase considerably within minutes. This amount is hard to predict and might leave hardware standing without use or create a situation where more calculation power is needed very fast. This problem is especially interesting and more complicated in case of massive multiplayer online games (MMOs) and virtual worlds, where most of the players rarely communicate with each other making system parallel, but still most of the people still communicate with someone. Another aspect is that these systems need to be stable and keep world consistency. As the number of users might grow to millions, one of the primary requirement is scalability. , developed by the , tries to answer some of these problems.

Main property of Darkstar is that it allows using multiple machines and ability to add new ones or remove them easily and that these machines might have multiple processors, so it allows parallel computing in multiple levels. It is brought out, that parallel programming in these cases is very difficult, even for the most experienced developers. Darkstar tries to solve this problem, by allowing to program the game like it works on a single machine and the framework takes care of the concurrency problems.

Clients usually have quite good computers, so as much calculations as possible is done on their computers and server side is kept as lightweight as possible. But it can not be just a simple logging machine, as in this case players would start cheating. Therefore server has to check that the data is realistic and be the ultimate source of truth.

One of the worse enemies of online games is lag, as this makes playing the game less fun, therefore infrastructure of such a game needs to be designed around the requirement of bounding latency whereever possible. One of the solutions for this problem is to separate geographical regions of the game world, so players are divided into smaller spaces. They can communicate easily within the area, but can not communicate with other areas. This raises a question that which areas should be separated. Furthermore this has to be done during the design time making changing them quite hard later. Another way is called sharding and this uses separate servers for same area. This is also problematic, as players in the same location can't communicate with each other, as they might be in different servers. Darkstart tries to solve this, so that this separation could be avoided. Darkstar is build as a set of separate services. The basics are very similar to an operating system having interfaces for accessing persistent storage, schedule and run tasks, and perform communication. Server side infrastucture will have multiple machines, each running a copy of these services and game logic. Clients connect to these services and send tasks to the task service. These tasks are short-lived and can be solved by any of the servers, but they are always processed in the order they were sent. As these tasks usually require or modify information about the world, operations in the task are packed into transactions. With the help of memory based service called the data service, which holds information about the world state, if two clients try to modify the same data, error is created. If some operation in a transaction fails, all the operations are rolled back and rescheduled. Another service neccessary for transactions is session service. This service masks the actual server, which is doing the real calculations from the client and is responsible for keeping the order in which tasks are performed. This allows to assume, that all the tasks that can be calculated are essentially concurrent. The second communication service is the channel service. It allows sending messages to all the subscribers of the channel and receiving them. This system allows tasks to be processed selectivly in different machines. If server processing the task crashes, the task is sent to another server, so there is no data loss. Another aspect of this is automatic load balancing – tasks are sent to servers whose load is not too high.

One of the problem while designing this system was that there are no existing bencmarks for these systems and therefore it was impossible to measure how well the system performs. This was also an important concern for the team. First problem that is met is that all objects that last over a single task are used as persistent and therefore stored by the data service. This might cause latency, but by the team's oppinion can be overcome by traditional caching. This brings in another aspect, that tasks that require same data should be preferably ran on a single machine.

The main goal of the project was to simplify programmers job, as now would not have to think about parallelism, but first projects showed that some thought still has to be given. The problem was that the game world was created as an object and all the objects in it were related to each other, causing the data service to disable parrallelism. Once the object system was redesigned, the performance of the system was increased by multiple orders of magnitude. This shows, that programmers can't be completly unaware of the undelying concurrency.

In overall, Darkstar introduced many novel ideas and tries to bring some standardization into the MMO games world.