A High Performance Data-Driven, Entity-Component Framework For Game Engines With Focus on Data-Oriented Design

Max Danielsson∗ Gustav Pihl Bohlin† Blekinge Institute of Technology

Abstract 2.1 Entity-Component

This paper presents an implementation of a Data-Driven Entity- Nystrom[2014] has summarized the fundamentals of component- Component framework for computer game engines built using based design and how it differs from object-oriented design. In his Data-Oriented Design principles. In the framework a developer de- suggested solution the components contain both data and logic and fines systems and components to create games. We present a simple are tied together in game object classes. Each component contains test-case built in the framework with rendering and audio, measur- its locally relevant logic and can communicate with other compo- ing system performance and cache misses. The results are deemed nents either via direct access or messages. promising, with good cache utilization and the framework is con- The game engine1 is a well known example where EC based cluded to be useful as a base for future development. programming is used to a great extent, providing a system for ex- tending entity behaviour through a scripting layer. Keywords: entity-component, data-oriented, data-driven, game West[2007] has a write-up over the basics of EC concepts, describ- engine, game architecture, engine framework ing a design which is flat rather than hierarchical. Martin[2007] has written about a slightly different solution adding 1 Introduction the concept of Systems which house the logic used for transform- ing the data of components, essentially turning components into In this paper, an implementation of a game object framework is pre- pure data. He also adds the concept of aspects, representing a sented. It combines two design patterns: EC (Entity-Component) component composition, similar to our solution, which we detail and DOD (Data-Oriented Design), with the aim to provide a foun- in section 3.2. He also goes to great lengths describing anecdo- dation for a architecture that is both efficient in terms tal information regarding the widespread difference in what entity of rapid feature implementation and cache utilization. component systems are to developers. His writings have inspired the Artemis Entity System Framework[Arent 2012] which is an EC based systems [Nystrom 2014, c.14] have been on the rise in open- framework for implementing EC based games and has popularity within the game industry since at least 2002 when Scott served as an inspiration to our own implementation. Bilas proposed a method where game objects are Data-Driven enti- Garcia et al.[2014] has previously described the gains in designing ties constructed from a composition of multiple components[Bilas accessible games by employing Data-Driven and EC based design 2002]. A flat EC game object design, as compared to a hierarchi- principles. The methods described show how a games stimulation cal game object design, lessens coupling between behaviour and or output to the user can be modified within the game without re- entity types, and individual entities become more malleable. Be- designing other aspects of the engine. This is achieved simply by haviour can be easily shared between vastly different entities, with- switching out components responsible for stimulation. By creat- out the need for complex inheritance structures. The core goal is to ing different profiles for people with different abilities the game is handle the immense complexity of large games and to allow faster allowed to have a greater accessibility and a bigger audience. turnaround times on feature changes. 2.2 Data-Oriented Design DOD is a design paradigm based on performance optimization prin- ciples. The primary goal of DOD is to engineer programs that consider cache hardware design, maximizing memory throughput The primary performance bottleneck of modern CPUs (Central Pro- and CPU utilization. We have observed that EC patterns lends it- cessing Unit) is memory access times[Hennessy et al. 2007]. The self well to DOD principles because of its potential decoupling of main way this has been alleviated has been through implementa- memory and procedures. This decoupling allows us to meticulously tion of so called caches; Faster but more expensive memory placed place component data in memory and order execution to minimize between the CPU and RAM (Random Access Memory) where re- cache-miss rates. cently accessed memory is stored for quicker reuse. The assump- tion is that memory which is recently used has a high likelihood of being accessed again—a concept called temporal locality. An- 2 Previous Work other assumption made with modern caches is that memory is often accessed close to previous accesses, or linearly. This concept is called spatial locality. The spatial locality assumption is reflected There has been a substantial amount of work done in the game de- in that cache often stores a larger chunk of sequential memory, a velopment field regarding DOD and EC. Yet, relatively little has cache-line, than is being accessed by the instruction.[Gerber and been published in peer-reviewed journals and most of the infor- Bik 2006, p.111][Hennessy et al. 2007] mation currently resides in books and developers personal online resources in the form of presentations and blogs. Albrecht[2009] from Sony has a presentation regarding DOD in which he presents the impact of cache efficient data layout. He also details the difference between different orders of execution ∗e-mail:[email protected] †e-mail:[email protected] 1http://unity3d.com/ and memory de-referencing and how strongly it can impact per- collide. The game uses six components to store the different types formance. of data. One component each for: position, velocity, rendering data (sprite), audio (audio file), collision (radius) and a component for Frykholm[2014] from Bitsquid details an EC implementation with activating the play sound on collision system. focus on DOD, although different from ours in many ways. His so- lution has a tighter coupling between systems and component types, Entities are defined in JSON files, as opposed to XML used by Gar- giving the systems the responsibility to handle allocation of its com- cia and Almeida Neris. Both formats serve similar purpose and ponent type. This results in a more tailored allocation scheme for the change is purely that of personal preference. The JSON files specific data, but gives the developer implementing the systems a contain the data definitions and component makeup of an entity. larger responsibility. The files are loaded by the program using Jansson3 and parsed to generate the needed entities for the test. After loading, the enti- 3 Method ties’ velocities and positions are randomly modified to make them behave somewhat differently. We define three types of objects: This paper uses Garcia’s and Almeida Neris’[2014] work as a start- audio-visual, visual and background objects. The audio-visual ob- ing point in defining how an EC framework can work in the abstract ject moves, collides, is rendered and plays a sound on collision. The and we present a practical implementation which is designed us- visual object has no audio component and the background object is ing DOD in the programming language C++11[Stroustrup 2013]. invisible, inaudible and runs only in the background. The varia- We then implement a simple real-time application and measure tion allows for better focus on our implementation and less on the its frame-time and cache-miss rates to determine its performance, performance and scaling of SDL’s features. which are later discussed. 4 Implementation 3.1 Outline Components can not have member functions or complex construc- Building upon the work of Garcia and Almeida Neris[2014] we first tors. They have to be trivially copyable, preferably so called POD detail some technical aspects in section 3.2, defining components (Plain Old Data) objects. The reason for this is to simplify copy and entities and adding concepts such as aspects and systems which and storage of these structures in memory. In our implementation will aid in describing the implementation in section4. After going we verify this during compile time using static compile-time asserts through our implementation we cover some results in section5 re- which simplify development and removes some common mistakes. garding the performance of the system. At the end of this paper in section6 we discuss the results and our implementation. Lastly sec- Entities consist of one unsigned integer. As a result it is easy to sort tion 6.2 covers future work and suggestions on how our framework entities and execute them in an ascending order. The assumption can be improved. is that entities which are placed in an ascending order has compo- nents which are placed in an ascending order. This is not guaran- 3.2 Concepts teed however and it is all up to the component allocation routines to try and optimize this creation to achieve an optimal memory Before we begin to detail our implementation we define the four structure. As entities are released and reclaimed, id reuse is in- primary concepts that are relevant. Note the there are no strict defi- evitable, this can cause mix-ups as a routine keeps a reference to a nitions of these terms and that ours may vary from other authors. reclaimed entity id which it does not actually have intended interest in. This can be solved using a second integer as a globally unique A component is a collection of data, it does not have any logic di- id which grows for each Entity allocation, resulting in collision free rectly tied to it, like a standard OO (Object-Oriented) class would. ids. Frykholm[2014] solves this issue by reserving part of the id’s An entity is a specific collection of Components given a common memory for a ”generational id” which is locally incremented for ID. Entities can have at most one instance of each component type each allocation. tied to it. The set of component types given to an entity defines its aspect, Component instances and entities are bound to each other in a row which can also be viewed as an entity’s identity. Two entities with major-matrix where each row is a list of a specific component type the same component type setup can be viewed as equivalent game id numbers referring to an index position in component data arrays. objects as they will have the same aspect. Each column in the matrix matches an entity id. An aspect is a bit field describing an entities component setup. It is used as an Systems contain the logic used to transform or communicate indicator to quickly determine if an entity has relevant components data between entities, components and external modules (meaning for a specific use. mainly 3rd part APIs). Systems iterate over all entities which match its configured inclusive and exclusive aspects, in effect deciding Garcia and Almeida Neris propose that logic affecting components’ what component make-up is required from an entity for the system data can either be tied to a component or be constructed as a stand- to apply its transformation. alone system[Garcia and Almeida Neris 2014]. Our framework fol- low the latter concept, this further separates logic from data mak- 3.3 Test Case ing it easier to ensure temporal and spatial locality[Gerber and Bik 2006][Noel 2009]. It also allows us to decouple logic from compo- In our test case we implement a simple 2D render with balls bounc- nents. ing on walls where each bounce can create a spatially placed (left or The entity handler is a central aspect of the implementation, it is right speaker) sound. Sound and video is implemented using SDL 2 responsible for handling the allocation and definition of entities and (Simple Direct Media Layer)2. A system is created for audio and components. All entities and components are created through the video respectively, a third system is built for motion, fourth for col- utilities of the entity handler. lision handling and a fifth for triggering the sound for objects that

2https://www.libsdl.org/ 3http://www.digip.org/jansson/ 5 Result word/register/size t/pointer is 8 bytes(64bit). That means that it is expected that a program with pure simple reads of 8 bytes per mem- The program4 was built and run on an Asus(UX32L) laptop with an ory fetch instruction and linear address increase should land on 8/64 i7-4510U CPU clocked at 2.0GHz, 8 GB of RAM and a GeForce = 12.5% cache-miss rate. Note that multiple reads or writes to read 840M graphics card. The operating system used was Ubuntu 14.04 locations will decrease the miss-rate as the memory reference is not LTS. The project was compiled using clang 3.5 with the -O3 opti- considered a miss if it is cached. This means our results are some- mization flag. Note that the project is built to be OS agnostic and what better than an optimal linear read. runs on Linux, Mac OS X and can be built on Windows 7 by ex- tending the CMake5 files. Figure 2: L1 cache data miss rate, read and write. 1 2 The measurements were performed over 1000 frames with 2 , 2 to Data L1 Cache Miss Rate 220 objects where 1% were visual objects and 0.1% of objects are audio-visual. This limitation is there to simulate programs where 12 the majority of active objects are culled from observation of the player and are instead active off screen, moving and colliding with the world walls. It also makes the measurements focus more on the 10 systems performance rather than external third party libraries. 8 5.1 System Performance 6

Figure 1: Individual system performance, Note that the AudioSys- Miss Rate (%) tem and SetAudibleToPlayOnCollisionSystem where measured at 4 close to 0ms in all cases. 2 Systems Performance

0 14 TransformSystem 2^1 2^17 2^18 2^19 2^20 AudioSystem RenderSystem Object Count 12 WorldWallCollisionSystem SetAudibleToPlayOnCollisionSystem 10 6 Conclusion

8 The accuracy of the results are not perfect as third party libraries and system calls influence the performance of the program. This 6 means that it is difficult to assign results to specific aspects of the Frame Duration (ms) program. The test case is realistic however, which simplifies per- 4 formance comparisons between solutions. The cache miss-rates ap-

2 pear promising. Specifically the miss-rate trend which appears to be logarithmic, meaning that the frame-length scaling is close to O(n)

0 for large n. The measured miss-rate reflects expectations, although 2^1 2^17 2^18 2^19 2^20 it is hard to conclude if it is because of expected reasons without Object Count doing a deeper study of cachegrind’s simulation and our generated assembly, which is outside the scope of this paper. Assuming that Running the program and individually measuring the accumulated the reads are 8 bytes and that our program primarily performs a sin- average time of each system we see in figure1, a near-linear trend gle linear read from memory, the results are close to perfect, but a relative to the number of objects on screen, matching expectations. program normally performs more than one read or write per address meaning there might be room for further improvements. 5.2 Cache efficiency One additional hardware feature that we expect our solution might benefit from is hardware prefetching, which is not reflected in To test the cache performance we ran the program with the same cachegrind’s results and will therefore not be discussed in any fur- object count and frame limitations as with the system performance ther detail. tests. To simulate cache performance Valgrind6(3.10.0) was used with the tool cachegrind. Valgrind simulated a top level cache of 4 The program is not large enough to warrant an analysis of the L1 in- MiB and a L1 instruction cache of 32 KiB and L1 data cache of 32 struction cache utilization as the entire game-loop can fit inside the KiB. This means that a large majority of the program data will fit L1 cache. But, assuming a large enough program the expectation in the simulated L2 cache. Therefore we will focus on the smaller is that the cache utilization would be relatively efficient with mod- and faster L1 cache. erate to small systems containing small loops transforming large amount of data as the instruction cache is not overwritten until the The graph in figure2 presents the cache miss measurement re- instructions are no longer relevant for the current frame, something sults from running the program. The curve follows a logarith- that can be an issue in large complex object hierarchies. mic trend concerning cache miss rates seemingly converging on close to 11.5% to 12%. To put the numbers in perspective consider that the cache line in our simulation is 64byte and a 6.1 Design Flaws

4https://gitlab.bthstudent.se/autious/mg entity component framework There is no guarantee that the components’ data will be accessed 5http://www.cmake.org/ in a cache optimal pattern, and sometimes, depending on how long 6http://valgrind.org/ the program runs and the re-allocation patterns for component data there might not even be a guarantee that the components will be References accessed in an ascending order. The assumption has been that the user of the system allocated long-lived entities first and short lived ABADI,M., AND CARDELLI, L. 1996. A theory of objects. Mono- components later. Making the short lived memory become reused graphs in computer science, 1431-6900. Springer, New York. near the end of the pool-allocators memory. Limiting the issues to a minority of the memory-region. This has been shown to be suffi- ALATALO, T. 2011. An entity-component model for extensible cient in projects where this model has been used, mainly the game virtual worlds. IEEE Internet Computing 15, 5 (Sept.), 30–37. 7 Kravall , a student-developed game that simulated large amounts of ALBRECHT, T., 2009. Pitfalls of object oriented programming agents in a crowd. GCAP 09.

The current component memory model is not sufficient for efficient ARENT, A., 2012. Artemis entity system framework. handling of inter-dependant components, e.g. scene graphs. Bit- BILAS, S., 2002. A data-driven game object system - squid[Frykholm 2014] solved this by moving the component data game objects slides.pdf. allocation responsibilities to specific systems. It allows the pro- grammer to adjust the memory allocation schemes but makes de- FRYKHOLM, N., 2014. bitsquid: development blog: Building a velopment and implementation more complex. It also has the added data-oriented entity system (part 1). effect of locking down components into systems. If one system in- GARCIA, F., AND ALMEIDA NERIS, V. 2014. A data-driven tends to manipulate two different components in one entity it would entity-component approach to develop universally accessible have to access some of the data through another system. games. In Universal Access in Human-Computer Interaction. Universal Access to Information and Knowledge. 8th Interna- 6.2 Future Work tional Conference, UAHCI 2014, 22-27 June 2014, Springer In- ternational Publishing, vol. pt. II of Universal Access in Human- Computer Interaction. Universal Access to Information and The presented solution lacks utilization of common SMP (Sym- Knowledge. 8th International Conference, UAHCI 2014, Held metric Multi Processor) systems, as the execution is limited to as Part of HCI International 2014. Proceedings: LNCS 8514, one thread. A multi-thread variant of the solution would need to 537–48. take into account cache related issues such as compulsory misses, true sharing, false sharing and access conflicts and other common GERBER,R., AND BIK, A. J. 2006. The software optimization threading problems like synchronization[Pesterev et al. 2010]. Both cookbook : high-performance recipes for IA-32 platforms. Intel parallelization between systems (using aspects to determine data- Press, Hillsboro, Or. access overlaps) and within systems (splitting execution over the GREGORY, J. 2009. Game engine architecture. A K Peters, Welles- list of internal entities) have been considered, but neither solution ley, Mass. has been tested in practice. No insight can be given as to which method would perform best, and in reality maybe a hybrid approach HENNESSY,J.L., AND PATTERSON, D. A. 2006. Morgan Kauf- may show to be the most flexible, albeit complex. mann Series in Computer Architecture and Desi : Computer Architecture : A Quantitative Approach (4th Edition). Morgan A possible extension of the EC solution is to add a scripting layer to Kaufmann, Burlington, MA, USA. the framework, allowing for entity construction in a language that is interpreted. Such methods can open up the ability to inject scripts HENNESSY,J.L.,PATTERSON,D.A., AND ARPACI-DUSSEAU, during run-time and allow increased flexibility during development. A. C. 2007. Computer architecture [Elektronisk resurs] : a Suggested future work is to implement such a solution. quantitative approach. Morgan Kaufmann, Amsterdam ;. HIRZEL, M. 2007. Data layouts for object-oriented programs. In Greater in depth studies are necessary with a larger code-base con- Proceedings of the 2007 ACM SIGMETRICS International Con- taining more and bigger systems reaching L1 instruction cache lim- ference on Measurement and Modeling of Computer Systems, its. Measuring the cache usage of programs such as these we could ACM, New York, NY, USA, SIGMETRICS ’07, 265–276. study how instruction cache efficiency is affected by a growing code base. MARTIN, A., 2007. Entity systems are the future of MMOG devel- opment part 1 | t-machine.org.

Finally, studies focusing on other engine frameworks are necessary NOEL, 2009. Data-oriented design (or why you might be shooting to create a clear contrast between performance and cache efficiency yourself in the foot with OOP) | games from within. in different designs. Out of curiosity, implementation a functionally equal version of this project using only pure Object Oriented sensi- NYSTROM, R. 2014. Game Programming Patterns, 1 edition ed. bilities could give a better insight in to the actual performance im- Genever Benning, S.l., Nov. pact of EC frameworks. Further, studies performing comparisons of PESTEREV,A.,ZELDOVICH,N., AND MORRIS, R. T. 2010. Lo- the actual practicality of developing in the different design philoso- cating cache performance bottlenecks using data profiling. In phies and how they might effect programmer productivity could be Proceedings of the 5th European Conference on Computer Sys- interesting. tems, ACM, New York, NY, USA, EuroSys ’10, 335–348.

STROUSTRUP, B. 2013. The C++ Programming Language, Fourth Acknowledgements Edition, fourth ed. Addison-Wesley Professional, May.

SUTTER, H., 2004. C++ coding standards. Thanks to Alexander Vestman, large parts of the design decisions were made thanks to his input and ideas. WEST, M., 2007. Cowboy programming evolve your hierarchy. WILSON, K., 2008. An anatomy of despair: Aggregation over inheritance at GameArchitect. 7http://kravall.autious.net/