<<

Master of Science in Computer Science May 2018

The performance impact from processing clipped triangles in state-of-the-art games.

Christoffer Karlsson

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 10 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author: Christoffer Karlsson E-mail: [email protected]

University advisors: Associate Professor Hans Tap Department of Department of Creative Technologies

Lecturer Stefan Petersson Department of Department of Creative Technologies

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background. Modern game applications pressures hardware to its limits, and af- fects how graphics hardware and APIs are designed. In games, rendering geometry plays a vital role, and the implementation of optimization techniques, such as view frustum culling, is generally necessary to meet the quality expected by the customers. Failing to optimize a game application can potentially lead to higher system require- ments or less quality in terms of visual effects and content. Many optimization techniques, and studies of the performance of such techniques exist. However, no research was found where the utilization of computational resources in the GPU, in state-of-the-art games, was analyzed. Objectives. The aim of this thesis was to investigate the potential problem of com- mercial game applications wasting computational resources. Specifically, the focus was set on the triangle data processed in the geometry stage of the , and the amount of triangles discarded through clipping. Methods. The objectives were met by conducting a case study and an empiri- cal data analysis of the amount triangles and entire draw calls that were discarded through clipping, as well as the vertex data size and the time spent on processing these triangles, in eight games. The data was collected using Triangelplockaren, a tool which collects the triangle data that reaches the rasterizer stage. This data was then analyzed and discussed through relational findings in the results. Results. The results produced consisted of 30 captures of benchmark and gameplay sessions. The average of each captured session was used to make observations and to draw conclusions. Conclusions. This study showed evidence of noteworthy amounts of data being pro- cessed in the GPU which is discarded through clipping later in the graphics pipeline. This was seen in all of the game applications included in this study. While it was impossible to draw conclusions regarding the direct impact on performance, it was safe to say that the performance relative to the geometry processed was significant in each of the analyzed cases, and in many cases extreme.

Keywords: game, optimization, geometry, graphics pipeline

i Acknowledgments

Many thanks to Stefan Petersson – whom without, this project would not have been possible, as Triangelplockaren is the only tool found which could measure the data required.

ii Contents

Abstract i

Acknowledgments ii

1 Introduction 2 1.1 The graphics pipeline ...... 2 1.2 Performance and optimization ...... 4 1.3 Aim ...... 5 1.4 Research questions ...... 5

2 Related Work 6 2.1 Culling techniques ...... 6

3 Method 8 3.1 Game application selection ...... 8 3.2 Data collection tool – Triangelplockaren ...... 9 3.3 Experiment setup ...... 12 3.4 Data collection ...... 13 3.5 Restrictions ...... 13

4 Results 14

5 Analysis and Discussion 24 5.1 Battlefield 1 ...... 24 5.2 Deus Ex: Mankind Divided ...... 25 5.3 FIFA 18 ...... 25 5.4 Hitman ...... 25 5.5 Rise of the Tomb Raider ...... 26 5.6 Sniper Elite 4 ...... 26 5.7 Tom Clancy’s The division ...... 26 5.8 Total War: Warhammer II ...... 27

6 Conclusions 28

7 Future work 29

References 30

iii List of Figures

1.1 An overview of the stages geometry go through when rendering graph- ics in Direct3D. The full pipeline represents the process for presenting a geometrical object on the screen, where all stages except the ver- tex shader stage is optional. The geometry stage, marked with a blue clamp, incorporates the processing and transformation of geometri- cal information in the GPU. In the remaining stages, the geometry is converted to fragments through scan conversion, and colored in one or more pixel-shaders. This is also where post-processing is applied. . 3 1.2 Illustration of a camera view frustum. The frustum is essentially the mathematical combination of a view- and projection matrix, which defines the orientation and bounds of the frustum...... 4

3.1 Screen capture from Battlefield 1 gameplay...... 10 3.2 Screen capture from Battlefield 1 gameplay, representing the perspec- tive projected triangles rendered to the screen, from the player cameras , and captured by Triangelplockaren in the current frame. This image only shows triangles that have been projected using the player camera transform matrices, and does not show, for example, triangles rendered in a draw call for effects such as shadow mapping. 11 3.3 Screen capture from Battlefield 1 gameplay, representing geometry in the same way as in Figure 3.2, but seen from a different angle through Triangelplockaren. Red colored geometry represents geometry which is outside the player camera viewing frustum, green geometry is inside the frustum, and triangles where only edges are visible uses alpha blending (mostly particles). Again, perspective projected triangles from other viewing frustums are left out in this image...... 12

4.1 Graphs representing the data collected in the Battlefield 1 gameplay capture...... 16 4.2 Graphs representing the data collected in the Deus Ex: Mankind Di- vided gameplay capture...... 17 4.3 Graphs representing the data collected in the FIFA 18 gameplay capture. 18 4.4 Graphs representing the data collected in the Hitman gameplay capture. 19 4.5 Graphs representing the data collected in the Rise of the Tomb Raider gameplay capture...... 20 4.6 Graphs representing the data collected in the Sniper Elite 4 gameplay capture...... 21

iv 4.7 Graphs representing the data collected in the Tom Clancy’s The Di- vision gameplay capture...... 22 4.8 Graphs representing the data collected in the Total War: Warhammer II gameplay capture...... 23

v List of Tables

4.1 A table showing the average per frame data for each capture. The val- ues displayed for each capture have been divided into four categories: Time, presenting the percentage of time spent on processing triangles outside the viewing frustum in relation to the time to process all tri- angles. Triangles, presenting the average number of triangles rendered per frame, and the percentage of triangles residing outside the view- ing frustum. Draw calls, presenting the average number of draw calls and the percentage of whole draw calls outside the viewing frustum. Vertex data, presenting the average vertex data size in bytes, and the percentage residing outside the viewing frustum...... 15

vi Glossary

Draw call A call to the graphics APIs draw operation. A draw call submits work to the render- ing pipeline and executes it. The amount of work (geometry to be rendered) varies between draw calls.

State-of-the-art game State-of-the-art game in this context is defined as a game supporting Direct3D 12, where the publishing organization is well established, with a larger number of pub- lished titles.

Vertex A point in space, consisting of information such as positional coordinates. Typically a point where two or more lines meet. A triangle would, for example, have three vertices.

1 Chapter 1 Introduction

As computer hardware advances, video game developers continue to push the hard- ware to its limits by producing more resource demanding applications. In game applications, rendering objects generally plays a key role, and hardware developers are tailoring GPUs to suit the needs of modern games. This involves adding new features, such as the programmable geometry engine in the Vega Architecture[8], as well as improving the performance of certain tasks [1][11].

1.1 The graphics pipeline

Game objects rendered to a screen go through a series of stages called the graphics pipeline. This pipeline differs between graphics APIs, but essentially includes the same steps and operations. The graphics pipeline structure for Direct3D can be seen in Figure 1.1. Older depictions of the graphics pipeline also includes a stage prior to this, called the application stage, which represents the CPU-side of a game. This is where things like game logic, sound and input is handled, as well as the configuration of how game object geometry should be processed by the GPU. This configuration specifies which hardware functionality to utilize and in what way it should perform certain tasks. The configuration also includes specifications of the format and layout of the geometric data. After the application stage, the geometric data, which can consist of points, lines, triangles, or any other primitive-type the developer has specified, is read by GPU. The remaining stages of the graphics pipeline takes place in the GPU [9], starting with the geometry stage. As presented in Figure 1.1, the geometry stage includes several stages, which will from now on be referred to as functional stages. These functional stages vary in complexity, an example being the shader stages, which are programmable and used to transform the input geometry. What operations shaders perform, and which types of shaders are used will thus depending on the implemen- tation. One geometry stage might use a single vertex shader, while another includes a vertex-, hull-, geometry- and domain shader, where the complexity of each shader varies depending on implementation. One shader can, for example, perform the same number of operations on each vertex (see the glossary) or it can perform more or less operations, depending on certain heuristics. The other functional stages perform fixed operations, for which behaviour can be controlled through a pipeline state, a kind of configuration. The shader implementation and state configuration for how geometry is processed is highly relevant, as the number of operations, together with the size of the geometric data, can have a severe impact on the use of computational

2 Chapter 1. Introduction 3

Figure 1.1: An overview of the stages geometry go through when rendering graphics in Direct3D. The full pipeline represents the process for presenting a geometrical object on the screen, where all stages except the vertex shader stage is optional. The geometry stage, marked with a blue clamp, incorporates the processing and transformation of geometrical information in the GPU. In the remaining stages, the geometry is converted to fragments through scan conversion, and colored in one or more pixel-shaders. This is also where post-processing is applied. Chapter 1. Introduction 4

Figure 1.2: Illustration of a camera view frustum. The frustum is essentially the mathematical combination of a view- and projection matrix, which defines the ori- entation and bounds of the frustum. resources [1][2][3][5][6][15]. Primitives processed in the geometry stage eventually reaches the rasterizer stage. This is where primitives residing outside the camera view frustum are clipped (dis- carded), and those inside are kept for rasterization. Primitives intersecting with the frustum are clipped against it, replacing vertices outside the frustum with new ones at the point(s) of intersection. A view frustum is essentially a mathematical expression which defines the bounds of what is seen by a camera. This mathematical expression is comprised of a view- and projection matrix, used to transform the vertices of a primitive from world-space to clip-space. In this space, the vertex coordinates are compared to the frustum bounds in order to determine whether it is inside or outside. A frustum can be pictured as a rectangle, often pyramid shaped with a cut off top, as a result of perspective projection being applied (Figure 1.2). Anything inside the frustum is potentially visible, while anything outside is out of sight, and therefore trivial [6][15]. There are many types of view frustums, such as the player camera view frustum or shadow map view frustum. The use of the term view frustum will henceforth relate to any frustum used for perspective projection.

1.2 Performance and optimization

As mentioned previously, any operation performed on a primitive consumes com- putational resources. It is therefore often profitable to prevent data from being processed in the geometry stage and then discarded through clipping. To achieve this, developers make use of various optimization techniques, referred to as culling techniques. These include techniques such as view frustum culling, working in the same way as clipping, but is often performed in the application stage of the graphics pipeline or early in the geometry stage [1][5][6][15]. Generally, whole objects that reside outside the view frustum are often culled through intersection tests with the objects bounding box. Even with frustum culling implemented as an acceleration technique, objects partially outside the frustum are generally seen as residing inside the frustum. This is true for most culling techniques, due to the fact that computing the clipping of each intersecting triangle in a complex geometric object on the CPU Chapter 1. Introduction 5 often is slower in terms of computational time, than processing the triangles outside the frustum in the geometry stage [2][3]. Since the large game development organizations affect the design of modern hard- ware, the utilization of the GPU in state-of-the-art game titles is highly relevant. This is also reflected in the development of the latest graphics APIs, Direct3D 12 [10] and Vulkan [12], where developers are given more control over the graphics pipeline in order to optimize the performance in their applications [10]. Careless implementa- tions or ignorance to the importance of optimization may lead to aspects such as higher application system requirements or lower level of quality in game content, as a result of wasting computational resources. This could also negatively affect the energy consumption of the GPU [7]. Further discussions regarding the impact on energy consumption is however out of the scope for this thesis. The use of the word game throughout the text will from now on refer to state-of-the-art games, as defined in the glossary.

1.3 Aim

This thesis aims to measure the amount- and performance impact of geometry unnec- essarily processed in the geometry stage of the graphics pipeline in game applications. The games included in this study are the following:

ˆ Battlefield 1 (2016)

ˆ Deus Ex: Mankind Divided (2016)

ˆ FIFA 18 (2017)

ˆ Hitman (2016)

ˆ Rise of the Tomb Raider (2015)

ˆ Sniper Elite 4 (2017)

ˆ Tom Clancy’s The Division (2016)

ˆ Total War: Warhammer II (2016)

1.4 Research questions

Based on the potential problem of wasted performance in games, and the aim defined for this thesis, the following research questions were defined:

ˆ To what extent are games wasting performance in the geometry stage of the GPU?

ˆ How large is the proportion of processed triangles discarded through clipping, in relation to the total number of triangles? Chapter 2 Related Work

Much work has been done in the area of optimization through culling. Culling tech- niques have existed for decades and algorithms have continously been improved for better performance [13][16]. Some algorithms show improved rendering speeds of up to four times faster [14], compared to no culling, depending on the application. The work described in this chapter discusses the need for optimization in 3D-applications, which in turn should incite the need for studies such as the one conducted in this thesis.

2.1 Culling techniques

Jahrmann & Wimmer [4] discusses the importance of culling, in their case for ren- dering large amounts of geometrical grass. They use a set of four tests for discarding the rendering of certain grass straws: ˆ Orientation of independent straws As each straw of grass is planar, a test is made to determine if the straw is parallel to the viewing frustum. If the angle is above a certain threshold, the straw is culled.

ˆ View frustum test In this test, three points (base, center and end) are tested against the view frustum and discarded if the blade is determined to be outside the frustum.

ˆ Distance to the camera In the third test, straws are culled based on the distance to the camera. Patches of straws are drawn, with each patch containing fewer straws at positions fur- ther away from the camera.

ˆ Occlusion test Based on the distance (depth) of other objects in the scene, straws behind other objects are culled, as they would not impact the final result. Their experiment is conducted in a controlled scene, with performance improvements of 40% using culling.

Su et al. [13] suggests a view frustum culling algorithm for optimized scene man- agement structures, to improve culling performance for rendering subdivisions of complex scenes in real-time. Their algorithm improves the sorting of geometry by

6 Chapter 2. Related Work 7 using adaptive binary trees (ABT) and two steps of view frustum culling. The al- gorithm divides and splits geometry, through view frustum culling, into branches of the ABT, allowing for culling against the ABT to be performed first, and then each individual object. The results showed improved speeds in culling objects on the CPU. However, in their introduction, they discuss the need for optimization, as hard- ware can not meet the requirements of rendering complex scenes in real-time. They further motivate their research by stating that

”Therefore, the most fundamental starting point of this paper is how to reduce the drawing objects effectively, reduce the complexity of the model and to improve the efficiency of the visibility and the real time rendering of the complex scene, on the basis of overcoming the limitation of the traditional scene organization.”

As with other studies found [2][3][14][16], the motive for developing these opti- mization techniques was never backed by sources stating that such needs existed. Additionally, the author found no research that was related to the amount of wasted performance in digital game applications, which further indicated that there existed little knowledge such needs in research. One can argue that estimating the need for optimization in order to determine if and which optimization techniques to imple- ment is key in developing both game applications, and the optimization techniques themselves. With this argument in mind, this was where the focus of this thesis resided. Chapter 3 Method

To answer the research questions, a case study was carried out, and an empirical data analysis was conducted where geometrical data was gathered from the eight selected games. This chapter will provide a detailed description of the selection of games, the experiment setup and procedure for collecting the data.

3.1 Game application selection

The author deemed eight games to be sufficient to provide enough diversion in both genre and content, as well as organizational standards for development. The selection of games was made with regards to the games supported by the data collection tool, as well as three criteria; Direct3D 12 support, built-in benchmark support and the possibility to play a single player game mode. As state-of-the-art games implies a high level of development or cutting edge technology, Direct3D 12 support was chosen as the first criterion. The purpose of using the Direct3D 12 API is to give the developer the means to further optimize the utilization of the graphics pipeline, using features such as indirect drawing [10]. The latest graphics API is therefore relevant to include when analyzing unnecessary computations in the GPU. Applications that only supports the Vulkan API, which has capabilities similar to Direct3D 12, were excluded as the tool used for gathering the data did not support it. This as no other tool was found with the ability to extract the desired information. The second criterion was introduced to improve reproducibility. Gathering data from random sequences of a game would not provide the ability to reproduce the data gathered in this thesis. Only five of the eight games fulfilling the first criterion contained built-in benchmark functionality, which resulted in the addition of three games without it (FIFA 18, Sniper Elite 4 and Battlefield 1). These games were included nonetheless, allowing further potential observations to be made. They were selected based on the list of games supported by the data collection tool. The third criterion was that the game must have some kind of single player game mode. This to not subject other players to unfair play, as the tool provided data not accessible by the common user. However, some games were analyzed in multi-player mode, but only up to the point where interaction with other players would occur. This also increased the reproducibility of the study, as other players characters and content would not be a factor that affected the collected data.

8 Chapter 3. Method 9

3.2 Data collection tool – Triangelplockaren

The data collected was captured using a tool called Triangelplockaren. This tool was developed by Stefan Petersson, a university lecturer at BTH. The tool can capture information from the graphics pipeline in supported Direct3D 12 applications, in real-time, and present the data in a separate viewer. In this viewer, the user can view the geometry parsed in the geometry stage, with the possibility to move around freely in the current frame. This is illustrated in figures 3.1 to 3.3. On top of a view of the geometry, various data is also saved to a log file. This file contains, per frame, the following information:

ˆ The number of perspective projected triangles inside and outside the camera view frustum.

ˆ The number of draw calls (see glossary) inside and outside the camera view frustum. A draw call outside the frustum meant that no perspective projected triangles rendered in that draw operation resided inside the frustum. A draw call that had at least one vertex of a perspective projected triangle inside the frustum was counted as being inside.

ˆ The size of the vertex data output (in bytes) inside and outside the camera view frustum.

ˆ The average time (in milliseconds) spent for all draw calls. This time includes the overhead of the operations performed by Triangelplockaren, and can there- fore not be used as concrete values.

ˆ The average time (in milliseconds) spent for all draw calls from input assembly to clipping. This time includes overhead as above. This time was measured from a separate draw call, meaning that every draw call issued for a frame was made twice, once for the entire pipeline and once for the geometry stage only.

ˆ The average time (in milliseconds) spent on clipped geometry. Calculated as Geometry time (T riangles outside/T riangles total) This was done for each individual draw∗ call and then averaged per frame. Chapter 3. Method 10

Figure 3.1: Screen capture from Battlefield 1 gameplay. Chapter 3. Method 11

Figure 3.2: Screen capture from Battlefield 1 gameplay, representing the perspective projected triangles rendered to the screen, from the player cameras perspective, and captured by Triangelplockaren in the current frame. This image only shows triangles that have been projected using the player camera transform matrices, and does not show, for example, triangles rendered in a draw call for effects such as shadow mapping. Chapter 3. Method 12

Figure 3.3: Screen capture from Battlefield 1 gameplay, representing geometry in the same way as in Figure 3.2, but seen from a different angle through Triangelplockaren. Red colored geometry represents geometry which is outside the player camera viewing frustum, green geometry is inside the frustum, and triangles where only edges are visible uses alpha blending (mostly particles). Again, perspective projected triangles from other viewing frustums are left out in this image.

3.3 Experiment setup

The data collected was captured on an ASUS G11CD, running Windows 10, with an NVIDIA GeForce GTX 1080 Ti graphics card. While capturing data, the only user-processes running were Triangelplockaren, Origin (Electronic Arts) and Steam (Valve). Each game was analyzed in one sitting, without closing the application between captures. Capturing data was manually initiated and terminated by the author, through key-press. Considering the large number of frames captured for each game and session, the impact of manually controlling the captures would be, if at all, minor. In order to not exclude any geometry only rendered with certain graphics settings, the graphics settings of each game were set to maximum. Shadow mapping does, for example, utilize a conceptual camera to render geometry from the perspective of the light. Turning this feature off could result in less geometry being rendered. Chapter 3. Method 13

3.4 Data collection

As the geometry rendered within each frame in a game varies, multiple series of rendered frames were captured. To strengthen the validity of the results, three iterations of the built in benchmark were captured. Additionally, five minutes of gameplay in the first mission or tutorial was captured for each game, to explore if the gameplay would show similar data to benchmarks. This was further motivated by the fact that the introductory tasks in each game was linear, as the player is expected to learn the core mechanics of the game at this point. This was performed for all games with built-in benchmarks. Capturing five minutes of gameplay was motivated by the number of frames produced. At very low frame rate, this would still produce well over 1000 frames, similar to the average number of frames captured in benchmarks. Since FIFA 18 lacked the ability to run benchmark sessions, three matches be- tween two computer controlled teams were observed and captured. This was deemed sufficient as each match has almost identical gameplay while not playing a cutscene. A cutscene, or close-up of a player character, occurred when a special event was triggered, such as a goal or corner. As the first mission in Sniper Elite 4 had a linear start, up to the point where the player was given the chance to branch out, three sessions from the first mission were captured. This as an attempt to make the study as reproducible as possible. For Battlefield 1, four captures were taken from random player spawns in the solo campaign Storm of Steel, where each instance was played until death. As the complexity and geometry rendered in each session fluctuated, the data gathered from Battlefield 1 was still considered relevant for observations, but almost impossible to reproduce and was therefore not supposed to serve as base for any conclusions. The sporadic gameplay of a first-person, war themed, game also motivated the addition of a fourth gameplay session, compared to FIFA 18 and Sniper Elite 4.

Seeing as each game is different, and because no other research was found that covered this topic, the number of sessions to capture and the number of frames per capture were deemed reasonable by the author.

3.5 Restrictions

As the GPU performs various operations in parallel, and because some overhead is added when Triangelplockaren is used, the timings measured could not be used as exact time in milliseconds. The timings measured for the geometry stage are however relative to each other, and were therefore normalized.

The tool, Triangelplockaren, only collected geometric information for perspective pro- jected triangles. Non-perspective projected triangles, often used for user interface- elements were ignored. The tool also excluded other types of geometry, such as points and lines. Chapter 4 Results

The data collected resulted in a total of 30 captured sessions. For each capture, four graphs were generated, representing the average data gathered per frame (Figures 4.1 to 4.8). Graph a represents the normalized average time used for all draw calls, per frame, broken down into the total time, the time used for geometry only, and the time used for processing clipped triangles. The normalization was calculated as x x A, x = ∀ ∈ max (x A) ∈ where A is all the timings measured in this capture and x is the time in milliseconds for a frame. Graph b shows the total number of perspective projected triangles rendered, inside and outside the frustum. Graph c shows the number of complete draw calls, partially inside and completely outside the frustum. Graph d shows the size of the vertex data, in bytes, for all draw calls, inside and outside the frustum.

The data of each capture was also added to a table (Table 4.1) which shows the average portion of time spent on processing geometry outside the viewing frustum in relation to all geometry, the total number of triangles, total number of draw calls and total vertex data size of the drawn geometry, as well as the percentage of the data in each column that resides outside the viewing frustum.

14 Chapter 4. Results 15

Average per frame Time Triangles Draw calls Vertex data (Bytes) Capture Out Total Out Total Out Total Out Battlefield: Infantry1 38.9% 2310616 35.1% 699 11.9% 41671144 35.0% Battlefield: Infantry2 45.4% 2502782 33.6% 955 10.6% 45837456 33.5% Battlefield: Infantry3 44.2% 2300053 35.8% 666 6.7% 40318564 35.7% Battlefield: Tank 36.8% 2541477 35.1% 748 7.6% 45682460 35.0% Deus Ex: Benchmark1 55.5% 5469903 31.7% 2652 12.1% 115654768 31.8% Deus Ex: Benchmark2 55.7% 5461739 31.8% 2651 12.1% 115527128 31.8% Deus Ex: Benchmark3 56.2% 5467952 32.0% 2654 12.2% 115649248 32.1% Deus Ex: FirstMission 62.8% 1154673 41.4% 689 9.7% 31845900 44.2% FIFA: Match1 30.3% 1574744 23.9% 391 11.4% 29198856 25.1% FIFA: Match2 25.4% 1477316 20.4% 374 6.8% 27140632 23.0% FIFA: Match3 32.6% 1097957 26.0% 341 9.1% 22539572 28.0% Hitman: Benchmark1 18.6% 2508941 21.6% 1778 8.3% 72985584 21.1% Hitman: Benchmark2 18.6% 2476227 21.6% 1760 8.2% 72087280 21.1% Hitman: Benchmark3 18.3% 2495611 21.9% 1774 8.3% 72499264 21.4% Hitman: FirstMission 40.0% 1549548 41.0% 810 18.3% 47079612 38.6% Sniper Elite: FirstMission1 32.2% 1393342 15.5% 594 30.3% 35140796 16.1% Sniper Elite: FirstMission2 32.2% 1393342 15.5% 594 30.3% 35140796 16.1% Sniper Elite: FirstMission3 28.9% 1438188 13.3% 637 26.6% 36540452 14.0% The Division: Benchmark1 16.7% 3930493 25.4% 3538 9.0% 52528036 23.2% The Division: Benchmark2 17.2% 3846238 25.8% 3513 9.1% 51340144 23.6% The Division: Benchmark3 16.5% 4289373 23.9% 3640 9.2% 58372964 21.7% The Division: FirstMission 22.3% 2018040 32.4% 1635 10.5% 26760704 31.0% Tomb Raider: Benchmark1 33.4% 5868812 41.3% 1104 12.7% 74954344 41.6% Tomb Raider: Benchmark2 32.9% 5747545 42.1% 1081 13.0% 73254896 42.3% Tomb Raider: Benchmark3 33.3% 5757778 42.1% 1081 12.8% 73429272 42.3% Tomb Raider: FirstMission 37.5% 4705525 44.8% 570 15.2% 63551108 44.6% Warhammer: Benchmark1 29.1% 5603252 35.9% 759 5.9% 133707864 24.6% Warhammer: Benchmark2 28.3% 5603075 35.8% 761 5.9% 133682432 24.6% Warhammer: Benchmark3 30.2% 5599311 35.9% 759 5.9% 133566256 24.6% Warhammer: FirstMission 26.0% 4289489 40.8% 682 3.5% 98529232 26.3%

Table 4.1: A table showing the average per frame data for each capture. The values displayed for each capture have been divided into four categories: Time, presenting the percentage of time spent on processing triangles outside the viewing frustum in relation to the time to process all triangles. Triangles, presenting the average number of triangles rendered per frame, and the percentage of triangles residing outside the viewing frustum. Draw calls, presenting the average number of draw calls and the percentage of whole draw calls outside the viewing frustum. Vertex data, presenting the average vertex data size in bytes, and the percentage residing outside the viewing frustum. Chapter 4. Results 16

Battlefield 1 – Infantry 1

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 · 2 0.8 1.5 0.6 1 0.4

0.2 0.5 Draw time (normalized)

0 0 Perspective projected triangles 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 107 4· 800 3 600

2 400

200 1 Complete draw calls

0 0 Vertex data output size (bytes) 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

Figure 4.1: Graphs representing the data collected in the Battlefield 1 gameplay capture. Chapter 4. Results 17

Deus Ex: Mankind Divided – First mission

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 ·

0.8 2

0.6

0.4 1

0.2 Draw time (normalized)

0 0 Perspective projected triangles 0 200 400 600 800 1,0001,2001,4001,6001,800 0 200 400 600 800 1,0001,2001,4001,6001,800 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 8107 · 2,000 6 1,500

4 1,000

500 2 Complete draw calls

0 0 Vertex data output size (bytes) 0 200 400 600 800 1,0001,2001,4001,6001,800 0 200 400 600 800 1,0001,2001,4001,6001,800 Frame number Frame number

Figure 4.2: Graphs representing the data collected in the Deus Ex: Mankind Divided gameplay capture. Chapter 4. Results 18

FIFA 18 – Match 1

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 · 4 0.8 3 0.6 2 0.4

0.2 1 Draw time (normalized)

0 0 Perspective projected triangles 0 1,000 2,000 3,000 4,000 0 1,000 2,000 3,000 4,000 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 107 ·8 1,000

800 6

600 4 400 2

Complete draw calls 200

0 0 Vertex data output size (bytes) 0 1,000 2,000 3,000 4,000 0 1,000 2,000 3,000 4,000 Frame number Frame number

Figure 4.3: Graphs representing the data collected in the FIFA 18 gameplay capture. Chapter 4. Results 19

Hitman – First mission

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 5·

0.8 4

0.6 3

0.4 2

0.2 1 Draw time (normalized)

0 0 Perspective projected triangles 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 3,000 108 · 1.5

2,000 1

1,000 0.5 Complete draw calls

0 0 Vertex data output size (bytes) 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

Figure 4.4: Graphs representing the data collected in the Hitman gameplay capture. Chapter 4. Results 20

Rise of the Tomb Raider – First mission

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 · 6 0.8

0.6 4

0.4 2 0.2 Draw time (normalized)

0 Perspective projected triangles 0 0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 107 · 8 800

6 600

400 4

200 Complete draw calls 2

0 Vertex data output size (bytes) 0 0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500 Frame number Frame number

Figure 4.5: Graphs representing the data collected in the Rise of the Tomb Raider gameplay capture. Chapter 4. Results 21

Sniper Elite 4 – First mission 1

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 · 3 0.8

0.6 2

0.4 1 0.2 Draw time (normalized)

0 0 Perspective projected triangles 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 108 2,500 1·

2,000 0.8

1,500 0.6

1,000 0.4

0.2 Complete draw calls 500

0 0 Vertex data output size (bytes) 0 500 1,000 1,500 2,000 2,500 3,000 0 500 1,000 1,500 2,000 2,500 3,000 Frame number Frame number

Figure 4.6: Graphs representing the data collected in the Sniper Elite 4 gameplay capture. Chapter 4. Results 22

Tom Clancy’s The Division – First mission

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 6 810 1 ·

0.8 6

0.6 4 0.4 2 0.2 Draw time (normalized)

0 0 Perspective projected triangles 0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 110.28 · 6,000 1

0.8 4,000 0.6

2,000 0.4

Complete draw calls 0.2

0 0 Vertex data output size (bytes) 0 500 1,000 1,500 2,000 2,500 0 500 1,000 1,500 2,000 2,500 Frame number Frame number

Figure 4.7: Graphs representing the data collected in the Tom Clancy’s The Division gameplay capture. Chapter 4. Results 23

Total War: Warhammer II – First mission

(a) (b)

Total Geometry Clipped Inside frustum Outside frustum 106 1 8·

0.8 6 0.6 4 0.4

0.2 2 Draw time (normalized)

0 Perspective projected triangles 0 0 1,000 2,000 3,000 4,000 5,000 0 1,000 2,000 3,000 4,000 5,000 Frame number Frame number

(c) (d)

Inside frustum Outside frustum Inside frustum Outside frustum 108 1,000 2· .5

800 2

600 1.5

400 1

Complete draw calls 200 0.5

0 0 Vertex data output size (bytes) 0 1,000 2,000 3,000 4,000 5,000 0 1,000 2,000 3,000 4,000 5,000 Frame number Frame number

Figure 4.8: Graphs representing the data collected in the Total War: Warhammer II gameplay capture. Chapter 5 Analysis and Discussion

The results showed that in the majority of captures, over 30% of the time spent on processing geometry was wasted on geometry outside the frustum. The results also showed that in the majority of captures, 20% or more of rendered triangles and vertex data resided outside the viewing frustum, with the exception of Sniper Elite 4, which was lying at 14% and 15% respectively. The graphs (Figures 4.1 to 4.8) showed that the amount∼ of triangles∼ outside the frustum surpassed the amount of triangles inside at several points in most games, with the exception of Tom Clancy’s The Division. This meant that more than half of the triangles were outside the frustum in those frames. The graphs also showed that the vertex data size outside the frustum closely followed the amount of triangles outside, meaning that the triangles outside the frustum are similar in data size to the ones inside. The percentage of draw calls outside the frustum was lower than the other cat- egories for all games, with the exception of Sniper Elite 4, which had a percentage almost twice as high as the percentage of triangles and vertex data outside the frus- tum. Total War: Warhammer wasted the fewest draw calls, with as little as 3% in gameplay and 6% in each benchmark. ∼ The time wasted,∼ number of triangles, complete draw calls and vertex data outside the viewing frustum proved to have reasonable stability between benchmarks, within each game. It was clear from the results that there is a noteworthy waste, relative to the totality, in all of the analyzed games. Additionally, the indications below were found for each game.

5.1 Battlefield 1

Contrary to initial assumptions made by the author, the data for Battlefield 1 proved to be almost as steady between captures regarding triangles and vertex data outside the frustum as in benchmarks for the other games. However, the time wasted and the average number of draw calls outside the frustum varied. Compared to other gameplay sessions, Battlefield 1 had a moderate percentage of triangles and vertex data size outside the frustum. The time wasted however, was higher than most other games. A higher percentage in time wasted than the percentage of triangles outside the frustum indicates that, in average, processing geometry discarded through clipping was more costly than for the geometry kept. This could however be affected by overhead from a fair amount of draw calls outside

24 Chapter 5. Analysis and Discussion 25 the frustum, as well as operations performed by the GPU not visible in the collected data.

5.2 Deus Ex: Mankind Divided

Deus Ex: Mankind Divided spent, in average, more than half of the time to process geometry on triangles that resided outside the viewing frustum. This was by far the highest proportional waste in time of all the captures taken. As the average portion of triangles residing outside the frustum was much lower, this meant that the trian- gles outside the frustum were in average more costly to process.

The captured gameplay proved to, proportionally, have a 10% higher waste in time, 25% more triangles, and 25% more data rendered∼ outside viewing frus- tum compared∼ to the benchmarks.∼ Additionally, the average total triangle count and data size proved to be 80% less than in the benchmarks. This indicated that the percentage of waste in benchmarks∼ could be lower due to better culling than during gameplay, and due to developers deliberately placing more geometry in the scripted path of the camera, or the camera being scripted to focus on geometry-rich areas.

The percentage of triangles and vertex data outside the frustum is almost three times as high as the percentage of draw calls outside the frustum. This indicates that there could be either few, large draw calls (in terms of triangles and game world coverage) outside the frustum that could be culled, or that the draw calls inside the frustum are large and could be divided into smaller draw calls, of which some could in turn be culled.

5.3 FIFA 18

FIFA 18 showed a moderate amount of relative time, triangles and vertex data size drawn outside the viewing frustum compared to the other games. Also, the total number of triangles and draw calls were among the lowest of all the captured sessions. In any case, proportionally, the wasted time for processing geometry should still be considered high. The fluctuations in waste between matches indicates that the number of goals, corners and other events that invoke scripted camera sessions affects the results. Further matches would have to be analyzed in order to confirm this.

5.4 Hitman

The average time used to process triangles outside the viewing frustum in Hitman was slightly lower than on triangles inside the frustum, indicating that the triangles rendered outside the frustum were cheaper to process than the ones inside.

In the benchmark sessions, triangles, draw calls and vertex data outside the frus- tum were also fairly high. However, gameplay presented even higher proportional Chapter 5. Analysis and Discussion 26 wastes in comparison to the benchmark results, about twice as high, giving the same indications as for Deus Ex: Mankind Divided, where focus in the benchmark has likely been on adding geometry in the path of the camera or focusing the camera on geometry-rich locations. Additionally, the severe difference in time wasted further strengthened this indication. It was safe to say that the culling in the benchmark sessions were much better than in the gameplay session.

5.5 Rise of the Tomb Raider

Out of the five highest proportions of rendered triangles outside the viewing frustum, Rise of the Tomb Raider ranked highest with gameplay at first place, and bench- marks at second, third and fifth place. Rise of the Tomb Raider also rated highest in vertex data outside the frustum, with gameplay at first place, and benchmarks at third, fourth and fifth place.

Rise of the Tomb Raider also ranked above average in terms of complete draw calls outside the frustum. Similarly to Deus Ex: Mankind Divided, this was also about a third of the portion of triangles and triangle data outside the frustum, giving the same indications of too large draw calls inside the frustum or that splitting draw calls could be beneficial.

Contrary to the games above with built in benchmarks, there was only a slight difference between the benchmark sessions and the gameplay session in all categories.

5.6 Sniper Elite 4

As with FIFA 18, Sniper Elite 4 produced the lowest numbers in terms of total triangles, draw calls and vertex data size. The waste in relative time was moderate, while the portions of triangles and vertex data outside the frustum were the lowest compared to the other games, in each of the captured sessions. This indicated the same as Deus Ex: Mankind Divided, where the triangles outside the frustum were in average more costly to process. Also, as Sniper Elite 4 had a hefty portion of complete draw calls outside the viewing frustum, this indicates that many smaller draw calls were made.

5.7 Tom Clancy’s The division

Tom Clancy’s The division had a moderate amount of information drawn outside the frustum, compared to the other games, with the exception of wasted time. Among all benchmarks, Tom Clancy’s The division had the least wasted time in relation to the time for processing all geometry. This was also true for the gameplay sessions. This, similarly to Hitman, meant that the average time used to process triangles outside the viewing frustum was lower than on triangles inside the frustum. Chapter 5. Analysis and Discussion 27

The game did also have higher portions of information drawn outside the frustum, and higher amounts of total geometry in the benchmark sessions compared to the gameplay session. This gave the same indications as for Deus Ex: Mankind Divided and Hitman regarding benchmarks having better culling and/or the game camera focusing more on geometry-rich areas.

5.8 Total War: Warhammer II

Total War: Warhammer II had a moderate amount of relative time wasted, and was the only game, with benchmark functionality, to have less wasted time in the gameplay session than in the benchmark sessions. The average portion of triangles outside the frustum was substantial, benchmarks being about six times as high as the portion of draw calls outside the frustum. The portion of triangles outside the frustum for the gameplay session was extreme in comparison to the portion of draw calls outside, at almost nine times higher. This strongly points towards an inefficient distribution of geometry within draw calls. The case could be that the draw calls outside the frustum is responsible for many of the triangles outside the frustum, in which culling them should be advocated.

The combination of the average total number of triangles rendered being moderate, and a high amount of average vertex data usage, almost twice as high as most games, indicates that the game uses large vertex data structures and in turn has a greater impact on the graphics cards memory bandwidth. Chapter 6 Conclusions

This study showed evidence of noteworthy amounts of data being processed in the GPU which is discarded through clipping later in the graphics pipeline, in all of the included games. To answer the research questions of this work:

”To what extent are games wasting performance in the geometry stage of the GPU?” While it was impossible to draw conclusions regarding the direct impact on per- formance, it was safe to say that the performance relative to the geometry processed was noteworthy in each of the analyzed cases, and in many cases extreme.

”How large is the proportion of processed triangles discarded through clipping, in relation to the total number of triangles?” The amount of triangles processed unnecessarily was also noteworthy, and in some cases extreme. Other than the observations made during analysis, further conclusions regarding the need for optimization could not be drawn without information about the exact implementation of each application.

28 Chapter 7 Future work

For further analysis, it would be desirable to broaden the study, adding more games and more sessions per game, from different parts of the game world. Specifically for comparison between geometry intense areas and sparse locations, which would provide a broader overview of the general state of optimization each game. The results of this study also suggests that work in the area of investigating the need for optimization in game applications is viable, and further studies could for example investigate the ”types” of data that is wasted. Tesselated triangles could for example be optimized away differently than particles, hence the desire to categorize the data.

29 References

[1] Akenine-Moller,¨ T., Haines, E., and Hoffman, N. Real-time rendering, 3rd ed. A.K. Peters, Wellesley, Mass, 2008.

[2] Assarsson, U., and Moller, T. Optimized view frustum culling algorithms for bounding boxes. Journal of Graphics Tools 5, 1 (2000), 9–22.

[3] Hudson, T., Manocha, D., Cohen, J., Lin, M., Hoff, K., and Zhang, H. Accelerated occlusion culling using shadow frusta. In Proceedings of the Thirteenth Annual Symposium on Computational Geometry (New York, NY, USA, 1997), SCG ’97, ACM, pp. 1–10.

[4] Jahrmann, K., and Wimmer, M. Responsive real-time grass rendering for general 3d scenes. ACM, pp. 1–10.

[5] Lake, A., and Books24x7, I. Game Programming Gems 8. Cengage Learn- ing, Boston, 2011;2010;.

[6] Movania, M. M. OpenGL Development Cookbook, 1 ed. Packt Publishing, Olton, 2013.

[7] Advanced Micro Devices, I. Polaris Carbon Footprint Study. https:// www.amd.com/Documents/polaris-carbon-footprint-study.pdf. (accessed: 11 March 2018).

[8] Advanced Micro Devices, I. Vega Architecture. https://gaming.radeon. com/en/vega-architecture/. (accessed: 11 March 2018).

[9] Microsoft Corporation. Graphics Pipeline (Windows). https: //msdn.microsoft.com/en-us/library/windows/desktop/ff476882(v= vs.85).aspx/. (accessed: 11 March 2018).

[10] Microsoft Corporation. What is Direct3d12? (Windows). https: //msdn.microsoft.com/en-us/library/windows/desktop/dn899228(v=vs. 85).aspx. (accessed: 5 June 2018).

[11] NVIDIA Corporation. Pascal GPU Architecture. https://www.nvidia. com/en-us/data-center/pascal-gpu-architecture/. (accessed: 2 Februari 2018).

[12] The Khronos Group Inc. Vulkan Overview. https://www.khronos.org/ vulkan/. (accessed: 5 June 2018).

30 References 31

[13] Su, M., Guo, R., Wang, H., Wang, S., and Niu, P. View frustum culling algorithm based on optimized scene management structure. In 2017 IEEE International Conference on Information and Automation (ICIA) (July 2017), pp. 838–842.

[14] Sunar, M. S., Zin, A., and Sembok, T. Improved view frustum culling technique for real-time virtual heritage application. 43–48.

[15] Varcholik, P. Real-time with DirectX 11 and HLSL: a practical guide to graphics programming. Pearson Addison Wesley, 2014.

[16] Zhang, G. Real time of large terrain in flight simulation. In 2008 Asia Simulation Conference - 7th International Conference on System Simula- tion and Scientific Computing (Oct 2008), pp. 36–39.

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden