Wydział Informatyki i Zarządzania kierunek studiów: Informatyka specjalność: Systemy informacyjne

Praca dyplomowa - magisterska

Procedural heightmap generation for level of detail purposes

Odyseusz Spiridis słowa kluczowe: procedural heightmap generation Krótkie streszczenie: W poniższej pracy zostaną ocenione, pod względem szybkości generowania, dwa rodzaje shaderów.

opiekun ...... pracy Tytuł/stopień naukowy/imię i nazwisko ocena podpis dyplomowej

Ostateczna ocena za pracę dyplomową Przewodniczący ...... Komisji egzaminu ocena podpis dyplomowego Tytuł/stopień naukowy/imię i nazwisko

Do celów archiwalnych pracę dyplomową zakwalifikowano do:* a) kategorii A (akta wieczyste) b) kategorii BE 50 (po 50 latach podlegające ekspertyzie) * niepotrzebne skreślić

pieczątka wydziałowa

Wrocław 2017

Streszczenie: Praca traktuje o proceduralnym generowaniu zawartości gry z naciskiem na generowane map wysokości w celu stworzenia proceduralnego planetarnego terenu. Aby móc użyć mapę wysokości w algorytmie optymalizującym konkretny typ szumu musi być użyty w celu uniknięcia nieciągłości. Po objaśnieniu szumów, algorytmu optymalizującego oraz dlaczego to GPU zostało wybrane do tego zadania kilka testów wydajności zostanie przeprowadzonych aby sprawdzić który rodzaj shadera daje najlepsze wyniki pod względem szybkości generowania map wysokości mierzonej w, dobrze znanej w branży gier, jednostce klatek na sekundę.

Abstract: This master thesis is about procedural content generation with emphasis to heightmap generation in order to create procedural planetary terrain. To use heightmaps in level of detail algorithm certain type of noise have to be used to avoid discontinuities. After explanation of noises, level of detail and why GPU is used in this task several benchmarks will be run to test what type of shader gives the best results in term of heightmap generation speed measured in, well known in graphic industry, frame per second unit.

2

1. Objectives ...... 4 2. Approach to terrain in games...... 4 2.1. Terrain in classic games ...... 5 2.2. Procedural terrains elements in mainstream games ...... 6 2.3. New approaches to terrain ...... 7 3. ...... 12 4. Level of detail ...... 13 5. Smooth noise ...... 15 6. GPU and CPU architecture comparison ...... 19 7. Shaders ...... 21 8. Test subjects ...... 23 8.1. Vertex shader...... 23 8.2. Compute shader ...... 26 9. Benchmark motivation and development ...... 27 9.1. Motivation ...... 27 9.2. Development ...... 28 10. Benchmarks...... 32 10.1. Heightmap generation benchmark ...... 33 10.1.1. Introduction ...... 33 10.1.2. Results ...... 34 10.2. Real-time heightmap generation benchmark ...... 37 10.2.1. Introduction ...... 37 10.2.2. Results ...... 37 10.3. GPU – driven gravity simulation ...... 41 11. Tests summary ...... 42 12. Bilbiography ...... 44

3

1. Objectives First objective of this thesis is to introduce reader to procedural content generation and its potential uses. Main discussion will be about terrain generation with emphasis to procedural planetary terrain however will be shown that procedural generation is suitable not only for terrain generation. In order to fully understand procedural planetary generation classic techniques of terrain creation will be discussed. Also techniques that speed up real time rendering and reduce resources usage will be covered followed by discussion about noises and how to distinguish its properties with examples on real time rendering application . After introduction to artistic side of procedural generation more technical aspects will be expounded. In these paragraphs architecture of modern graphic card and how to use them will be reviewed and explained why, for procedural terrain generation, it’s the best choice. Because graphic card are also programmable but in different way, programs which run on them will be compared and its differences will be explained. At the end tests will be run, in application exclusively written for this thesis, to find what is the fastest way to generate procedural heightmaps which are main building blocks for procedural planetary terrain. Three benchmarks will be run on three different manufacturers of graphic card drivers. These benchmark will give answer what kind of shader to use for particular purpose and how much performance gain will be there on a specific platform.

2. Approach to terrain in games. Games are interactive computer software in which some action takes place. There are many genres of games like racing, strategy or role playing game, but all of them have to be placed in some environment. One of the most important part of any environment is a terrain which can have direct impact on gameplay like in strategy game or purely aesthetical, for example distant mountains as background in racing game. Realistically created terrain is one of main requirement to consider game immersive and realistic if realism is what game is aiming for. Even for arcade games with toonish styling diversified terrain (levels) is advantage since player is not bored with repetitive surroundings. Games focused strictly on some mechanism like soft body physics, which start on flat plane, also benefits from interesting terrain because it easier to interest players who are not up to date with newest technology. See Figure 1 example.

Figure 1 BeamNG terrain in early and recent stage of development. : bsimracing.com

4

Creation of terrain starts as any other game object with geometry. Classically level artist make terrain or level of game. Flat plane or other generic shape is taken and in prepared editor with pre- defined tools shape of object is changed. Other quite new approach is to generate terrain with some kind of noise with pre-defined properties (eg. amplitude, frequency) without human intervention. There is also something in-between where tiles are pre-made by artist but whole terrain or level is assembled into one piece procedurally. In this master thesis mostly second option, fully procedural terrain is taken into consideration. 2.1. Terrain in classic games As mentioned earlier, in mainstream games the creation of terrain depends mainly on the artist. This process requires making (or using existing one) and editor with friendly graphics user interface, some assets to populate game (like trees, buildings etc.) and of course concept. Example of early stage of terrain development in Engine is shown on Figure 2. As it can be seen artist can use various shaped brushes to add or substract height in certain point. When the geometry is done textures, collision mesh are added, some assets are placed and terrain is ready to host game action. See in Figure 3 example of excellent terrain in one of the most successful role playing game ever made: The Witcher 3

Figure 2 Early stage of terrain development. Source: youtube.com

5

Figure 3 Screen shot from The Witcher 3

2.2. Procedural terrains elements in mainstream games One of the most successful game with procedural terrain generation is the Minecraft. It is sandbox video game created and designed by Swedish game designer Markus Persson, and later fully developed and published by Mojang. The creative and building aspects of the Minecraft allow players to with variety different cubes in 3D procedurally generated world. Other activities in game include exploration, resource gathering, crafting and combat[1]. It uses cube based engine. These cubes can be gathered by player, processed and placed in any place. To generate terrain, biomes, villages, monsters etc. procedural generation is used. Artists only prepare assets (monsters, armors, weapons etc.), blocks (dirt, sand etc.) and characters. The rest is further developed by an algorithm. In Figure 4 different biomes can be seen.

Figure 4 Biomes in Minecraft. Source: planetminecraft.com

6

Another excellent example of procedural level generation was introduced in the Diablo which was published in 1996. This is action role-playing hack and slash game developed by Blizzard North. It uses algorithm to place and connect together tiles in order to create whole level. One tile has one to four exits placed on north, east, south and west side of map. Tiles are of course created by level designer. This is one of the first game that pushed procedural generation even further not only to create maps but also to random generate quests and monsters. This type of level design is continued in subsequent game releases. This technique of connecting pre-made level tiles is also used in newest Codemasters rally game Dirt 4 which was released in 2017. Not only stages by itself are procedurally generated but also random events that take place on stages like crashes, appearance of drones and TV helicopters. Some of players criticized Dirt 4 for its approach to rally stages and said that they feel blend and repetitive but nevertheless player have virtually infinite amount of them to play and, like in real life, if new championship is generated player have no chance to drive it limitless time to memorize every corner. Dirt series also provide very realistic wheel feeling through force feedback because utilize noise to simulate little bumps on surface like randomly spread rocks on gravel surface or little cracks and irregularities of asphalt. When it comes to racing game and Codemasters it is necessary to mention about Fuel, their first try of procedural generated racing game. Though it is not directly said what is procedurally generated and how, it can be assumed from how much game required disk space (6GB) and how big the game was (about 14.440km2 [2]). Fuel was not quite successful though due to lack of immersion and poor handling model. 2.3. New approaches to terrain Recently games got bigger and bigger not only in terms of area to explore. Procedural generation allows to generate enormous terrains using with a lot of details using very little of disk space. If it comes to closed level games like tracks in racing games or dungeons in role playing games there is no problem with limiting players to stay on particular area because it comes naturally. On tracks we are limited with barriers, behind which cheering crowds are, or walls of underground dungeon. But if it comes to sandbox games, where player can roam as they wish it is problem to limit player and do not destroy immersion with invisible walls. Some games tackle this problem wit a lot of grace. For example player in Grand Theft Auto III and newer is always placed on the island surrounded by water.

7

Figure 5 Grand Theft Auto V map. Source: gtaall.com

Another good example of not destroying immersion on the edge on map were also presented in The Witcher 3. Although map is not limit by any natural obstacle player can not cross certain border. What happens than is that protagonist does not want go any further and say some clever comment like “Better look at the map” or “I am too old to travel the world”, but map itself is continuous with no visible borders. These kind of limiting in games where player are forced to stay on the ground is good enough. Problems occur when player is able to leave the ground and travel in air. The question is how to provide immersive gameplay when player can easily roam wherever wants. One of the solutions is to procedurally generate terrain in front of the player. This solutions is used in mentioned before Minecraft where terrain is generated only in places which player visited. But it is not limitless as the biggest terrain Minecraft can generate can contain area as big as Neptune [3]. Another solution is to generate procedural terrain in front of the player as it used to be in some platform game. It can be achieved on the classic heightmap1 which travels with player through worldspace coordinates or using a lot more sophisticated algorithm called Marching Cubes (also used in medicine) which allows to create complex 3D models. Comprehensive explanation of this algorithm is written by Paul Bourke in his blog [4]. In my opinion the best solution for this kind of problem comes right from nature. The best

1 heightmap – other name heightfield is a two dimensional array that holds elevation data. It is convenient to represent heightmap as image. Common practice is R and B channels are X and Z coordinates while G is height. 8 limitless and naturally occurring shape is sphere. Planets are spherical and as history shown it can host any action human ever done. One of most popular, but sadly not successful game using planetary engine is No Man’s Sky. It is an action-adventure survival video game developed and published by the indie studio Hello Games for PlayStation 4 and Microsoft Windows. It was released in 2016. The gameplay of No Man’s Sky is built on four pillars: exploration, survival, combat and trading. Players are free to perform within the entirely procedurally generated deterministic open universe, which includes over 1.8×1019 planets many with their own sets of flora and fauna [5]. If it comes to terrain geometry developers revealed on Game Developers Conference that it based on voxels [6]. What is more this game uses procedural generation not only for terrain but for practically everything like fauna and flora. See Figure 6 for example of procedural generated animals. Although No Man’s Sky showed some cutting edge technology does not succeed probably because of lack of promised multiplayer and blend world. In my opinion also immersion was not good enough due to worldspace jumping between planets. In other world there was no seamless travelling between them, player needs to go through loading screen.

Figure 6 A random generation of creatures based on the triceratops model. Source: 3dgamedevblog.com

In 2008 started new project in Slovakia called Outerra. It is semi procedural . I called it semi procedural because the most general information about current place is taken directly from and afterwards noises are used to add more detail. It is able to render the Earth in scale 1:1 from space to one centimeter resolution using fractal algorithms, unlimited visibility, procedural content generation with integrated vehicle and physics engines [7]. Outerra also provide state of the art water simulation, terrain deformation, procedural texturing, procedural vegetation and some excellent solutions to common known problems which game developers have to deal with ( e.g. z-fighting), which are posted on their forum and blog [8]. What is more in Outerra currently are tested procedural cities and roads. It is not revealed how terrain is polygonised it can be assumed that these are heightmaps with different resolutions with additional level of detail (on forums on of the developers said that adaptive quadtree is used). Nevertheless results are truly amazing and much more convincing than we could see in No Man’s Sky. Pictures are not enough to fully appreciate scale and sophistication of this project but still some screenshots from official blog are shown in Figure 7. 9

Figure 7 Screenshot from Outerra. Source: outerra.com

There is also another great procedural planetary project in progress called Star Citizen. It is more entertainment focused and massive multiplayer online gameplay oriented. It is mainly crowd funded and it was able to gather more than 141 million dollars from fans [9] . Game contains first person shooter mechanism with realistic ballistic engine and human body physics, spectacular space fights with rewarding flight mechanism, ground vehicles and, what is the most important in this master thesis, realistic planetary scale terrains. It is not revealed exactly how much procedural generation is used but there can be seen some pre-made location with bases, buildings and landing pads that do not look procedural generated. Taking into consideration how big planets are can be assumed that some of theirs parts are made by artists and some are procedurally generated. What is distinct from Outerra is that in game contain more than one fictional planets. Despite the fact that Star Citizen is based on engine no more information are provided about technology but after seeing some level of detail popups while approaching planet surface can be assumed that most common quadtree based LOD with heightmaps were used. Screen shots from Star Citizen are shown in Figure 8.

10

Figure 8 Various planets in Star Citizen. Source: reddit.com user Sitarow

In this chapter two projects in progress and one fully developed game was briefly discussed and of course these are not only existing games and projects that deal with terrain problem in new interesting and innovative way. However these were projects were used as inspiration in my personal project which will use results of benchmarks included in this thesis in order to improve performance of procedural generation. See Figure 9 for example screenshots.

Figure 9 My own OpenGL based procedural planetary engine

11

3. Procedural generation The most generally procedural generation is technique to create content using some kind of randomness with certain restrictions. Procedural means that some procedures are used. Can be said that even handmade objects are procedurally created because artist had to follow some kind of procedures. For example in order to create human face geometry have to be prepared, textures , collision mesh (hitboxes) and later in given order assembled together. However in context of this thesis procedural generation means that some mathematic procedures are deployed within certain constraints in order to create or modify in game content. It is hard to tell when exactly procedural generation started but one first game which presented true potential of procedural generation was released in 1984 space exploration and combat game Elite. The game was originally intended to have 282 trillion galaxies (each containing 256 solar systems) but the publisher, Acornsoft, thought that players would find it impossible to believe such an array of galaxies could exist in an early generation 8-bit computer game and would be potentially overwhelmed. As a result, they scaled the generation code back so that it would create 8 galaxies (again with 256 solar systems to explore) at the start of the game [10]. Although it common to think about procedural generation in terms of terrain this approach can be used to generate other games content. It is mentioned before that No Man’s Sky used procedural generation (procedural modification) to generate countless of animals to populate procedurally generated planets. Another game that use a lot of procedural generation was Spore which was published in 2008. This life simulation, real-time strategy single-player god game developed by Maxis, allows a player to control the development of a species, from its beginning as a microscopic organism, through development as an intelligent social creature, to interstellar exploration as spacefaring culture[11]. During process of evolution species player can modify species in any way and procedural algorithm takes care of animations. Also procedural planets were generated to allow player interstellar exploration. There is also procedural texturing techniques mostly used along procedural terrains. Most common approach is to divide terrain by the biomes (e.g. forests, deserts, icelands) and terrain slope (e.g. snow cannot lay on the 90 degree slope) and apply appropriate texture. Textures itself can be pre-made by artists or also procedurally generated using photos as inputs to algorithm. Similar to terrain generation, the procedural generation of textures uses noise to create variance. This can then be used to create varying textures. Different patterns and equations are also used to create a more controlled noise that forms recognizable patterns. Generating textures procedurally like this means that you can potentially have an unlimited number of possible textures without any overhead on storage. From a limited pool of initial resources, endless combinations can be generated.[19] Very interesting and, at the time of writing, only tested in Outerra and in Star Citizen is procedural city generation. Figure 10 shows some results from fully procedural city generation. In Figure 11 screenshots from Star Citizen 3.0 presentations are shown. This technique requires pre-made assets like buildings, roads etc. Procedurally is generated layout of city, transportation routes and districts.

12

Figure 10 Procedurally generated city. Source: https://josauder.github.io/procedural_city_generation/

Figure 11 Procedural planetary city in Star Citizen. Source: youtube.com - CitizenCon 2017

4. Level of detail Level of detail (short lod) is an optimization approach which is used in order to improve performance of real time rendering. Can be used to any part of game object. Level of certain objects may depend on many factors; speed of the objects, its importance to scene, if is in shadow or lit or distance to camera. In texturing lod is called mipmapping which are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the same image. The height and width of each image, or level, in the mipmap is a power of two smaller than the previous level [12]. Example of mipmap are shown in Figure 12.

13

Figure 12 Example of texture mipmapping. Source: wikipedia.com

If it comes to geometry a lot of different lod algorithms can be used. One of them, which is shown in Figure 13, is reducing polygons count. These different models could be pre-made by artist but in this case they are procedurally removed if necessary [13]. On the other side of spectrum there is tessellation which is integrated into rendering pipeline and can add vertices in polygon-space coordinates in order to improve details. Tessellation pipeline’s shaders are also configurable so for example noise can be added.

Figure 13 Level of detail by reducing polygons count. Source is in 13th reference To create planetary scale terrain some more general level of detail algorithms have to be used. One of the approach is to store heightmaps in tiles which are organized in quadtree (tree where parent have maximum four children) hierarchy. One tile have four heightmaps and when certain conditions are fulfilled one of the heightmap is removed from tile (but still can be in memory for later use) and new tile is placed where removed heightmap was. When quadtree reaches its limitation (e.g. minimum tile size) other lod algorithms can be used within single heightmap.

14

Planetary terrain is roughly spherical and heightmaps are flat so it to make it work starting shape is cube which is later spherified. Examples of quadtree lod and spherified cube are shown in Figure 14.

Figure 14 Example of described above quadtree based lod on the left. On the right spherified cube. Source: https://www.gamedev.net/forums/topic/662284-spherified-cube/

5. Smooth noise In dictionaries noise often refers to some kind of unpredictable, random and not useful in any way information that can be ignored or filtered off. For example definition in Cambridge English dictionary state that information noise is a “unexplained or unexpected information in sample that is not useful and can be ignored”. When it comes to gathering information, transferring or displaying signal to a user it is true but in computer graphics noises are very useful. Noise algorithms take input and generate value, usually floating point value between -1 and 1. First of all procedural terrain generation requires pseudorandom noise. It means that random function returns always the same value when the same inputs are provided. As inputs mainly position coordinates are use. For example when generating terrain and then placing some assets for a given coordinates (x,y,z) there is always a tree right there. When player revisits this area always see the same landscape. Second very desirable feature of noise, when creating terrain, is smoothness. Noise is considered smooth when neighbor values does not change their values dramatically or in other words smooth noise is when derivative can be applied to a function, which is generated by a noise, at any given point. However at least one algorithm that can generate realistic terrains utilizing white noise exists. Nevertheless it cannot be used to generate heightmaps in quadtree based level of detail due to lack of continuity between adjacent heightmaps. See Figure 15.

15

Figure 15 Heightmap generated using Diamond Square algorithm Most commonly used noise to create procedural content is Perlin Noise. It was developed in 1983 because Perlin was not satisfied with “machine-like” looking computer graphics over being part in creating “Tron” movie. While creating his noise Perlin probably also created first shader language. Ken Perlin received multiple award for his work including Academy Award in 1997 [15]. The details about math and original source code can be found on its author’s webpage and blog [15] [16]. The idea is to generate the slopes directly, and infer height values from that, rather than generate height values and interpolate slopes. The random numbers we are going to generate will be interpreted as random gradients, i.e. the steepness and direction of the slopes. This kind of random initialization of an array is called gradient noise. [18] Example of Perlin Nose in comparison with white noise is show in Figure 16. Original Perlin Noise implementation include two and three dimensions. For flat heightmap 2D noise can be use but for planetary terrain 3D have to be used because at given coordinates point is offsetting by value of noise along radius of the sphere. There is also 4D implementation which can be uses for animate surfaces like oceans. In this case three dimensions are coordinates in space and fourth is time resulting in (x,y,z,time), a four dimensional vector as an input.

16

Figure 16 Perlin Noise on the left and white noise on the right. Source: wikipedia.om

To control Perlin Noise parameters are used: amplitude, frequency and amount of octaves. As it said before Noises functions generally return value from range -1 to 1 to match trigonometric functions. So amplitude is just factor which generated noise value is multiplied by. See Figure 17 for examples of varying amplitude.

Figure 17 Increasing amplitude from left to right Frequency is a factor which multiplies seed for noise. In other words with increasing frequency increase speed of travel through noise space. With increasing frequency same seeds sample noise space further away from each other. See Figure 18.

17

Figure 18 Increasing frequency from left to right Octaves count is how many times noise is generated at current coordinates with different amplitude and frequency and added to final result. With octaves lacunarity is closely connected because it is factor how much amplitude decrease and frequency increase per loop. Common practice is to set lacunarity near 2. In Figure 19 octaves are shown and in Figure 20 lacunarity.

Figure 19 Increasing octaves count from left to right

18

Figure 20 Increasing lacunarity from left to right

For optimization purposes, in case of animating surfaces, it is better to use so called Fourier Noise which is result of applying 2D Fourier transformation to random generated values in 2D grid, applying filter in frequency domain and using inverse Fourier transformation to create heightmap. Created this way heightmap is perfectly tilable which means that can be instantiate which saves computation power. This feature is shown in Figure 21.

Figure 21 Single heightmap generated with Fourier transformations on the left and the same heightmap instantiate several times on the right.

6. GPU and CPU architecture comparison Central Processing Unit (CPU) and Graphics Processing Unit (GPU) are two components of modern personal computer. Also original usage was different as the their names suggest. CPU is the brain of the computer, central management and processing unit. It have fast access to Rapid Access Memory (RAM). It does not have main purpose since it provides capabilities to do various

19 of things; getting input from user, mathematic operations, displaying results on a screen etc. Since first graphics chips have been introduced in 1970s [16] its only task was to display image on the user’s screen which is parallel asynchronous task as pixels don not need to know about each other, only in some cases nearest are taking into consideration (e.g. anti-aliasing). This usage of GPU forced manufactures to introduce some architectural approaches which make GPU different from CPU. First main difference is that CPU is low latency low throughput2 while GPU is high latency high throughput. Its mean that GPU are designed for task that can tolerate latency and CPU are designed to minimize latency like user input. To do so CPU need more cache while GPU not, so more transistor area is dedicated to computation power. Also GPU to be efficient need high throughput. What is more GPU can have more ALUs (Algorithmic Logic Units) for the same sized chip which means more threads (modern GPUs run 10.000s of threads concurrently). This is shown in Figure 22

Figure 22 CPU and GPU architecture comparison [17] GPU is designed to run specific types of threads. These threads are independent of each other so there is no synchronization issues. To reduce hardware overhead for thread management Single Instruction Multiple Data (SIMD) approach is used which mean that one instruction will run over different data. Also because GPU is able to run concurrently blocks programming is used (e.g. one pixel shader per draw call, or group of pixels). On GPU thread management and scheduling is done by hardware. In comparison CPU runs 10s of task with different instructions which are mapped to multiple threads. Each thread is managed and scheduled explicitly and have to be programmed individually. Unlike GPU threads which are lightweight CPU threads are considered heavyweight [17]. On the other hand because of lack of fancy branch predictors a lot of computation power is lost when branching, in compare to CPU, so writing a GPU program (shader) branches (if statements) should be avoided and if possible replacing with math. Also because GPU are not directly connected to RAM data transfer should be minimized or done with big batches. To summarize main difference is that CPU is more universal and faster in serial operations due to much faster clock speed and RAM access and GPU is better optimized for parallel computation. In case of generating procedural terrain it seems that GPU should be much better choice.

2 throughput – amount of work done in given amount of time 20

7. Shaders In this thesis OpenGL will be discussed but DirectX environment is somehow similar. Graphic Processing Unit is programable through shader. Shader is a GPU program that run in SIMD approach which is explained in previous chapter. Shader communicates with CPU side through Buffers Objects. There is couple different types of shaders. Most of them are incorporated into rendering pipeline. Complete rendering pipeline can be seen on Figure 23.

Figure 23 Full rendering pipeline. Source: RasterGrid Blog Not all shaders have to be provided in order to create rendering pipeline, only vertex and fragment shader are mandatory. All shaders which are part of rendering pipeline have strictly defined behavior, inputs and outputs. Recently that was changed with new type of shader called compute shader. It is first arbitrary purpose shader that can utilize GPU parallel power to compute non- graphics related task and is highly configurable. All shaders are written in C-like language called Graphics Library Shader Language (GLSL). To start GPU programming first GPU program have to be created. In order to do that shader source have to be compiled and linked with no errors. For benchmarks purposes LoadShaders() function was used which was introduced in official OpenGL Programming Guide. In Figure 24 example of shader loading. Note that there are five slots to fill with shaders. This is due to maximum of five configurable shaders in rendering pipeline.

Figure 24 Example of loading shader. In this case compute shader is loaded Next buffers have to be defined and filled with data. Flexible and big buffer is Shader Storage Buffer Object (SSBO). It can store primitive data types or structures. To create buffer first it needs to be generated, memory has to be allocated and then mapped if SSBO will store structured data. To do so, layout of structured data have to be provided both in shader and on the CPU-side. See example in Figure 25.

21

Figure 25 Example of properly configured SSBO It can be seen that in Particle struct there is a member called tab. It is because buffers should be aligned to four times 32 bit. It provides better performance and sometimes are necessary to work properly. To use SSBO first it needs to be bound. It is done by calling glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, Buffer) . To retrieve data in shader same data layout and correct binding point have to be provided. Accessing SSBO is done via C-like array. See Figure 26 for complete compute shader. Before running GPU program OpenGL have to be informed what program will be used. This is done by calling glUseProgram(ProgramName) . Because running programs differ according to type this subject will be talked through in following chapters.

22

Figure 26 Particle update compute shader

8. Test subjects 8.1. Vertex shader Vertex shader is first and mandatory shader in rendering pipeline. Its main purpose is to transform stream of vertices which are most common part of a model. This shader is being managed by draw calls. Draw calls invocate rendering pipeline which begins with vertex shader. Most common task done by this shader is to transform vertices of a model from model space to clip space. This is done by multiplication of vertex and its properties (e.g. normal vectors) by projection matrix, view matrix and model matrix. These matrices have to be

23 provided in buffers (e.g. SSBO). Then vertex and its properties are streamed in packages depending on drawing mode (in case of triangles three vertices make one polygon) to next stages of rendering pipeline. In most basic situation straight to the fragment shader which operates on pixels. Between vertex shader and fragment shader per polygon interpolation occur which makes it possible to make more realistic lighting using fragment shader. As said before rendering pipeline is managed by draw calls. Draw call can be indexed, instantiated, indirectly invocated and more. Type of draw call also determine how built-in vertex shader variables behave. The most basic and important, in terms of reducing memory usage and improving looks, is indexed draw call. Before draw call special buffer binding have to be done to provide necessary information about Vertex Buffer Object (VBO). SSBO can also be bound as VBO. Draw call with necessary bindings is shown in Figure 27.

Figure 27 glDrawElements is a draw call

24

Most basic yet functional vertex shader in shown in Figure 28. Note that height of current vertex of heightmap is used to generate color (line 19). On Figure 29 fragment shader and resulting heightmap is shown.

Figure 28 Basic vertex shader

Figure 29 Fragment shader (up) and resulting heightmap (down)

25

Vertex shader can be also use to modify incoming model’s geometry. A case of this of kind usage can be an ocean simulation which is also included in following benchmarks. Because normal of heightmap is known vertex can be offset along it with pseudo random noise (e.g. Perlin Noise).

8.2. Compute shader Compute shader is an arbitrary purpose program that runs on GPU. It does not have specified inputs or outputs however it contains built-in variables which identify current shader invocation. Illustration that explains compute shader invocations indexing, that comes from official OpenGL Reference Guide, is shown in Figure 30.

Figure 30 Compute shader invocation indexing mechanism. Source: Khronos Group Compute shader program is very similar to rendering programs. It also communicates with CPU side via buffers and the creation process is the same. The main differences are in dispatching compute program and there is no drawing on the user’s screen as a result. In Figure 31 binding buffers and dispatching compute shader are shown. Figure 32 shows compute program that was dispatched.

Figure 31 glDispatchCompute is a function which runs GPU compute program

26

Figure 32 Source code of compute program that polygonise heightmap It can be seen in line 14 that compute programs invocations can be indexed in three dimensions. Current work group index, invocation within workgroup and global invocation are provided as 3D vectors. In Figure 28 in 14th line of shader’s source code can be seen workgroup layout declaration. In this case workgroup is ten by ten. Between invocations within the same workgroup a shared memory is provided. Because compute shader’s behavior is not predefined it is up to user’s to take care of synchronization with provided memory barrier functions.

9. Benchmark motivation and development 9.1. Motivation Planetary terrain would require massive amount of storage to be kept on user’s hard disk. To download data from online storage would require very a fast internet connection. With this problem Outtera engine struggling. It takes sometimes several minutes to download data from Google Earth. But what is most important is that modelling of such a massive terrain would require enormous amount of time and human resources. This is why in this thesis fully procedural planetary terrain is discussed. In case of planetary terrain that I am currently developing, which is supposed to host a real time strategy (RTS) game, interplanetary travelling and switching between planets (e.g. to

27 give commands for troops on another planet) will be done very often. What is more, to provide players with unique experience every time they are starting a new game, whole planets have to be generated on the fly using unique seed, not stored on a disk. It is why speed of generating planets is very important and to provide seamless experience it should be done as fast as possible. My planetary terrain is based on quadtree level of detail which holds heightmaps so it is why I am aiming to find the fastest way to generate heightmap. Tested heightmaps will gave shape of a square. This is because it is more convenient to use them as this in quadtree lod and to make them spherical (spherified cube). What is more planetary RTS will be based on GPU Computing CPU Management approach to handle units behavior so it is another reason to test which GPU program is faster to do no drawing related tasks. Despite the fact that realistic procedural planetary approach is quite new a concept there are some established practices about procedural generation. For example generating an ocean surface is commonly done in vertex shader. It is no surprise since it is convenient to change some vertex properties (normal or position) just before drawing it on the screen. Generating noise on the GPU-side is not the only solution. It is also possible to generate heightmap on CPU-side and transfer data to GPU only for drawing purposes. It seems to be a good idea while collision detection and physics often are handled by CPU. But in case of planetary RTS where forecasted amount of units will go over tens of thousands it is clear that mesh colliders that are provided in popular engines (e.g. Unity) will not be good pick for this task. This is another reason to keep computation on GPU and handle units behavior using position based dynamics.

9.2. Development CPU and GPU communicates with commands. It means that it is very hard to measure time of executing GPU programs because function with GPU commands can return before GPU finished its task. It is why measurement in FPS was chosen. First well known FPS measuring application (FRAPS) were used but because FPS were not stable it was necessary to calculate average amount of FPS. That is why benchmarks accumulate all frames during test and divide it by all executing time. Correctness of measurement was confirmed by comparing results with FRAPS. Despite the fact that GPU architecture, because of its parallelism, seems like ideal for generating procedural heightmaps in first benchmark implementation CPUs also were included. It used the same, recommended and compatible with GLSL mathematic library (GLM). Perlin Noise were ported from GLSL version to CPU and tested. Results were so bad that even progress bar have to be added while generating to be sure that application is not frozen. Generating 100x100 heightmap took about 30 seconds. Even though generation was not threaded and still potentially could be speeded four (up to eight depending on CPU) times benchmark was dropped with conclusion that heavily parallel nature of task needs parallel architecture to handle it in bearable time. Next benchmarks were deployed and results were astonishing. Vertex shaders were up to ten times faster that compute shaders. See discussed disproportion in Figure 33.

28

Heightmap size: 1000x1000 1000

100 FPS

10

1 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader with workgroup size 1x1

Figure 33 First benchmark results. Note that Y axis is logarithmic At first it is not surprising since compute shader are the newest graphic API related conception. While they use the same hardware (shader units) injecting them into existing rendering software could be expensive. What is more there are dedicated GPU computing software so manufacturers could deliberately make OpenGL compute shaders slower by limiting third party API access to hardware (on the driver’s layer). Since all planetary engine computing was based on compute shader it was clear that this is bottleneck which have to be deal with. First solution which came to my mind is to change all GPU compute program from compute shader to vertex shaders. However this would be extremely big change in a lot of shaders and application source codes. Before that step more investigation was needed. In OpenGL documentation there is an information provided what are differences between compute shader and vertex shader which are described in previous chapter. Despite indexing and interpolation mechanisms there is a difference that there is shared memory within working group in compute shader. Since procedural heightmap generation is purely asynchronous and does not need information about neighbors, sharing memory within workgroup will not be important in terms of performance. All my compute shaders were dispatched with workgroup size 1 to be controllable from CPU-side. Still tweaking workgroup size speeded up computation and it seems that two dimensional ten by ten workgroup is optimal in terms of speed and controllability from CPU-side. See Figure 34 for comparison.

29

Heightmap size: 1000x1000 1000

100 FPS

10

1 2 4 8 16 32 64 Octaves Vertex Shader Compute shader with workgroup size 10x10 Compute Shader with workgroup size 1x1

Figure 34 Comparison of vertex, and two compute shaders with different workgroup size After successfully removing obvious bottleneck all benchmark were deployed again on different test machines with different GPU drivers installed on them. See Figure 35, 36 and 37 for examples. All benchmarks results are provided in attachment A.

Heightmap size: 1000x1000 400 350 300 250

200 FPS 150 100 50 0 2 4 8 16 32 64 Octaves Vertex Shader Compute Shader

Figure 35 Heightmap generation results for nVidia drivers

30

Heightmap size: 1000x1000 300

250

200

150 FPS 100

50

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Figure 36 Heightmap generation results for AMD drivers

Heightmap size: 1000x1000 40 35 30 25

20 FPS 15 10 5 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Figure 37 Heightmap generation results for Intel drivers

Since the most important information is performance gain using compute shaders in comparison to vertex shader rather than FPS performance this kind of results visualization have to be changed to acquire more precise information. Also to get rid of CPU performance contribution to results, range of heightmap size was changed.

31

10. Benchmarks Two of the following benchmarks operate on heightmap. To generate surface 4D Perlin Noise will be used. Size of heightmap and octaves count will be variables. Octaves indicate nothing more than how many times noise was evaluated at current coordinates. Max octaves count in all benchmark is 64. In case of generating real heightmap, noise is not only evaluated at current coordinates several times to create more interesting surface but also to create normal vectors. In order to do that same noise have to be evaluated at current coordinates and two other locations as close as possible. Then taking the cross product of vectors created from three new positions normal can be computed. In Figure 38 difference between octaves count on tested heightmaps is shown.

Figure 38 From left: 2 octaves, 4 octaves and 8 octaves were used to create surface These benchmark however will be lacking of shading to ensure only generation performance is measured. Benchmarks itself will be running for 65 seconds. After full frame generation including GPU computing frame count will be increased by one and total loop time will be increased by frame time. At the end total frame count will be divided by total loop time to calculate Frames Per Second (FPS). Full Benchmark loop is shown in Figure 39. Tested drivers versions: Name Driver’s version AMD 18.1.1 nVidia (mobile) 385.54 Intel (mobile) 22.20.16.4749 Note that these benchmarks should be used only to compare performance of certain shaders not to benchmark hardware itself. Approaches to how GPU program is interpreted by drivers differ among manufacturers and also how hardware work. In this case not even the newest GPUs were tested and not from the same mobility platform (desktops and laptops). I want to thanks patriciogonzalezvivo for hosting GLSL noises algorithm on his GitHub page. For these benchmarks 4D Perlin Noise implementation by Ian McEwan was used. All source codes (with shaders) used for following benchmarks can be found on my GitHub: github.com/Kedriik/Benchmark 32

Figure 39 Benchmarks main loop

10.1. Heightmap generation benchmark 10.1.1. Introduction The results of this benchmark is heightmap stored on the GPU side as a texture. It means that heightmap can be easily accessed from any shader fast and convenient using 2D vector as index in imageLoad() function. It can be retrieved on the CPU side as a 1D or 2D array but this kind of data transfer should be done rarely and in big batches to avoid potential bottleneck. Note that GPU-side does not mean that data is stored on GPU memory because it is up to driver to store it in RAM or GPU memory. Since these benchmarks are all about speed of generating procedural heightmaps memory usage will not be further investigated. In this case reading and writing to a texture is heavily because heightmap is generated, polygonised and store to texture per frame. In this benchmark vertex shader behaves like compute shader with no instant drawing after heightmap generation. In order to compile rendering pipeline, which vertex shader is part of, fragment shader have to be also provided. In this case little “hack” was used. Fragment shader was indeed provided but returning with no effects on the user’s screen.

33

10.1.2. Results Figure 40 shows benchmarks results for AMD. It is clear that compute shader performance is much better than vertex shader. What is more the more calculations the difference become bigger and go over two times more frames generated by compute shader.

Compute shader performance compared to Vertex shader in

205 percents

200

% 195

190

185 2 4 8 16 32 64 Octaves Heightmap size:600x600 Heightmap size:700x700 Heightmap size:800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 40 AMD benchmarks results In Figure 41 results for nVidia drivers are shown. In this case compute shader is slower. It seems like under heaviest load compute shaders have about 80% performance of vertex shader

Compute shader performance compared to Vertex shader in percents 180 160 140 120 100 % 80 60 40 20 0 2 4 8 16 32 64 Octaves Heightmap size:600x600 Heightmap size:700x700 Heightmap size:800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 41 nVidia benchmarks results 34

Figure 42 shows results for Intel drivers. Compute shader is still faster than vertex shader but difference becoming smaller with more data to calculate.

Compute shader performance compared to Vertex shader in percents 150 145 140 135 130

% 125 120 115 110 105 100 2 4 8 16 32 64 Octaves

Heightmap size:600x600 Heightmap size:700x700 Heightmap size:800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 42 Intel benchmarks results Benchmarks show that amount of octaves are more important for performance difference than heightmap size and that there is probably a convergence to certain value. To check convergence value benchmark with the biggest heightmap size were deployed. Now octaves range from 70 to 95. Also to stay in 32bit float range (most commonly used in graphics) lacunarity3 had to be changed from classic 2 to 1.5. In Figure 43 AMD results shows that the performance gain limit is about 200%. However if it comes to nVidia drivers there is no performance gain using compute shader and it settle about 80% of vertex shader performance. See Figure 44. If it comes to Intel drivers compute shader and vertex shader performance are similar with few percent advantage to compute shader. See Figure 45.

3 Lacunarity – noise property that determines how quickly the frequency increase and how fast amplitude decrease. 35

Compute shader performance compared to Vertex shader in percents 210 209 208 207 206

% 205 204 203 202 201 200 70 75 80 85 90 95 Octaves

Heightmap size:1000x1000

Figure 43 AMD results

Compute shader performance compared to Vertex shader in percents 87 86 85 84 83

% 82 81 80 79 78 77 70 75 80 85 90 95 Octaves

Heightmap size:1000x1000

Figure 44 nVidia results

36

Compute shader performance compared to Vertex shader in percents 112

110

108

% 106

104

102

100 70 75 80 85 90 95 Octaves

Heightmap size:1000x1000

Figure 45 Intel results

10.2. Real-time heightmap generation benchmark 10.2.1. Introduction Real-time heightmap generation is useful while generating animated heightmaps like ocean surface. Result is not stored because it is changing every frame and current state of animation is not connected with previous. To enhance the performance it is good idea to generate one heightmap and instantiate it around player. As said in previous chapter Fourier Noise is behaves very handy and gives realistic results. However in these benchmarks the same 4D Perlin Noise were used to ensure comparable results through all benchmarks. This benchmark tests VBO Object reading and writing with no texture objects accessing. In this benchmark vertex shader have at least one advantage. Because modifying VBO is incorporated into rendering pipeline less code is there to maintain. 10.2.2. Results In Figure 46 AMD results are shown. Compute shader is much faster than vertex shader and difference can go up to nine times more frames generated.

37

Compute shader performance compared to Vertex shader in percents 900 800 700

600 % 500 400 300 200 2 4 8 16 32 64 Octaves

Heightmap size:600x600 Heightmap size:700x700 Heightmap size:800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 46 AMD benchamrks results Figure 47 shows benchmarks results for nVidia. Compute shader is faster and difference becoming more significant with more data to calculate and can go up to two times more frames generated.

Compute shader performance compared to Vertex shader in percents 220 200 180

% 160 140 120 100 2 4 8 16 32 64 Octaves

Heightmap size:600x600 Heightmap size:700x700 Heigthmap size: 800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 47 nVidia benchmark results

38

Figure 48 shows results for Intel drivers. In this case compute shader and vertex shaders have similar performance.

Compute shader performance compared to Vertex shader in percents 115 110 105 100 95 % 90 85 80 75 70 2 4 8 16 32 64 Octaves

Heightmap size:600x600 Heightmap size:700x700 Heightmap size:800x800 Heightmap size:900x900 Heightmap size:1000x1000

Figure 48 Intel benchmarks results As before to find any convergence values benchmarks were run again with biggest heightmap, modified lacunarity and octaves range. For AMD, which results are shown in Figure 49, performance gain using compute shader is about 9 times more frames generated. In Figure 50 can be seen that nVidia performance gain settle in about 2 times more frames generated. Intel in Figure 51, as before, showed no difference in shaders performance.

39

Compute shader performance compared to Vertex shader in percents 950

940

930

920

% 910

900

890

880

870 70 75 80 85 90 95 Ocaves

Heightmap size: 1000x1000

Figure 49 AMD results

Compute shader performance compared to Vertex shader in percents 210

208

206

% 204

202

200

198 70 75 80 85 90 95 Octaves

Heightmap size:1000x1000

Figure 50 nVidia results

40

Heightmap size:1000x1000 120

115

110

% 105

100

95

90 70 75 80 85 90 95 Octaves

Heightmap size:1000x1000

Figure 51 Intel results

10.3. GPU – driven gravity simulation This benchmark is not directly related with procedural heightmap generation but because if its O(n2) complexity is able to push GPUs to its limits. Also this kind of computation can be used to drive planetary RTS physics engine. It is also aesthetically amusing after applying density post effect. See Figure 52. However it does not show any difference in performance of vertex shaders and compute shaders.

Figure 52 Newton's gravity simulation with more than 10000 objects

41

11. Tests summary This thesis was about finding best way, in terms of speed, to generate procedural heightmaps. It is one of the most important feature since motivation to do such test is planetary RTS game which require as fast as possible terrain generation. The objective were more than fulfilled. Not only performance gains were computed between shaders but also bottleneck were found and dealt with. First benchmarks shows that implemented compute shaders, which were responsible for all procedural generation, were not fast enough compared to vertex shaders. After investigation and parameters tweaking it was clear that workgroup size is the issue. Even though generating procedural heightmap is asynchronous task workgroup size should be bigger than one. In this case two dimensional workgroup ten by ten is optimal in terms of performance and controllability by CPU. Heightmap generation benchmarks show that only on AMD drivers compute shader is significantly faster than vertex shader and under heavy load it is up to two times faster than vertex shaders. However generating heightmap on fly (e.g. for ocean surface simulation) settles at 9 times more frames generated. It is clear that compute shaders and vertex shaders do not have same texture accessing priority, which is not surprise while vertex shader are more likely to use textures than compute shaders. nVidia results also clearly show that compute shader and vertex shader are not the same software even though they using same hardware. Benchmark with a lot of texture accessing settles compute shader performance in about 80% compared to vertex shader and again in benchmark without texture accessing compute shader is about two times faster than vertex shader. Intel is quite different. In all two benchmarks convergence value is about 100% which mean no performance gain. At this point can be assumed that due to long developing time of AMD and nVidia there manufacturers implemented special mechanism to access or store texture objects which speeds up rendering pipeline. Intel graphics are integrated with CPU and it seems there is no special memory mechanism to manage textures and that compute and vertex shader are basically same software. Compute shaders despite the fact that are quite a new concept and work outside rendering pipeline can be easily integrated into it. Compute shaders provide excellent data indexing mechanism and shared variables among working group which reduces code to maintain, speeds up implementation and reduces difficulty of implementing an algorithm. Above direct benchmarks for shaders helped to find bottleneck in planetary engine and show that properly configured compute shader is as fast or even faster than vertex shader. Improved lighting and resolution of heightmaps resulting in much better visuals are shown in Figure 53.

42

Figure 53 Planetary engine with improved lighting

43

12. Bilbiography 1. Wikipedia contributors “Minecraft” Wikipedia the free Encyclopedia, https://en.wikipedia.org/wiki/Minecraft , 22.1.2018 2. Wikipedia contributors “Fuel” Wikipedia the free Encyclopedia, https://en.wikipedia.org/wiki/Fuel_(video_game) , 22.1.2018 3. Gamepedia contributors Minecraft world border, https://minecraft.gamepedia.com/World_border , 22.1.2018 4. Paul Bourke Polygonising a scalar field, http://paulbourke.net/geometry/polygonise/ , 22.1.2018 5. Wikipedia contributors “No Man’s Sky” Wikipedia the free Encyclopedia, , https://en.wikipedia.org/wiki/No_Man%27s_Sky 6. Innes McKendrick Continuous World Generation in 'No Man's Sky' https://www.gdcvault.com/play/1024265/Continuous-World-Generation-in-No , 22.1.2018 7. Brano Kemen, Laco Hrabcak Outerra, http://www.outerra.com/ , 22.1.2018 8. Brano Kemen, Laco Hrabcak Outerra, http://outerra.blogspot.com/ , 22.1.2018 9. Paul Tassi 'Star Citizen' Lumbers Into 2017 With $141 Million In Crowdfunding https://www.forbes.com/sites/insertcoin/2017/01/09/star-citizen-lumbers-into-2017-with- 141-million-in-crowdfunding/#589c6e382ebc , 22.1.2018 10. How to geek https://www.howtogeek.com/trivia/which-video-game-was-the-first-to- feature-procedural-generation/ ,22.1.2018 11. Wikipedia contributors “Spore” Wikipedia the free Encyclopedia, https://en.wikipedia.org/wiki/Spore_(2008_video_game) , 22.1.2018 12. Wikipedia contributors “Mipmap” Wikipedia the free Encyclopedia, https://en.wikipedia.org/wiki/Mipmap , 22.1.2018 13. Ken Power Level of detail http://glasnost.itcarlow.ie/~powerk/3DGraphics2/Theory/Levelofdetail.htm , 1.22.2018 14. Ken Perlin Making Noise, https://web.archive.org/web/20160308022101/http://noisemachine.com:80/talk1/index.ht ml 15. Ken Perlin Improved Noise reference implementation http://mrl.nyu.edu/~perlin/ , 22.1.2018 16. James Hague Why Do Dedicated Game Consoles Exist?, http://prog21.dadgum.com/181.html , 1.22.2018 17. Ashu Rege An introduction to Modern GPU Architecture , http://download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf 18. Noor Shaker, Julian Togelius, and Mark J. Nelson (2016). Procedural Content Generation in Games: A Textbook and an Overview of Current Research. Springer. ISBN 978-3-319- 42714-0. 19. Dale Green (2016) Procedural Content Generation for C++ Game Development. Packt Publishing. ISBN: 9781785886713

44

ATTACHMENT A nVidia benchmark results: Heightmap size: 100x100

600 550 500 450

400 FPS 350 300 250 200 2 4 8 16 32 64 Octaves Vertex Shader Compute Shader

Heightmap size: 500x500 600 500 400

300 FPS 200 100 0 2 4 8 16 32 64 Octaves Vertex Shader Compute Shader

45

Heightmap size: 1000x1000 400 350 300 250

200 FPS 150 100 50 0 2 4 8 16 32 64 Octaves Vertex Shader Compute Shader

nVidia ocean simulation benchmark results:

Heightmap size: 100x100 700 600 500 400

FPS 300 200 100 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

46

Heightmap size: 500x500 600

500

400

300 FPS 200

100

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Heightmap size:1000x1000 500

400

300

FPS 200

100

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

AMD benchmark results:

47

Heightmap size: 100x100 3500 3000 2500 2000

FPS 1500 1000 500 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Heightmap size: 500x500 1000

800

600

FPS 400

200

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

48

Heightmap size: 1000x1000 300

250

200

150 FPS 100

50

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

AMD ocean simulation benchmark results:

Heightmap size: 100x100 3500 3000 2500 2000

FPS 1500 1000 500 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

49

Heightmap size: 500x500 1200 1000 800

600 FPS 400 200 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Heightmap size: 1000x1000 350 300 250 200

FPS 150 100 50 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

50

Intel benchmark results:

Heightmap size: 100x100 160 140 120 100

80 FPS 60 40 20 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Heightmap size: 500x500 80 70 60 50

40 FPS 30 20 10 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

51

Heightmap size: 1000x1000 40 35 30 25

20 FPS 15 10 5 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

Intel ocean simulation benchmark results:

Heightmap size: 100x100 150

100 FPS 50

0 2 4 8 16 32 64 Octaves Vertex Shader Compute Shader

Heightmap size: 500x500 100

80

60

FPS 40

20

0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

52

Heightmap size: 1000x1000 35 30 25 20

FPS 15 10 5 0 2 4 8 16 32 64 Octaves

Vertex Shader Compute Shader

53