DEGREE PROJECT

Implementation of DirectX 11 Rendering Engine in Nebula

Gustav Sterbrant

Bachelor of Science in Engineering Technology Computer Game Programming

Luleå University of Technology Department of Comuter Science, Electrical and Space Engineering Implementation of DirectX 11 rendering engine in the Nebula game engine

Gustav Sterbrant gsCEPT Lule˙aUniversity of Technology, Campus Skellefte˙a

May 19, 2012 1 Sammanfattning

Denna uppsats beskriver hur man portar en renderingsmotor fr˙anDirectX 9 till DirectX 10/11, och som samtidigt beskriver f¨ordelarnamed att uppgradera. Denna uppsats kommer g˙aigenom den omfattande f¨or¨andringenmellan DirectX 9.0c till version 10, vilka f¨or¨andringarsom gjordes, och hur man anpassar sin ap- plikation s˙aatt den utnyttjar alla nya funktioner. Senare kommer vi unders¨oka de nya funktionerna i DirectX 11 och hur de fungerar. I slut¨andankommer vi se hur en faktiskt spelmotor, mer exakt Nebula, kan designas om f¨oratt anv¨anda den nya DirectX standarden. Vi kommer ocks˙abeskriva de nya Application Programming Interfaces, eller API:er, som kommit med DirectX 11. F¨orstkommer vi beskriva DirectX Graph- ics Infrastructure, f¨orkortat DXGI, som anv¨andsf¨orhantera adaptrar och ut- matningsenheter s˙asomen datorsk¨arm. Vi kommer beskriva DirectWrite, och hur det kan anv¨andasi sammarbete med f¨oratt rendera text. Vi kommer ¨aven beskriva shader model 5.0, vad som ¨arnytt fr˙anshader model 3.0, samt hur man skriver en shader med den nya standarden.

2 Abstract

This paper seeks to explain how to port an application rendering using Di- rectX 9, to DirectX 10/11 whilst at the same time explaining why this upgrade is preferable. This paper will discuss the extensive redesign of DirectX from version 9.0c to version 10, what changes were made, and how one has to re- design your own application to fully take advantage of the new features. Also, we will explore the new features in DirectX 11 and how they work. Later, we will see how an actual game engine, more precisely the Nebula game engine, is redesigned to fit the new DirectX standard. We will describe the new Application Programming Interfaces, or , in- troduced with DirectX 11. We will introduce you to the DirectX Graphics Infrastructure, or DXGI, and explain how adapters are used in newer versions of DirectX to better handle variating hardware. We will investigate how Di- rectWrite is used together with Direct2D to render text. Also, we will go in depth to explain the new features in shader model 5.0, and how these can be used to improve the visual quality of your game.

1 Contents

1 Sammanfattning 1

2 Abstract 1

3 Introduction 4

4 Previous work 4

5 The DirectX 10 redesign 4 5.1 The DirectX Graphics Infrastructure ...... 4 5.2 Graphics data ...... 5 5.3 Shaders ...... 6 5.4 The assembly line ...... 6

6 The new features in DirectX 11 7 6.1 Hull shader ...... 7 6.2 Domain shader ...... 7 6.3 Compute shader ...... 8 6.4 Shader model 5.0 ...... 8 6.5 Multi-core CPU rendering ...... 9

7 Porting from DirectX 9 9 7.1 Device handling ...... 10 7.2 Buffers ...... 10 7.3 Textures ...... 10 7.4 Vertex declaration ...... 11 7.5 Shaders ...... 12 7.6 Text rendering ...... 14

8 Results 14

2 9 Discussion 14

10 Conclusion 16

References 20

3 3 Introduction

With the development of games, also comes the need to develop the look of the game. The look of a game can be one of the major sales points, and thus, it comes natural that one might want to expand on what the game engine can manage. A new API doesn’t automatically give the game better graphics, but the usage of new features enables the developers and the graphics artists to make games that look better. This paper will explain all the new features in DirectX 11, what has changed from DirectX 9 to 10, and how to port an actual game engine.

4 Previous work

The Nebula game engine, developed by Radon Labs GmbH, uses the DirectX 9 rendering API. Also, they’ve constructed a sophisticated rendering engine to take advantage of the features found in DirectX 9, including a sophisticated and modular render path system. The DirectX 9 rendering code is also completely modular, and therefore easily replaceable with a newer one. Not only that have they implemented a shader system, where shader vari- ables are shared between effects. This shader system supports features such as deferred rendering and character skinning, to name a few. The shader system works in tight correlation with the render path to support writing render passes and setting up render pass variables using XML.

5 The DirectX 10 redesign

When DirectX version 10 was developed, the developers decided to make a com- plete remake of the infrastructure of the DirectX rendering API. Even though this forced rendering engines to be rewritten to support DirectX 10, it also opened up for new features, and also produced a much more clean and manage- able rendering code.

5.1 The DirectX Graphics Infrastructure

One major difference between DirectX 9 and 10, is the way that the devices and render contexts are separated. Before, you would have a DirectX 9 object, which could handle your adapters and check for supported DirectX features. This object does not exist in DirectX 10, as it is replaced by the DirectX Graphics Infrastructure, or DXGI [1]. Instead, you have a DXGIAdapter to check for

4 support, seeing as the adapter itself has noting do with the actual DirectX context. The DXGIAdapter can then be queried for display modes, which allows you to find out whether or not your graphics card can support the suggested format. Also, instead of having all of the information in the actual DXGI object, one has to create a descriptor, and then request to get the information. A descriptor is basically a struct, containing bundled data. For example, the DXGI MODE DESC contains the data relevant to describe a display mode, such as refresh rate, width, height, scaling and format. When the appropriate object is queried with the GetDesc-function, in this case it’s the DXGIOut- puts GetDisplayModeList, the DXGI MODE DESC is filled out with the data containing a single display mode. Device information is no longer accessed via DirectX in version 10. In version 9, one could query the DirectX display device to get information about the device name, and driver information. In DirectX 10, this data is no longer accessible, neither using DXGI nor ID3D10. Instead, one has to rely on the fact that checking for supporting features will suffice. Also, DirectX 10 allows for backwards compatibility for using cards that doesn’t support DirectX 10. This ensures that the features requested will be supported by the hardware.

5.2 Graphics data

The way graphics data is saved has been generalized into a much more intuitive system. There is no longer any difference between an index buffer, a vertex buffer, or a shader variable buffer, they are all treated as DirectX 10 data buffers [2]. The API knows what kind of buffer you want your current data buffer to be used for by using a flag when binding the buffer. This pattern is very much like the OpenGL way of creating data buffers. Textures however, have been more specifically divided into Texture1D, Tex- ture2D, and Texture3D. Instead of having for example a cube texture from DirectX 9, one handles it like an array of 6 Texture2D textures, giving you spe- cific control over each and every surface of the texture cube. This also allows you to target specific images as render targets, seeing as you get to handle every single layer of the texture as an individual texture. Every texture consists of sub resources, and this is also new for DirectX 10 [3]. A Texture2D can consist of a single sub resource, and thus, you have a texture without any mip-levels. If a 2D texture consists of several sub resources, then you have that many levels of mip for your texture. A sub resource can be tampered with, switched, and even removed, giving the user complete control over a texture and its mip-maps. To allow a texture to be used as a render target, one has to assign a render target view to a specific texture. This render target view can then be set as the back buffer, or as an instance in a multiple render target pass. To then set

5 what to display, the user has to specify what texture, or render target view, that should be used as the back buffer using the DXGI Swap Chain object.

5.3 Shaders

The DirectX effects is not longer a part of the standard library. This library is now available for as one of the DirectX SDK samples, so to access the features in the DirectX 11 effects library, one has to compile it separately. With DirectX 10 came shader model 4.0. With shader model 4.0 came a handful of changes. For example, the previous VPOS shader semantic, that would give you the vertex position in screen space, is now renamed to SV POSITION, and is now representing the center of a pixel, rather than a corner of it [4]. Also, DirectX 10 gave us the geometry shader pipeline stage, which is a fully programmable new stage of the pipeline, allowing the programmer to alter the mesh by emitting or removing vertices. There were also some changes to the effects library. To set render states for a technique or render pass, one needs to define three different state objects, a BlendState, a RasterizerState and a DepthStencilState. In DirectX 9, the pass itself contained these variables, but with DirectX 10, the developer is allowed to have different blend settings for different render targets. Also, textures are now separate objects, instead of being coupled with sam- plers. With the new texture object came a couple of new features, such as being able to fetch the texture dimensions from within the shader itself. Also came the possibility to, instead of sample a texture, simply load data from it, which is extremely useful when skinning using a skin data texture.

5.4 The assembly line

When you have all the data you want. All your buffers saved, and all your textures and render targets setup, you might want to use them to make a render pass. All of this is handled by the device context. What you need to do, is to set the buffers, render targets, shaders, output buffers and rasterizer options, and then make a draw call. You can chose to draw them indexed, instanced, indexed and instanced, or all of the previous options indirect. The rendering is then produced to the assigned render targets using the shaders attached in the effect used. The input assembly, or IA, can set multiple buffers simultaneously. This is very much like the StreamFrequency features seen in DirectX 9. For example, one can supply two streams for rendering particles, one with static data, that is position data for a 2D particle, and one stream containing particle specific information such as rotation, scaling etc.

6 6 The new features in DirectX 11

DirectX 11, building on the structure of DirectX 10, gave us two new pipeline stages. DirectX 11 also introduced dynamic binding of interfaces, which would allow the programmer to dynamically link a shader and it’s features without having to write an entirely new shader for each special instance.

6.1 Hull shader

With the introduction of DirectX 11 came two more programmable stages of the rendering pipeline [5]. The first one is called the hull shader. This shader provides the developer with the current face, and all neighbor faces, which is all the necessary data you need to perform any subdivision algorithm. The next stage is the tessellation, which, much like a geometry shader, generates new vertices. This stage of the pipeline is not by itself programmable, but you can customize it by deciding for example the tessellation factor. The hull shader need some special definitions in order to be able to tessellate. The developer has to define what sort of primitive is being input as a patch, whether it’s a quad, triangle or line. There is a partitioning parameter which handles how the edges are partitioned. There is also a parameter that defines what kind of output topology is being passed to the domain shader. This pa- rameter can be a line, a clockwise triangle, or a counter clockwise triangle. You can also define the amount of control points the hull shader should output. The last parameter defines what constant function should be executed to subdivide the surface.

6.2 Domain shader

When the actual tessellation is done, the domain shader is invoked for each new vertex created by the hull shader. This stage is called the domain shader, and it’s fully programmable. Each new vertex is also provided with a UV-coordinate, which is very handy for making a texture lookup. This can be used to lookup a value in a displacement map to displace a vertex to allow for extensive control by a graphics artist. The domain shader can be used for several other features, such as simulating flesh decay, showed by [6]. The output from the domain shader is then passed to the geometry shader, and is ultimately rendered. The result can be seen in Figure 1

7 Figure 1: Tessellation seen in real-time in Unigine using DirectX 11

6.3 Compute shader

The differences between a CPU, or central processing unit, and a GPU, or graph- ics processing unit, are many. The development of CPUs has mainly focused on increasing the clock of the processor, whilst GPUs has developed the ability to execute more and more operations simultaneously. Since graphics processors are designed to work in a stream-like fashion, there can be no interdependence between the data in the threads, which poses a problem to ordinary sequential code, which is most frequently seen when programming on a CPU. DirectCompute allows a developer to use the graphics processor to execute code that isn’t graphics related, and thus harness the power of the graphics card for general purpose programming. The performance difference between GPUs and CPUs can be seen in the picture presented in [7]. DirectX 11 allows us very easy access to write our own DirectCompute shader, and bind it to our device context with a single function call.

6.4 Shader model 5.0

Superseding shader model 4.0, is version 5.0. Shader model 5.0 allows you to write and run your own hull, domain and compute shaders, as well as giving

8 you the flexibility and structure of object-oriented programming in HLSL. One of the major features in shader model 5.0 is the ability to allow for dynamic linking of shaders. This allows the developer to use interfaces and inheritance to avoid making a completely new shader for each variation one could want. Instead one can define a base interface, and then simply overload the functions if you want a variation. This can be done on the CPU level using the ID3D11ClassLinkage class [8]. For example, one can write a shader that handles vertex transformations and coloring, but is dynamic in such a way that the coloring functions are not fixed. This way, the same shader can be used, but with a simple rerouting of the functions within the shader, the shaders output result can be something completely different. One example of how this could be used is simple rendering and light mapped rendering.

6.5 Multi-core CPU rendering

DirectX 11 allows the developer to split the rendering engine in to multiple threads. In example, the application can create buffers while at the same time perform rendering, which makes CPU-heavy operations such as physics simula- tion have a lesser impact on graphics performance. You can also create command lists, which is basically lists of graphics com- mands. These commands can be created over separate threads, so you could in example have a graphics data management thread that creates and modifies buffers and textures, and have a draw thread which handles draw calls, shader variable updates and rendering pipelines configurations, without them having to wait for eachother.

7 Porting from DirectX 9

Porting from DirectX 9 to DirectX 10 or 11 is not as trivial as it might seem. If one wants to use all of the new features of DirectX 11, one has to make a completely new shading system to not only take advantage of the new shaders, but also to use the class interface features, multiple vertex buffers and multi-core rendering. The Nebula game engine has a modular rendering engine, which allows for a rewrite of the current engine to support DirectX 11. Although, seeing as much have changed in the DirectX 9 to 10, some of the Nebula classes needs to be altered to fit the changes.

9 7.1 Device handling

In the Nebula DirectX 9 rendering engine, there is a class for both a render device and a display device. The display device has to be implemented as the DXGI adapter, and the render device as the D3D11 device and device context. The DirectX 11 implementation of the device and device context is a lot more flexible than the one found in version 9. For example, the DirectX 11 device doesn’t have any support capability checks, but instead scales transparently depending on the hardware. There isn’t any D3D PRESENT PARAMETERS either, instead you use the swap chain to setup the current settings. Nebula uses DirectX 9 to get device information that is no longer necessary for the user to know. That information included driver information, the device name and driver name. The reason why this isn’t available anymore, is because DXGI is transparent to whatever driver you use. If you for some reason cannot create a DirectX 11 context, then DXGI will try to use a lower version DirectX context, until it finds one that fits your hardware. This makes information about your graphics driver irrelevant, seeing as DXGI and DirectX handles all the feature demands on its own.

7.2 Buffers

Also, Nebula is design to create a DirectX 9 object, and since this object doesn’t have any correspondent in DirectX 11, some functions has to be removed because they are unnecessary. When addressing buffers in DirectX 9, you would use the Lock and Unlock functions to map a buffer to your render context. In DirectX 10 and 11, you have to use the Map and Unmap functions to address the same issue. The big difference is that you have to bind your buffer to a sub resource, and thus, you need to first create a sub resource to bind your buffer to. As previously mentioned, creating a buffer in DirectX 10 and 11 differs from version 9 in the way you declare what the buffer contains. Thus, to create a buffer, you need a buffer descriptor, your buffer object, and, if available, initial data. To create a buffer, you simply call the CreateBuffer function, and the descriptor then tells the API how the buffer is to be used. For example, declaring that a buffer should be handled like an index buffer is done by giving the descriptor the bind flag D3D11 BIND INDEX BUFFER.

7.3 Textures

Textures are handled differently in DirectX 10 and 11 as well. First of all, tex- tures are divided into three subgroups, Texture1D, Texture2D and Texture3D. Nebula has no native implementation using 1-dimensional textures however. This might seem surprising, seeing as there is no cube texture type. Instead, you have to create your own texture cube consisting of 6 individual Texture2D

10 objects, and assign each surface individually. Not only does this unify the usage of ordinary textures and texture cubes, but also lets developers have full control of every side of the cube. To create a cube texture, one has to supply an extra variable in the buffer description. The CreateTexture2D() function takes three arguments, a descriptor, initial data, and the target buffer. The descriptor, D3D11 TEXTURE2D DESC, needs to be supplied with an ArraySize of 6 to make it into a cube texture [9]. Mapping textures works the same way as mapping buffers, seeing as they both inherit the DirectX 11 sub resource structure. To be able to alter a texture in the shader stage, the high-level Nebula DirectX 11 texture class needs to contain a ID3D11ShaderResourceView. An ID3D11ShaderResourceView lets you bind a texture object to a shader parameter, which then lets you set a texture in the shader which will be useful when setting textures. Nebula has a feature which allows the user to see all the available textures. To allow the CPU to read from a texture, one has to make the texture use a special usage, called staging. With the staging mode turned on, the texture can be altered from the CPU, GPU, and between those two, giving the developer full control over the texture. However, the staging usage is highly unrecommended seeing as it needs to be able to be locked and unlocked constantly, thus impacting on performance. The solution to this problem is to create a parallel texture, with the staging option turned on, and keep the original texture optimized. Instead of reading directly from the original texture, all we need is a copy, seeing as staging can handle both CPU and GPU requests, this is the most viable solution.

7.4 Vertex declaration

We’ve talked about creating buffers, such as index buffers and vertex buffers. A vertex buffer can and will in most cases contain more than just basic positional data for each vertex. A vertex can contain information about normals, tangents, bi-tangents, bone indices and bone weights etc. The way DirectX 11 knows how to handle the extra data, is by declaring a vertex declaration. A vertex declaration is a set of descriptors, which all describe a single input in the vertex. For example, a vertex containing a position and normal would have an array with the size of 2, where the first slot would describe a position, what format it’s saved in, and how far the offset from the beginning of the vertex the data is contained. The same goes for the normal. Also, the vertex declaration can describe in what input slot the data is contained, so it can be used together with multiple vertex buffers. The descriptor must also specify what semantic name it is, and what index it has. So if a vertex shader accepts for example two TEXCOORD, one with index 0, and one with index 1, the descriptor must explain which one, so that DirectX can properly map each chunk of vertex data to each data type in the shader.

11 In DirectX 9, a vertex declaration had no connection whatsoever to the shader, until the moment it was attached to the pipeline and to be used for rendering. In DirectX 11 however, the vertex declaration needs the actual shader code to be able to create a declaration [10]. However, this isn’t entirely true. DirectX 11 can create a vertex declaration if one OR more semantics match any given shader, so if one supplies the function with a shader where no function can take all of them, but only a subset, that is how the declaration will be created. This means that you need to be completely sure that the shader using the vertex declaration is the one passed to the function, otherwise, you will get random data, seeing as DirectX will supply the input slots with nothing. The solution is to create the vertex declaration just before it needs to be used. This way, the correct shader is bound in the engine, and thus easily retrievable, and used to create the declaration. This only needs to be handled once for each declaration.

7.5 Shaders

The shading system in Nebula is managed using the DirectX 9 Effects libraries. What this does, is it allows the developer to make a complete render pass, with all the render settings, what pixel and vertex shader to use, etc, in the HLSL code, and bundle it into an effect. This way, there doesn’t have to be any CPU code for customizing the rendering of specific objects. Nebula encapsulates this into their render path, giving developers access to the render passes and post- effects by simply editing an XML file. With the separation of the effects libraries from the DirectX rendering li- braries, the standard fxc.exe effects compiler does not understand the new shader model 5 effects syntax. The new syntax needs the shader programmer to define what version a technique belongs to, be it a technique10 or technique11, depending on what version of DirectX is used. Also, with the new object- oriented approach, the shader states are not longer set by defining separate variables. Instead, there are three different state structs that can be defined to the users liking, and then set in the pass. This is because what the effects really are doing, is calling the appropriate DirectX 11 functions. To create a blendstate, one has to define a BlendState struct, and within it, set the variables found at [11]. In the pass, you then call the appropriate set-function, in this case it’s SetBlendState( BlendState*, float4 BlendColor, UINT SampleMask ). It is possible to set the depth and stencil state, the blend state, and the rasterizer state, as well as setting shaders in the pass. Texture sampling is also different in shader model 5. Instead of having a sampler state, which contains a texture object, you instead declare a texture type, and use the object itself. In shader model 5.0, a texture can either be a Texture1D, Texture2D, Texture3D, TextureCube, depending on what kind of texture is bound to the target. Also, a sampler state is just a list of attributes

12 which are used to sample an image. Before, you would sample a texture by using the tex2D( samplerState, UV ), but in shader model 5.0, the syntax is TexturePointer.Sample(samplerState, UV). Thus, the SamplerState also follows a different syntax, seeing as it cannot have a texture bound to it anymore [12]. As a result of the detachment of the Effects library from the standard SDK, one has to make their own compiler to compile the shader model 5 effects. The fxc.exe, which is the effects compiler that comes with the SDK, can only compile DirectX effects from version 9.0-9.0c. Instead, one has to make their own compiler that can compile DirectX 11 effects. With this in mind, every shader has to be rewritten to work for shader model 5.0. The operations still look the same, but texture sampling, technique declarations, sampler states, and pass states has to be set in order for the DirectX 11 compiler to pass. In the shader model 3.0 versions of the shaders, some vertex shaders had fewer outputs than their coupled pixel shaders had inputs. This is because the compiler can ignore or offset parameters to make them fit. In DirectX 11 however, the compiler will complain if one tries to mismatch the vertex and pixel shaders. This is easily solved however, seeing as it’s easy to just add the missing parameter in the pixel shader or vertex shader. The reason why some shaders mismatched was because some vertex shaders didn’t have to output all parameters, and some pixel shaders had no need for them as inputs. Nebula handles shader variations using feature masks, which basically just decides whether or not to use alpha, skinned, light mapped or static colored. As previously discussed, DirectX 11 allows for dynamic linking, and thus requires no need to have completely separate shaders for slight variations. Instead, one has to only set what class should be overloaded, by reading the feature masks. Also, the shaders themselves work differently from DirectX 9. In Nebula, the way they set shader variables is by simply query the effect to set a variable using a value and a D3DXHANDLE. Using DirectX 11, all you need to do is to request a variable from the effect using an identifier, be it a name or index. Setting a variable no longer requires DirectX-specific data types, instead, you only need floats, booleans or integers, even when setting vectors and matrices. However, sharing variables between shaders does not work the same way it used to. Whenever a render pass is being applied, the Effects system supplies the shader with all its data, be it their constants, textures and actual shaders. The problem is that the system resupplies the shader with every apply, thus resetting the values previously set by another shader. Therefore, one needs to resolve the problem with using shared variables, and instead make them all point to the same source, thus making all applies send the exact same data. This is solved by letting every texture that is to be used as a render target (because these are the only ”constant” textures in the system) inform all shaders using that texture, to set their variable corresponding to the newly created render target to the texture resource that is the actual render target surface.

13 When sampling a texture with shader model 5.0, one has to be very careful. The UV-coordinates are not mapped in UV-space, but rather in texture space, and the texture space is relative to the first render target assigned in the shader. This means that in order to be able to down-sample a texture, one needs to go outside of the range [0.0, 1.0] when addressing a texture larger than the render target. The same has to be considered when up-sampling a texture, but the other way around.

7.6 Text rendering

With the transition from DirectX 10 and 11, decided to remove the text rendering API from DirectX and put it in an entirely stand-alone API called DirectWrite. DirectWrite is modular in what render API it uses [13]. In the DirectX 11 implementation in Nebula, DirectWrite uses Direct2D, which is also a new stand-alone API from Microsoft. In the ported version of the text renderer, a DirectWrite write factory, alongside a Direct2D render target is created. The backbuffer is then fetched from the DirectX 11 device to be used as render target when rendering the text. Besides from the dismemberment of the APIs from DirectX 11, the rendering is done the same way, by defining a rectangle, and drawing to it, thus drawing to an area on the render target.

8 Results

The result is the Nebula 3 game engine rendering using DirectX 11 instead of 9. That means a full conversion of shaders and the fundamental rendering system, including the deferred lighting, skinning, texture handling, vertex declarations, buffer handling, the online debugger, and of course no loss on performance. Seeing as there is no real change in the look of the engine, because a new graphics engine doesn’t really improve on existing content, but only opens up for making better content, the presentable result is in the code, documentation, and the shaders. Figure 2 shows the DirectX 9 render, and Figure 3 shows the one using DirectX 11. As one can see, there is no visible difference between the two pictures.

9 Discussion

The first question one might ask when porting from DirectX 9 to 11 is why? Why not simply stick with DirectX 9, seeing as it’s reliable and that many households have graphics cards modern enough to support it? Of course the engine still needs to support DirectX 9, but at the same time, more and more homes get equipped with newer graphics cards, capable of handling at least

14 Figure 2: The DirectX 9 version render

Figure 3: The DirectX 11 version render

15 DirectX 10, and even 11. The reason is to keep development going, and to incorporate the new graphical features that opens with the modern hardware. To take advantage of every single transistor, to use every part of the pipeline, and to make way for future development. The big step is not from DirectX 10 to 11, but rather from 9 to 10. The upcoming of DXGI resulted in a complete class restructuring of DirectX, making it more adaptable, more intuitive, more structured, and also gave us a bunch of new features. Going from DirectX 10 to 11 requires no such work, it’s more or less adding support for multi-core rendering, the hull and domain shaders, and the dynamic linked shader model 5.0 shaders. Not only does DirectX 11 give us new features, but it also gives developers the opportunity to optimize their code, using the asynchronous command queuing, as well as minimizing shader state switches for the graphics cards when handling shader variations. It also improved on instancing, which is a very optimized way to avoid unnecessary draw calls when rendering lots of objects. As we’ve seen, DirectX 11, through DXGI, offers seamless backwards com- patibility with previous versions of DirectX, which gives us no reason not to use the newer version of DirectX. The main goal with this project is not to present an actual result, as it is a way to open for further development for more advanced shaders in Nebula 3, and to expand and develop the engine to keep it up to date with modern hardware. There is still work that needs to be done. For example, one should implement a shadowing engine for Shader Model 5.0, as well as optimize the shaders and make sure all of them work correctly. Perhaps, with the new samplers, inferred lighting is possible to achieve without artifacts, seeing as sampling can be scaled in a different way than before. Also, implementing a hull and domain shader was not a part of my assignment, but it could easily be implemented to further utilize the new technology presented.

10 Conclusion

The infrastructure that Nebula provided has all the necessary features to render, handle devices, textures, buffers and shaders very smoothly, and thus, porting the rendering engine to DirectX 11, with no prior experience with DirectX ren- dering wasn’t intuitive, but the functions and the layout of Nebula helped with knowing what needed to be ported. Although the final product doesn’t take full advantage of the multi-core rendering techniques, where one can write to buffers and textures while at the same time rendering, it does use the dynami- cally linked shaders, as well as the hull and domain shader pipeline stages.

16 Not only do we open up for future games using DirectX 11, which to this date isn’t yet widely used by the game industry. As well as providing stunning visual effects, DirectX 11 also allows developers to make heavy computations on their GPUs using DirectCompute via compute shaders, which allows for, in example, heavy physics calculations, massive crowds and even fluid simulations. This opens for future games to explore what before was impossible. Although Nebula can accommodate DirectX 11, it’s not as optimal as it could be. Future development would be to optimize Nebula to take full advantage of DirectX 11, by using the multi-core rendering support, and utilizing the input assembly slots, by setting multiple vertex buffers. It’s possible to have two, or three separate threads for the rendering engine. One thread could handle textures and buffers, by updating, copying, manipulating and setting them. One thread could make the draw calls, being the main rendering thread. The last thread could handle DirectCompute operations. When rendering objects, it’s always a good idea to use as few state switches as possible. The input assembly in DirectX 11 can set up to 15 vertex buffer simultaneously to be used for rendering. With this in mind, Nebula can be optimized to batch objects with similar shaders by setting the vertex buffer slots for as many objects as possible, and thus reduce the amount of state switches. The redesign from DirectX 9 to DirectX 10 and 11 isn’t only focused on adding features and optimizations. The biggest difference can be observed when converting between version 9.0 and 10. The amount of code required to do, for example, a texture or buffer copy, is greatly simplified with DirectX 10 than with 9. When copying a texture using Direct9, one has to retrieve an IDi- rect3DSurface9 from the texture, create two locked rectangles, lock the surface to a rectangle, and then copy the raw data from your newly locked rectangle to the other one. To copy a texture in DirectX 11, one only needs to call the CopyResource() from the ID3D11DeviceContext class, and define what two tex- tures should be used as the source, and the destination. The same way, one can copy a sub resource, for example a mip level of a 2D texture, or a slice of a 3D texture, by using the CopySubresource() function. This reduces the amount of code needed to make a very fundamental operation. There were some problems when integrating the Effects11 library with Neb- ula. Nebula has it’s own memory allocation system, and Effects11 has its own. When Nebula allocates memory to be used for arrays, the engine allocates the memory using its ObjectArrayHeap. However, the Effects11 library overloads the new[] operator, which results in Nebula allocating arrays with the ordinary ObjectHeap. This could work, however they don’t overload the delete[] operator. This results in Nebula trying to delete an object in the ObjectArrayHeap when it’s really in the ObjectArrayHeap. This was resolved by simply replacing every NEW macro defined in the Effects11 project with the ordinary new keyword. Thus, instead of using Microsofts memory allocator, it uses Nebulas. It took some time before realizing how vertex declarations really worked.

17 According to the documentation, all you need is a shader that has the correct input, and if you don’t, the function fails and lets you know that you’ve done something wrong. This wasn’t the case, instead, it created a vertex layout where the minimal amount of data could fit the shaders inputs. What happened when I tried to implement this was that it always ”succeeded” on the shader environment.fx, when in fact, it needed to succeed in the shader static.fx. The input layout can of course be reused with any other shader using the exact same signature, but it is vital, and I cannot stress this enough, to be completely sure that the input layout is correct, otherwise, there will be no knowing what will happen. I discovered this when using the Microsoft SDK program PIX, which allows you to debug shaders and monitor DirectX calls. I tried to debug the rendering system by looking at what data was passed to the vertex shader, and what I found was that nothing but positional data was consistent. Data like normals, tangents and such, was random each time I looked, so I realized that something must be wrong with the data handling. My investigation took me through the entire engine to see if there was any- thing wrong on the CPU side, and it was only by random chance that I found out it tried to create the layout with the wrong shader. Another interesting point with vertex declarations is the format of the data. In Nebula, each piece of vertex information is compressed into a special format. For example, texture coordinates are saved as an array of two short integers. When passing integers to the shader, with the format DXGI FORMAT R16G16 SINT and in the shader retrieving it as a float2, you get nothing. It doesn’t give you random values, it just assigns zeros to your data. It took me a while to realize that I have to retrieve them as int2, and then convert them to floats to make it work. This is something that isn’t documented on MSDN, and should not be left out when considering how important it is to be able to compress your own data, and even more important how to port an application whilst still maintaining the same level of functionality. The demands was that the Nebula engine should continue with all previous functionality it had, but with the addition of the new features introduced in DirectX 11. These features included tessellation and shader model 5.0. Seeing as the engine previously was able to handle everything from particles to full- screen quads, nothing had to be changed, but instead just reimplemented to fit with this new standard. This of course meant a complete rewrite of the shader library, seeing as it would have to support a new shader model, and thus, a new version of the shading language (HLSL). The engine is to be used by the students here at Lule˙aTekniska H¨ogskola in game production, and seeing as the product itself hasn’t changed in functionality, except for the added features, the demands of the final product are met. Albeit there is still lots of work that could be done to expand on this, I’ve left the doors open for futher modification, by simply following the Nebula design paradigm and workflow, which allows for modular replacement and/or additions without having to remake any previous work.

18 The question might be asked why tessellation is necessary. Well, let’s say you want a game with a very low polygon count, which is easier to work with when it comes to skinning and animating, UV-mapping, and is also kind on hard drive space and load times, but you still want the same detail in your game as you would with a high polygon count. In that case, a low polygon model could be made, accompanied with a displacement map, and using hard- ware tessellation one can increase the polygon count in realtime, and use the given displacement map to move the vertices into the desired position, thus re- constructing the previous high polygon mesh without any added work. Seeing as the skinning procedure is done on vertex-shader level, the tessellation will remain even if the mesh is tessellated afterwards. As for the new shader model, new shaders with new features can be written by future game developers, al- lowing for more complex and optimized shaders for future games. For example, one could use the Texture2D-function SampleCmpLevelZero, which samples a texture and compares the value at that position with a previously given value [14], in order to evaulate whether or not the pixel should be returned. This is very useful for shadow mapping, seeing as the comparison between shadow distances can be compared on the hardware using only one line of code. A more optimized code results in the game running smoother, in which case less graphics power is required. This may in turn lead to a lesser use of energy, which also results in lower energy costs for the one supplying the computer with electricity. Also, if the shader code is optimized enough, one might not need hardware on the bleeding edge in order to play the game, or work with the game engine, which results in lower costs for that specific person. Also, using tessellation to improve on production speed, may result in the game getting produced faster, or a bigger game than what was previously possible, which in turn may generate more revenue than before. When rewriting the shaders, it wasn’t to obvious to have to take into account that UVs are texture relative. It was somewhat strange to find out that you had to sample outside the range, when you simply assume you are in non-scaled UV-space. This can be because of the fact that textures and samplers are decoupled, and thus the sampler needs to be generic to whatever texture you want to sample. Before, a sampler was bound to a texture, and thus it would be obvious that the samplers would map the texture 1:1, but with sampler being separate, the sampler can’t possibly know the ratio of the UV to the texture, and thus the render target seems like the only ”certain” texture available. Could’ve been nice if this was explained somewhere though.

19 References

[1] Microsoft. DXGI overview, 2011. [2] Microsoft. Introduction to Buffers in 11, 2011. [3] Microsoft. Subresources, 2011. [4] Microsoft. Direct3D 9 VPOS and Direct3D 10 SV Position, 2011.

[5] Microsoft. Tessellation Overview, 2011. [6] NVIDIA Corporation. Alien vs Triangles tessellation demonstration, 2010. [7] Bruno Simes. General-purpose computing on the GPU. Think Techie, 2009.

[8] Microsoft. Interfaces and classes, 2011. [9] Microsoft. D3D11 TEXTURE2D DESC Structure, 2011. [10] Microsoft. ID3D11Device::CreateInputLayout, 2011. [11] Microsoft. Effect State Groups, 2011.

[12] Microsoft. Sampler (DirectX HLSL Texture Object), 2011. [13] Microsoft. DirectWrite, 2011. [14] SampleCmpLevelZero, 2012.

20