2018/2019, 4th quarter INFOGR: Graphics / NeedToKnow Lecture 9 – OpenGL

Author: Jacco Bikker

TL;DR In this installment of ‘NeedToKnow’ we switch to rasterization. After a brief introduction of the we investigate the GPU architecture and OpenGL as a means to control this architecture.

Figure 1: A typical game scene: static geometry, objects, and objects attached to objects. And bullets.

Engine A 3D engine is the software component that produces images for a game. Early engines such as id Tech 1 (Doom), the Quake Engine and Unreal Engine 1 did little beyond that. Later engines such as the Source Engine (Half Life 2, Portal), Frostbite and Unity added quite a bit: physics, audio, scripting and multi-core aware job management. Such an engine is called a game engine, and part of the game engine is still a block that we refer to as the rendering engine.

Figure 2: From left to right: Doom 1, Quake, Unreal 1. The task of the rendering engine is thus to produce images. The engine visualizes a virtual world, which typically consists of triangles. The triangles are grouped into meshes, and meshes are organized in a scene graph, which we will discuss in a minute. Visualization starts with the transform stage. The input for this stage is a stream of vertices and a 4x4 matrix; the output is a stream of vertices transformed into camera space. These vertices are then projected into screen space. “Rasterization” takes the final vertex positions and connectivity data (three vertex indices per triangle) to determine which pixels are affected by each triangle. These pixels are then shaded and drawn.

Scene Graph Objects in the real world typically do not move independently. My hand is attached to my wrist, and I rotate when my chair rotates. The Earth revolves around the Sun, and the Moon around the Earth (Figure 3). When I jump during a moonwalk, I move relative to the Moon (Figure 4).

Figure 3: Like this. Figure 4: Like so.

Object relations are conveniently stored as a hierarchy, which is the scene graph. The spatial relations themselves are conveniently expressed using matrices*. If we store a matrix for each object in the world, the transform we should use for an individual scene graph node is the recursive concatenation of its matrix with the matrix of its ancestors. The camera is a special case: to get objects into camera space, we move everything by the inverse of the camera matrix.

*: Note that quaternions are insufficient: these do not store translations.

GPU Architecture Once upon a time (before 1997) graphics were simply produced by CPUs. Video processors did exist, but they simply provided a frame buffer, and a way to get this frame buffer to a television set or monitor*. Drawing a textured triangle using a CPU means: calculating the outline of the triangle, interpolating texture coordinates over this outline and finally filling the horizontal spans of pixels that make up the triangle. Early GPUs, such as the 3Dfx VooDoo, contained dedicated hardware for just the triangle filling. Transform, lighting and clipping all remained the responsibility of the CPU. The rasterization hardware was not programmable: it was fixed function hardware. NVidia's GeForce 256 also implemented transform and lighting in hardware. It implemented the following flow, which is still ‘fixed function’, and not programmable:

This flow illustrates an important property of GPU architectures: a stream of data (vertices) is processed by a functional block that applies a single operation to all input elements. Likewise, the stream of pixels that leaves the rasterizer is processed by several blocks that together implement shading. The elements in the stream are independent from each other, can safely be processed in parallel, or in arbitrary order. This type of processing is called the streaming model, and it makes it relatively easy to design the GPU as a massively parallel unit.

Shaders Modern GPUs keep the streaming model. They do however allow programming of certain parts of the functional flow. A vertex is a programmable transform and lighting block, and a pixel shader replaces many of the blocks that follow the rasterization phase. Rasterization itself remains fixed function hardware, even on the latest GPUs.

*: Exceptions existed obviously, such as the blitter chip, the N64 and some arcade hardware. OpenGL Today there is a rather diverse collection of graphics processors from several vendors. AMD, Intel, NVidia and Imagination all produce GPUs, and typically renew their lineup every other year. Most vendors offer GPUs for mobile devices as well as for high-end rendering. Obviously, it is impossible to write software that supports all these GPUs, as well as future devices. For that reason, we use a hardware abstraction layer or HAL. The HAL offers a common programming interface for applications to the hardware. OpenGL, Vulkan, DirectX and Metal are therefore also known as graphics APIs. OpenGL’s history, in bullets: ▪ 1985: Silicon Graphics develops IRIS GL, which became the predecessor of OpenGL. ▪ 1992: An industry consortium, the ARB, is formed to steer development. ▪ 1995: Direct3D, the main competitor of OpenGL, is introduced by Microsoft. ▪ 1997: Microsoft and SGI attempt to unify APIs in Fahrenheit. ▪ In 2006, development of OpenGL is transferred to the Khronos Group. Some other important developments, to put things in perspective: ▪ 1997: Rise of the GPU, with 3Dfx’s VooDoo cards and GLQuake. (so: Direct3D started without hardware 3D support!) ▪ 2001: Rise of the Shader, with NVidia’s GeForce3. ▪ 2014: Multiple vendors introduce lower level APIs: Metal, Mantle, DirectX12. The history of OpenGL more or less ends in 2016, with the introduction of Vulkan. Vulkan, like Metal, Mantle and DirectX12, aims to give much lower level control over the hardware. This means more responsibilities for the developer, but also better performance.

Practical Early versions of OpenGL operated in immediate mode. An example of this: public void TickGL() { GL.Begin( PrimitiveType.Triangles ); GL.Color3( 1.0f, 0, 0 ); GL.Vertex2( 0.0f, 1.0f ); GL.Color3( 0, 1.0f, 0 ); GL.Vertex2( -1.0f, -1.0f ); GL.Color3( 0, 0, 1.0f ); GL.Vertex2( 1.0f, -1.0f ); GL.End(); } Note that this code draws one triangle. We can put a loop around the central three commands to draw many triangles. They are however passed one by one to OpenGL, which doesn’t suit the streaming data model at all. OpenGL will try to batch operations to improve performance, but this is obviously quite high- level behavior, and not under programmer control. An additional problem is the constant communication between CPU and GPU. Transfers are slow, so this communication easily can become the bottleneck for an application.

Modern OpenGL Although modern OpenGL still allows the use of immediate mode (which is sometimes really convenient), the use of the core profile is encouraged. This mode requires the use of vertex buffer objects (VBOs) to render graphics. A VBO is a buffer that it resides (in principle*) on the GPU. Rendering anything with the data in the buffer thus does not require transfers from the host to the rendering device. The execution flow now becomes:

The main difference with the earlier shown flows is that all data is already on the GPU. Of course, we need to get it there first, but from there, we only send updates. This means that most textures and meshes are transferred only once. This is how all modern graphics APIs operate.

State Machine

OpenGL operates as a state machine. The state is a combination of: ▪ the active transformation matrix ▪ the active and textures ▪ whether or not the z-buffer is enabled ▪ the render target (e.g. the screen, or a texture) ▪ …and many other properties. Drawing a mesh is strongly affected by this state.

*: OpenGL does not guarantee this, but the programmer may assume this. In low-level APIs, such as Vulkan, this kind of decisions is explicitly transferred to the programmer. A similar mechanism is found in other APIs. This makes sense: the streaming model which GPUs use suffers whenever the state changes. A state change temporarily stops the vertex stream to e.g. adjust a matrix or some other constant (‘uniform’) data. And: drawing all polygons that use texture X at once maximizes cache efficiency. The scene graph matches this model. Between scene graph nodes the state changes. Inside a scene graph node, the state is constant.

THE END That’s it for this installment. If you have any questions, feel free to ask by email or on Slack!

INFOGR2019 – NEEDTOKNOW