Master of Science Thesis Lund, spring 2010
Load balancing in a tiling rendering pipeline for a many-core CPU
Rasmus Barringer* Engineering Physics, Lund University
Supervisor: Tomas Akenine-Möller†, Lund University/Intel Corporation
Examiner: Michael Doggett‡, Lund University
Abstract
A tiling rendering architecture subdivides a computer graphics image into smaller parts to be rendered separately. This approach extracts parallelism since different tiles can be processed independently. It also allows for efficient cache utilization for localized data, such as a tile’s portion of the frame buffer. These are all important properties to allow efficient execution on a highly parallel many-core CPU. This master thesis evaluates the traditional two-stage pipeline, consisting of a front-end and a back-end, and discusses several drawbacks of this approach. In an attempt to remedy these drawbacks, two new schemes are introduced; one that is based on conservative screen-space bounds of geometry and another that splits expensive tiles into smaller sub-tiles.
* [email protected] † [email protected] ‡ [email protected] Acknowledgements
I would like to thank my supervisor, Tomas Akenine-Möller, for the opp- ortunity to work on this project as well as giving me valuable guidance and supervision along the way. Further thanks goes to the people at Intel for a lot of help and valuable discussions: thanks Jacob, Jon, Petrik and Robert! I would also like to thank my examiner, Michael Doggett.
1 Table of Contents
1 Introduction, aim and scope ...... 3 2 Background ...... 4 2.1 Overview of the rasterization pipeline ...... 4 2.2 GPUs, CPUs and many-core architectures ...... 5 2.3 Tiled rendering ...... 6 3 Implementation and testing environment ...... 9 3.1 Scheduling and dependency analysis ...... 9 4 Case studies ...... 11 4.1 F.E.A.R...... 11 4.2 Unreal Tournament 3 ...... 16 4.3 Simple scene ...... 18 4.4 Conclusions ...... 20 5 The pre-front-end ...... 21 5.1 Screen space bounds ...... 22 5.2 Dependency analysis and the pre-front-end ...... 23 5.3 Preserving submission order ...... 25 5.4 Results and discussion ...... 25 6 Tile splitting ...... 29 6.1 Per-tile cost estimation ...... 29 6.2 Front-end counters ...... 32 6.3 Split heuristic ...... 34 6.4 Dispatch heuristic ...... 34 6.5 Special rasterizer ...... 34 6.6 Results and discussion ...... 35 7 Conclusion and future work ...... 43 8 References ...... 44
2 1 Introduction, aim and scope
The goal of this project was initially to realize an idea that was conceived at Intel concerning load balancing of tiling rendering pipelines. First, the con- ventional two-stage pipeline was analyzed and then a pre-front-end was added, in an attempt to remedy some of the issues discovered. The purpose was to improve performance. As the project evolved, another idea was conceived that concerns cost estimation and splitting of tiles. This report is organized into the following sections: background, imple- mentation and testing environment, case studies, pre-front-end, tile splitting, and, conclusion and future work. The background section contains information on rasterization pipelines as well as the graphics processing unit (GPU), central processing unit (CPU), and the many-core CPU. The implementation and testing environment section describes the system used for implementing our new algorithms and for evaluating performance. The case studies section describes a number of scenes that are used to highlight some of the problems in the existing pipeline. The following two sections explain our two novel schemes aiming to improve the overall performance of a tiling rendering pipeline. The conclusion and future work section discusses some lessons learned as well as things that may be interesting to research more in-depth in the future.
3 2 Background
2.1 Overview of the rasterization pipeline In a rasterization based three-dimensional graphics pipeline, geometry is mapped to screen-space using some transformation from three-dimensional to two-dimensional space. This two-dimensional geometry can then be rasterized in the form of pixels on the screen, i.e., pixels in the frame buffer. In the most basic case, the geometry consists of triangles and the transformation consists of a series of matrices; a world matrix, a view matrix and a projection matrix. A triangle consists of three vertices that contain their position and optionally other attributes such as color values.
The world matrix, , represents the transformation from object-space to world-space; if a group of triangles is moving around it makes sense to use a transformation matrix for its movement rather than moving the individual
coordinates of each triangle. The view matrix, , corresponds to the viewer’s position and rotation in the scene. The projection matrix, , projects the vertices on the two-dimensional screen. The combined matrix can now be formed: