A Modular 3D Graphics Accelerator for FPGA
Total Page:16
File Type:pdf, Size:1020Kb
Institutionen för systemteknik Department of Electrical Engineering Examensarbete A Modular 3D Graphics Accelerator for FPGA Examensarbete utfört i Datateknik vid Tekniska högskolan vid Linköpings universitet av Jakob Fries, Simon Johansson LiTH-ISY-EX--11/4479--SE Linköping 2011 Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping A Modular 3D Graphics Accelerator for FPGA Examensarbete utfört i Datateknik vid Tekniska högskolan i Linköping av Jakob Fries, Simon Johansson LiTH-ISY-EX--11/4479--SE Handledare: Andreas Ehliar isy, Linköpings universitet Examinator: Olle Seger isy, Linköpings universitet Linköping, 5 July, 2011 Avdelning, Institution Datum Division, Department Date Division of Computer Engineering Department of Electrical Engineering 2011-07-05 Linköpings universitet SE-581 83 Linköping, Sweden Språk Rapporttyp ISBN Language Report category — Svenska/Swedish Licentiatavhandling ISRN Engelska/English Examensarbete LiTH-ISY-EX--11/4479--SE C-uppsats Serietitel och serienummer ISSN D-uppsats Title of series, numbering — Övrig rapport URL för elektronisk version http://www.da.isy.liu.se http://www.ep.liu.se Titel En modulär 3D-grafikaccelerator för FPGA Title A Modular 3D Graphics Accelerator for FPGA Författare Jakob Fries, Simon Johansson Author Sammanfattning Abstract A modular and area-efficient 3D graphics accelerator for tile based rendering in FPGA systems has been designed and implemented. The accelerator supports a subset of OpenGL, with features such as mipmapping, multitexturing and blend- ing. The accelerator consists of a software component for projection and clipping of triangles, as well as a hardware component for rasterization, coloring and video output. Trade-offs made between area, performance and functionality have been described and justified. In order to evaluate the functionality and performance of the accelerator, it has been tested with two different applications. Nyckelord Keywords Tile-based Rendering, 3D Graphics Accelerators, FPGA, Customizable Graphics Accelerator Abstract A modular and area-efficient 3D graphics accelerator for tile based rendering in FPGA systems has been designed and implemented. The accelerator supports a subset of OpenGL, with features such as mipmapping, multitexturing and blend- ing. The accelerator consists of a software component for projection and clipping of triangles, as well as a hardware component for rasterization, coloring and video output. Trade-offs made between area, performance and functionality have been described and justified. In order to evaluate the functionality and performance of the accelerator, it has been tested with two different applications. Sammanfattning En modulär och utrymmeseffektiv 3D-grafikaccelerator för tile-baserad rendering i FPGA-system har designats och implementerats. Acceleratorn stöder en delmängd av OpenGL med funktioner som mipmapping, multitexturering och blending. Ac- celeratorn är uppdelad i en mjukvarudel för projektion och klippning av trianglar och en hårdvarudel för rastrering, färgsättning och utritning till skärm. Avväg- ningar som gjorts mellan area, prestanda och funktionalitet har beskrivits och motiverats. För att evaulera funktionalitet och prestanda har acceleratorn testats med två olika applikationer. v Acknowledgments We would like to thank our examiner Olle Seger and our supervisor Andreas Ehliar for the opportunity to work with this interesting and challenging project. We would also like to thank our opponents Jesper Eriksson and Johan Holmér for providing comments and feedback on our report. vii Contents 1 Introduction 1 1.1 Background . 1 1.2 Purpose . 1 1.3 Scope . 1 1.4 Report Outline . 2 2 3D Graphics 3 2.1 Triangle Rasterization . 3 2.1.1 Fill Convention . 3 2.1.2 Edge Function Approach . 4 2.1.3 Scanline Conversion Approach . 4 2.1.4 Linear Interpolation . 5 2.2 Depth Testing . 6 2.3 Texture Mapping . 6 2.3.1 Mipmapping . 6 2.3.2 Texture filtering . 7 2.3.3 Multitexturing . 8 2.4 Blending . 10 3 System Architecture 11 3.1 Platform Assumptions . 11 3.2 Implications of Platform Limitations . 11 3.3 Partitioning of the Graphics Pipeline . 14 4 Hardware Implementation 15 4.1 Proposed Configuration . 15 4.1.1 Notes About the Limitations of the Configuration . 15 4.2 Module Descriptions . 16 4.2.1 GPU . 19 4.2.2 Video Out . 19 4.2.3 VBlank Swap Helper . 20 4.2.4 Frame Renderer . 21 4.2.5 Triangle Fetcher . 21 4.2.6 Tile Renderer . 22 4.2.7 Triangle Handler . 22 ix x Contents 4.2.8 Fragmenter . 23 4.2.9 Triangle Parameter Calculator . 23 4.2.10 Paramgen . 24 4.2.11 Fragment Generator . 25 4.2.12 Depth Buffer . 26 4.2.13 Fragment Queue . 28 4.2.14 Colorizer . 29 4.2.15 Texturer . 30 4.2.16 Texture Unit . 30 4.2.17 Blender . 32 4.2.18 Color Buffer Dumper . 33 4.3 Communication Protocols . 33 4.3.1 Four-phase req/ack . 33 4.3.2 Strobe and Busy Signals . 34 4.3.3 Simple Burst Protocol . 34 4.4 Interfacing with the GPU . 34 4.4.1 Registers . 35 5 Software Implementation 37 5.1 Software Component . 37 5.2 Mesa Hooks . 37 5.3 Triangle Clipping . 38 6 Tools 41 6.1 Software Renderer . 41 6.2 Scheduler . 41 7 Evaluation by Simulation 45 7.1 Evaluation Platform . 45 7.2 Data Acquisition . 45 7.3 Evaluation of Tailored Accelerators . 45 7.3.1 Quake 3 . 46 7.3.2 Teeworlds . 47 8 Evaluation using FPGA 53 8.1 Evaluation of Tailored Accelerator . 53 9 Conclusions 55 10 Future Work 57 10.1 Faster clipping . 57 10.2 Texture prefetching . 57 10.3 Improved Scheduler . 57 10.4 Generated Fragment Shader . 58 10.5 Automatic Configuration Generation . 58 10.6 Central OpenGL State Storage . 58 10.7 Parallelization . 58 Contents xi Bibliography 59 A Data Formats 61 A.1 Triangle Data Format . 61 A.2 Texture Format . 61 A.3 Framebuffer Format . 61 Chapter 1 Introduction 1.1 Background Powerful embedded systems are becoming more common. When a short time-to- market is important or when the systems are produced in low volume, FPGAs can be used to add custom hardware functionality. If such a system is to display graphics, it would be cost efficient to have the graphics functionality inside the FPGA. There exists a variety of options for displaying 2D graphics in this manner, but there are few options when it comes to 3D graphics. 1.2 Purpose The purpose of this thesis is to explore how to design a hardware graphics ac- celerator that is adaptable to the application that it is used for, in order to use the minimal amount of resources while generating graphics adequately fast. An architecture is proposed, implemented and its performance is evaluated. The implementation will be used in a realistic system with a single external memory. Applications running on this kind of system are typically not very ad- vanced, so only a subset of OpenGL is supported: a fixed-function pipeline with mipmapped multitexturing, depth testing and blending. 1.3 Scope The scope of this report is to provide some context for a 3D graphics accelerator and to describe the architecture of a triangle rasterizer that is highly tunable for performance vs. area cost. The architecture is then evaluated using real-world applications. 1 2 Introduction 1.4 Report Outline General background information about rendering 3D graphics is given in chapter 2. Assumptions about the target platform are described in chapter 3. The hardware and software implementations of the accelerator are described in chapters 4 and 5, respectively. Some tools have been developed to aid in the development. They are described in chapter 6. Chapters 7, 8 and 9 contain evaluation, results and conclusions of the work and possible future improvements are listed in chapter 10. Chapter 2 3D Graphics 2.1 Triangle Rasterization The most common way of rendering and displaying 3D graphics on the screen is by representing geometry as a set of triangles, then projecting them onto the screen plane and lastly filling some pixels in a frame buffer with the correct color. This section describes how to convert an already projected triangle into a set of points in the plane. There are two main approaches to triangle rasterization: the edge function approach and the scanline conversion approach. In addition, fill convention and linear interpolation are important parts of rasterization. These topics are described below. 2.1.1 Fill Convention When triangles share a common edge, one wants to avoid any pixels on this edge being missed, but also to avoid drawing the same pixel twice. This can be done by using a fill convention, a set of rules that describe which pixels should and should v2 v1 a ˆy v0 ˆx Figure 2.1: Characterization of an edge as a “bottom” edge. 3 4 3D Graphics not be drawn for any triangle [2]. A common fill convention is to keep pixels that are inside the triangle, precisely on the “upper” or the “left” edge, but avoiding pixels precisely on the “lower” or “right” edge. Assuming anti-clockwise winding, and a coordinate system as seen in figure 2.1, the vector a = v1 − v0 can be used to characterize the edges of the triangle as follows. a is a “top” or “left” edge if ay < 0∨(ay = 0∧ax < 0). Conversely, if ay > 0∨(ay = 0∧ax > 0), a is a “right” or “bottom” edge. According to these criteria, the edge in figure 2.1 should be considered a “bottom” edge. 2.1.2 Edge Function Approach In order to decide whether a point in the plane is inside a triangle, one can employ edge functions [8]. These are simply functions that assign negative numbers to points on one side of a line and positive numbers to points on the other side, while points on the line are assigned zero.