Programmable and Scalable Architecture for Graphics Processing Units Carlos S. de La Lama1,PekkaJ¨a¨askel¨ainen2, and Jarmo Takala2 1 Universidad Rey Juan Carlos, Department of Computer Architecture, Computer Science and Artificial Intelligence, C/ Tulip´an s/n, 28933 M´ostoles, Madrid, Spain
[email protected] 2 Tampere University of Technology, Department of Computer Systems, Korkeakoulunkatu 10, 33720 Tampere, Finland
[email protected],
[email protected] Abstract. Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing sys- tems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. In this paper we evaluate the suitability of Transport Triggered Ar- chitectures (TTA) as a basis for implementing GPUs. TTA improves scalability over the traditional VLIW-style architectures making it in- teresting for computationally intensive applications. We show that TTA provides high floating point processing performance while allowing more programming freedom than vector processors. Finally, one of the main features of the presented TTA-based GPU design is its fully programmable architecture making it suitable target for general purpose computing on GPU APIs which have become popu- lar in recent years. Keywords: GPU, GPGPU, TTA, VLIW, LLVM, GLSL, OpenGL. 1 Introduction 3D graphics processing can be seen as a compound of sequential stages applied to a set of input data. Commonly, graphics processing systems are abstracted as so called graphics pipelines, with only minor differences between the various existing APIs and implementations.