Research Collection

Research Collection Master Thesis Optimizing MathZonnon using OpenCL Author(s): Ernst, Benjamin Publication Date: 2011 Permanent Link: https://doi.org/10.3929/ethz-a-006607456 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Optimizing MathZonnon using OpenCL Benjamin Ernst Master Thesis July 2011 Supervising Professor: Prof. Dr. Jürg Gutknecht Supervising Assistant: Roman Mitin Native Systems Group Institute of Computer Systems Swiss Federal Institute of Technology Zurich, Switzerland 1 Abstract The general purpose language Zonnon features Matlab‐like operators that can work with built‐in mathematical arrays. These operators can be computationally very expensive, and executing them at runtime on CPU takes rather long. The majority of today’s consumer systems contain a graphics processing unit. With the ongoing shift from GPUs to general‐purpose processing devices comes the ability to run computations on them that are not related to graphics. We use OpenCL to speed up the computations in Zonnon using GPUs. This thesis introduces the Compute Framework into Zonnon. It is integrated into the Zonnon compiler and runtime library to coordinate computations across multiple OpenCL devices. We give an overview over the Compute Framework, explain the implementation strategy and details, discuss benchmarks and conclude with an analysis of the results. 2 3 Acknowledgements I would like to thank a number of people who have enabled, helped with, optimized, followed, had to bear and/or supported this thesis. My first thanks go to Prof. Dr. Jürg Gutknecht for letting me work with the Zonnon language and providing me with the hardware I needed to test my developments. I would like to thank my supervisor Roman Mitin very much for introducing me into the Zonnon compiler. I especially appreciated to have a supervisor who is expert on what I was working on and has a low latency to answer all my questions. Then my thanks go to Georg Ofenbeck for his support for the thesis and for sharing his knowledge about GPUs. Many thanks go to Nina Gonova who introduced the mathematical data types into Zonnon in her master thesis so that I could build my optimizations on top of it. I would like to thank Alexey Morozov for his idea to use the alternating least squares algorithm as a benchmark and providing me with a very well understandable Matlab implementation. My further thanks go to Lukas Schwab for ensuring a pleasant working environment in the Student Lab and for bearing the noise and heat produced by the GPUs. I would like to thank my parents for financing my studies at this great university. Last but not least I would like to thank Ramona for sharing her life with me and taking a share of my own life into hers. 4 Contents 1 Introduction ............................................................................................................ 8 1.1 Task Description ..................................................................................................... 9 2 Background ........................................................................................................... 12 2.1 Zonnon Compiler .................................................................................................. 12 2.1.1 Common Compiler Infrastructure .............................................................. 12 2.1.2 Math ........................................................................................................... 13 2.2 OpenCL ................................................................................................................. 13 2.2.1 Platforms and Devices ................................................................................ 14 2.2.2 Programs and Kernels ................................................................................ 14 2.2.3 Buffers ........................................................................................................ 14 2.2.4 Work Items and Concurrency .................................................................... 14 2.3 StarPU ................................................................................................................... 15 2.3.1 Codelets and Tasks ..................................................................................... 15 2.3.2 Dependencies ............................................................................................. 15 2.3.3 Time Estimation and Scheduling ................................................................ 15 2.3.4 Data Management and Consistency .......................................................... 16 3 Concept ................................................................................................................. 18 3.1 Using Zonnon for Mathematical Computations ................................................... 18 3.2 Compute Framework Architecture ...................................................................... 18 3.3 Assignments ......................................................................................................... 18 3.4 Introductory Example ........................................................................................... 19 4 Runtime Architecture ............................................................................................ 20 4.1 Accelerators .......................................................................................................... 20 4.2 Data ...................................................................................................................... 20 4.3 Tasks ..................................................................................................................... 21 4.4 Dependencies ....................................................................................................... 21 4.5 Scheduling and Data Management ...................................................................... 21 4.6 Running Tasks ....................................................................................................... 22 4.7 Task Completion ................................................................................................... 22 4.8 Concurrency Model .............................................................................................. 22 5 Compilation Process .............................................................................................. 24 5.1 Arrays ................................................................................................................... 24 5.2 Method Calls ........................................................................................................ 24 5.3 Assignments ......................................................................................................... 25 5.3.1 Grouping Operators ................................................................................... 25 5.3.2 Assignment Target ..................................................................................... 27 5 5.3.3 Kernel Generation ...................................................................................... 27 5.3.4 Kernel reuse ............................................................................................... 27 5.3.5 Assignment reuse....................................................................................... 28 5.4 Limitations ............................................................................................................ 28 5.5 Kernel Generation Example ................................................................................. 29 5.6 CCI Generation Example ...................................................................................... 31 6 Experimental Results ............................................................................................ 36 6.1 Matrix Multiplication ........................................................................................... 36 6.2 Alternating Least Squares (ALS) ........................................................................... 37 6.3 Discussion ............................................................................................................. 38 7 Conclusion ............................................................................................................ 40 7.1 Future Work ......................................................................................................... 40 7.2 Conclusive Statement .......................................................................................... 41 8 Bibliography ......................................................................................................... 42 A Big Operator Kernel Templates ............................................................................. 44 A.1 Matrix‐Matrix Multiplication ............................................................................... 44 A.2 Matrix‐Vector Multiplication ............................................................................... 45 A.3 Vector‐Matrix Multiplication ............................................................................... 45 A.4 Element‐Wise Copy .............................................................................................. 46 B Test Source Code .................................................................................................. 47 B.1 Matrix Multiplication

Research Collection

Ironpython in Action

NET Framework

Understanding CIL

Working with Ironpython and WPF

The Zonnon Project: a .NET Language and Compiler Experiment

Languages and Compilers (Sprog Og Oversættere)

Programming with Windows Forms

Using Powershell and Reflection API to Invoke Methods from .NET

NET Reverse Engineering

Diploma Thesis

Mono on F&S Boards

NET Hacking & In-Memory Malware