Opencl Based Digital Image Projection Acceleration

OPENCL BASED DIGITAL IMAGE PROJECTION ACCELERATION Thesis Submitted to The School of Engineering of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree of Master of Science in Electrical Engineering By Bryan Michael Badalamenti UNIVERSITY OF DAYTON Dayton, Ohio August, 2015 OPENCL BASED DIGITAL IMAGE PROJECTION ACCELERATION Name: Badalamenti, Bryan Michael APPROVED BY: Eric Balster, Ph.D. Frank Scarpino, Ph.D. Advisor Committee Chairman Committee Member Associate Professor, Department of Professor Emeritus, Department of Electrical and Computer Engineering Electrical and Computer Engineering Vijayan Asari, Ph.D. Committee Member Professor, Department of Electrical and Computer Engineering John G. Weber, Ph.D. Eddy M. Rojas, Ph.D., M.A., P.E. Associate Dean Dean School of Engineering School of Engineering ii c Copyright by Bryan Michael Badalamenti All rights reserved 2015 ABSTRACT OPENCL BASED DIGITAL IMAGE PROJECTION ACCELERATION Name: Badalamenti, Bryan Michael University of Dayton Advisor: Dr. Eric Balster In this thesis, several implementations of an image back projection algorithm using Open Com- puting Language (OpenCL) for different types of processors are developed. Image back projection is a method to take aerial imagery and create a map-like image that contains real-world dimensions and to remove the perspective angle from the camera. The processors that ran the back projection algorithm include a Central Processing Unit (CPU), a Many Integrated Core (MIC), two Graphic Processing Units (GPUs), and two Field-Programmable Gate Array (FPGA) devices all have different architectures are require different programming styles. OpenCL is a new programming standard that provides a common programming language between the different types of devices using a widely used programming environment. OpenCL follows the C99 Standard and provides support for devices with parallel computing capabilities to help create an optimized solution. Different versions of the back projection algorithm are presented here to examine the flexibility of the OpenCL standard and to create the most optimal solution among the different devices. Timing measurements are compiled for each version of the algorithm for each device and compared against one another and a C++ single threaded host CPU implementation used as a baseline. The fastest implementation for each device is then compared in terms of execution time, throughput, and maximum iii total dissipated energy (TDE). The presented OpenCL back projection solution provides a 90 times speedup (62.2 ms) on the CPU when compared to the baseline implementation (5680.7 ms) of the algorithm. The throughput for this CPU implementation is 1302.25 megabytes per second (MB/s) and the maximum TDE is 7.2 J per frame. iv To my entire family, especially my parents, for all of their love, support, and encouragement. v ACKNOWLEDGMENTS I would like to thank the following people for their support in making all of this possible and getting me to this point. • Dr. Eric Balster: For giving me this opportunity in graduate school and for providing the means and the push needed for me to complete this in a professional manner. • Kerry Hill: For funding and supporting my research. • Al Scarpelli: For funding and supporting my research. • Bill Turri: For supporting my research. • Andrew Kordik: For contributing his knowledge and insight in image projection and providing me guidance along the way. • Jonathan Skeans: For providing help and knowledge in OpenCL and providing me guidance along the way. • David Mundy: It was rewarding to work alongside you in the lab. • Daniel Shaffer: For providing help with image projection. • David Fan: For his support in all things programming that allowed me to more quickly use the lab equipment and create testing scripts. vi • Dr. Frank Scarpino: For his knowledge in back projection and being on my defense committee. • Dr. Vijayan Asari: For bearing through my thesis defense. • AFRL/UDRI/ADDA: For providing a great work environment and creating the framework and systems I used to finish my research and work. • Frank & Kathleen Badalamenti: I would like to thank my parents for all the love and support they have provided me throughout my life. You are the best parents a son can ask for. • Adam Badalamenti: I would like to thank my brother for always being there for me. • Helena Badalamenti: I would like to thank my grandma for her support and reminders that I am getting old. • Buddy, Kiba, & Lotus: My dogs who are always there when I get home providing me with emotional support and helping to clear my mind of all the stress. vii TABLE OF CONTENTS ABSTRACT........................................... iii DEDICATION.........................................v ACKNOWLEDGMENTS.................................... vi LIST OF FIGURES.......................................x LIST OF TABLES....................................... xii I. INTRODUCTION....................................1 1.1 Objectives in Research...............................2 1.2 Significance of Research..............................2 1.2.1 Previous Optimization Attempts......................3 1.3 Thesis Organization................................3 II. OPENCL OVERVIEW..................................5 2.1 History of OpenCL.................................5 2.2 Programming in OpenCL.............................7 2.2.1 Host Programming.............................7 2.2.2 Kernel Programming............................ 12 III. BACK PROJECTION OVERVIEW........................... 14 3.1 Projection Background............................... 14 3.1.1 Image Projection.............................. 15 3.1.2 Back Projection.............................. 16 3.1.2.1 Collinearity Equations....................... 19 3.1.2.2 Digital Elevation Maps (DEMs).................. 20 viii IV. TEST PROCEDURE................................... 22 4.1 Test Hardware................................... 22 4.1.1 Intel Xeon E5-2670 v2 CPU........................ 23 4.1.2 Intel Xeon Phi 5110P........................... 24 4.1.3 Nvidia K10 GPU.............................. 24 4.1.4 Nvidia K40 GPU.............................. 25 4.1.5 Altera GSD5 FPGA............................ 26 4.1.6 Altera GSD8 FPGA............................ 27 4.2 Test Description.................................. 28 V. BACK PROJECTION ALGORITHM VERSIONS................... 32 5.1 Single Work Item.................................. 32 5.2 Single Threaded Work Group........................... 34 5.3 Row Wise Threaded................................ 36 5.4 Column Wise Threaded.............................. 38 VI. RESULTS......................................... 41 6.1 Timing Measurements............................... 41 6.1.1 Single Work Item Results......................... 42 6.1.2 Single Threaded Work Group Results................... 45 6.1.3 Row Wise Threaded Results........................ 48 6.1.4 Column Wise Threaded Results...................... 52 6.2 Comparison..................................... 55 6.2.1 Throughput................................. 57 6.2.2 Power Measurements........................... 58 VII. FINAL THOUGHTS................................... 60 7.1 Conclusion..................................... 60 7.2 Future Research.................................. 62 7.2.1 Examine Additional Memory Access Patterns.............. 62 7.2.2 Examine Additional Devices....................... 63 7.2.3 Channels.................................. 64 7.2.4 Integer Arithmetic............................. 64 BIBLIOGRAPHY........................................ 66 ix LIST OF FIGURES 2.1 Flow chart of OpenCL programming and execution.................7 2.2 The OpenCL framework................................8 2.3 Model of an OpenCL device regarding processing distribution............ 11 3.1 Back projection processing chain........................... 15 3.2 Back projection diagram................................ 16 3.3 The input aerial image provides little context of the image to the region....... 17 3.4 Back projected aerial image provides additional information to a viewer...... 18 3.5 Information about the region presented in the input image.............. 20 3.6 Sample 3x3 DEM grid................................. 21 4.1 Intel Xeon E5-2670 v2 CPU.............................. 23 4.2 Intel Xeon Phi 5110P................................. 24 4.3 Nvidia K10 GPU.................................... 25 4.4 Nvidia K40 GPU.................................... 26 4.5 BittWare board with an Altera GSD5 FPGA..................... 27 4.6 Nallatech board with an Altera GSD8 FPGA..................... 28 5.1 Single Work Item Diagram.............................. 33 x 5.2 Single Threaded Work Group Diagram........................ 35 5.3 Row Wise Threaded Diagram............................. 37 5.4 Column Wise Threaded Diagram........................... 39 6.1 Single Work Item Run Times............................. 42 6.2 Single Work Item FPGA Optimization Run Times.................. 43 6.3 Single Work Item FPGA Optimization Logic Utilization............... 44 6.4 Single Threaded Work Group Item Run Times.................... 46 6.5 Single Threaded Work Group FPGA Optimization Run Times............ 47 6.6 Single Threaded Work Group FPGA Logic Utilization................ 48 6.7 Row Wise Threaded Run Times............................ 49 6.8 Row Wise Threaded FPGA Optimization Run Times................. 50 6.9 Row Wise Threaded FPGA Optimization Logic Utilization............. 51

Load more