Accelerating on Embedded GPU

Iype P. Joseph

A Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of

MASTER OF APPLIED SCIENCE

in Electrical and Computer Engineering

Ottawa-Carleton Institute of Electrical and Computer Engineering University of Ottawa Ottawa, Canada

January 2014

© Iype P. Joseph, Ottawa, Canada, 2014 Abstract

Multicore CPUs (Central Processing Units) and GPUs (Graphics Processing Units) are omnipresent in today’s market-leading smartphones and tablets. With CPUs and GPUs getting more complex, maximizing hardware utilization is becoming problematic. The challenges faced in GPGPU (General Purpose computing using GPU) computing on embedded platforms are different from their desktop counterparts due to their memory and computational limitations. This thesis evaluates the performance and energy efficiency achieved by offloading Java applications to an embedded GPU. The existing solutions in literature address various techniques and benefits of offloading Java on desktop or server grade GPUs and not on embedded GPUs. Our research is focussed on providing a framework for accelerating Java programs on embedded GPUs. Our experiments were conducted on a Freescale i.MX6Q SabreLite board which encompasses a quad-core ARM Cortex A9 CPU and a Vivante GC 2000 GPU that supports the OpenCL 1.1 Embedded Profile. We successfully accelerated Java code and reduced energy consumption by employing two approaches, namely JNI- OpenCL, and JOCL, which is a popular Java-binding for OpenCL. These approaches can be easily implemented on other platforms by embedded Java programmers to exploit the computational power of GPUs. Our results show up to an 8 times increase in performance efficiency and 3 times decrease in energy consumption compared to the embedded CPU-only execution of Java program. To the best of our knowledge, this is the first work done on accelerating Java on an embedded GPU.

ii

Acknowledgement

This project and thesis would not have been possible without the help and support of kind people around me. And hence, this document would be incomplete without expressing my gratitude to them.

My parents, Joseph and Beena, and my brothers, Toms and Charles, have always provided me their unequivocal support for pursuing my dreams and I owe my deepest gratitude to them.

This thesis would not have been possible without the invaluable advice, guidance and support from my supervisors Dr. Miodrag Bolic and Dr. Voicu Groza. I have greatly benefited from their insightful comments and suggestions. I would like to use this opportunity to extend my heartfelt gratitude to them.

I would like to show my greatest appreciation to my colleagues at Computer Architecture Research Group (CARG), Dr. Amir Rajabzadeh, Jonathan Parri, Yu Wang, and Wei Wang, for their advice and support. I have greatly benefited from the facilities provided by CARG, University of Ottawa, and have had the support and encouragement from professors at Carleton University and University of Ottawa. I would like to appreciate and thank their efforts.

I would like to acknowledge the financial support provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), YOUi Labs Inc., and my supervisors.

I am particularly grateful for the trainings and technical support I received from CMC Microsystems.

I would like to offer my special thanks to my homestay hosts, Yagu, James and May, for the friendship and warmth I have received.

I owe a very important debt to all my gurus from grade school to graduate school, particularly, Prof. Rajesh Kannan Megalingam who sowed the seeds of research in me.

Last but not the least, I would like to express my gratitude to all my friends, especially, Aseem, Balaji, Breeson, Dave, Kate, Kylea, Manosilah, Muzammil, Parthasarathy, Radhika, Rakesh, Richa, Rij