Characterization for Heterogeneous Multicore Architectures

S. Arash Ostadzadeh antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Del, op gezag van de Rector Magnificus prof. ir. K.Ch.A.M. Luyben, voorzier van het College voor Promoties, in het openbaar te verdedigen op dinsdag december om : uur door Sayyed Arash OSTADZADEH Master of Science in Computer Engineering Ferdowsi University of Mashhad geboren te Mashhad, Iran Dit proefschri is goedgekeurd door de promotor: Prof. dr. K.L.M. Bertels Samenstelling promotiecommissie: Rector Magnificus voorzier Prof. dr. K.L.M. Bertels Technische Universiteit Del, promotor Prof. dr. ir. H.J. Sips Technische Universiteit Del Prof. Dr.-Ing. Michael Hübner Ruhr-Universität Bochum, Duitsland Prof. Dr.-Ing. Mladen Berekovic Technische Universität Braunschweig, Duitsland Prof. dr. Henk Corporaal Technische Universiteit Eindhoven Prof. dr. ir. Dirk Stroobandt Universiteit Gent, België Dr. G.K. Kuzmanov Technische Universiteit Del Prof. dr. ir. F.W. Jansen Technische Universiteit Del, reservelid S. Arash Ostadzadeh antitative Application Data Flow Characterization for Heterogeneous Multicore Architectures Met samenvaing in het Nederlands. Subject headings: Dynamic Binary Instrumentation, Application Partitioning, Hardware/Soware Co-design. e cover images are abstract artworks created by the Agony drawing program developed by Kelvin (hp://www.kelbv.com/agony.php). Copyright © S. Arash Ostadzadeh ISBN 978-94-6186-095-8 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmied, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the permission of the author. 9 789461 860958 Printed in e Netherlands Dedicated to my dear parents ⅰ ⅱ Abstract ecent trends show a steady increase in the utilization of heterogeneous multicore architectures in order to address the ever-growing need for com- r puting performance. ese emerging architectures pose specific challenges with regard to their programmability. In addition, they require efficient application mapping schemes to fully harness their processing power and avoid bole- necks. In this respect, it is of critical importance to analyse application behaviour, and the data communication between tasks, in particular. In this dissertation, we present a profiling framework that helps developers to gain an insight into the behaviour of an application. e presented profiling framework is generic and not restricted to a particular platform, application, or purpose. We utilize this framework with the primary goal of mapping applications onto a heterogeneous multicore architecture. e framework includes a memory access profiling toolset, called Quad, that provides quantitative information regarding the memory accesses in an application. Quad utilizes Dynamic Binary Instrumentation (DBI) to detect the actual data dependencies that occur between the tasks of an application at runtime. Additional- ly, it also provides accurate memory access measurements, such as the amount of data transferred between tasks and the memory size required for their communication. Such information can be utilized to identify critical parts of an application, to highlight coarse- grained parallelism opportunities, and to guide code optimizations. As a proof of concept to substantiate the usefulness of the extracted profiling information, we utilize the main output of Quad, the antitative Data Usage (QDU) graph, as the input model to formulate a general application partitioning problem. e formula- tion of this intractable problem is flexible and accommodates different design objectives and constraints. Subsequently, we propose a heuristic algorithm to find high quality partitions of an application in a reasonable amount of time. In addition to the com- plexity analysis of the proposed algorithm, we present a thorough theoretical analysis of the application partitioning problem. In order to evaluate the quality of the solutions, we developed a test bench for generating synthetic QDU graphs and compared the results against the optimal partitions obtained using an exhaustive search. e com- parison results show that the proposed heuristic algorithm is able to provide optimal or near-optimal solutions. To further prove the applicability of the profiling framework, we investigate in de- tail the utilization of the framework in practice, by mapping two real applications onto ⅲ a heterogeneous reconfigurable architecture. To achieve this goal, we propose a hardware/soware partitioning methodology that introduces the concept of merging tightly- coupled tasks based on the data communication analysis. Moreover, the profiling information is utilized to fine-tune the applications and optimize their data flow. e obtained results show a performance increase of % and %. ⅳ Acknowledgements My interest in computers dates back to , when I managed to get my hands on a Commodore . I can still vividly recall the day my brother came up with a magic box in his hands. All it needed was just a "poke" to make my already hypnotized eyes poke out! and yes, I do remember the magical number aer all these years! POKE ,<color code: -> and bingo… you have the desired border color! Simple, but it was more than enough to cast a spell on me. If I am where I am standing today, it is because of you, Shervin. I decided to study computer science because I was enchanted by your programming skills and enthusiasm. I will never forget all those good times when I used to sit beside you, trying to learn something new about computers. You are not only a dear friend and a true brother to me, but also a perennial source of inspiration and fortitude. anks for your selfless support and encouragement through all these years. is thesis is not only the outcome of my endeavor over the last years, but also the kind guidance, assistance and support of several individuals, to all of whom I am deeply grateful. Words fail to stand for the deep gratitude that I wish to express to all of you. I would like to stress the fact that the order in which I acknowledge the names is not representative of the value that I place on their roles in this respect. First of all, I would like to sincerely thank my advisor and promotor, Prof. Koen Bertels, who gave me the chance to step into the PhD journey. Koen, I kindly value your continuous support, commitment, and patience, which immensely influenced my research view. You gave me the opportunity to develop myself in different aspects and to have a vision for future research. I am grateful for your dedication to guide me along the entire journey, our fruitful discussions, and the freedom that you granted me to pursue the research work. I am also thankful for your invaluable comments on my thesis. I would like to extend my gratitude to my defense commiee for the time that they invested in reading the thesis manuscript. I appreciate their insightful discussions and suggestions to improve the quality of this thesis. I am indebted to my dear friend, Faisal, for all the proofreading of my thesis. Faisal, I value the time we spent for the research collaboration; but above that, I highly appreciate your genial friendship. anks for the helping hand whenever I needed you. My appreciation also goes to Roel for the comments and discussions on our collaborative research work. Roel, thanks for helping me during the recent years in Holland. I would like to express my gratitude to Imran for his valuable contributions to the extension of this research work. Imran, you are a smart, dedicated, hardworking researcher that any- one would cherish working with. At the same time, you are a modest and trustworthy ⅴ friend. I would also like to thank Kamana for her friendship and support. I am grateful to Carlo for all the proofreading of my manuscripts and his comments over the last few years. I would also like to acknowledge Valery and Marco for their efforts to improve the Quad toolset, Andrew for kindly proofreading the abstract and propositions of the thesis, Roel and Moa for their translation into Dutch. My appreciation goes to Iranian friends in Holland who helped me sele down here and made me feel at home: Mahmood, Mojtaba, Alireza, Javad, Mehdi, Behnaz, Mahyar, Rahim, Hamed, Asad, Roya, Mehdi, Gholam Reza, Vahid, Azadeh, Ashkan, Ghazaleh, Sepideh, Mohammad Reza, Reza, Masoud, Mohammad, Amin, Behzad, Hossein, Hadi, Arash, Mohamad Reza, Ali, Hossein, and other friends that I have failed to name here. A special thanks goes to Alireza and Javad for tolerating me when I was falling asleep where I was not supposed to! … thanks for being supportive through all these years. Mahyar, I appreciate all your invaluable support and kindness. I would also like to acknowledge my present and former colleagues in the Computer Engineering research group at TU Del: Zubair, Luyi, omas, Dimitrios, Sebastian, Jae, Tariq, Fakhar, Seyab, Aqeel, Mafalda, Innocent, Laiq, Bogdan, Omar, Hamid, Roel, Saleh, Elena, Cuong, Vlad, Razvan, Muhammad, Chunyang, and Demid. I am grateful to Lidwina for taking care of all the administrative work during these years. I also wish to thank Bert, Erik and Eef for their technical support. I would like to take this opportunity to express my sincere appreciation to all my teachers who have taught me since I went to school, and to all my wonderful friends in Iran for their prayers, kind words, and moral support. Finally, I wish to express my deepest gratitude to my dear parents for their endless love, support, and commitment throughout my life. Mom, Dad, your incomparable love gave me the strength to overcome all the troubles that I faced in my life. You were my one and only motivation to stay and complete this journey. Mom, Dad, I endured just to see the smile on your face, which means more than the world to me.

Characterization for Heterogeneous Multicore Architectures

Application Performance Optimization

A Hardware Abstraction Layer in Java

Modeling of Hardware and Software for Specifying Hardware Abstraction

Memory Subsystem Profiling with the Sun Studio Performance Analyzer

A Full Stack Approach to FPGA Integration in the Cloud

W4118 Operating Systems OS Overview

Oracle Solaris Studio 12.2 Performance Analyzer MPI Tutorial

Oracle® Solaris Studio 12.4: Performance Analyzer Tutorials

Vypracovane Otazky K Bakalarskym Statnicim

Openmp Performance 2

Openmp 4.0 Support in Oracle Solaris Studio

Web Applications and the Hardware Abstraction Layer