K-Loops: Loop Transformations for Reconfigurable Architectures
Total Page:16
File Type:pdf, Size:1020Kb
Ozana Silvia Dragomir K-loops: Loop Transformations for Reconfigurable Architectures K-loops: Loop Transformations for Reconfigurable Architectures PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof.ir. K.C.A.M Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op donderdag 16 juni 2011 om 15:00 uur door Ozana Silvia DRAGOMIR Master of Science, Politehnica University of Bucharest geboren te Ploies¸ti, Roemenie¨ Dit proefschrift is goedgekeurd door de promotor: Prof.dr.ir. H.J. Sips Copromotor: Dr. K.L.M. Bertels Samenstelling promotiecommissie: Rector Magnificus, voorzitter Prof.dr.ir. H.J.Sips, Technische Universiteit Delft, promotor Dr. K.L.M.Bertels, Technische Universiteit Delft, copromotor Prof.dr. A.P. de Vries, Technische Universiteit Delft Prof.dr. B.H.H. Juurlink, Technische Universitat¨ Berlin, Duitsland Prof.dr. F. Catthoor, Katholieke Universiteit Leuven, Belgie¨ Prof.dr.ir. K. De Bosschere, Universiteit Gent, Belgie¨ Dr. J.M.P. Cardoso, Universidade do Porto, Portugal Prof.dr. C.I.M. Beenakker, Technische Universiteit Delft, reservelid Ozana Silvia Dragomir K-loops: Loop Transformations for Reconfigurable Architectures Delft: TU Delft, Faculty of Elektrotechniek, Wiskunde en Informatica Met samenvatting in het Nederlands. ISBN 978-90-72298-00-3 Subject headings: compiler optimizations, loop transformations, reconfigurable computing. Cover image by Sven Geier. Copyright c 2011 Ozana Silvia Dragomir All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission of the author. Printed by Wohrmann¨ Print Service (www.wps.nl), The Netherlands. To my beloved husband, to our wonderful son. K-loops: Loop Transformations for Reconfigurable Architectures Abstract ECONFIGURABLE computing is becoming increasingly popular, as it is a middle point between dedicated hardware, which gives the best R performance, but has the highest production cost and longest time to the market, and conventional microprocessors, which are very flexible, but give less performance. Reconfigurable architectures have the advantages of high performance (given by the hardware), great flexibility (given by the software), and fast time to the market. Signal processing represents an important category of applications nowadays. In such applications, it is common to find large computation intensive parts of code (the application kernels) inside loop nests. Examples are the JPEG and the MJPEG image compression algorithms, the H264 video compression algorithm, the Sobel edge detection, etc. One way of increasing the performance of these applications is to map the kernels onto reconfigurable fabric. The focus of this dissertation is on kernel loops (K-loops), which are loop nests that contain hardware mapped kernels in the loop body. In this thesis, we propose methods for improving the performance of such K-loops, by using standard loop transformations for exposing and exploiting the coarse grain loop level parallelism. We target a reconfigurable architecture that is a heterogeneous system con- sisting of a general purpose processor and a field programmable gate array (FPGA). Research projects targeting reconfigurable architectures are trying to give answers to several problems: how to partition the application – decide which parts to be accelerated on the FPGA, how to optimize these parts (the kernels), what is the performance gain. However, only few try to exploit the coarse grain loop level parallelism. This work goes towards automatically deciding the number of kernel instances to place into the reconfigurable hardware, in a flexible way that can balance between area and performance. In this dissertation, we propose a general frame- work that helps determine the optimal degree of parallelism for each hardware i mapped kernel within a K-loop, taking into account area, memory size and bandwidth, and performance considerations. In the future it can also take into account power. Furthermore, we present algorithms and mathematical models for several loop transformations in the context of K-loops. The algorithms are used to determine the best degree of parallelism for a given K-loop, while the mathematical models are used to determine the corresponding performance improvement. The algorithms are validated with experimental results. The loop transformations that we analyze in this thesis are loop unrolling, loop shifting, K-pipelining, loop distribution, and loop skewing. An algorithm that decides which transformations to use for a given K-loop is also provided. Finally, we also present an analysis of possible situations and justifications of when and why the loop transformations have or have not a significant impact on the K-loop performance. ii Acknowledgements The fact that I came to Delft to do my PhD is the result of a series of both fortunate and unfortunate events. I would like to thank to the main people that made it possible: Prof. Irina Athanasiu, Prof. Stamatis Vassiliadis, Vlad Sima, and Elena Panainte. Irina Athanasiu was my first mentor from the Politehnica University of Bucharest, and she taught many of us not only about compilers, but also valuable lessons of life. Stamatis Vassiliadis was the one that gave me the chance to do this PhD, and accepted me in his group. He was a great spirit and a great leader, and the group has never been the same without him. Vlad Sima has been near me in many crucial moments of my life. I thank him for all his patience and support throughout the years and, although our lives went on different paths, I will always think of him fondly. I am grateful to Elena Panainte and to her husband Viorel for welcoming me into their home, and for sharing their knowledge about life here, which made it easier for me to make the decision of coming to live here. Next, I would like to thank my advisor, Dr. Koen Bertels, who believed in me more than I did. If it wasn’t for him, I might not have found the strength to complete the PhD. I am also grateful to Stephan Wong and to Todor Stefanov, with whom I worked on my first papers. They critically reviewed my work and gave me helpful suggestions to improve it. Many thanks also to Carlo Galuzzi who reviewed the text of my dissertation. Many people from the CE group made my life here in Delft easier and nicer, and I would like to thank all of them for the fun nights and social events, for the help when relocating (too many times already), for the time to chat, for the nice pictures, and for everything else. Many thanks to all the Romanians, to my past and present office mates, to Kamana, Yana, Filipa, Behnaz, Lu Yi, Demid, Roel, Dimitris, Sebastian, Stefan (Bugs), Cor, Ricardo, and to the CE staff (Lidwina, Monique, Erik, and Eef). Special thanks go to Eva and Vasek,ˇ for opening up my mind regarding the terrifying subject of child rearing; to Marjan Stolk, for introducing us to the Dutch language and culture in a pleasant and fun way and for being more than iii an instructor; to Martijn, for being the ‘best man’ and a true friend; and to Sean Halle, for insightful discussions, for believing in me, and for becoming an (almost) regular presence in our family. Last but not least, I would like to thank my family. My parents have always been there for me and encouraged me to follow my dreams. I thank them for understanding that I cannot be with them now, and I miss them. To my husband, Arnaldo, I would like to thank for his love and patience, and for our wonderful son, Alex. With them I am a complete person, and, I hope, a better person. Ozana Dragomir Delft, The Netherlands, June 2011 iv Table of contents Abstract .................................. i Acknowledgements ............................ iii List of Tables ................................ ix List of Figures ............................... xi List of Acronyms and Symbols ...................... xv 1 Introducing K-Loops .......................... 1 1.1 Research Context . 1 1.2 Problem Overview . 3 1.3 Thesis Contributions . 7 1.4 Thesis Organization . 8 2 Related Work .............................. 9 2.1 Levels of Parallelism . 10 2.2 Loop Transformations Overview . 11 2.2.1 Loop Unrolling . 11 2.2.2 Software Pipelining . 14 2.2.3 Loop Fusion . 17 2.2.4 Loop Fission . 18 2.2.5 Loop Scheduling Transformations . 19 2.2.6 Enabler Transformations . 22 2.2.7 Other Loop Transformations . 24 2.3 Loop Optimizations in Reconfigurable Computing . 25 2.3.1 Fine-Grained Reconfigurable Architectures . 25 2.3.2 Coarse-Grained Reconfigurable Architectures . 29 2.4 LLP Exploitation . 32 v 2.5 Open Issues . 34 3 Methodology .............................. 37 3.1 Introduction . 37 3.1.1 The Definition of K-loop . 38 3.1.2 Assumptions Regarding the K-loops Framework . 38 3.2 Framework . 41 3.3 Theoretical Background . 42 3.3.1 General Notations . 42 3.3.2 Area considerations . 45 3.3.3 Memory Considerations . 46 3.3.4 Performance Considerations . 48 3.3.5 Maximum Speedup by Amdahl’s Law . 49 3.4 The Random Test Generator . 51 3.5 Overview of the Proposed Loop Transformations . 53 3.6 Conclusions . 58 4 Loop Unrolling for K-loops ....................... 59 4.1 Introduction . 59 4.1.1 Example . 59 4.2 Mathematical Model . 60 4.3 Experimental Results . 62 4.3.1 Selected Kernels . 62 4.3.2 Analysis of the Results . 64 4.4 Conclusions . 68 5 Loop Shifting for K-loops ....................... 69 5.1 Introduction . 69 5.1.1 Example . 70 5.2 Mathematical Model . 71 5.3 Experimental Results . 74 5.4 Conclusions . 77 6 K-Pipelining .............................. 81 6.1 Introduction . 81 6.1.1 Example . 82 6.2 Mathematical Model . 85 6.3 Experimental Results .