Application and System Support for Reconfigurable Coprocessors in Multicore Devices

APPLICATION AND SYSTEM SUPPORT FOR RECONFIGURABLE COPROCESSORS IN MULTICORE DEVICES by Philip C. Garcia A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Electrical and Computer Engineering) at the UNIVERSITY OF WISCONSIN–MADISON 2012 Date of final oral examination: 8/4/2011 This dissertation is approved by the following members of the Final Oral Committee: Katherine Compton, Associate Professor, Electrical and Computer Engineering Mikko Lipasti, Professor, Electrical and Computer Engineering Nam Kim, Assistant Professor, Electrical and Computer Engineering Michael Schulte, Associate Professor, Electrical and Computer Engineering David Wood, Professor, Computer Science c Copyright by Philip C. Garcia 2012 All Rights Reserved i To my family and friends, without whom, this thesis would not have been possible ii ACKNOWLEDGMENTS There are many people I’d like to thank for helping me survive my many years in college. Most importantly, I’d like to thank my parents for always being there for me, and being understanding when I repeatedly told them I had no clue how long I had left in school. Their support throughout the years has been remarkable, and I cannot stress how important their influence on me has been. I would also like to thank the rest of my family for being so patient with me. I know everyone was wondering when I was going to graduate, and when I would get to see everyone again, and I hope to soon. I love all of you very much, and cannot thank you enough for your support. I’d also like to thank all of the friends I’ve made over the years. Your companionship, and help throughout the years have allowed me to continue on, and yet somehow, those of you who took real jobs after graduating did not convince me to drop out and get a job. My friends have taught me more than any book I have read or class I have taken, for this I am extremely grateful. I will not name names, for fear of leaving many many people out, but you are all important to me, and I cannot stress enough how much you have done for me. I’d especially like to thank my “irc” friends in #acm for . being there. The channel’s stories of the grasshopper and the octopus, bad jokes, name calling, and links to questionable websites have provided hours of entertainment, and a wonderful diversion from my research. Surviving years of graduate school is not an easy ordeal, and the pains of undertaking a PhD are quite immense. Therefore I would like to thank Jorge Cham for his wonderful comic Piled Higher and Deeper, which represented both the painful truths and the more whimsical aspects of graduate life. Additionally, no PhD would be possible without the help of my labmates and colleagues. Countless discussions have helped shape the way I approach my research, and the direction I chose to take it. The years spent working in an academic lab are rather magical, and allow one to absorb more knowledge than I ever thought possible. I’d also like to thank the many graduate students in vastly different areas that I have gotten the fortune to know. It is always wonderful to talk to someone doing research on a field completely apart from your own, and to hear their views on the world. They also have let me realize that I do not struggle alone, and that the problems of PhD students are relatively universal in nature. The many days (and more nights) I spent in lab were not spent in silence. As a lover of music, I have spent countless hours listening to many different bands. Listening to music allows me to both relax and better concentrate. I can also relate to many of my favorite bands. I’d therefore like to thank my favorite musicians for the countless hours of entertainment you’ve provided me. In particular, I’d like to thank some of my favorite bands including: Soul iii Asylum, The Replacements, The Jayhawks, Two Cow Garage, Paul Westerberg, Slobberbone, Grand Champeen, Big Star, Wilco, and the Kinks. Your music has helped me to maintain my sanity. Additionally the many live concerts I managed to see during graduate school have helped me to maintain a “real” perspective of the world outside of academia. They have been a great chance to relax, and have allowed me to become friends with people from all walks of life, bonding over our shared love of music. Additionally, no mention of music in graduate school would be complete without thanking Strictly Discs, my neighborhood record store that I spent way too much of my stipend at. Finally, I’d like to thank my thesis committee for their advice, and guidance during my PhD. I want to thank you all for taking the time to help me grow as a researcher, and I am proud to have gotten to know each one of you. This thesis would have been impossible without your input. This material is based upon work supported by the National Science Foundation under Grant No. 0702605. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. iv TABLE OF CONTENTS Page LIST OF TABLES ................................................. vii LIST OF FIGURES ................................................ viii ABSTRACT .................................................... xi 1 Introduction ..................................................1 1.1 Contributions . .2 1.2 Thesis Organization . .3 2 Background and Prior Work .........................................4 2.1 Heterogeneous Computing . .4 2.2 Coprocessor Models . .5 2.3 Reconfigurable Hardware Overview . .7 2.4 Reconfigurable Computing . .8 2.5 Existing Reconfigurable Computing Systems . .9 2.6 Dynamic Reconfiguration . 10 2.7 Operating System Support . 11 2.7.1 OS Support For Communication . 11 2.7.2 OS Support for Dynamic Reconfiguration . 12 2.8 Summary . 12 3 System Model ................................................. 13 3.1 System Overview . 13 3.2 Processor Model . 14 3.3 Reconfigurable Coprocessor . 14 3.4 Shared Memory Communication . 16 3.5 Direct Communication . 18 3.6 Simulation Platform . 19 3.7 System Limitations . 21 3.7.1 Stream Controller Allocation . 21 3.7.2 RH Kernel Placement . 22 4 Application Programming Interface ..................................... 24 4.1 Memory-Mapped I/O Segments . 24 4.1.1 Global Memory Segment . 25 4.1.2 Physical Memory Segment . 25 4.1.3 Virtual Memory Segment . 25 4.1.4 Virtual OS Memory Segment . 26 4.2 Initializing and Using RH Kernels . 26 v Page 4.3 OS Control of the RH Kernels . 28 4.4 Hardware Support . 30 4.4.1 Support for the RH Controller’s Memory Segments . 30 4.4.2 Querying RH Kernels . 30 4.4.3 Configuring RH Kernels . 32 4.5 RH Kernel Virtual Address Translation . 32 4.6 Memory Ordering on the Reconfigurable Hardware . 34 4.7 Library and OS Support . 36 5 Benchmarks, Workloads, and Testing Metrics ............................... 37 5.1 Benchmarks . 37 5.1.1 Benchmark Development . 37 5.1.2 Hybrid RH/SW Benchmarks . 38 5.1.3 Software-only Benchmarks . 40 5.2 Workloads . 41 5.3 Testing Metrics . 41 6 Cache Organizations for Reconfigurable Computing Systems ....................... 43 6.1 How Do RH Kernel Memory Accesses Differ From a CPUs? . 43 6.2 Examined Cache Topologies . 45 6.3 Performance . 46 6.4 Cache Sizing . 52 6.5 Power . 55 6.6 Reading Data Directly From The RH Kernels . 56 6.7 Conclusion . 59 7 Scaling to Multicore Systems ......................................... 60 7.1 Prior Work . 60 7.2 Application Workloads . 61 7.3 Results . 62 7.3.1 Hardware TLB Miss Handler . 62 7.3.2 Multiprocessor Performance . 64 7.3.3 Impact on Software-Only Applications . 66 7.4 Scalability Conclusions . 68 8 Reconfigurable Computing on Simultaneous Multithreaded Processors ................. 71 8.1 Prior Work . 71 8.2 Execution of Hybrid RH/SW workloads on SMT Processors . 72 8.3 Limiting RH thread’s Resource Usage . 75 8.3.1 Dynamic Fetch Stalling While Executing RH Kernels . 75 8.3.2 Improved Fetch Stalling . 78 8.4 Conclusions . ..

Application and System Support for Reconfigurable Coprocessors in Multicore Devices

Multiprocessing Contents

1 Introduction

Focused on Embedded Systems

FPGA Architecture: Survey and Challenges

(19) United States (12) Reissued Patent (10) Patent Number: US RE44,365 E Vorbach Et Al

Structural Object Programming Model: Enabling Efficient Development on Massively Parallel Architectures

UC San Diego Electronic Theses and Dissertations

A Study of Multiprocessor Systems Using the Picoblaze 8-Bit Microcontroller Implemented on Field Programmable Gate Arrays Venkata Mandala

A Survey of Multicore Processors

MPEG-4 Decoder on a Massively Parallel Processor Array (MPPA), Ambric 2045, by Converting the CAL Actor Language Implementation of the Decoder

Massively Parallel Processors

Generation of Custom Run-Time Reconfigurable Hardware For