Flexwafe - an Architecture For
Total Page:16
File Type:pdf, Size:1020Kb
FlexWAFE - an Architecture for Reconfigurable Image Processing Systems FlexWAFE - eine Architektur für rekonfigurierbare-Bildverarbeitungssysteme Von der Fakultät für Elektrotechnik, Informationstechnik, Physik der Technischen Universität Carolo-Wilhelmina zu Braunschweig zur Erlangung der Würde eines Doktor-Ingenieurs (Dr.-Ing.) genehmigte Dissertation von: Eng. Amilcar do Carmo Lucas aus: Almeirim, Portugal eingereicht am: 19.04.2012 mündliche Prüfung am: 23.07.2012 Vorsitzender: Prof. Dr.-Ing. Harald Michalik 1. Referent: Prof. Dr.-Ing. Rolf Ernst 2. Referent: Prof. Dr.-Ing. Holger Blume 2012 Kurzfassung Kürzlich gab es eine Zunahme der Nachfrage nach hochauflösenden digitalen Medieninhalten in den Kino- und Fernsehenindustrien. Derzeit vorhandene Systeme entsprechen nicht den Anforderungen, oder sind zu teuer. Neue Hardware-Systeme und neuer Programmiertechniken sind erforderlich, um den hochauflösenden, hochwertigen, Bildanforderungen zu genügen und Kosten zu verringern. Die Industrie sucht eine flexible Architektur zur Ausführung mehrerer Anwendungen auf Standard-Komponenten, mit reduzierten Entwicklungszeiten. Bis jetzt ist gängige Praxis, spezialisierten Architektur und Systeme zu entwickeln, die eine einzelne Anwendung zielen. Dieses hat wenig Flexibilität und führt zu hohe Entwicklungs- kosten, jede neue Anwendung ist fast von Grund auf neu konzipiert. Unser Fokus war es, eine für Bild Verarbeitung geeignet Architektur zu entwickeln dass die Flexibilität hat mehrere Anwendungen an dieselbe FPGA-basierte Hardware-Plattform zu laufen. Die Neuheit in unserem Ansatz ist, dass wir Teile der Architektur zur Laufzeit rekon- figurieren, aber, ohne das Zeit und constraints strafe von FPGA Partielle-Rekonfiguration- Techniken. Die Architektur verwendet eine hierarchische Kontrollstruktur, die zur paral- lel Verarbeitung gut geeignet ist, und Single-Cycle-Latenz Rekonfiguration von Teilen der Verarbeitungs-Pipeline ermöglicht. Dieses wird unter Verwendung relativ weniger Ressour- cen für die verteiltes Steuerung Strukturen erzielt. Um das entwickelte Architektur zu testen ein komplexer Film-Korn-Rauschunterdrückung Algorithmus wurde auf einer von Thomson-Grass Valley entwickelt standard Hardware- Plattform umgesetzt. Das System erfüllt alle Anforderungen und hatte sehr wenig Last auf den hierarchischen Kontrollstrukturen, es gibt viel Wachstum Spielraum für viel komplizier- tere Steuerunganforderungen. Die Architektur ist zu anderen Hardwareplattformen portiert worden, und andere Anwen- dungen wurden ebenfalls implementiert. Der Laufzeitreconfigurability ist ein Schlüsselfaktor im Erfolg des FlexWAFE gewesen. ii Abstract Recently there has been an increase in demand for high-resolution digital media content in both cinema and television industries. Currently existing equipment does not meet the re- quirements, or is too costly. New hardware systems and new programming techniques are needed in order to meet the high-resolution, high-quality, image requirements and reduce costs. The industry seeks a flexible architecture capable of running multiple applications on top of standard off-the-shelf components, with reduced development time. Until now, standard practice has been to develop specialized architectures and systems that target a single application. This has little flexibility and leads to high developments costs, every new application is designed almost from scratch. Our focus was to develop an architecture that is suited to image stream processing and has the flexibility to run multiple applications using the same FPGA-based hardware platform. The novelty in our approach is that we reconfigure parts of the architecture at run-time, but without incurring in the time and added constraints penalty of FPGA-partial-reconfiguration techniques. The architecture uses a hierarchical control structure that is well suited to parallel processing, and allows single cycle latency reconfiguration of parts of the processing pipeline. This is achieved using relatively little resources for the distributed control structures. To test the developed architecture a complex film-grain noise reduction algorithm was im- plemented on an off-the-shelf hardware platform developed by Thomson-Grass Valley. The system meet all the requirements and had very little load on the hierarchical control structures, there is growth headroom for much complexer control demands. The architecture has been ported to other hardware platforms, and other applications have been implemented as well. The run-time reconfigurability has proven to be a key factor in the success of the FlexWAFE. iv Acknowledgments I would like to thank my thesis advisor Professor Rolf Ernst for the opportunity and trust he gave me when he offered me the job position after a short three month DAAD internship at his institute. I am very grateful for the motivation, guidance and insightful suggestions he gave me throughout my work. Furthermore I would like to thank Professor Harald Michalik for chairing the examination committee and Professor Holger Blume for the co-examination. I would like to express my deepest gratitude to my college and friend Dr. Sven Heithecker who shared the office with me for many years and worked together with me on the same project. I’ve learned a lot from him, our cooperative brainstorming sessions helped shape the FlexWAFE architecture and he also helped me adapt to a new country. Many thanks go also to office college Henning Sahlbach, who supported me in the final stages of the thesis and built my Dr. hat1. I would like to thank Dr. Marek Jersak for the trust he gave me after the short internship, and for the friendship. An important role in this thesis was also played by the excellent work environment provided by all the colleagues at the institute. I am thankful for the Kaffee Runde discussions, the institute excursions, the Christmas parties, and the extracurricular hat building opportunities provided by the institute. On a more personal note I would like to thank my loving wife for all the support and encouragement she gave me throughout this thesis. Most importantly of all, I would like to thank my parents, for the patience they had when educating me. And for the lovely way they found to bring-up technical curiosity in me. At age three I was already sure I wanted to be an electronics engineer. Thank you. 1PhD. candidates at IDA get a big, heavy, personalized microcontroller controlled hat with lots of flashing lights and moving parts vi Contents Kurzfassungi Abstract iii Acknowledgmentsv List of Figures xi 1. Introduction1 1.1. Digital Film Processing.............................1 1.1.1. Image Gathering............................2 1.1.2. Post Production.............................2 1.1.2.1. Application.........................2 1.1.2.2. Techniques..........................3 1.1.3. Delivery.................................3 1.1.4. Resolutions...............................4 1.2. The FlexFilm Project..............................4 1.2.1. Usage of FPGAs............................4 1.2.2. Motivation...............................6 1.2.3. Project partners and their work-packages...............8 1.2.4. Example Application..........................9 1.2.4.1. Film Grain Noise......................9 1.2.4.2. The Algorithm........................9 1.2.5. Hardware Architecture......................... 10 1.2.5.1. FPGAs............................ 12 1.2.6. Communication Channels....................... 12 1.2.7. Contribution to FlexFilm........................ 12 1.3. Thesis Outline.................................. 14 2. Processing platforms 17 2.1. Processing platforms for digital film processing................ 17 2.1.1. Line Dancer............................... 17 2.1.2. GPU.................................. 18 2.1.3. Cell Processor............................. 18 2.1.4. Storm-1................................. 19 2.1.5. FPGA based processors........................ 19 viii Contents 2.1.6. FPOA.................................. 20 2.1.7. Software on standard processors.................... 20 2.2. FPGA programming methodologies...................... 21 2.2.1. C as input................................ 21 2.2.2. Matlab/Simulink as input........................ 22 2.2.3. Hardware description languages.................... 23 2.3. Summary and Conclusion............................ 23 3. FlexWAFE 25 3.1. FlexWAFE Reconfigurable Architecture.................... 25 3.2. Inter-block Signaling.............................. 27 3.2.1. Data Types............................... 28 3.3. Data Processing Unit (DPU)- Data Processing Unit.............. 30 3.3.1. Processing Groups........................... 31 3.3.2. SIMD like processing......................... 31 3.4. Local Memory with Controller (LMC)- Local Memory with Controller.... 32 3.4.1. Asynchronous FIFOs.......................... 32 3.4.2. Address Stepper............................ 32 3.4.3. Cascaded Stepper............................ 33 3.4.4. LMC reorder streams.......................... 34 3.4.5. LMC with external memory...................... 35 3.4.6. Large FIFO based on external SDRAM memory........... 36 3.4.7. Other LMCs.............................. 36 3.5. Custom DDR-SDRAM Memory Controller.................. 36 3.6. Inter-chip and Inter-board Communication................... 37 3.7. Control and Programmability.......................... 38 3.7.1. Local Controller............................ 40 3.7.2. Algorithm Controller (AC)- Algorithm Controller........... 43 3.7.3. Control Bus............................... 44 3.8. Summary