Adaptable OS Services for Distributed Reconfigurable Systems On

Adaptable OS Services for Distributed Reconfigurable Systems on Chip Sufyan L. M. Samara Design of distributed embedded systems University of Paderborn A thesis submitted to the Faculty of Computer Science, Electrical Engineering, and Mathematics of the University of Paderborn in partial fulfillment of the requirements for the degree of Dr. rer. nat. November 2010 Abstract The ever guest for more computational capabilities leads to embedded systems which consist of multiple computational elements integrated on a single chip. An example is the integration of a reconfigurable fabric (FPGA) with a number of general purpose processors to form what so called a reconfigurable system on chip. These embedded systems are common to be distributed. This creates a flexible high performance distributed system. However, it is very complex when it comes to management. Applications running on such systems are expected to be dynamic in re- gard of arriving and leaving the system. This increases the complexity as the resources and the demands would change unpredictably. In this work, an OS service model, which efficiently adapts to the various changes in these systems, is presented. In addition, the algorithms and the methodologies, developed to allow this novel OS service model to interact with the application demands and the environment unpredictable dynamic variations, are discussed. Furthermore, the extensive evaluations of these algorithms are presented. Finally, a case study, which introduces the triple data encryption standard as prove of concept, is provided. Zusammenfassung Das ständige Streben nach immer größeren Rechenkapazitäten führt zu eingebetteten Systemen, die aus mehreren Verarbeitungselementen bestehen, die auf einem Chip in- tegriert sind. Ein Beispiel dafür ist die Integration eines rekonfigurierbaren Gewebes (FPGA) mit mehreren Universalprozessoren, um ein rekonfigurierbares System auf einem Chip zu bilden. Typischerweise werden diese Systeme verteilt. Dieses schafft flexible, verteilte Hochleistungssysteme. Allerdings sind diese Systeme aus Verwal- tungssicht hochgradig komplex. Es wird erwartet, dass Anwendungen, die auf diesen Systemen laufen, dynamisch in das System hineinkommen und es verlassen. Dieses erhöht die Komplexität, da sich Ressourcen und Anforderungen unvorhersehbar verändern können. In dieser Arbeit wird ein Betriebssystemdienstmodell präsentiert, welches effizient eingesetzt werden kann und sich den verschiedenartigen Veränderungen in diesen Sys- temen anpasst. Darüber hinaus werden Algorithmen und Methodologien diskutiert, die es diesem neuartigen Betriebssystemmodell erlauben mit den unvorhersehbaren Vari- ationen der Anforderungen der Anwendungen sowie der Umgebung zu interagieren. Ferner werden extensive Evaluationen dieser Algorithmen präsentiert. Das Dokument schließt mit einer Fallstudie des Triple Data Encryption Standards als Konzeptnachweis ab. Acknowledgements In the Name of ALLAH, Most Gracious, Most Merciful. All praise and thanks are due to ALLAH (the one and only GOD, the cre- ator of all and everything) and peace and blessings be upon his messenger Muhammad, he who said: "One who does not thank people does not give thanks to ALLAH, either." 1, I hereby thank my father, my mother, my aunt Fatimah, my family, the DAAD, Prof. Franz Rammig, my friends, and all who supported me in this work. Muharram 1432 (December 2010) Sufyan Lutfi Samara 1Tirmidhi, Birr 35, 1955; Abu Dawud, Adab 12, 4811 Contents List of Figures ix List of Tables xi Glossary xv 1 Introduction1 1.1 Motivation . .2 1.2 Thesis contribution . .3 1.3 Thesis structure . .4 2 Background and related work7 2.1 Reconfigurable systems . .8 2.1.1 FPGA and reconfiguration . 10 2.1.2 Co-design and partitioning . 13 2.2 Operating system for embedded systems . 14 2.2.1 Configurability in embedded operating systems . 15 2.2.2 Operating systems and reconfigurable elements . 17 2.3 Chapter conclusion . 19 3 System design and architecture 21 3.1 System design . 21 3.1.1 Centralized distribution . 22 3.1.2 Fully distributed topology . 24 3.1.3 Hybrid distribution topology . 25 3.2 RSoC architecture . 27 3.2.1 Middleware . 27 3.2.2 Virtual Machine . 33 3.3 Chapter conclusion . 33 v CONTENTS 4 OS Service structure 35 4.1 OS service design objectives . 36 4.2 OS service design . 38 4.2.1 SESs execution and applications . 41 4.2.2 Example: Realtime adaptation . 42 4.3 Chapter Conclusions . 45 5 OS service configurations and adaptation 47 5.1 OS service model formal definition . 49 5.2 Optimal configurations and constraints . 51 5.2.1 Single criterion optimization algorithm . 52 5.2.2 Evaluating for a suitable OS service configuration . 55 5.2.3 Evaluating for Pareto optimal OS service configuration . 57 5.2.4 Wrap up example . 62 5.3 SESs granularity and pipelining . 66 5.3.1 Partitioning granularity . 66 5.3.2 Pipelining execution and communication time . 71 5.4 Chapter Conclusions . 74 6 OS services distribution 75 6.1 SESs distribution . 76 6.1.1 Heterogeneity and Distributed Environments . 76 6.1.2 Fault Tolerance or Availability . 77 6.1.3 Distribution stages . 78 6.2 The Initialization stage . 78 6.2.1 Distribution algorithm . 80 6.3 The discovery and execution routing stage . 83 6.3.1 The building of the execution routing graphs . 84 6.3.2 Discovery and execution using self-x routing . 86 6.4 Chapter Conclusion . 91 7 Methods Evaluation 93 7.1 Evaluating OS service configurations . 94 7.1.1 OS service configurations under limited resources . 94 7.1.2 OS service Pareto optimal configuration . 98 7.2 OS service distribution . 100 7.2.1 Load balancing over fork distribution patterns . 101 7.2.2 Load balancing over RSoCs . 102 7.3 Chapter conclusion . 106 vi CONTENTS 8 Case study 107 8.1 Reconos . 107 8.1.1 SESs support . 108 8.1.2 SESs implementation . 109 8.2 Triple DES . 113 8.3 Results . 115 8.4 Chapter conclusion . 119 9 Conclusion and future directions 121 9.1 A brief summary . 121 9.2 Future directions . 123 9.2.1 Distribution and optimization . 123 9.2.2 Online model checking and recovery . 125 Author’s Publications 129 References 131 vii CONTENTS viii List of Figures 2.1 Embedded Systems: flexibility vs. performance . .9 2.2 Different RH and GPP connection strategies . .9 2.3 Reconfigurable computing as accelerator . 10 2.4 FPGA basic inner components . 11 2.5 Virtex II pro CLB and slice basic elements . 12 2.6 Xilinx Virtex-II pro structure . 12 3.1 Centralized topology . 23 3.2 Hybrid topology . 26 3.3 RSoC and middleware architecture . 28 3.4 GPP area representation . 30 4.1 OS service with two partitioned implementations . 39 4.2 SES general structure . 40 4.3 OS service realtime adaptation Example 1 . 43 4.4 OS service realtime adaptation Example 2 . 44 5.1 OS service configurations exhaustive exploration search . 48 5.2 General OS service model with edge representation . 50 5.3 Maximum and minimum thresholds representation . 60 5.4 Example of OS service with random meta-data . 63 5.5 Warp up example optimal configurations . 65 5.6 OS service and maximum inter SESs communication overhead . 68 5.7 Fine granularity partitioned OS service with and without inter communication . 70 5.8 A comparison of the minimum allowable SES granularity . 71 5.9 A comparison of partitioned and non-partitioned OS service W CET . 72 5.10 A comparison of a pipelined partitioned and non-partitioned OS service W CET .............................. 73 ix LIST OF FIGURES 5.11 A comparison of a pipelined partitioned, non-pipelined, and non-partitioned processing time . 73 6.1 Random SESs distribution . 79 6.2 The general fork distribution . 81 6.3 An OS service with its execution routing graph . 87 6.4 Execution configuration using self-x routing . 88 7.1 Evaluation of OS service suitable found configurations . 96 7.2 Evaluating OS service number of configurations for variable power and response time ratios . 97 7.3 Evaluation of the algorithm that finds the Pareto optimal OS service configuration . 99 7.4 Normalized representation of an OS service configurations resources requirements . 100 7.5 Distribution load balancing over fork distribution pattern set . 101 7.6 Load balance distribution based on one prioritized resource . 103 7.7 Load balance distribution of all resources . 105 8.1 ReconOS unified hardware/software thread modeling . 108 8.2 ReconOS OSIF and synchronization state machine . 111 8.3 The DES block diagram . 113 8.4 Partitioned 3DES OS service . 114 8.5 Case study: Pipelined vs non-pipelined execution . 116 8.6 Case study: partitioned and non-partitioned execution . 118 8.7 Case study: pipelined and non-partitioned execution . 118 9.1 A brief summary of OS service adaptation . 122 9.2 Execution network . 124 9.3 Recovering a possible fault in OS service execution . 126 9.4 SESs, middleware, and model checker integration . 127 x List of Tables ∗ 5.1 Time optimized Ck[l] with T =42 . 63 ∗ 5.2 Time optimized Mk[l] with M =h ................... 63 ∗ 5.3 Power optimized Ck[l] with P =19 . 64 ∗ 5.4 Power optimized Mk[l] with M =s................... 64 ∗ 5.5 Area optimized Ck[l] with A =19 . 64 ∗ 5.6 Area optimized Mk[l] with M =s.................... 64 6.1 Fields of FDAnt message . 90 6.2 Fields of BDAnt message . 90 7.1 Arbitrary example of OS service optimal resource requirements . 95 8.1 Inter communication overhead . 115 8.2 SESs FPGA and memory utilization . 116 xi LIST OF TABLES xii List of Algorithms 5.1 Single criterion optimal SESs configuration . 54 5.2 GetSuitableSESsConfigurations . 56 5.3 Pareto optimization for OS services . 61 6.1 General Fork Distribution Algorithm . 81 6.2 Make Execution Routing Graph . ..

Adaptable OS Services for Distributed Reconfigurable Systems On

Eee4120f Hpes

Review of FPD's Languages, Compilers, Interpreters and Tools

A Mobile Programmable Radio Substrate for Any‐Layer Measurement and Experimentation

Total Ionizing Dose Effects on Xilinx Field-Programmable Gate Arrays

Altera Powerpoint Guidelines

System-Level Methods for Power and Energy Efficiency of Fpga-Based Embedded Systems

SDCC Compiler User Guide

A TMD-MPI/MPE Based Heterogeneous Video System

SDCC Compiler User Guide

Altera Powerpoint Guidelines

Direct Hardware Generation from High-Level Programming Language

Xilinx/Synopsys Interface Guide— ISE 4 Printed in U.S.A