Time-Predictable Java Chip-Multiprocessor

Die approbierte Originalversion dieser Dissertation ist an der Hauptbibliothek der Technischen Universität Wien aufgestellt (http://www.ub.tuwien.ac.at). The approved original version of this thesis is available at the main library of the Vienna University of Technology (http://www.ub.tuwien.ac.at/englweb/). DISSERTATION Time-Predictable Java Chip-Multiprocessor ausgeführt zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften unter der Leitung von O.Univ.Prof. Dipl.-Ing. Dr. Herbert Grünbacher und Univ.Ass. Dipl.-Ing. Dr. Martin Schöberl Institut für Technische Informatik, Real-Time Systems Group eingereicht an der Technischen Universität Wien, Fakultät für Informatik von Christof Pitter Matr.-Nr. 0327072 Markhofgasse 4/17, A–1030 Wien Wien, im März 2009 ii Time-Predictable Java Chip-Multiprocessor Embedded systems are frequently used in real-time applications. Such applications must undergo timing analysis to ensure the timing constraints are met and the mission succeeds. Static worst-case execution time (WCET) analysis yields safe and precise upper bounds of tasks for a given hardware platform. It is preferred to measurement-based analysis methods because it guarantees to consider all possible execution times. The purpose of this thesis is to design a novel chip-multiprocessor (CMP) solution for the development of Java real-time applications. This chip- multiprocessor system consists of a global physical memory accessible to all processors. A memory arbiter resolves concurrent access of multiple CPUs to the main memory. This architecture enables simple communication by accessing shared data objects. This thesis investigates if a shared memory multiprocessor can serve as a hardware platform for real-time applications. The great challenge is that tasks running on different CPUs of a CMP in- fluence each others’ execution times when accessing memory. Therefore, the system’s arbiter must limit these interdependencies to be able to analyze WCETs of individual tasks. An adaptation of a static WCET tool for use with the multiprocessor architecture shall permit straightforward WCET analysis results. In this study, the proposed CMP design is implemented using field-program- mable gate array technology. Three different arbitration policies are devel- oped: a fixed priority, a fair-based, and a time-sliced arbiter. Timing analysis approaches are carried out for the specified memory arbiters. Various CMP configurations with varying number of CPUs are evaluated, analyzed, and compared with respect to their real-time and average-case performance. Different benchmarks are used for executing programs on real hardware. Results of this study have revealed that only the time-sliced memory arbitration scheme allows a calculation of viable WCET bounds of Java applications. A comparison of different CMP configurations shows that dynamic arbitration mechanisms are less predictable in the temporal domain but show better average-case program performance. The principal conclusion of this research demonstrates that timing analysis is possible for homogeneous multiprocessor systems with a shared memory. iii iv Echtzeit Java Chip-Multiprozessor Eingebettete Systeme werden häufig für sicherheitskritische Anwendungen verwendet. Solche Anwendungen erfordern eine Analyse des zeitlichen Ver- haltens, um die zeitlichen Anforderungen garantieren und ein Fehlverhal- ten mit dramatischen Konsequenzen verhindern zu können. Die statische Worst-Case Execution Time (WCET) Analyse bestimmt sichere und präzise Grenzen für Programmausführungszeiten auf einer vorgegebenen Hardware- Plattform. Sie wird einer messbasierten Analyse vorgezogen, weil alle mög- licherweise auftretenden Ausführungszeiten berücksichtigt werden. Zweck dieser Dissertation ist die Entwicklung einer innovativen Chip-Multi- prozessor (CMP) Lösung, für die Entwicklung von Echtzeitanwendungen mittels Java. Dieses symmetrische Multiprozessorsystem besteht aus mehreren CPUs und einem globalen Hauptspeicher. Ein Arbiter löst Speicherzu- griffskonflikte zwischen mehreren Prozessoren auf. Die Kommunikation zwischen unterschiedlichen Prozessoren wird unter Verwendung von gemeinsam genutzter Variablen sichergestellt. Die Dissertation untersucht, ob ein sym- metrischer Multiprozessor (SMP) als Plattform für Echtzeitanwendungen eingesetzt werden kann. Die große Herausforderung ist, dass sich die Aus- führungszeiten der Tasks gegenseitig durch den gemeinsamen Speicherzugriff beeinflussen. Deshalb muss ein Arbiter diese gegenseitigen Beeinträchtigun- gen beschränken, um maximale Ausführungszeiten von individuellen Tasks analysieren zu können. Die Adaptierung eines bestehenden Analysetools für die Verwendung mit SMPs soll eine einfache Laufzeitbestimmung ermögli- chen. Ein Teil dieser Forschungsarbeit war die Implementierung des CMPs in der FPGA Technologie. Drei verschiedene Arbitertypen wurden entwickelt: ein Arbiter mit fixer Priorität, ein fairer Arbiter und ein Arbiter der die ge- meinsame Speicherbandbreite in Zeitschlitze unterteilt. Das Zeitanalysekon- zept wird individuell für jeden Arbiter ausführlich präsentiert. Verschiedene CMP-Konfigurationen mit unterschiedlicher Prozessoranzahl werden evalu- iert, analysiert und in Bezug auf ihr Echtzeitverhalten bzw. auf ihre Rechen- leistung verglichen. Dazu werden Benchmarks am Prototyp ausgeführt. Die Dissertationsergebnisse bestätigen, dass nur der Arbiter mit dem Zeit- schlitzverfahren eine Berechnung von brauchbaren, maximalen Ausführungs- zeiten zulässt. Der Vergleich von verschiedenen CMP-Konfigurationen zeigt, dass dynamische Arbitrieralgorithmen im Zeitbereich weniger vorhersagbar sind, jedoch größeres Leistungspotenzial besitzen. Diese Dissertation beweist, dass eine statische Zeitanalyse für einen SMP möglich ist. v vi Acknowledgements The research for this thesis has been conducted during my employment as a research assistant at the Institute of Computer Engineering, Real-Time Systems Group within the Vienna University of Technology. First, I would like to thank my adviser Professor Dr. Herbert Grünbacher for his skillful advice, valuable support, and helpful suggestions. I would like to extend my gratitude towards my secondary adviser Professor Dr. Reinhold Weiss for beneficial comments and suggestions on the thesis. Special thanks go to my thesis co-adviser and colleague Martin Schöberl. We often had interesting and valuable discussions! I really appreciated your inspiration and motivation before upcoming paper submission deadlines. Your commitment and enthusiasm improved my work considerably during the last four years. Furthermore, I would like to thank Peter Puschner, Wolfgang Puffitsch, and Bernhard Gressl for constructive comments and helpful suggestions on the thesis. I would like to acknowledge the financial support of the Austrian Federal Ministry of Transport, Innovation, and Technology (BMVIT). This work would not have been possible without their funding. Special thanks go to my friends from all over the world, for keeping real life more interesting than the virtual one. They are an endless source of joy and inspiration and I would not be the same without them. I am deeply grateful to Valerie and my family for their constant support. This thesis is dedicated to them. vii viii Contents 1 Introduction1 1.1 Motivation.............................1 1.2 Problem Definition and Objectives...............6 1.3 Contributions...........................7 1.4 Thesis Outline..........................9 2 Time Predictable CPU and DMA Shared Memory Access 15 2.1 Introduction............................ 16 2.2 Related Work........................... 17 2.3 CPU/DMA shared memory access............... 18 2.4 Evaluation............................. 21 2.5 Conclusion and Future Work.................. 27 3 Towards a Java Multiprocessor 29 3.1 Introduction............................ 30 3.2 Related Work........................... 31 3.3 CMP Architecture........................ 33 3.4 Implementation.......................... 39 3.5 Experiments............................ 43 3.6 Conclusion............................. 46 4 Performance Evaluation of a Java Chip-Multiprocessor 51 4.1 Introduction............................ 52 4.2 Related Work........................... 53 4.3 Overview of JopCMP....................... 56 4.4 Performance Evaluation..................... 61 4.5 Conclusion and Future Work.................. 68 5 Time-Predictable Memory Arbitration for a Java Chip-Multi- processor 73 5.1 Introduction............................ 74 5.2 Related Work........................... 75 5.3 JopCMP Architecture...................... 76 5.4 WCET Analysis......................... 79 5.5 Results............................... 86 5.6 Conclusion and Future Work.................. 91 ix CONTENTS 6 Further Analysis and Evaluation 95 6.1 Memory Arbitration Revisited.................. 95 6.2 Timing Analysis......................... 100 6.3 Performance Evaluation..................... 105 6.4 Discussion............................. 110 7 Conclusion and Outlook 115 7.1 Thesis Goal............................ 115 7.2 Research Compendium...................... 116 7.3 Thesis Relevance......................... 117 7.4 Outlook.............................. 118 7.5 Summary............................. 121 A Measurement-based Verification of Bytecode WCET Analy- sis 123 A.1 Verification Goal......................... 123 A.2 Verification Method....................... 123 A.3 Verification Result of Bytecode iaload............. 124 B Analyzed Bytecode

Time-Predictable Java Chip-Multiprocessor

WCET Analysis of Java Bytecode Featuring Common Execution Environments

Design Decisions for a Java Processor

JOP: a Java Optimized Processor

A Hardware Abstraction Layer in Java

Application Experiences with a Real-Time Java Processor

Extending Java for Heterogeneous Embedded System Description

JOP: a Java Optimized Processor for Embedded Real-Time Systems

Evaluation of a Java Processor

A Java Processor Architecture for Embedded Real-Time Systems

JOP Reference Handbook: Building Embedded Systems with a Java Processor

Design Decisions for a Java Processor

Time-Predictable Stack Caching