Multi-Core Datapath Contention Modelling

Eindhoven University of Technology MASTER Multi-core datapath contention modelling Tang, X. Award date: 2017 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Multi-core Datapath Contention Modelling Master thesis project (5T746) of Xinjue Tang ([email protected]) Student ID. 0898843 Host Company: Océ -Technologies B.V. Research & Development Department St. Urbanusweg 43 5914 CA Venlo The Netherlands Supervisor Océ : dr. Lou Somers ([email protected]) Supervisors TNO: dr. Tjerk Bijlsma ([email protected]) dr. Martijn Hendriks ([email protected]) Supervisor TU/e: prof. Twan Basten ([email protected]) GLOSSARY Platform The operating system and computer hardware. CPU An abbreviation of Central Processing Unit. It may contain one or more cores, and includes interconnection and caches. Core The basic computation unit of a CPU. Multiprocessor The use of two or more CPUs within a single computer system. A multi- processing system includes multiple complete processing units. Multithreading A software concept. The ability of a CPU or a single core to execute multiple processes or threads concurrently, supported by the operating system. Hardware threads Refers to hyper-threading, which is Intel’s proprietary simultaneous multithreading. The ability of a CPU or a single core to execute multiple processes or threads at the same time, supported by the hardware. A core supporting 2 hardware threads means 2 threads can be executed simultaneously. Retired Instructions Refers to instructions that are actually executed and completed by a CPU. Modern processors execute much more instructions that the program flow needs (but results are “stored” only for retired instructions). This is called “speculative execution”. The instructions that are “proven” as indeed needed by flow are retired instructions. 1 ABBREVIATIONS DES Discrete Event Simulation DVFS Dynamic Voltage and Frequency Scaling HRT Hard Real Time WCRT Worst-Case Response Time SRT Soft Real Time NRT Not Real Time AuE Application under Evaluation LLC Last Level Cache SO Static Ordering LJF Longest Job First NGMP Next-Generation Multi-core Processor VTune Intel® VTuneTM Amplifier 2013 CPI Cycles per Instruction NUMA Non-Uniform Memory Access ALU Arithmetic Logic Unit 2 TABLE OF CONTENTS Glossary ........................................................................................................................................................ 1 Abbreviations ............................................................................................................................................... 2 Table of Contents .......................................................................................................................................... 3 I. Abstract ................................................................................................................................................. 4 II. Introduction .......................................................................................................................................... 4 III. System Description ........................................................................................................................... 6 IV. Performance Prediction: the State of The Art .................................................................................. 8 V. Modeling Approaches: Analytical Model vs. DES Model .................................................................... 10 VI. Development Process and Requirements ....................................................................................... 12 VII. Test-Sets and Platforms for F-path ................................................................................................. 13 VIII. Introduction on the DES Model Components ................................................................................. 14 IX. The F-path Model without a Penalty Model ................................................................................... 16 X. F-path DES Model with the Static Penalty Model ............................................................................... 21 A. Predict the processing time of bitmaps with increasing number of cores ..................................... 23 B. Predict the processing time of different bitmaps ........................................................................... 24 C. Predict the performance on different platforms ............................................................................ 24 XI. F-path DES Model with the Dynamic Penalty Model ...................................................................... 25 A. Hypothesis and Validation .............................................................................................................. 26 1) No Hyper-threaded mode: .......................................................................................................... 26 2) Hyper-threaded mode: ............................................................................................................... 29 B. Model Realization ........................................................................................................................... 31 1) The dynamic penalty model developed for capability 1 ........................................................... 33 2) The dynamic penalty model developed for capability 2 ........................................................... 40 3) The dynamic penalty model developed for capability 3 ........................................................... 44 XII. Comparisons to the Analytical Model ............................................................................................. 47 XIII. Combined Datapath DES Model with the Dynamic Penalty Model ................................................ 50 XIV. Conclusion ....................................................................................................................................... 52 XV. References ...................................................................................................................................... 54 3 I. ABSTRACT One printer Océ currently developed envelops two platforms, each of them executes one of the pipelines used to generate the firing patterns for the printer. When Océ wants to save its cost on manufacturing this printer, it is general to consider implementing both pipelines in the same platform. At the same time, it is not willing to sacrifice the throughput. Hence, inspecting the possibility of implementing the pipelines in one platform without losing the throughput constraints is the researching topic Océ proposes. Modeling is one way to inspect the performance of systems. From the perspective of Océ , this model should be able to estimate the processing time of the printer on different inputs. Currently, two models have been accomplished to estimate the speed of each pipeline running on its own platform. One model is developed by discrete event simulation, called DES model, while the other one is by regression analysis, called analytical model. Although these two models have been validated by the measured processing time on available test-sets, the report claims that in some extreme cases, predictions made by the analytical model are less accurate. The reports also declares that the predicted processing time of the proposed implementation is not that convincible by just adding the predicted processing time of each model. Thus, a new model using discrete event simulation is built here as an alternative to this analytical model, which helps to address the above two problems raised. Besides, parts of this new DES model, such as resources’ models, are reusable, and are shared with another existing DES model when they are combined together. Another advantage of a DES model is its ability to easily inspect the influence of different scheduling algorithms, which helps to find new algorithms that can increase the throughput of the printer. In summary, the DES model proposed in this project has more capabilities than the analytical model. II. INTRODUCTION In the printing industry, throughput is one of the major criteria to indicate the quality of a printer, and of its datapath, which is responsible for the image processing. Amdahl’s law [13] claims that the speedup of parallelizing software relates to the proportion of the parallel part of a software component, which further enhances the throughput. Hence, an increase in the number of cores is a trend in manufacturing to meet hard constraints on throughput. While manufacturers try to apply

Load more