Reconciling Performance and Predictability on a Noc-Based Mpsoc Using Off-Line Scheduling Techniques Manel Fakhfakh

Reconciling performance and predictability on a noc-based mpsoc using off-line scheduling techniques Manel Fakhfakh To cite this version: Manel Fakhfakh. Reconciling performance and predictability on a noc-based mpsoc using off-line scheduling techniques. Data Structures and Algorithms [cs.DS]. Université Pierre et Marie Curie - Paris VI, 2014. English. NNT : 2014PA066145. tel-01126944 HAL Id: tel-01126944 https://tel.archives-ouvertes.fr/tel-01126944 Submitted on 6 Mar 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE DE DOCTORAT DE L’UNIVERSITÉ PIERRE ET MARIE CURIE Spécialité Informatique (École Doctorale Informatique, Télécommunication et Électronique) Présentée par MANEL DJEMAL Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ PIERRE ET MARIE CURIE RÉCONCILIER PERFORMANCE ET PRÉDICTIBILITÉ SUR UN MANY-COEUR EN UTILISANT DES TECHNIQUES D’ORDONNANCEMENT HORS-LIGNE Soutenue le 27 juin 2014, devant le jury composé de Mme. FLORENCE MARANINCHI Verimag, Grenoble Rapporteur M. RENAUD SIRDEY CEA, Saclay Rapporteur M. BERTRAND GRANADO UPMC, LIP6, Paris Examinateur M. FRANÇOIS IRIGOIN CRI - MINES ParisTech Examinateur M. LOUIS MANDEL Collège de France, Paris Examinateur M. FRANÇOIS PÊCHEUX UPMC, LIP6, Paris Examinateur Mme. ALIX MUNIER-KORDON UPMC, LIP6, Paris Directeur de thèse M. DUMITRU POTOP-BUTUCARU INRIA, Rocquencourt Encadrant de thèse PH.D. THESIS OF THE UNIVERSITY PIERRE AND MARIE CURIE Department : COMPUTER SCIENCE AND MICRO-ELECTRONICS Presented by: MANEL DJEMAL Thesis submitted to obtain the degree of DOCTOR OF THE UNIVERSITY PIERRE AND MARIE CURIE RECONCILING PERFORMANCE AND PREDICTABILITY ON A NOC-BASED MPSOC USING OFF-LINE SCHEDULING TECHNIQUES Defence on 27 June 2014, Committee: Mme. FLORENCE MARANINCHI Verimag, Grenoble Reviewer M. RENAUD SIRDEY CEA, Saclay Reviewer M. BERTRAND GRANADO UPMC, LIP6, Paris Examiner M. FRANÇOIS IRIGOIN CRI - MINES ParisTech Examiner M. LOUIS MANDEL Collège de France, Paris Examiner M. FRANÇOIS PÊCHEUX UPMC, LIP6, Paris Examiner Mme. ALIX MUNIER-KORDON UPMC, LIP6, Paris Advisor M. DUMITRU POTOP-BUTUCARU INRIA, Rocquencourt Co-Advisor Remerciements Je tiens à remercier en premier lieu Alix Munier, ma directrice de thèse, qui a toujours été à mon écoute et a veillé sur le bon déroulement de ma thèse. Je remercie également Dumitru Potop-Butucaru, mon encadrant de thèse, pour m’avoir proposé un sujet passionnant, pour avoir eu la patience de m’encadrer pendant tout ce temps et pour les discussions fructueuses et les conseils avisés qil a su me prodiguer du- rant tout ce travail. Ensuite, toutes mes remerciments à Florence Maraninchi et Renaud Sirdey d’avoir accepté la lourde tâche d’être rapporteurs de cette thèse, leurs commentaires ont augmenté mon recul par rapport aux domaines traités. Je remercie de même Bertrand Granado, François Irigoin, Louis Mandel et François Pêcheux de m’avoir fait l’honneur de faire partie de mon jury. Mes remerciments s’adressent plus particulièrement à François Pêcheux, Frank Wajs- burt et Zhen Zhang avec qui j’ai eu le plaisir de travailler sur certains aspects de ma thèse. Le temps n’aurait pas passé si vite s’il n’y avait pas eu Meriem Zidouni, Thomas Carle, Cécile Stentzel sans oublier leur soutien pendant la rédaction du manuscrit. Je voudrais remercier aussi ma famille et mes amis qui m’ont soutenu, pour tout ce que je leur ai fait subir pendant tout ce temps, chacun à sa façon m’a aidé à traverser ces trois ans de ma vie. Enfin, je remercie mon mari qui m’a soutenu, encouragé et supporté. Sans lui, cette thèse n’aurait été qu’une thèse et certainement la vie n’aurait pas été si belle. Sans oublier biensur mon ange Nour. Résumé Les réseaux-sur-puces (NoCs) utilisés dans les architectures multiprocesseurs-sur-puces posent des défis importants aux approches d’ordonnancement temps réel en ligne (dy- namique) et hors-ligne (statique). Un NoC contient un grand nombre de points de contention potentiels, a une capacité de bufferisation limitée et le contrôle réseau fonctionne à l’échelle de petits paquets de données. Par conséquent, l’allocation efficace de ressources nécessite l’utilisation des algorithmes da faible complexité sur des modèles de matériel avec un niveau de détail sans précédent dans l’ordonnancement temps réel. Nous con- sidérons dans cette thèse une approche d’ordonnancement statique sur des architectures massivement parallèles (Massively parallel processor arrays ou MPPAs) caractérisées par un grand nombre (quelques centaines) de coeurs de calculs. Nous identifions les mécan- ismes materiels facilitant l’analyse temporelle et l’allocation efficace de ressources dans les MPPAs existants. Nous déterminons que le NoC devrait permettre l’ordonnancement hors-ligne de communications, d’une manière synchronisée avec l’ordonnancement de calculs sur les processeurs. Au niveau logiciel, nous proposons une nouvelle méthode d’allocation et d’ordonnancement capable de synthétiser des ordonnancements globaux de calculs et de communications couvrants toutes les ressources d’exécution, de communication et de la mémoire d’un MPPA. Afin de permettre une utilisation efficace de ressources du matériel, notre méthode prend en compte les spécificités architecturales d’un MPPA et implémente des techniques d’ordonnancement avancées comme la préemp- tion pré-calculée de transmissions de données. Nous avons évalué notre technique de mapping par l’implantation de deux applications de traitement du signal. Nous obtenons dans les deux cas de bonnes performances du point de vue de la latence, du débit et de l’utilisation des ressources. Mots clés: Multiprocesseurs-sur-puce (many-coeur), réseau-sur-puce (NoC), ordonnancement hors-ligne, ordonnancement temps réel Titre en anglais: Reconciling performance and predictability on a NoC-based MPSoC using off-line scheduling technique. Abstract On-chip networks (NoCs) used in multiprocessor systems-on-chips (MPSoCs) pose sig- nificant challenges to both on-line (dynamic) and off-line (static) real-time scheduling approaches. They have large numbers of potential contention points, have limited inter- nal buffering capabilities, and network control operates at the scale of small data packets. Therefore, efficient resource allocation requires scalable algorithms working on hardware models with a level of detail that is unprecedented in real-time scheduling. We consider in this thesis a static scheduling approach, and we target massively parallel processor arrays (MPPAs), which are MPSoCs with large numbers (hundreds) of processing cores. We first identify and compare the hardware mechanisms supporting precise timing analysis and efficient resource allocation in existing MPPA platforms. We determine that the NoC should ideally provide the means of enforcing a global communications schedule that is computed off-line (before execution) and which is synchronized with the scheduling of computations on processors. On the software side, we propose a novel allocation and scheduling method capable of synthesizing such global computation and communication schedules covering all the execution, communication, and memory resources in an MPPA. To allow an efficient use of the hardware resources, our method takes into account the specificities of MPPA hardware and implements advanced scheduling techniques such as pre-computed preemption of data transmissions. We evaluate our technique by mapping two signal processing applications, for which we obtain good latency, throughput, and resource use figures. Keywords: Chip-multiprocessor (Many-core), On-chip network (NoC), Off-line scheduling, Real-time scheduling English Title: Reconciling performance and predictability on a NoC-based MPSoC using off-line scheduling technique. Table of Contents Remerciements 2 Résumé 3 Abstract 4 1 Introduction 8 1.1 Thesis motivation . 8 1.1.1 The advent of many-cores . 8 1.1.2 The advent of Networks-on-Chips . 12 1.1.3 Many-cores for hard real-time applications . 13 1.1.4 Mapping applications onto NoC-based many-cores . 15 1.2 Thesis contributions . 16 1.2.1 The DSPINpro programmable Network-on-Chip . 17 1.2.2 The Automatic real-time mapping and code generation . 17 1.2.3 An environment for virtual prototyping of MPPA applications . 19 1.3 Outline . 21 2 State of the art 22 2.1 Network-on-Chip design . 23 2.1.1 NoC building blocks . 23 2.1.2 NoC topology . 24 2.1.3 NoC switching . 24 2.1.3.1 Routing . 25 2.1.3.2 Switching method and buffering policy . 26 2.1.3.3 Arbitration/Scheduling . 29 2.1.4 Existing Network-on-Chip architectures . 33 2.1.4.1 DSPIN . 33 2.1.4.2 Æthereal . 33 2.1.4.3 Nostrum . 35 2.1.4.4 Kalray MPPA NoC . 36 2.1.4.5 The scalar interconnect of MIT RAW . 38 2.1.4.6 Other NoC architectures . 39 2.1.4.7 Comparison with our work . 40 2.2 Massively parallel processor arrays . 40 2.2.1 Tilera TILEPro64 . 40 2.2.2 Kalray MPPA-256 . 42 5 2.2.3 Adapteva Epiphany . 43 2.2.4 Intel SCC . 45 2.2.5 ST Microelectronics STHORM . 46 2.2.6 TSAR . 46 2.2.7 Academic MPSoC architectures with TDM-based NoC arbitration 47 2.3 Static application mapping . 47 2.3.1 Off-line real-time multi-processor scheduling . 49 2.3.1.1 The AAA/SynDEx methodology . 49 2.3.2 The StreamIt compiler for the MIT RAW architecture . 52 2.3.3 Compilation of the SC language for the Kalray MPPA256 plat- form . 54 2.3.4 Other mapping approaches .

Reconciling Performance and Predictability on a Noc-Based Mpsoc Using Off-Line Scheduling Techniques Manel Fakhfakh

A 1024-Core 70GFLOPS/W Floating Point Manycore Microprocessor

Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018

Survey and Benchmarking of Machine Learning Accelerators

SPORK: a Summarization Pipeline for Online Repositories of Knowledge

Master's Thesis: Adaptive Core Assignment for Adapteva Epiphany

Program & Exhibits Guide

Proyecto Fin De Grado

Experiences Using Tegra K1 and X1 for Highly Energy Efficient Computing

AI-Optimized Chipsets

November 2–5, 2014 Asilomar Hotel and Conference Grounds

Electronics Tinkering Books

Analysis of Task Scheduling for Multi-Core Embedded Systems