Mapping of Large Task Network on Manycore Architecture Karl-Eduard Berger

Mapping of large task network on manycore architecture Karl-Eduard Berger To cite this version: Karl-Eduard Berger. Mapping of large task network on manycore architecture. Distributed, Par- allel, and Cluster Computing [cs.DC]. Université Paris Saclay (COmUE), 2015. English. NNT : 2015SACLV026. tel-01318824 HAL Id: tel-01318824 https://tel.archives-ouvertes.fr/tel-01318824 Submitted on 20 May 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. NNT : 2015SACLV026 THESE DE DOCTORAT DE L’UNIVERSITE PARIS-SACLAY PREPAREE A “ L’UNIVERSITE VERSAILLES SAINT-QUENTIN EN YVELINES ” ECOLE DOCTORALE N° 580 Sciences et technologies de l’information et de la communication Mathématiques et Informatique Par M. Karl-Eduard Berger Placement de graphes de tâches de grande taille sur architectures massivement multicoeur Thèse présentée et soutenue à CEA Saclay Nano-INNOV, le 8 décembre 2015 : Composition du Jury : M., El Baz, Didier Chargé de recherche CNRS, LAAS Rapporteur M, Galea, François Chercheur, CEA Encadrant CEA M, Le Cun, Bertrand Maitre de conférence PRISM Co-directeur de thèse Mme, Munier, Alix Professeur des universités, UPMC Examinatrice M, Nace, Dritan Professeur des universités HEUDIASYC Président M., Pellegrini, François Professeur des universités, LABRI Rapporteur M, Sirdey, Renaud Directeur de recherche, CEA Directeur de thèse Titre : Placement de graphes de tâches de grande taille sur architectures massivement multicoeurs Mots clés : Placement de tâches, recherche opérationnelle, architectures manycoeurs Résumé : Ce travail de thèse de doctorat est Cette comparaison sert de métrique d'évaluation dédié à l'étude d'un problème de placement de des placements des différentes méthodes tâches dans le domaine de la compilation proposées. Afin de résoudre à ce problème, trois d'applications pour des architectures algorithmes de placement ont été développés. massivement parallèles. Ce problème de Task-wise Placement et Subgraph-wise placement doit être résolu dans le respect de Placement s'appliquent dans des cas où les poids trois critères: les algorithmes doivent être des tâches et des arêtes sont unitaires. capables de traiter des applications de tailles Regret-based Approach est une heuristique de variables, ils doivent répondre aux contraintes construction progressive basée sur la théorie des de capacités des processeurs et prendre en jeux qui s’applique sur des graphes dans compte la topologie des architectures cibles. lesquels les poids des tâches et des arêtes sont Dans cette thèse, les tâches sont organisées en variables similairement aux valeurs qu'on peut réseaux de communication, modélisés sous retrouver dans des cas industriels. forme de graphes. Pour évaluer la qualité des Afin de vérifier la robustesse de l'algorithme, solutions produites par les algorithmes, les différents types de graphes de tâches de tailles placements obtenus sont comparés avec un variables ont été générés. placement aléatoire. Title : Mapping of large task networks on manycore architecture Keywords : Tasks mapping, Operational Research, manycore architecture Abstract: This Ph.D thesis is devoted to the In order to address this problem, three study of the mapping problem related to heuristics are proposed. They are able to solve massively parallel embedded architectures. a dataflow process network mapping problem, This problem has to be solved considering where a network of communicating tasks is three criteria: heuristics should be able to deal placed into a set of processors with limited with applications with various sizes, they must resource capacities, while minimizing the meet the constraints of capacities of processors overall communication bandwidth between and they have to take into account the target processors. Task-wise Placement and architecture topologies. In this thesis, tasks are Subgraph-wise Placement are applied on task organized in communication networks, graphs where weights of tasks and edges are modeled as graphs. In order to determine a way unitary set. Then, in a will to address problems of evaluating the efficiency of the developed that can be found in industrial cases, heuristics, mappings, obtained by the application cases are widen to tasks graphs heuristics, are compared to a random mapping. with tasks and edges weights values similar to This comparison is used as an evaluation those that can be found in the industry. A metric throughout this thesis. The existence of progressive construction heuristic named this metric is motivated by the fact that no Regret Based Approach, based on game theory, comparative heuristics can be found in the is proposed In order to check the strength of the literature at the time of writing of this thesis. algorithm; many types of task graphs with various sizes are generated. Université Paris-Saclay Espace Technologique / Immeuble Discovery Route de l’Orme aux Merisiers RD 128 / 91190 Saint-Aubin, France To Hélène, Stefan and Maximilien Berger, Ingeborg und Günter Berger, Mireille and Martial Labouise, Jeanne Clermon et Huguette Chaumereux, And my friend Kristina Guseva. 4 He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast. Leonardo de Vinci 5 6 Contents Acknowledgement 1 Publications 5 Résumé 7 Abstract 9 RésuméFran¸cais 11 Introduction 17 1 Context 25 1.1 Introduction.................................. 25 1.2 Moore’slaw .................................. 26 1.3 Embeddedsystems .............................. 28 1.3.1 Whatisanembeddedsystem?. 28 1.3.2 Massivelyparallelembeddedsystems . .. 29 1.4 Manycorearchitectures. 30 1.4.1 Parallelarchitectures . 30 1.4.2 Massively Parallel Processor Architectures (MPPA) . ..... 32 1.5 SDFandCSDFdataflowprogrammingmodels . 34 1.6 ΣCprogramminglanguageandcompilation . ... 36 1.6.1 ΣCprogramminglanguage. 36 1.6.2 An Example ΣC Application: Laplacian of an Image . 37 1.6.3 Thecompilationprocess . 39 1.7 FormalizationoftheDPNmappingproblem . ... 41 1.8 A metric to evaluate the quality of solutions . ...... 43 1.8.1 Approximation measure of Demange and Paschos . 43 1.8.2 Random-basedapproximationmetric . 44 1.9 GraphTheoryBackground. 45 1.9.1 SomeDefinitions ........................... 45 1.9.2 Breadth-First Traversal (BFT) algorithm . .. 47 1.9.3 Notionofaffinity ........................... 47 1.10Conclusion................................... 48 i ii CONTENTS 2 State of the Art of the Mapping Problem 49 2.1 Introduction.................................. 49 2.2 Theimportanceofmappingapplications . .... 50 2.3 Partitioningproblems. 51 2.3.1 Bipartitioningalgorithms. 52 2.3.2 Multilevelapproaches. 52 2.4 QuadraticAssignmentProblems(QAP) . 53 2.5 Mappingproblems .............................. 54 2.5.1 Two-phasesmappingheuristics . 54 2.5.2 One-phasemappingheuristics . 59 2.6 Solvers..................................... 62 2.6.1 Metis,Parmetis,hMetis,kMetis . 62 2.6.2 ScotchandPT-Scotch . 62 2.6.3 Jostle: Parallel Multi-Level Graph Partitioning Software ..... 63 2.6.4 Othersolvers ............................. 64 2.7 Discussion................................... 65 2.7.1 Overview ............................... 65 2.7.2 Ourwork ............................... 66 2.8 Conclusion................................... 66 3 Two Scalable Mapping Methods 69 3.1 Introduction.................................. 69 3.2 Subgraph-WisePlacement . 70 3.2.1 CreationofaSubgraphofTasks. 70 3.2.2 Subgraphtonodeaffinity . 72 3.2.3 Complexityofthealgorithm . 72 3.2.4 Conclusion............................... 73 3.3 Task-WisePlacement............................. 73 3.3.1 Distanceaffinity............................ 75 3.3.2 Themappingprocedure . 76 3.3.3 ComplexityoftheAlgorithm. 78 3.3.4 Conclusion............................... 78 3.4 Results..................................... 78 3.4.1 ExecutionPlatform. 78 3.4.2 Instances ............................... 79 3.4.3 Computationalresults . 79 3.5 Conclusion................................... 84 4 Regret-Based Mapping Method 85 4.1 Introduction.................................. 85 4.2 Gametheorybackground........................... 86 4.2.1 Overviewofgametheory. 86 4.2.2 Behavioralgametheory . 87 4.2.3 Decisiontheory ............................ 89 CONTENTS iii 4.3 Regrettheory ................................. 91 4.3.1 External regret, internal regret and swap regret . ...... 92 4.3.2 Formaldefinition ........................... 93 4.3.3 Useofregrettheoryintheliterature . 94 4.4 RegretBasedHeuristic ............................ 96 4.4.1 Task-Wise Placement behavior on non-unitary task graphs.... 96 4.4.2 Introductionofthetaskcostnotion . .. 97 4.4.3 A new task selection model based on regret theory . ... 97 4.4.4 Descriptionofthealgorithm . 98 4.4.5 Complexityofthealgorithm . 98 4.5 GRASPPrinciplesforRBA . 101 4.5.1 DefinitionoftheGRASPprocedure . 101 4.5.2 GRASPandRBA........................... 102 4.6 Applicationoftheheuristic . 102 4.6.1 Executionplatform . 102 4.6.2 Instances ............................... 102 4.6.3 Thenodelayout...........................

Mapping of Large Task Network on Manycore Architecture Karl-Eduard Berger

Increasing Memory Miss Tolerance for SIMD Cores

Tousimojarad, Ashkan (2016) GPRM: a High Performance Programming Framework for Manycore Processors. Phd Thesis

Multi-Core Processors and Systems: State-Of-The-Art and Study of Performance Increase

Understanding and Guiding the Computing Resource Management in a Runtime Stacking Context

Consolidating High-Integrity, High-Performance, and Cyber-Security Functions on a Manycore Processor

Parallel Processing with the MPPA Manycore Processor

A Scalable Manycore Simulator for the Epiphany Architecture

Power and Energy Characterization of an Open Source 25-Core Manycore Processor

MALT Is a Truly Manycore Processor  [email protected] ✆ +7 (495) 133 6248

Enhancing Programmability in Noc-Based Lightweight Manycore Processors with a Portable MPI Library

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI James A

Manycore Platforms