Génération Dynamique De Code Pour L'optimisation Énergétique
Total Page:16
File Type:pdf, Size:1020Kb
THÈSE Pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité : Informatique Arrêté ministérial : 7 août 2006 Présentée par Fernando Akira ENDO Thèse dirigée par Henri-Pierre CHARLES préparée au sein du Laboratoire Infrastructures Atelier Logiciel pour Puce, CEA Grenoble et de l’École Doctorale Mathématiques, Sciences et Technologies de l’Information, Informatique Génération dynamique de code pour l’optimisation énergétique Thèse soutenue publiquement le 18 Septembre 2015, devant le jury composé de : M. Frédéric PÉTROT Professeur au Grenoble Institute of Technology, Président M. Florent DE DINECHIN Professeur à l’INSA de Lyon, Rapporteur M. Paul KELLY Professeur à l’Imperial College London, Rapporteur Mme Karine HEYDEMANN Maître de conférences à l’Université Pierre et Marie Curie, Examinatrice M. Henri-Pierre CHARLES Directeur de recherche au CEA LIST, Directeur de thèse M. Damien COUROUSSÉ Ingénieur chercheur au CEA LIST, Grenoble, Co-Encadrant de thèse Dedico esta tese a minha família, que sempre me apoiou. iv Agradecimentos Gostaria de agradecer a todo pessoal do laboratório, que me acolheu durante esses três anos de doutorado, e também ao CEA pelo financiamento desta tese, assim como aos integrantes da banca exam- inadora, em especial aos examinadores da dissertação, Dr. Florent de Dinechin e Dr. Paul Kelly, por suas sugestões e correções. Gostaria de agradecer especialmente aos meus amigos Fayçal Benaziz e Thibault Cattelani, que me ajudaram a revisar e corrigir o resumo em francês da tese, e também Alexandre Aminot, Ivan Llopard, Laurentiu Trifan, Thierno Barry, Tiana Rakotovao e Victor Lomüller, que revisaram e corrigiram meus artigos, que por sua vez foram integrados à tese. Gostaria também de agradecer à UNICAMP, ao programa BRAFITEC e ao INSA de Lyon, sem os quais não teria tido a oportunidade de realizar um intercâmbio na França e obter um diploma francês, que facilitou minha candidatura ao doutorado. Finalmente, gostaria de agradecer à Phi Innovations, pois grande parte do conhecimento que aprendi nessa empresa, incluindo a BeagleBoard-xM que recebi de presente, foram úteis ao desenvolvimento técnico e científico de meus trabalhos. vi Contents I Thesis 1 1 Introduction 3 1.1 Thesis contribution . 6 1.1.1 Run-time code generation and auto-tuning for embedded systems . 7 1.1.2 Micro-architectural simulation of ARM cores . 7 1.2 Thesis organization . 8 2 State of the art 9 2.1 Sources of energy consumption in ICs . 9 2.1.1 Static or leakage power . 9 2.1.2 Dynamic power . 10 2.2 Energy reduction techniques integrated into compilers . 10 2.2.1 Energy reduction in software . 10 2.2.2 Compiler techniques . 13 2.3 The ARM architecture . 15 2.4 Embedded processor simulation . 15 2.4.1 Abstraction levels . 15 2.4.2 Micro-architectural performance simulation . 16 2.4.3 Micro-architectural energy simulation . 18 2.5 Run-time code optimizations . 23 2.5.1 Run-time code specialization . 23 2.5.2 Dynamic binary optimizations . 24 2.5.3 Run-time recompilation . 25 2.5.4 Online auto-tuning . 25 2.6 Conclusion . 26 3 Micro-architectural simulation of ARM processors 29 3.1 gem5 . 30 3.1.1 The arm_detailed configuration . 31 3.1.2 Modeling improvements . 33 3.1.3 In-order model based on the O3 CPU model . 35 3.2 McPAT . 37 3.2.1 Overview . 37 3.2.2 Better modeling core heterogeneity . 40 3.3 Parameters and statistics conversion from gem5 to McPAT . 40 viii CONTENTS 3.4 Performance validation . 42 3.4.1 Reference models . 42 3.4.2 Simulation models . 42 3.4.3 Benchmarks . 46 3.4.4 Accuracy evaluation of the Cortex-A models . 47 3.4.5 In-order model behavior and improvement for a Cortex-A8 . 49 3.5 Area and relative energy/performance validation . 51 3.5.1 Reference models . 51 3.5.2 Simulation models . 52 3.5.3 Benchmarks . 54 3.5.4 Area validation . 54 3.5.5 Relative energy/performance validation . 56 3.6 Example of architectural/micro-architectural exploration . 57 3.7 Scope and limitations . 58 3.7.1 Scope . 58 3.7.2 Limitations . 60 3.8 Conclusion . 61 4 Run-time code generation 63 4.1 deGoal: a tool to embed dynamic code generators into applications . 63 4.1.1 Utilization workflow . 64 4.1.2 Example of kernel implementation: C with and without SIMD intrinsics and deGoal versions . 64 4.1.3 The Begin and End commands . 66 4.1.4 Register allocation . 67 4.1.5 Code generation decisions: deGoal mixed to C code . 68 4.1.6 Branches and loops . 69 4.2 Thesis contribution: New features and porting to ARM processors . 69 4.2.1 Overview of contributions . 70 4.2.2 SISD and SIMD code generation . 71 4.2.3 Configurable instruction scheduler . 72 4.2.4 Static and dynamic configuration . 73 4.2.5 Further improvements and discussion . 74 4.3 Performance analysis . 75 4.3.1 Evaluation boards . 75 4.3.2 Benchmarks and deGoal kernels . 75 4.3.3 Raw performance evaluation . 75 4.3.4 Transparent vectorization: SISD vs SIMD code generation . 80 4.3.5 Dynamic code specialization . 80 4.3.6 Run-time auto-tuning possibilities with deGoal . 82 4.4 Scope and limitations . 84 4.4.1 Scope . 84 4.4.2 Limitations . 84 4.5 Conclusion . 85 CONTENTS ix 5 Online auto-tuning for embedded systems 87 5.1 Motivational example . 89 5.2 Methodology . 91 5.2.1 Auto-tuning with deGoal . 91 5.2.2 Regeneration decision and space exploration . 94 5.2.3 Kernel evaluation and replacement . 95 5.3 Experimental setup . 96 5.3.1 Hardware platforms . 96 5.3.2 Simulation platform . 96 5.3.3 Benchmarks . 96 5.3.4 Evaluation methodology . 99 5.4 Experimental results . 99 5.4.1 Real platforms . 99 5.4.2 Simulated cores . 102 5.4.3 Analysis with varying workload . 105 5.4.4 Analysis of correlation between auto-tuning parameters and pipeline designs . 105 5.5 Scope, limitations and future work . 108 5.5.1 Scope . 108 5.5.2 Limitations . 108 5.5.3 Future work . 109 5.6 Conclusion . 110 6 Conclusion and prospects 111 6.1 Achievements . 111 6.1.1 Embedded core simulation with gem5 and McPAT . 111 6.1.2 Run-time code generation and auto-tuning for embedded systems . ..