Approche Logicielle Pour Améliorer La Fiabilité D'applications Parallèles
Total Page:16
File Type:pdf, Size:1020Kb
THÈSE Pour obtenir le grade de DOCTEUR DE LA COMMUNAUTÉ UNIVERSITÉ GRENOBLE ALPES Spécialité : NANO ÉLECTRONIQUE ET NANO TECHNOLOGIES Arrêté ministérial : 25 mai 2016 Présentée par Vanessa Carolina VARGAS VALLEJO Thèse dirigée par Raoul VELAZCO, Directeur de Recherche, CNRS et codirigée par Jean-François MÉHAUT, Professeur, Grenoble UJF préparée au sein Laboratoire Techniques de l’informatique et de la Mi- croélectronique pour l’Architecture des Systèmes Intégrées et de l’École Doctorale Électronique, Électrotechnique, Automatique, Traitement du Signal (EEATS) Approche logicielle pour améliorer la fiabilité d’applications parallèles implémentées sur des processeurs multi-cœur et many-cœur Thèse soutenue publiquement le 28 avril 2017, devant le jury composé de : Monsieur Fréderic PÉTROT Professeur, Président, Grenoble INP Monsieur Luc GIRAUD Directeur de recherche, Rapporteur, INRIA Délegation aquitaine Monsieur Olivier ROMAIN Professeur, Rapporteur, Université de Cergy-Pontoise Monsieur Nacer-Eddine ZERGAINOH Maître de conférences, Examinateur, Université Grenoble Alpes Acknowledgement First and all above, I would like to thank God, the almighty, for all the blessings He shed in my life, without Him and his support this work would not be possible. Special thanks to the Ecuadorian government through the Secretaría de Educación Superior, Ciencia, Tecnología e Innovación del Ecuador and Universidad de las Fuerzas Armadas ESPE for the confidence placed in me and my abilities to fulfill this challenge. I would like to express my deep thankfulness and appreciation to my academic advisor, Dr. Raoul Velazco for his helpful guidance, valuable availability and precious optimism throughout the development of this research. I want to express my deep gratitude to my academic co-advisor, Prof. Jean -François Méhaut for his constant involvement, constructive advices, and priceless readiness during this thesis. Special thanks to my thesis reviewers Prof. Luc Giraud et Prof. Olivier Romain, for their invaluable comments and relevant questions that help me to improve this document. I acknowledge all the staff of TIMA and LIG laboratories for their hospitality and cooperation, special thanks to Anne-Laure Fourneret, Fréderic Chevrot and Christian Seguy for their availability and service. I would like to greatly appreciate Maud Baylac, Francesca Villa and Solene Rey from the LPSC laboratory for their long-lasting cooperation in the execution of the radiation tests experiments. I want to express my gratitude to Stéphane Gailhard and Renaud Stevens from Kalray Company for their support in the applications’ development and the experiments on the MPPA-256 many-core processor. I would like to thank all my friends that I met during this stage of my life, specially Robert, Jessica, Alejandro, Audrey, Martin, Anaïs, Adrien, Cristina et Edison. Special deep thanks to my mother, my mother-in-law and my aunt Talita for all the efforts and sacrifices that they have made on behalf of my family and for taking care with love of my children during different periods along these research years. I want to thank all my family for their prays, invaluable support and love, specially to my parents, who have always trusted in me and encouraged me to fight for my dreams, and my siblings for their unconditional availability, confidence and help during these years. I want to thank my little children, for their great comprehension and tenderness, for their smiles, kisses and hugs that fortify my soul. Lastly, I want to thank my wonderful husband for his motivation, encouragement, and patience during the development of this thesis, for support me every time especially in the tough moments, love me and make me happy. Vanessa Vargas Dedication To the love of my life, Pablo, and my tresors, Matías and Emmanuel... To my beloved parents Guido and Lorgia... "The Lord is my strength and my shield; my heart trusts in him, and he helps me" Ps 28:7 Contents List of Tables vi List of Figures viii List of abbreviations ix 1 Introduction 1 1.1 Research issues . .3 1.2 Context of the thesis . .4 1.3 Thesis outline . .4 2 Background 7 2.1 Radiation Effects on Electronic circuits . .7 2.1.1 Radiation Environment . .7 2.1.2 Single Event Effects . .9 2.1.3 Consequences caused by transient effects . 10 2.1.4 Evaluation of the radiation effects on electronic circuits . 11 2.1.5 Mitigation of Radiation effects . 13 2.2 Multi-core and many-core processors . 14 2.2.1 Architectural concept . 14 2.2.2 Memory hierarchy models . 16 2.2.3 Software issues . 17 2.2.4 Sensitivity to neutron radiation . 20 2.3 Reliability of multi-core and many-core . 23 2.3.1 Conceptual basis . 23 2.3.2 Redundancy techniques . 26 i 2.3.3 Partitioning . 28 2.4 Discussion . 29 3 Experimental Methodology to evaluate the impact of SEEs Sensitivity 31 3.1 Generalities . 31 3.2 Evaluation strategies . 33 3.2.1 Fault-injection strategy . 34 3.2.2 Neutron-radiation facility . 35 3.3 Experimental Platforms . 36 3.3.1 P2041RDB Development Board . 37 3.3.2 MPPA Developer . 37 3.4 Benchmark applications . 38 3.4.1 Traveling Salesman Problem . 38 3.4.2 Matrix Multiplication . 40 3.4.3 Intrinsic characteristic issues . 41 3.5 Concluding Remarks . 41 4 Case-studies based on the multi-core processor 43 4.1 Description of the P2041 multi-core processor . 43 4.2 Case-study A: Multi-core processor running each core independently . 45 4.2.1 System configuration . 45 4.2.2 Experimental evaluation by neutron radiation . 46 4.3 Case-study B: Multi-core implementing TMR as a fault-tolerant technique 47 4.3.1 System configuration . 48 4.3.2 Fault-injector details . 48 4.3.3 Experimental evaluation by fault injection . 49 4.4 Case-study C: Multi-core sharing resources and maximizing inter-core com- munication . 52 4.4.1 System configuration . 53 4.4.2 Fault-injector details . 54 4.4.3 Experimental evaluation by fault injection . 55 4.4.4 Experimental evaluation by neutron radiation . 59 4.5 Discussion of the obtained results . 63 ii 4.6 Concluding Remarks . 67 5 Case-studies based on the many-core processor 69 5.1 Description of the MPPA-256 many-core processor . 69 5.1.1 Sensitive zones . 71 5.1.2 Programming MPPA issues . 72 5.2 Case-study D: Many-core execution with minimal use of Network-on-Chip (NoC) services . 72 5.2.1 System configuration . 73 5.2.2 Fault-injector details . 74 5.2.3 Experimental evaluation by fault injection . 75 5.2.4 Experimental evaluation by neutron radiation . 76 5.3 Case-study E: Many-core execution with intensive use of NoC services. 78 5.3.1 System configuration . 78 5.3.2 Benchmark details . 80 5.3.3 Fault-injector details . 83 5.3.4 Experimental evaluation by fault injection . 83 5.4 Discussion of the overall results . 87 5.5 Concluding remarks . 90 6 NMR-MPar: a fault-tolerant approach 93 6.1 Description . 93 6.2 Case-Study F: NMR-MPar implemented on the MPPA-256 processor . 95 6.2.1 System configuration . 95 6.2.2 Fault-injector details . 100 6.2.3 Experimental evaluation by fault injection . 100 6.3 Overall result comparison . 103 6.4 Concluding remarks . 106 7 Related work 107 7.1 SEE sensitivity of multi-core and many-core processors . 107 7.2 Reliability in multi-core and many-core processors . 108 7.2.1 Fault-tolerance by redundancy . 109 iii 7.2.2 Fault-tolerance by partitioning . 110 7.3 Contributions to the state-of-the-art . 111 8 Conclusions and perspectives 113 8.1 Contributions . 114 8.2 Future Works . 115 Annexe 117 A Implementation details of case-study F . 117 B Publications during this thesis . 121 Extended Abstract in French 123 Projects references 137 Bibliography 139 iv List of Tables 4.1 Sensitive areas of the P2041 targeted in this work . 44 4.2 Results of radiation experiments for the P2041 working in AMP scenario 46 4.3 Results of fault injection at register level for TMR MM AMP . 50 4.4 Results of fault injection in variables for TMR MM AMP . 51 4.5 Injection Error-Rates in Variables for TMR MM AMP . 52 4.6 Applications summary for TMR scenatio . 55 4.7 Fault-injection campaigns’ details for TMR scenario . 56 4.8 Results of fault injection in registers for SMP scenarios . 56 4.9 Results of fault injection in memory for SMP scenarios . 57 4.10 Error-rates for SMP scenarios in P2041 . 58 4.11 Test campaigns characteristics for TSP-SMP-P2041-NR scenario . 59 4.12 Results of radiation experiments for TSP-SMP-P2041-NR scenario . 60 4.13 Test campaigns characteristics for MM-SMP-P2041-NR scenario . 61 4.14 Results of neutron radiation tests for MM-SMP-P2041-NR . 62 5.1 Memory cells of the MPPA-256 many-core processor . 71 5.2 Results of the fault-injection campaigns for MM-AMP-MPPA . 75 5.3 Results of the dynamic radiation tests for MPPA scenarios . 77 5.4 Standard execution time for different configuration of TSP on the MPPA 80 5.5 Fault-injection campaigns details for the Multi Purpose Processing Array (MPPA) . 84 5.6 Results of the fault-injection campaigns on applications running on the MPPA..................................... 84 5.7 Error-rate by fault injection for POSIX scenarios in MPPA . 85 5.8 Summary of the different fault-injection scenarios evaluated on the MPPA 88 v 6.1 Execution-time comparison for different Traveling Salesman Problem (TSP) configurations on the MPPA . 99 6.2 NoC connections on applications running NMR-MPar on the MPPA . 99 6.3 Fault-injection campaigns details of the 4MR-8Par applications running on the MPPA .