Acceleration of Memory Accesses in Dynamic Binary Translation Antoine Faravelon

Acceleration of memory accesses in dynamic binary translation Antoine Faravelon To cite this version: Antoine Faravelon. Acceleration of memory accesses in dynamic binary translation. Operating Sys- tems [cs.OS]. Université Grenoble Alpes, 2018. English. NNT : 2018GREAM050. tel-02004524 HAL Id: tel-02004524 https://tel.archives-ouvertes.fr/tel-02004524 Submitted on 1 Feb 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THÈSE Pour obtenir le grade de DOCTEUR DE LA COMMUNAUTÉ UNIVERSITÉ GRENOBLE ALPES Spécialité : Informatique Arrêté ministériel : 25 mai 2016 Présentée par Antoine FARAVELON Thèse dirigée par Frédéric PETROT (MSTII), Professeur Grenoble-INP, Grenoble INP et codirigée par Olivier GRUBER préparée au sein du Laboratoire Techniques de l'Informatique et de la Microélectronique pour l'Architecture des systèmes intégrés dans l'École Doctorale Mathématiques, Sciences et technologies de l'information, Informatique Accélération des accès mémoire dans la traduction binaire dynamique Acceleration of memory accesses in dynamic binary translation Thèse soutenue publiquement le 22 octobre 2018, devant le jury composé de : Monsieur FREDERIC PETROT PROFESSEUR, GRENOBLE INP, Directeur de thèse Monsieur ERVEN ROHOU DIRECTEUR DE RECHERCHE, INRIA CENTRE RENNES-BRETAGNE ATLANTIQUE, Rapporteur Monsieur GAËL THOMAS PROFESSEUR, TELECOM SUDPARIS, Rapporteur Madame FLORENCE MARANINCHI PROFESSEUR, GRENOBLE INP, Président Monsieur ALAIN TCHANA MAITRE DE CONFERENCES, INP TOULOUSE - ENSEEIHT, Examinateur Monsieur OLIVIER GRUBER PROFESSEUR, UNIVERSITE GRENOBLE ALPES, Co-directeur de thèse Remerciements E tiens tout d’abord à remercier mon directeur de thèse Frédéric Pétrot. Malgré son emploi J du temps très chargé du à ses responsabilités, il a toujours été disponible pour discuter et m’accompagner tout du long de cette aventure. Et ce bien que je n’ai pas été le doctorant le plus simple à encadrer (on se contentera de citer ma tendance au travail nocturne). Je remercie aussi Olivier Gruber d’avoir co-encadré et largement participé à l’élaboration de cette thèse. Mais aussi d’avoir su me garder motivé avec sa version bien à lui de l’optimisme. Je remercie bien sûr Nicolas Fournel qui, non content de m’avoir donné goût pour les architectures matérielles, est également celui qui m’a permis de rejoindre le laboratoire TIMA pour mon magistère, ouvrant ainsi la voie à cette thèse. Je remercie ensuite mon Jury de thèse. Tout d’abord Florence Maraninchi pour avoir accepté de présidé ce jury. Erven Rohou et Gaël Thomas pour leur retours très intéressants et très positifs. Et aussi Alain Tchana pour avoir examiné cette thèse, et pour ses remarques et perspectives très intéressantes. De manière plus étendue, je remercie tout ceux qui m’ont accompagné durant ces trois ans, ceux de SLS et de GreenSocs: Luc et Clément qui ont subis mes assauts de questions, Adrien P., Thomas et Olivier M. pour toutes les discussions que nous avons pu avoir, techniques ou non et avec lesquelles ont peut toujours refaire le monde, Adrien B. avec qui j’ai fais mes armes sur Qemu. À Martial et Arief qui ont vécu avec moi les derniers mois de thèses, préparation à l’oral et de rédaction. Et bien sûr un grand merci à Enzo qui a été un excellent public pour les répétitions de ma présentation. Finalement un grand merci à Laurence, avec laquelle l’enseignement à l’UGA et la fin de ma thèse ont été largement plus amusants ! Et je n’oublie pas l’essentiel, en remerciant ma femme, Danmei, que j’ai rencontré aussi dans l’équipe SLS. Merci de m’avoir donné l’envie de continuer et de m’avoir soutenu à tout moments. Sans elle, cette thèse n’aurai sans doute pas connu la même réussite. iii Contents 1 Introduction1 2 Problem statement3 2.1 Instruction Set Simulation Techniques.......................3 2.2 Full system simulation with DBT..........................6 2.2.1 Dynamic Binary Translation principle...................6 2.2.2 Simulating complex instructions......................8 2.2.3 Emulating memory accesses........................9 2.3 Conclusion....................................... 13 3 State of the art 15 3.1 Dynamic Binary Translation............................. 15 3.2 Accelerating Full System Dynamic Binary Translation.............. 17 3.2.1 Generated code optimizations techniques................. 18 3.2.2 Multi Threading support.......................... 20 3.3 Accelerating Memory Accesses Simulation.................... 22 3.3.1 TLB based solutions............................. 27 3.4 Conclusion and Remaining questions....................... 28 4 Accelerating DBT using hardware assisted virtualisation for cross-ISA simulation 31 4.1 Introduction...................................... 31 4.2 Background...................................... 32 4.2.1 Hardware Assisted Virtualization..................... 32 4.2.2 Dune framework............................... 34 4.3 Design......................................... 36 4.4 Implementation.................................... 38 4.5 Conclusion....................................... 42 5 Cross-ISA simulation using a Linux kernel module and host page tables 45 5.1 Introduction...................................... 45 5.2 Design of our Solution................................ 46 5.3 QEMU/Linux Implementation............................ 49 5.3.1 Kernel module................................ 50 5.3.2 Dynamic binary translation......................... 53 5.4 Handling Internal Faults at Kernel Level...................... 56 5.5 Conclusion....................................... 58 v CONTENTS 6 Experimentations 61 6.1 Introduction...................................... 61 6.1.1 Benchmark set................................ 61 6.2 Breakdown of QEMU execution time........................ 62 6.2.1 Ratio of time spent in subparts of QEMU ................. 62 6.2.2 Instruction breakdown: percentage of memory accesses in the benchmark 64 6.2.3 Efficiency of existing memory accesses simulation acceleration.... 65 6.3 Dune based solution................................. 66 6.3.1 Protocol.................................... 66 6.3.2 Performance Analysis............................ 67 6.3.3 Communication costs............................ 69 6.4 Linux Kernel Module solution............................ 70 6.4.1 Performance Overview........................... 71 6.4.2 Hybrid solution analysis........................... 72 6.4.3 Address space mapping analysis...................... 73 6.4.4 Performances with in kernel fault handling................ 76 6.5 Conclusion and comparison of both methods................... 78 7 Conclusion 81 7.1 Prospective....................................... 83 7.1.1 N privilege levels support.......................... 83 7.1.2 QEMU Multi threading support....................... 85 7.1.3 64-bit targets support............................. 85 7.1.4 Handling write accesses with in kernel internal faults handling.... 87 vi Chapter 1 Introduction omputing systems are now ubiquitous in our daily life, be it under the form of a C computer, or that of a device that performs computations unbeknownst to its user. These systems have taken a major place in modern society thanks to silicon integration that makes them smaller and smaller, and yet more and more feature full, leading to the advent of System on Chip (SoC), now and for some time already, the dominant way of selling silicon. The integration trend makes the design and programming of these chips more and more complicated. It is indeed nowadays fairly common for the ones that are relatively “general purpose” to feature 8 or more cores, gigabytes of RAM and advanced graphics units. To make matters worst, chip makers are producing dozens of different chips each year. Due to this, neither designers, which need to test their designs, nor developers which need their applications to be available as soon as possible for the new chips, can wait for the final tape out. A partial solution to these problems is the use of simulation. Instead of waiting for the chip to be available, a software model of it is conceived, and executed on top of a general purpose computer. Amongst the models, one is of particular interest to us, the model of the processor that executes the code of the operating system and applications. To model instruction set architectures, multiple techniques have been developed through- out time. Historically, the first and simplest one has been instruction per instruction interpre- tation. However, it has been shown to be very slow. To avoid this problem, more sophisticated techniques were developed. Amongst them we can find dynamic binary translation, static binary translation and native (or compiled) simulation. Although the fastest, static binary translation, and native simulation even more so, can not directly run any unmodified target code we might want. On the other hand, dynamic binary translation is fairly balanced. It is relatively fast (from four to twenty times slower than native execution) and precise enough to run unmodified target software. It is at the base of many industrial

Load more