Université De Montréal Low-Impact Operating

UNIVERSITE´ DE MONTREAL´ LOW-IMPACT OPERATING SYSTEM TRACING MATHIEU DESNOYERS DEPARTEMENT´ DE GENIE´ INFORMATIQUE ET GENIE´ LOGICIEL ECOLE´ POLYTECHNIQUE DE MONTREAL´ THESE` PRESENT´ EE´ EN VUE DE L’OBTENTION DU DIPLOMEˆ DE PHILOSOPHIÆ DOCTOR (Ph.D.) (GENIE´ INFORMATIQUE) DECEMBRE´ 2009 c Mathieu Desnoyers, 2009. UNIVERSITE´ DE MONTREAL´ ECOL´ E POLYTECHNIQUE DE MONTREAL´ Cette thèse intitulée : LOW-IMPACT OPERATING SYSTEM TRACING présentée par : DESNOYERS Mathieu en vue de l’obtention du diplôme de : Philosophiæ Doctor a étédûment acceptée par le jury constituéde : Mme. BOUCHENEB Hanifa, Doctorat, présidente M. DAGENAIS Michel, Ph.D., membre et directeur de recherche M. BOYER Fran¸cois-Raymond, Ph.D., membre M. STUMM Michael, Ph.D., membre iii I dedicate this thesis to my family, to my friends, who help me keeping balance between the joy of sharing my work, my quest for knowledge and life. Je dédie cette thèse àma famille, àmes amis, qui m’aident àconserver l’équilibre entre la joie de partager mon travail, ma quête de connaissance et la vie. iv Acknowledgements I would like to thank Michel Dagenais, my advisor, for believing in my poten- tial and letting me explore the field of operating systems since the beginning of my undergraduate studies. I would also like to thank my mentors, Robert Wisniewski from IBM Research and Martin Bligh, from Google, who have been guiding me through the internships I have done in the industry. I keep a good memory of these experiences and am honored to have worked with them. A special thanks to Paul E. McKenney, who I found has an insatiable curiosity and never-ending will to teach, discuss and learn. I really enjoyed working with him on the user-space RCU implementations. Thanks to my family, Normand, Hélène and Alexandre for their support through the duration of this research. A special thanks to my friend Etienne Bergeron, who helped me making this thesis better with his thorough reviews and relevant criticism. Thanks to everyone who reviewed this thesis: Nicolas Gorse and Matthew Khouzam. Thanks to the students with whom I shared the laboratory for the discussions, always useful to explore the many faces of a problem. Thanks to the Montréal Art Caféstaff for feeding the writing of this thesis with their joviality and excellent CaféMoka. Thanks to Google, IBM Research, Ericsson, Autodesk, Natural Sciences and Engi- neering Research Council of Canada and Defence Research and Development Canada for funding this research. Thanks to the countless industry collaborators at Fujitsu, Wind River, Nokia, Siemens, Novell and Monta Vista. Thanks to the SystemTAP team for their collab- oration, especially to Frank Ch. Eigler. Finally, I would like to thank the Linux kernel developers community for their constructive criticism and their input. Special thanks to Ingo Molnar, Steven Rostedt, Andrew Morton, Linus Torvalds, Christoph Hellwig, Christoph Lameter, H. Peter Anvin, Rusty Russell, Andi Kleen, Tim Bird, Kosaki Motohiro, Lai Jiangshan, Peter Zijlstra and James Bottomley. I would not have developed such a thorough knowledge of operating systems without all these heated discussions on the Linux Kernel Mailing list. v Abstract Computer systems, both at the hardware and software-levels, are becoming in- creasingly complex. In the case of Linux, used in a large range of applications, from small embedded devices to high-end servers, the size of the operating system kernels increases, libraries are added, and major software redesign is required to benefit from multi-core architectures, which are found everywhere. As a result, the software development industry and individual developers are facing problems which resolution requires to understand the interaction between applications and all components of an operating system. In this thesis, we propose the LTTng (Linux Trace Toolkit next generation) tracer as an answer to the industry and open source community tracing needs. The low- intrusiveness of the tracer is a key aspect to its usefulness, because we need to be able to reproduce, under tracing, problems occurring in normal conditions. In some cases, users leave tracers active at all times in production, which makes the tracer overhead definitely critical. Our approach involves the design of synchronization primitives that meet the low-impact requirements. The linearly scalable and wait-free RCU (Read-Copy Update) synchronization mechanism used by the LTTng tracer fulfills these requirements with respect to data read. A custom-made buffer synchronization scheme is proposed to extract tracing data while preserving linear scalability and wait-free characteristics. By measuring the LTTng impact, we demonstrate that it is possible to create a tracer that satisfy all the following characteristics: low latency, deterministic real- time impact (wait-free), small impact on operating system throughput and linear scalability with the number of cores. Experiments on various architectures show that this tracer is portable. We propose a general model for superscalar multi-core systems with weakly- ordered memory accesses to perform formal verification of the RCU correctness and wait-free guarantees by model-checking. The LTTng buffering scheme is also for- mally verified for safety and progress. Formal verification demonstrates that these algorithms allow reentrancy from multiple execution contexts, ranging from standard thread to non-maskable interrupts handlers, allowing a wide instrumentation coverage vi of the operating system. vii Résumé Les systèmes informatiques, tant au niveau matériel que logiciel, deviennent de plus en plus complexes. En ce qui concerne Linux, système d’exploitation utilisé dans une vaste catégorie d’applications, des petits systèmes embarqués aux serveurs de haut-niveau, la taille des noyaux de systèmes d’exploitations augmente, des li- brairies sont ajoutées et une réingénierie logicielle est requise pour bénéficier des architectures multi-cœurs, lesquelles sont omniprésentes. Par conséquent, l’industrie du développement logiciel et les développeurs individuels font face àdes problèmes dont la résolution nécessite de comprendre l’interaction entre les applications et tous les composants d’un système d’exploitation. Dans cette thèse, nous proposons le traceur LTTng (Linux Trace Toolkit next generation) comme réponse aux besoins de tra¸cage de l’industrie et de la commu- nautédu logiciel libre. La faible intrusivitédu traceur est un aspect clémenant à son utilisabilité, puisqu’il est nécessaire de reproduire, sous tra¸cage, des problèmes observés sous des conditions d’exécution normales. Dans certains cas, les usagers souhaitent laisser des traceurs actifs en tout temps sur des systèmes en production, ce qui rend l’impact en performance définitivement critique. Notre approche implique l’élaboration de primitives de synchronisation qui rencontrent les requis de faible impact. Le mécanisme de synchronisation permettant la mise àl’échelle et sans attente RCU (Read-Copy Update) utilisépar le traceur LTTng remplit ces requis en ce qui concerne la lecture de données. Nous proposons un mécanisme de synchronisation pour extraire les données de tra¸cage en préservant les caractéristiques de mise àl’échelle linéaire et de non-attente. En mesurant l’impact du traceur LTTng, nous démontrons qu’il est possible de créer un traceur qui satisfait toutes les caractéristiques suivantes : faible latence, comportement temps-réel déterministe (sans attente), faible impact sur le débit du système d’exploitation et une mise àl’échelle linéaire par rapport au nombre de processeurs. Une expérimentation sur plusieurs architectures permet d’observer la portabilitédu traceur. Nous proposons un modèle général pour les systèmes superscalaires multi-cœurs avec accès mémoire faiblement ordonnés pour permettre la vérification formelle des viii garanties quant àl’exactitude de l’exécution et l’exécution sans attente àl’aide de la vérification de modèle. Le mécanisme de tampon de LTTng est également vérifié formellement quant àson exactitude et son exécution sans attente. La vérification formelle permet également de démontrer que ces algorithmes permettent la réentrance de plusieurs contextes d’exécution, du fil d’exécution standard aux gestionnaires d’in- terruptions non-masquables, permettant une large couverture d’instrumentation du système d’exploitation. ix Condensé en fran¸cais La croissance en complexitédes systèmes informatiques actuels rend les tâches de déverminage et l’analyse de performance de plus en plus difficiles. Le besoin en outils d’analyse qui tiennent compte de l’ensemble du système se fait sentir, mais leur impact en performance est généralement un obstacle àleur adoption. Le tra¸cage est une technique éprouvée pour permettre l’étude des interactions entre composants d’un système informatique, mais son impact en performance le rend inutilisable sous plusieurs charges de calcul. Cette recherche tente de rendre le tra¸cage largement utilisable par une vaste catégorie d’applications. Il importe également pour un traceur de modifier le comportement du système observéde fa¸con minimale afin de permettre la reproductibilitédes problèmes observés sans tra¸cage. Il est également important que le traceur n’utilise qu’une fraction des ressources du système et ne modifie son comportement que de fa¸con déterministe afin de permettre une activation du tra¸cage sur des systèmes en production. Le problème étudiédans ce travail est l’extraction d’information de tra¸cage d’un système d’exploitation, ce qui implique la collecte d’information àpartir de l’exécution de ce système d’exploitation et le transfert de cette information hors du noyau.

Load more