Thesis Reference

Thesis Revisiting memory assignment semantics in imperative programming languages RACORDON, Dimitri Abstract This thesis studies the semantics of imperative programming languages. In particular, it explores the relationship between the syntax and semantics of memory assignment. The contributions are threefold. First, I developed a theoretical programming language, called the assignment calculus, to uniformly express the assignment semantics of imperative programming languages. Second, I formalized common memory errors (e.g. access to uninitialized memory and memory leaks) in the context of this language, and provided dynamic and static approaches to prevent them. Third, I developed a general purpose programming language called Anzen, based on the theoretical foundation of the assignment calculus. Reference RACORDON, Dimitri. Revisiting memory assignment semantics in imperative programming languages. Thèse de doctorat : Univ. Genève, 2019, no. Sc. 5409 URN : urn:nbn:ch:unige-1271053 DOI : 10.13097/archive-ouverte/unige:127105 Available at: http://archive-ouverte.unige.ch/unige:127105 Disclaimer: layout of this document may differ from the published version. 1 / 1 UNIVERSITÉ DE GENÈVE FACULTÉ DES SCIENCES Département d’Informatique Professeur D. Buchs Revisiting Memory Assignment Semantics in Imperative Programming Languages THÈSE présentée à la Faculté des sciences de l’Université de Genève pour obtenir le grade de Docteur ès sciences, mention informatique par Dimitri Racordon de Torny (FR) Thèse No 5409 Genève Atelier d’impression ReproMail 2019 M.¤;たちの9G(耀満8 iv Acknowledgements This work would not have been possible without the unfailing support of so many people I had the chance to meet along the way. First, I would like to express my sincere gratitude to my advisor Professor Didier Buchs. Thank you for believing in me, for pushing me to pursue my research, for channeling my impetuous desire to explore new lands and yet giving me the freedom to choose my own path. I could not have hoped for a better boss. To my friends and colleagues from the SMV lab, to David Lawrence for sharing with me the first years of this adventure, to Maximilien Colange for giving me a taste for convoluted formal notations, to Alban Linard for our endless discussions about programming languages and Doctor Who, to Damien Morard for patiently letting me rant about the writing of the chapters herein, to Stefan Klikovitz for forgiving my bizarre affection for long and complicate sentences, thank you so much for making this journey possible. To my friends and colleagues at Socialease, thank you for coping with a CTO who spent more time writing inference rules than planning development sprints. To my wife Shiori, thank you for your unconditional love and support. You never failed to cheer me up when I was feeling lost. You inspired me every day with your kindness and strength and I cannot wait to live the many experiences that await us by your side. To my family, thank you for your love and support. My mother, my father and my sister made me who I am today. You are my biggest fans, and a formidable source of inspiration. To my friends, Aloys, Nadine, Robin, Elliott, Alexandre, Adrien, Noémi, Flori- anne, Jean-Pierre, Anouck, Christian, Mireille, Sam, Philippe and whoever else I forgot, thank you for being there for me every time I was feeling overwhelmed. v vi Abstract Programming languages have become an unavoidable tool, not only for computer experts, but also for scientists and engineers from all horizons. For the sake of usability, modern programming languages typically sit at a very high abstraction level and hide the intricacies of data representation and memory management. The continuing growth in computational power has enabled this evolution, al- lowing compilers and interpreters to support features once thought unrealistically expensive, such as automatic garbage collection algorithms and powerful static type inference. While this has undeniably contributed to make code simpler to write and clearer to read, relics of the underlying model still transpire in most languages’ semantics. In imperative programming languages, where computa- tion is expressed in terms of successive mutations of a program’s state, leaks at any level of abstraction may lead to unintuitive and/or misunderstood memory assignment behaviors. In particular, the interplay between values and variables can prove to be a prolific source of confusion. While both are usually perceived as interchangeable notions, values are semantic objects that live in memory while variables are syntactic tools to interact with them. As both concepts are not nec- essarily thetered in a one-to-one relationship, foreseeing the reach of a modifying operation requires a clear understanding of the memory abstraction. This thesis proposes a model to better reason about memory management. Our first objective is to provide a more accurate description of memory assignment semantics, in a universal and unambiguous way. The resulting model marks a clear distinction between variables and values, that highlights situations where aliasing occurs and situations where assignments may have side effects beyond the muta- tion of a single variable. Such a model is presented formally by the means of a complete semantics, and informally with examples of its application to some non- trivial examples. It is then used to describe memory errors related to assignment. Two methods are proposed. The first is based on an instrumentation of the dynamic semantics to detect accesses to uninitialized or freed memory. The second relies on a capability-based type system to guarantee memory safety statically. Finally, Anzen, a general purpose language based on the aforementioned model, is introduced as an attempt to empirically validate its practicality. vii viii Résumé Les langages de programmation sont devenus un outil indispensable, non seule- ment pour les professionnels de l’informatique, mais aussi pour les scientifiques et ingénieurs d’autres disciplines. Ainsi, dans le but de faciliter leur utilisation, les langages de programmation offrent désormais de nombreuses abstractions visant à masquer les subtilités liées à la représentation des données et à la gestion de la mé- moire. La continuelle croissance de la puissance de calcul a rendu cette évolution possible, permettant aux compilateurs et interprètes de supporter des fonctionnal- ité autrefois jugées irréalistes, telles que la récupération automatique de mémoire ou encore l’inférence de type. Si ces améliorations ont indubitablement contribué à rendre le code plus facile à écrire et plus clair à lire, des reliques du modèle sous- jacent transparaissent toujours dans la sémantique de la plupart des languages, lesquelles peuvent conduirent à des comportements mal compris. En particulier, la relation entre valeurs et variables se révèle être une prolifique source de confusion. Alors que ces deux notions sont fréquemment perçues comme interchange- ables, une valeur est en fait un object sémantique vivant dans la mémoire tandis qu’une variable est un outil syntaxique pour interagir avec. De plus, ces concepts ne sont pas nécessairement liés par une relation bijective. Par conséquent, prédire la portée d’une opération de modification requiert une compréhension claire de l’abstraction faite sur la mémoire. Cette thèse propose un modèle pour mieux raisonner à propos de la gestion de mémoire. Notre premier objectif est de fournir une description plus précise des sé- mantiques d’assignation, de manière universelle et sans ambiguïté. Le modèle qui en résulte marque une distinction nette entre variables et valeurs, et met en avant les situations dans lesquelles sont créés des alias, ainsi que les situations dans lesquelles une modification peut avoir des effets de bords. Ce modèle est présenté formellement par le biais d’une sémantique complète, et informellement par le biais d’exemples de son application sur des exemples non triviaux. Il est ensuite utilisé pour décrire les erreurs mémoires liées à l’assignation. Deux méthodes sont proposées. La première consiste en une instrumentation de la sémantique dynamique pour détecter les accès à de la mémoire non initialisée ou libérée. La deuxième se repose sur un système de type basé sur des capacités pour garantir ix x des propriétés de sureté statiquement. Finalement, nous présentons Anzen, un langage de programmation général basé sur le modèle formel susmentionné, dont le but est de valider de manière empirique son utilisabilité. Contents Foreword iii 1 Introduction 1 1.1 Memory in the Imperative Paradigm . .2 1.2 Motivations . .8 1.3 Contributions . 10 1.4 Outline . 12 2 Background and Related Work 15 2.1 Memory Management . 15 2.2 Formal Methods . 23 2.3 Static Program Analysis . 27 2.4 Synthesis . 36 3 Mathematical Preliminaries 39 3.1 General Notations . 39 3.2 Data Types . 40 3.3 Inference Rules . 45 4 The A-Calculus 49 4.1 Problematic . 50 4.2 Informal Introduction . 54 4.3 Syntax and Semantics . 57 4.4 Examples . 79 4.5 Summary . 83 5 Dynamic Safety 85 5.1 Observing Memory Errors . 86 5.2 Signaling Memory Errors . 92 5.3 Summary . 107 xi xii CONTENTS 6 Static Safety 109 6.1 Rust’s Type System in a Nutshell . 110 6.2 Static Garbage Collection . 115 6.3 Type Checking . 126 6.4 Records and Static Garbage Collection . 151 6.5 Capabilities for Immutability . 154 6.6 Summary . 160 7 Anzen 161 7.1 Compilers and Interpreters . 162 7.2 Anzen in a Nutshell . 164 7.3 Examples . 170 7.4 Anzen’s Intermediate Representation . 172 7.5 Anzen’s Compiler . 178 7.6 Summary . 185 8 Conclusion 187 8.1 Summary of our Contributions . 187 8.2 Critique . 188 8.3 Future Directions . 189 8.4 The Holy War of Programming Languages . 191 Bibliography 193 A Formal Companion 209 A.1 List of Symbols and Notations . 209 A.2 A-Calculus’ Semantics . 213 A.3 A-Calculus’ Type System . 217 B Anzen 223 B.1 Anzen’s Concrete Syntax .

Load more