Simplifying the Analysis of C++ Programs
Total Page:16
File Type:pdf, Size:1020Kb
SIMPLIFYING THE ANALYSIS OF C++ PROGRAMS A Dissertation by YURIY SOLODKYY Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Chair of Committee, Bjarne Stroustrup Committee Members, Jaakko J¨arvi Lawrence Rauchwerger Sergiy Butenko Head of Department, Duncan M. (Hank) Walker August 2013 Major Subject: Computer Science and Engineering Copyright 2013 Yuriy Solodkyy ABSTRACT Based on our experience of working with different C++ front ends, this thesis identifies numerous problems that complicate the analysis of C++ programs along the entire spectrum of analysis applications. We utilize library, language, and tool extensions to address these problems and offer solutions to many of them. In particular, we present efficient, expressive and non-intrusive means of dealing with abstract syntax trees of a program, which together render the visitor design pattern obsolete. We further extend C++ with open multi-methods to deal with the broader expression problem. Finally, we offer two techniques, one based on refining the type system of a language and the other on abstract interpretation, both of which allow developers to statically ensure or verify various run-time properties of their programs without having to deal with the full language semantics or even the abstract syntax tree of a program. Together, the solutions presented in this thesis make ensuring properties of interest about C++ programs available to average language users. ii ACKNOWLEDGEMENTS Knowledge is in the end based on acknowledgement. Ludwig Wittgenstein This Ph.D. thesis would not have been possible without direct or indirect support of many indi- viduals who I was fortunate enough to meet through life and be friends with. I hope I did not forget anyone, as these were the people who taught me values, shown new horizons and inspired me with their encouragements. First and foremost, I would like to express all my gratitude to my dissertation advisor and mentor, Bjarne Stroustrup, for believing in me despite my initial failures, and for creating an excellent research environment in the Programming Techniques, Tools and Languages Group that stimulates creative thinking, fosters new ideas and promotes collaboration. I would also like to thank him for creating C++, which became my hobby, my research, my bread and butter and my lingua franca to understanding other programming languages. I am very grateful to Jaakko J¨arvi,who has been my unofficial supervisor all this time not only in computer science, but also in squash and writing. I would also like to thank Gabriel Dos Reis for forcing me to take his classes, which always were a source of great knowledge and research ideas. I am greatly indebted to Lawrence Rauchwerger for teaching me to listen and encouraging me to pursue my dreams. And I would like to thank Sergiy Butenko for helping me never feel homesick as well as being my emergency contact on all the forms, though probably without knowing it. I am particularly thankful for all the discussions, collaborations and exchange of ideas we had with Peter Pirkelbauer, Andrew Sutton, Abe Skolnik, Luke Wagner, Jason Wilkins, Damian Dechev, Jacob Smith, Yue Li, Stephen Cook, John Freeman, Michael Lopez, Xiaolong Tang, Gabriel Foust, Olga & Roger Pearce, Mani Zandifar, Troy McMahon, Ioannis Papadopoulos, Nathan Thomas, Timmie Smith and many others in the Parasol Lab. My doctoral studies would not have been possible without support and endorsement from my Master thesis advisor Volodymyr Protsenko, as well as professors Yuriy Belov and Mykola Glybovets. I am also greatly indebted to Roman Pyrtko, Khrystyna Lavriv, Igor Panchuk and Yaroslav Vasylenko who taught me true values during times when no one cared about those. I would like to express my gratitude to all the friends who encouraged me through the years, no matter how long I did not answer their emails or return their phone calls: Alexander Boyko, Natasha Gulayeva, Roman Lototskyy, Iryna & Anna Khapko, Konstantyn Volyanskyy, Svyatoslav Trukhanov, Irina Shatruk, Jorge Dorribo Camba, Karen Triff, Zhanna Nepiyushchikh, Olga & Anatoliy Gashev, Mike & Elena Kolomiets, Ian Murray, Amineh Kamranzadeh, Gregory Berkolaiko and many, many others who always reminded me that there is more to life than grad school. I would like to reserve my special thanks to Vahideh whose patience, support and encouragement were the driving force behind completing this thesis. Finally, I am forever grateful to my parents, Iryna and Myroslav, as well as my brother Eugene for their endless encouragement and love. This work is dedicated to them. iii NOMENCLATURE ABI Application Binary Interface ACM Association for Computing Machinery ADT Algebraic Data Type AST Abstract Syntax Tree DAG Directed Acyclic Graph DLL Dynamically Linked Library DSL Domain-Specific Language EDG Edison Design Group GADT Generalized Algebraic Data Type GCC GNU Compiler Collection I/O Input/Output IDE Integrated Development Environment IEEE Institute of Electrical and Electronics Engineers IPR Internal Program Representation ISO International Organization for Standardization LOC Lines Of Code LLVM Low Level Virtual Machine MPL Boost Metaprogramming Library OMD Open-Method Description OOP Object-Oriented Programming RTTI Run-Time Type Information SAIL SAIL Abstract Interpretation Library SELL Semantically Enhanced Library Language STL Standard Template Library XML eXtensible Markup Language XPR eXternal Program Representation XTL eXtensible Typing Library iv TABLE OF CONTENTS Page ABSTRACT . ii ACKNOWLEDGEMENTS . iii NOMENCLATURE . iv TABLE OF CONTENTS . v LIST OF FIGURES . viii LIST OF TABLES . ix 1. INTRODUCTION . 1 1.1 Motivation . 1 1.2 Problem Description . 2 1.2.1 AST Complexity . 2 1.2.2 Dealing with AST . 3 1.2.3 Analysis Complexity . 3 1.2.4 Language Complexity . 5 1.2.5 Challenges and Opportunities . 5 1.3 Results . 6 1.4 Contributions . 6 1.5 Overview of the Thesis . 7 2. PRELIMINARIES1 ............................................. 9 2.1 Mathematical Notation . 9 2.2 Substitutability . 10 2.3 Classes, Objects and Subobjects . 11 2.3.1 Subobjects . 12 2.3.2 Virtual Table Pointers . 14 2.4 Algebraic Data Types . 17 2.4.1 Algebraic Data Types in C++ ................................ 18 2.4.2 Hierarchical Extensible Data Types . 20 2.5 Pattern Matching . 20 2.6 Expression Problem . 23 2.7 Visitor Design Pattern . 24 2.8 C++ Templates . 26 2.8.1 Template Metaprogramming . 27 2.8.2 Expression Templates . 28 2.8.3 Concepts . 30 3. RELATED WORK1 ............................................. 32 3.1 C++ Parsers and Front Ends . 32 3.1.1 Pivot . 33 3.2 Internal Approaches . 33 3.3 External Approaches . 34 3.4 Multiple Dispatch . 34 3.5 Type Switching . 36 3.6 Pattern Matching . 37 3.7 Abstract Interpretation . 39 3.7.1 Non-Relational Abstract Domains . 40 3.7.2 Relational Symbolic Abstract Domains . 41 4. OPEN MULTI-METHODS1 ........................................ 43 v 4.1 Introduction . 44 4.1.1 Summary . 45 4.2 Application Domains . 45 4.2.1 Shape Intersection . 45 4.2.2 Data Format Conversion . 45 4.2.3 Type Conversions in Scripting Languages . 46 4.2.4 Dealing with an AST . 47 4.2.5 Binary Method Problem . 47 4.2.6 Selection of an Optimal Algorithm Based on Dynamic Properties of Objects . 47 4.2.7 Action Systems . 48 4.2.8 Extending Classes with Operations . 49 4.2.9 Open-Methods Programming . 50 4.3 Definition of Open Methods . 51 4.3.1 Type Checking and Call Resolution of Open-Methods . 52 4.3.2 Overload Resolution . 52 4.3.3 Ambiguity Resolution . 53 4.3.4 Covariant Return Types . 55 4.3.5 Algorithm for Dispatch Table Generation . 56 4.3.6 Alternative Dispatch Semantics . 59 4.4 Design Choices and Decisions . 61 4.4.1 Late Ambiguities . 62 4.4.2 Consistency of Covariant Return Types . 64 4.4.3 Pure Open-Methods . ..