PROGRAMMING LANGUAGE EVOLUTION and SOURCE CODE REJUVENATION a Dissertation by PETER MATHIAS PIRKELBAUER Submitted to the Office
Total Page:16
File Type:pdf, Size:1020Kb
PROGRAMMING LANGUAGE EVOLUTION AND SOURCE CODE REJUVENATION A Dissertation by PETER MATHIAS PIRKELBAUER Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY December 2010 Major Subject: Computer Science PROGRAMMING LANGUAGE EVOLUTION AND SOURCE CODE REJUVENATION A Dissertation by PETER MATHIAS PIRKELBAUER Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, Bjarne Stroustrup Committee Members, Gabriel Dos Reis Lawrence Rauchwerger Jaakko J¨arvi Weiping Shi Head of Department, Valerie Taylor December 2010 Major Subject: Computer Science iii ABSTRACT Programming Language Evolution and Source Code Rejuvenation. (December 2010) Peter Mathias Pirkelbauer, Dipl.-Ing., Johannes-Kepler Universit¨atLinz, Austria; M.B.A., Texas A&M University Chair of Advisory Committee: Dr. Bjarne Stroustrup Programmers rely on programming idioms, design patterns, and workaround techniques to express fundamental design not directly supported by the language. Evolving languages often address frequently encountered problems by adding language and library support to subsequent releases. By using new features, programmers can express their intent more directly. As new concerns, such as parallelism or security, arise, early idioms and language facilities can become serious liabilities. Modern code sometimes benefits from optimization techniques not feasible for code that uses less expressive constructs. Manual source code migration is expensive, time-consuming, and prone to errors. This dissertation discusses the introduction of new language features and li- braries, exemplified by open-methods and a non-blocking growable array library. We describe the relationship of open-methods to various alternative implementation tech- niques. The benefits of open-methods materialize in simpler code, better performance, and similar memory footprint when compared to using alternative implementation techniques. Based on these findings, we develop the notion of source code rejuvenation, the automated migration of legacy code. Source code rejuvenation leverages enhanced program language and library facilities by finding and replacing coding patterns that iv can be expressed through higher-level software abstractions. Raising the level of abstraction improves code quality by lowering software entropy. In conjunction with extensions to programming languages, source code rejuvenation offers an evolutionary trajectory towards more reliable, more secure, and better performing code. We describe the tools that allow us efficient implementations of code rejuvena- tions. The Pivot source-to-source translation infrastructure and its traversal mecha- nism forms the core of our machinery. In order to free programmers from representa- tion details, we use a light-weight pattern matching generator that turns a C++ like input language into pattern matching code. The generated code integrates seamlessly with the rest of the analysis framework. We utilize the framework to build analysis systems that find common workaround techniques for designated language extensions of C++0x (e.g., initializer lists). More- over, we describe a novel system (TACE | template analysis and concept extraction) for the analysis of uninstantiated template code. Our tool automatically extracts requirements from the body of template functions. TACE helps programmers under- stand the requirements that their code de facto imposes on arguments and compare those de facto requirements to formal and informal specifications. v TABLE OF CONTENTS CHAPTER Page I INTRODUCTION :::::::::::::::::::::::::: 1 I.A. What Is Source Code Rejuvenation? . 2 I.A.1. Applications . 3 I.B. Case Study . 4 I.B.1. Initializer Lists . 4 I.C. Statement of Contribution . 6 II LITERATURE REVIEW :::::::::::::::::::::: 9 II.A. Tool Support . 9 II.B. Software Analysis . 12 II.C. Rejuvenation . 18 II.D. Refactoring . 21 II.D.1. Source Code Rejuvenation Versus Refactoring . 23 II.E. Multiple-Dispatch Extensions . 24 III TOOL SUPPORT :::::::::::::::::::::::::: 28 III.A. Pivot . 29 III.A.1. Internal Program Representation . 30 III.A.1.a. Implementation . 31 III.A.2. eXternal Program Representation . 35 III.B. Support Libraries . 36 III.B.1. Compile-Time Meta Information . 37 III.B.2. A Generic Visitor . 42 III.B.3. Traversal Library . 43 III.B.4. Equality Comparison . 46 III.B.5. Implementation . 47 III.C. A Lightweight Pattern Matching Tool . 48 III.C.1. Motivation . 49 III.C.2. Input Language . 50 III.C.2.a. Simple Patterns . 51 III.C.2.b. Pattern Variables . 53 III.C.2.c. Matching Sequences . 55 III.C.2.d. Pattern Calls . 57 vi CHAPTER Page III.C.2.e. Pattern Arguments . 58 III.C.2.f. Matching Order . 60 III.C.2.g. Ambiguity Checking and Resolution . 60 III.C.3. Output . 62 III.C.3.a. Tree Matching . 63 III.C.3.b. String Matching . 64 III.C.3.c. Runtime Engine . 65 III.C.3.d. Comparison . 65 III.C.4. Future Work . 66 IV THE OPEN-METHODS LANGUAGE EXTENSIONS ::::: 68 IV.A. Introduction . 68 IV.B. Application Domains . 70 IV.B.1. Shape Intersection . 71 IV.B.2. Data Format Conversion . 71 IV.B.3. Type Conversions in Scripting Languages . 73 IV.B.4. Compiler Pass over an AST . 73 IV.B.5. Binary Method Problem . 74 IV.B.6. Algorithm Selection Using Dynamic Properties of Objects . 75 IV.B.7. Action Systems . 77 IV.B.8. Extending Classes with Operations . 78 IV.B.9. Open-Methods Programming . 79 IV.C. Definition of Open Methods . 81 IV.C.1. Type Checking and Call Resolution of Open-Methods 83 IV.C.2. Overload Resolution . 84 IV.C.3. Ambiguity Resolution . 85 IV.C.3.a. Single Inheritance . 86 IV.C.3.b. Repeated Inheritance . 86 IV.C.3.c. Virtual Inheritance . 88 IV.C.4. Covariant Return Types . 89 IV.C.5. Algorithm for Dispatch Table Generation . 91 IV.C.6. Alternative Dispatch Semantics . 94 IV.D. Discussion of Design Decisions . 96 IV.D.1. Late Ambiguities . 97 IV.D.2. Consistency of Covariant Return Types . 103 IV.D.3. Pure Open-Methods . 105 IV.E. Relation to Orthogonal Features . 105 vii CHAPTER Page IV.E.1. Namespace . 105 IV.E.2. Access Privileges . 107 IV.E.3. Smart Pointers . 107 IV.F. Implementation . 108 IV.F.1. Changes to Compiler and Linker . 108 IV.F.2. Changes to the Object Model . 109 IV.F.3. Alternative Approaches . 112 IV.F.3.a. Multi-Methods . 112 IV.F.3.b. Chinese Remainders . 113 V COMPARISON WITH WORKAROUND TECHNIQUES :::: 120 V.A. Background on Runtime Concepts . 120 V.A.1. Introduction . 121 V.A.2. Generic Programming . 123 V.A.2.a. Standard Template Library . 125 V.A.3. Runtime Concepts for STL Iterators . 126 V.A.3.a. Concept Layer . 127 V.A.3.b. Model Layer . 128 V.A.3.c. Wrapper Layer . 130 V.A.4. The Algorithms Library . 131 V.A.4.a. Design Goals . 132 V.A.5. Implementation in ISO C++ . 133 V.A.6. Implementation in C++ with Open Multi-Methods 140 V.A.7. Tests . 144 V.A.7.a. Algorithm Performance . 145 V.A.7.b. Library Function Selection . 147 V.A.8. Related Work . 149 V.B. Comparison of Programming Styles . 150 V.B.1. Image Format Conversion . 150 V.B.2. AST Traversal . 154 V.C. Comparison with Other Implementation Techniques . 158 V.C.1. Implementations . 159 V.C.2. Results and Interpretation . 160 VI RECOVERY OF HIGHER LEVEL LANGUAGE ABSTRAC- TIONS :::::::::::::::::::::::::::::::: 163 VI.A. Semantics and Workaround Definition . 163 VI.A.1. Semantic Equality . 164 viii CHAPTER Page VI.B. Workaround Recognition . 166 VI.B.1. Patterns of Initializer List Construction . 166 VI.B.2. The State Machine . 169 VI.C. Results . 170 VI.C.1. Tests . 170 VI.C.2. Discussion . 171 VII REJUVENATION OF LIBRARY USE :::::::::::::: 172 VII.A. Introduction . 172 VII.A.1. Non-Blocking Synchronization . 173 VII.A.2. Non-Blocking Data Structures . 174 VII.A.3. A Non-Blocking Vector . 174 VII.B. Implementation . 175 VII.B.1. Implementation . 176 VII.B.1.a. The ABA Problem . 178 VII.C. Performance Evaluation . 181 VII.D. Future Work . 185 VIII TEMPLATE ANALYSIS AND CONCEPT EXTRACTION ::: 186 VIII.A.Introduction . 186 VIII.B.Concepts for C++ . 190 VIII.C.Requirements Extraction . 194 VIII.C.1. Evaluation of Expressions . 194 VIII.C.2. Evaluation of Declarations and Statements . 198 VIII.C.3. Evaluation of Class Instantiations . 199 VIII.C.4. Examples . 200 VIII.D.From Requirements to Concepts . 202 VIII.D.1. Function Calls . 202 VIII.D.2. Result Type Reduction . 203 VIII.E.Concept Matching with Repository . 205 VIII.E.1. Concept Kernel . 206 VIII.E.2. Concept Matching . 207 VIII.E.3. Families of Functions . 210 VIII.F.Results . 212 VIII.G.Related Work . 213 VIII.H.Future Work . 214 VIII.H.1. Concept Utilization Analysis . 214 VIII.H.2. Automatic Lifting of Non-Template Code . 218 ix CHAPTER Page IX CONCLUSION AND FUTURE WORK :::::::::::::: 219 REFERENCES ::::::::::::::::::::::::::::::::::: 221 APPENDIX A ::::::::::::::::::::::::::::::::::: 247 VITA :::::::::::::::::::::::::::::::::::::::: 254 x LIST OF TABLES TABLE Page I Source code rejuvenation vs. refactoring :::::::::::::::: 23 II Support libraries overview :::::::::::::::::::::::: 47 III Pattern matching strategies ::::::::::::::::::::::: 66 IV Dispatch table with repeated base classes ::::::::::::::: 87 V Dispatch table with virtual base class :::::::::::::::::: 89 VI Resolution with covariance :::::::::::::::::::::::: 90 VII Call resolution in D1 and D2 :::::::::::::::::::::: 103 VIII D1 and D2 loaded together ::::::::::::::::::::::: 104 IX Dispatch table with repeated base classes :::::::::::::::