Engineering of Reliable and Secure Software Via Customizable Integrated Compilation Systems
Total Page:16
File Type:pdf, Size:1020Kb
Engineering of Reliable and Secure Software via Customizable Integrated Compilation Systems Zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften von der KIT-Fakultät für Informatik des Karlsruher Institut für Technologie (KIT) genehmigte Dissertation von Dipl.-Inf. Oliver Scherer Tag der mündlichen Prüfung: 06.05.2021 1. Referent: Prof. Dr. Veit Hagenmeyer 2. Referent: Prof. Dr. Ralf Reussner This document is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0): https://creativecommons.org/licenses/by-sa/4.0/deed.en Abstract Lack of software quality can cause enormous unpredictable costs. Many strategies exist to prevent or detect defects as early in the development process as possible and can generally be separated into proactive and reactive measures. Proactive measures in this context are schemes where defects are avoided by planning a project in a way that reduces the probability of mistakes. They are expensive upfront without providing a directly visible benefit, have low acceptance by developers or don’t scale with the project. On the other hand, purely reactive measures only fix bugs as they are found and thus do not yield any guarantees about the correctness of the project. In this thesis, a new method is introduced, which allows focusing on the project specific issues and decreases the discrepancies between the abstract system model and the final software product. The first component of this method isa system that allows any developer in a project to implement new static analyses and integrate them into the project. The integration is done in a manner that automatically prevents any other project developer from accidentally violating the rule that the new static analysis checks. The second component is a way to directly integrate system models (e.g. from UML) into the project by treating the model as a direct input to the compiler, just like any other source code. These two components together allow developers to handle complex situations that are only relevant to the given project. The entire project gets analyzed for the correct usage of nontrivial APIs or other hazards which either are bugs or are likely to turn into bugs in future refactorings. Thus, the new method permits the incremental introduction of formal analysis without forcing a project’s developers to change to unfamiliar habits or styles. At the same time, it allows preventing classes of defects automatically, yielding immediate gains from the first usage of the new method. Contents 1 Introduction 1 1.1 Motivation ................................. 1 1.2 Overview over the state of the art of reliable and secure software ... 3 1.3 Open aspects of existing solutions .................... 4 1.4 Goals of this thesis ............................ 5 2 State of the art of reliable and secure software 7 2.1 Reliability ................................. 7 2.1.1 Availability ............................ 8 2.1.2 Redundancy ............................ 9 2.1.3 Security .............................. 10 2.1.4 Comprehensibility ........................ 10 2.2 Software engineering of reliable systems ................. 10 2.2.1 Software development process .................. 11 2.2.2 Continuous integration ...................... 11 2.2.3 Model checking .......................... 13 2.2.4 Typestate ............................. 13 2.2.5 Restricting Programming Languages to a safe subset ..... 15 2.2.6 Static Analyses .......................... 20 2.2.7 Project Specific Static Analyses ................. 21 2.2.8 Safety through Programming Languages ............ 22 2.2.9 Compiler extensions ....................... 23 3 New concept of an extendable compilation system with integrated static analysis 36 3.1 Components of compilation systems ................... 36 3.2 Automation in software engineering ................... 38 3.3 Analysis of requirements for extensible compilers ........... 38 3.3.1 Coverage of language constructs ................. 39 3.3.2 Direction of dataflow ....................... 39 3.4 Features enabled by the new concept .................. 41 3.4.1 Employing static analyses .................... 41 3.4.2 Model driven development .................... 47 3.4.3 Checking code against models .................. 48 3.4.4 Code generation from models .................. 49 3.4.5 Code generation vs code checking ................ 49 3.4.6 Abstract interpretation ...................... 50 3.5 Project specific compiler extensions instead of generally applicable tools .................................... 51 3.6 Lint categories .............................. 53 3.7 (Semi-)automatically resolving static analysis results ......... 55 3.8 Incremental steps towards full formal analysis ............. 55 3.9 Summary ................................. 56 4 Comparison of the new concept with related works 58 4.1 Components of compilation systems ................... 59 4.2 Automation in software engineering ................... 60 4.3 Compiler extensions ........................... 61 4.4 Formal analysis .............................. 61 4.5 Model driven development ........................ 62 4.5.1 Code generation ......................... 62 4.5.2 Manual transcription ....................... 62 4.5.3 New concept ........................... 62 4.6 Static analyses .............................. 63 5 Implementation of the components of the new concept 64 5.1 Integrating static analyses into the development process ....... 65 5.1.1 Developing a new static analysis ................ 65 5.1.2 Choosing a responsiveness level ................. 66 5.1.3 Forward compatibility of compiler extensions ......... 67 5.1.4 Diagnostics ............................ 67 5.1.5 Problem resolution instructions ................. 68 5.2 Model driven development ........................ 69 5.2.1 Code generation ......................... 69 5.2.2 Code checking .......................... 74 5.3 Abstract interpretation .......................... 76 5.3.1 Running abstract interpretation on test suites ......... 77 5.3.2 Restraining the virtual machine ................. 78 5.3.3 Employing abstract interpretation for debugging purposes .. 80 5.3.4 Summary ............................. 81 6 Aspects of the practical application of the new concept 82 6.1 Role of false positives on the usability of static analyses ........ 82 6.1.1 Stability across compiler changes ................ 85 6.2 Compiler extension schemes ....................... 89 6.3 Generating static analyses from examples ................ 89 6.3.1 The problem ........................... 89 6.3.2 The new tool and its usage ................... 90 6.4 Community driven database ....................... 93 7 Evaluation 96 7.1 Analysis of the goals of this thesis .................... 96 7.2 Usability study .............................. 97 7.2.1 Collected data .......................... 98 7.2.2 Analysis .............................. 98 7.2.3 Limits of this study ....................... 98 8 Conclusion 99 8.1 Contributions ............................... 101 8.2 Limits ................................... 102 8.3 Outlook .................................. 102 Abbreviations 104 Bibliography 106 A Semantic versioning 118 B Forward compatibility guarantees 119 C The Rust programming language 121 C.1 Basics ................................... 121 C.2 Constructors ............................... 121 C.3 Destructuring ............................... 122 C.4 Refutable destructuring ......................... 123 C.5 Ownership ................................. 124 D Rustc internals 126 D.1 AST Items ................................. 126 D.2 AST Types ................................ 127 D.3 AST Expressions ............................. 127 E Pointer equality operation during abstract interpretation 131 F Code generation from model 132 G Custom driver for project specific static analyses 135 H init function call before foo function call demo 137 I Survey about the ease of tool integration 145 J Full list of related publications 146 J.1 Peer reviewed conferences and journals ................. 146 J.2 Language design and standardization (peer reviewed white papers) . 146 J.3 Peer reviewed open source ........................ 147 Chapter 1 Introduction 1.1 Motivation Adding new ways to access any system, be it physical, informational or digital, comes with the inherent risk of giving access to unauthorized parties. Physical security has been a part of humanity since its dawn and part of the animal kingdom before that. Protecting offspring or property via preparation has clearly been a beneficial trait. Even before the existence of writing knowledge has been kept secret or hidden in code words. Communicating information without fear of interception by either the messenger or third parties was an issue well before the beginning of our calendar [63]. But only with the invention of automated information processing has the protection against malicious information become an end in itself. When it became possible to connect to computers via phone lines, systems did not have any IT-security (from here on just referred to as “security”) concept except for the secrecy of the phone number used to connect to it. Thus, the cat and mouse game of inventing new attacks and defending against them began with automated systems calling every possible phone number to find computers. After these systems were considered reasonably secure, the internet opened a new avenue of attack, because a system was not limited to