Refinement-Based Program Analysis Tools

Total Page:16

File Type:pdf, Size:1020Kb

Refinement-Based Program Analysis Tools Refinement-Based Program Analysis Tools Manu Sridharan Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-125 http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-125.html October 23, 2007 Copyright © 2007, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Refinement-Based Program Analysis Tools by Manu Sridharan B.S. (Massachusetts Institute of Technology) 2001 M.Eng. (Massachusetts Institute of Technology) 2002 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Rastislav Bodík, Chair Professor George Necula Professor Leo Harrington Fall 2007 The dissertation of Manu Sridharan is approved: Professor Rastislav Bodík, Chair Date Professor George Necula Date Professor Leo Harrington Date University of California, Berkeley Fall 2007 Refinement-Based Program Analysis Tools Copyright © 2007 by Manu Sridharan Abstract Refinement-Based Program Analysis Tools by Manu Sridharan Doctor of Philosophy in Computer Science University of California, Berkeley Professor Rastislav Bodík, Chair Program analysis tools are starting to change how software is developed. Verifiers can now eliminate certain complex bugs in large code bases, and automatic refactor- ing tools can greatly simplify code cleanup. Nevertheless, writing robust large-scale software remains a challenge, as greater use of component frameworks complicates de- bugging and program understanding. Developers need more powerful programming tools to combat this complexity and produce reliable code. This dissertation presents two techniques for more powerful debugging and pro- gram understanding tools based on refinement. In general, refinement-based tech- niques aim to discover interesting properties of a large program by only reasoning about the most important parts of the program (typically a small amount of code) precisely, abstracting away the behavior of much of the program. Our key contribu- tion is the first framework for effective refinement-based handling of object-oriented data structures; pervasive use of such data structures thwarts the effectiveness of most existing analyses and tools. Our two refinement-based techniques significantly advance the state-of-the-art in program analyses and tools for object-oriented languages. The first technique is a refinement-based points-to analysis that can compute precise answers in interactive 1 time. This analysis enables precise reasoning about data structure behavior in clients that require interactive performance, e.g., just-in-time compilers and interactive de- velopment environment (IDE) tools. Our second technique is thin slicing, which gives usable answers to code relevance questions—e.g., "What code might have caused this crash?"—addressing a long-standing challenge for analysis tools. In this dissertation, we describe the design and implementation of these techniques and present experi- mental evaluations that validate their key properties. Professor Rastislav Bodík, Chair Date 2 Contents List of Figures vi List of Tables xi 1 Introduction 1 1.1 Our Approach: Refinement ........................ 5 1.2 Refinement-Based Points-To Analysis .................. 8 1.2.1 Motivation ............................. 8 1.2.2 A Brief History of Points-to Analysis .............. 10 1.2.3 Our Approach ........................... 14 1.3 Thin Slicing ................................ 17 1.3.1 Motivation ............................. 18 1.3.2 Our Approach ........................... 19 1.4 Dissertation Overview .......................... 24 2 Points-To Analysis Background 26 2.1 Points-To Analysis Definition ...................... 26 2.2 Points-To Analysis Precision ....................... 27 2.2.1 Assignments ............................ 28 2.2.2 Field accesses ........................... 29 i 2.2.3 Call Graph ............................ 30 2.2.4 Method Calls ........................... 32 2.2.5 Heap Abstraction ......................... 36 2.2.6 Control Flow ........................... 39 3 Points-To Analysis Formulations 42 3.1 Context-Free Language Reachability .................. 43 3.2 Context-Insensitive Formulation ..................... 46 3.2.1 Graph Representation ...................... 46 3.2.2 Analysis Grammar ........................ 48 3.2.3 Other Java Language Features .................. 52 3.3 Context-Sensitive Formulation ...................... 53 3.4 On-The-Fly Call Graph Formulation .................. 59 3.4.1 Intuition .............................. 60 3.4.2 Graph Representation ...................... 62 3.4.3 The LOTF Language ........................ 64 3.4.4 Adding Context Sensitivity .................... 70 3.4.5 Discussion ............................. 72 4 Context-Insensitive Points-To Analysis 76 4.1 Algorithm Overview ........................... 77 4.1.1 Demand-Driven Points-To Analysis ............... 77 4.1.2 Client-Driven Refinement .................... 79 4.1.3 Simplified Formulation ...................... 80 4.1.4 Refinement Algorithm ...................... 81 4.1.5 Proofs of Termination and Soundness .............. 89 4.2 Regular Approximation .......................... 94 4.2.1 Regular Reachability ....................... 95 ii 4.2.2 RegularPT ............................. 97 4.2.3 Improving Precision with Types ................. 98 4.3 Refinement ................................ 99 4.3.1 Refining through match edge removal .............. 100 4.3.2 RefinedRegularPT ........................ 101 4.4 Evaluation ................................. 106 4.4.1 Experimental Configuration ................... 107 4.4.2 Experimental Results ....................... 112 4.5 Java vs. C ................................. 123 4.6 FullFS Details ............................... 125 4.7 Adaptation of Tabulation Algorithm .................. 130 4.7.1 Tabulation Algorithm for Points-To Analysis .......... 130 4.7.2 Discussion ............................. 136 5 Context-Sensitive Points-To Analysis 139 5.1 Algorithm Overview ........................... 140 5.1.1 Simplified Formulation ...................... 140 5.1.2 Context-Sensitive Refinement Algorithm ............ 142 5.1.3 Refinement on Java programs .................. 143 5.2 Decidable Formulation .......................... 149 5.2.1 Context-Sensitive Analysis in CFL-Reachability ........ 149 5.2.2 Handling Recursion ........................ 151 5.3 Refinement Algorithm .......................... 153 5.3.1 Refinement for LRF-reachability ................. 153 5.3.2 Pseudocode ............................ 157 5.3.2.1 Refinement Loop .................... 158 5.3.2.2 Computing Points-To Sets ............... 159 iii 5.3.2.3 Refinement Policy ................... 166 5.4 Evaluation ................................. 166 5.4.1 Experimental Configuration ................... 167 5.4.2 Experimental Results ....................... 171 6 Thin Slicing 177 6.1 Defining Thin Slices ........................... 178 6.2 Thin Slices as Dependences ....................... 180 6.3 Expanding Thin Slices .......................... 182 6.3.1 Question 1: Explaining Aliasing ................. 183 6.3.2 Question 2: Control Dependence ................ 186 6.4 Computing Thin Slices .......................... 187 6.4.1 Graph Construction ....................... 188 6.4.2 Context-Insensitive Thin Slicing ................. 189 6.4.3 Context-Sensitive Thin Slicing .................. 189 6.5 Evaluation ................................. 190 6.5.1 Configuration and Methodology ................. 192 6.5.2 Experiment: Locating Bugs ................... 196 6.5.3 Experiment: Understanding Tough Casts ............ 200 7 Related Work 204 7.1 Pointer Analysis Related Work ..................... 204 7.1.1 Context-insensitive points-to analysis .............. 205 7.1.2 Context-sensitive points-to analysis ............... 206 7.1.3 Refinement-based points-to analysis ............... 208 7.1.4 Demand-driven points-to analysis ................ 208 7.1.5 CFL-reachability ......................... 209 7.1.6 Incremental points-to analysis .................. 210 iv 7.1.7 Cast verification .......................... 210 7.2 Thin Slicing Related Work ........................ 211 8 Conclusions and Future Work 215 Bibliography 218 v List of Figures 1.1 Graphs showing the number of public methods for the Eclipse plat- form [Ecl] and the Java standard library [J2S] in successive releases. The size of Eclipse has increased by a factor of 3 since its initial release, while the Java library size has increased by a factor of 3.8. ...... 2 1.2 Code examples to informally illustrate the effects of our refinement technique. ................................. 6 1.3 Example showing the advantages of thin slicing. The six statements with underlined expressions are in the thin slice seeded with line 24, while the traditional slice for line 24 contains all displayed code. The bodies of functions with inessential behavior (e.g., print()) have been elided for clarity. ............................. 20 2.1 A simple example to illustrate different treatments of assignments.
Recommended publications
  • IBM Developer for Z/OS Enterprise Edition
    Solution Brief IBM Developer for z/OS Enterprise Edition A comprehensive, robust toolset for developing z/OS applications using DevOps software delivery practices Companies must be agile to respond to market demands. The digital transformation is a continuous process, embracing hybrid cloud and the Application Program Interface (API) economy. To capitalize on opportunities, businesses must modernize existing applications and build new cloud native applications without disrupting services. This transformation is led by software delivery teams employing DevOps practices that include continuous integration and continuous delivery to a shared pipeline. For z/OS Developers, this transformation starts with modern tools that empower them to deliver more, faster, with better quality and agility. IBM Developer for z/OS Enterprise Edition is a modern, robust solution that offers the program analysis, edit, user build, debug, and test capabilities z/OS developers need, plus easy integration with the shared pipeline. The challenge IBM z/OS application development and software delivery teams have unique challenges with applications, tools, and skills. Adoption of agile practices Application modernization “DevOps and agile • Mainframe applications • Applications require development on the platform require frequent updates access to digital services have jumped from the early adopter stage in 2016 to • Development teams must with controlled APIs becoming common among adopt DevOps practices to • The journey to cloud mainframe businesses”1 improve their
    [Show full text]
  • Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
    1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it.
    [Show full text]
  • Building Useful Program Analysis Tools Using an Extensible Java Compiler
    Building Useful Program Analysis Tools Using an Extensible Java Compiler Edward Aftandilian, Raluca Sauciuc Siddharth Priya, Sundaresan Krishnan Google, Inc. Google, Inc. Mountain View, CA, USA Hyderabad, India feaftan, [email protected] fsiddharth, [email protected] Abstract—Large software companies need customized tools a specific task, but they fail for several reasons. First, ad- to manage their source code. These tools are often built in hoc program analysis tools are often brittle and break on an ad-hoc fashion, using brittle technologies such as regular uncommon-but-valid code patterns. Second, simple ad-hoc expressions and home-grown parsers. Changes in the language cause the tools to break. More importantly, these ad-hoc tools tools don’t provide sufficient information to perform many often do not support uncommon-but-valid code code patterns. non-trivial analyses, including refactorings. Type and symbol We report our experiences building source-code analysis information is especially useful, but amounts to writing a tools at Google on top of a third-party, open-source, extensible type-checker. Finally, more sophisticated program analysis compiler. We describe three tools in use on our Java codebase. tools are expensive to create and maintain, especially as the The first, Strict Java Dependencies, enforces our dependency target language evolves. policy in order to reduce JAR file sizes and testing load. The second, error-prone, adds new error checks to the compilation In this paper, we present our experience building special- process and automates repair of those errors at a whole- purpose tools on top of the the piece of software in our codebase scale.
    [Show full text]
  • The Art, Science, and Engineering of Fuzzing: a Survey
    1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature.
    [Show full text]
  • Jedit: Isabelle/Jedit
    jEdit ∀ = Isabelle α λ β → Isabelle/jEdit Makarius Wenzel 5 December 2013 Abstract Isabelle/jEdit is a fully-featured Prover IDE, based on Isabelle/Scala and the jEdit text editor. This document provides an overview of general principles and its main IDE functionality. i Isabelle's user interface is no advance over LCF's, which is widely condemned as \user-unfriendly": hard to use, bewildering to begin- ners. Hence the interest in proof editors, where a proof can be con- structed and modified rule-by-rule using windows, mouse, and menus. But Edinburgh LCF was invented because real proofs require millions of inferences. Sophisticated tools | rules, tactics and tacticals, the language ML, the logics themselves | are hard to learn, yet they are essential. We may demand a mouse, but we need better education and training. Lawrence C. Paulson, \Isabelle: The Next 700 Theorem Provers" Acknowledgements Research and implementation of concepts around PIDE and Isabelle/jEdit has started around 2008 and was kindly supported by: • TU M¨unchen http://www.in.tum.de • BMBF http://www.bmbf.de • Universit´eParis-Sud http://www.u-psud.fr • Digiteo http://www.digiteo.fr • ANR http://www.agence-nationale-recherche.fr Contents 1 Introduction1 1.1 Concepts and terminology....................1 1.2 The Isabelle/jEdit Prover IDE..................2 1.2.1 Documentation......................3 1.2.2 Plugins...........................4 1.2.3 Options..........................4 1.2.4 Keymaps..........................5 1.2.5 Look-and-feel.......................5 2 Prover IDE functionality7 2.1 File-system access.........................7 2.2 Text buffers and theories....................8 2.3 Prover output..........................9 2.4 Tooltips and hyperlinks....................
    [Show full text]
  • T-Fuzz: Fuzzing by Program Transformation
    T-Fuzz: Fuzzing by Program Transformation Hui Peng1, Yan Shoshitaishvili2, Mathias Payer1 Purdue University1, Arizona State University2 A PRESENTATION BY MD SHAMIM HUSSAIN AND NAFIS NEEHAL CSCI 6450 - PRINCIPLES OF PROGRAM ANALYSIS 1 Outline • Brief forecast • Background • Key Contributions • T-Fuzz Design • T-Fuzz Limitations • Experimental Results • Concluding Remarks • References CSCI 6450 - PRINCIPLES OF PROGRAM ANALYSIS 2 Fuzzing • Randomly generates input to discover bugs in the target program • Highly effective for discovering vulnerabilities • Standard technique in software development to improve reliability and security • Roughly two types of fuzzers - generational and mutational • Generational fuzzers require strict format for generating input • Mutational fuzzers create input by randomly mutating or by random seeds CSCI 6450 - PRINCIPLES OF PROGRAM ANALYSIS 3 Challenges in Fuzzing • Shallow coverage • “Deep Bugs” are hard to find • Reason – fuzzers get stuck at complex sanity checks Shallow Code Path Fig 1: Fuzzing limitations CSCI 6450 - PRINCIPLES OF PROGRAM ANALYSIS 4 Existing Approaches and Limitations • Existing approaches that focus on input generation ◦ AFL (Feedback based) ◦ Driller (Concolic execution based) ◦ VUzzer (Taint analysis based) ◦ etc. • Limitations ◦ Heavyweight analysis (taint analysis, concolic execution) ◦ Not scalable (concolic execution) ◦ Gets stuck on complex sanity checks (e.g. checksum values) ◦ Slow progress for multiple sanity checks (feedback based) CSCI 6450 - PRINCIPLES OF PROGRAM ANALYSIS
    [Show full text]
  • An Efficient Data-Dependence Profiler for Sequential and Parallel Programs
    An Efficient Data-Dependence Profiler for Sequential and Parallel Programs Zhen Li, Ali Jannesari, and Felix Wolf German Research School for Simulation Sciences, 52062 Aachen, Germany Technische Universitat¨ Darmstadt, 64289 Darmstadt, Germany fz.li, [email protected], [email protected] Abstract—Extracting data dependences from programs data dependences without executing the program. Although serves as the foundation of many program analysis and they are fast and even allow fully automatic parallelization transformation methods, including automatic parallelization, in some cases [13], [14], they lack the ability to track runtime scheduling, and performance tuning. To obtain data dependences, more and more related tools are adopting profil- dynamically allocated memory, pointers, and dynamically ing approaches because they can track dynamically allocated calculated array indices. This usually makes their assessment memory, pointers, and array indices. However, dependence pessimistic, limiting their practical applicability. In contrast, profiling suffers from high runtime and space overhead. To dynamic dependence profiling captures only those depen- lower the overhead, earlier dependence profiling techniques dences that actually occur at runtime. Although dependence exploit features of the specific program analyses they are designed for. As a result, every program analysis tool in need profiling is inherently input sensitive, the results are still of data-dependence information requires its own customized useful in many situations, which is why such profiling profiler. In this paper, we present an efficient and at the forms the basis of many program analysis tools [2], [5], same time generic data-dependence profiler that can be used [6]. Moreover, input sensitivity can be addressed by running as a uniform basis for different dependence-based program the target program with changing inputs and computing the analyses.
    [Show full text]
  • Static Program Analysis As a Fuzzing Aid
    Static Program Analysis as a Fuzzing Aid Bhargava Shastry1( ), Markus Leutner1, Tobias Fiebig1, Kashyap Thimmaraju1, Fabian Yamaguchi2, Konrad Rieck2, Stefan Schmid3, Jean-Pierre Seifert1, and Anja Feldmann1 1 TU Berlin, Berlin, Germany [email protected] 2 TU Braunschweig, Braunschweig, Germany 3 Aalborg University, Aalborg, Denmark Abstract. Fuzz testing is an effective and scalable technique to per- form software security assessments. Yet, contemporary fuzzers fall short of thoroughly testing applications with a high degree of control-flow di- versity, such as firewalls and network packet analyzers. In this paper, we demonstrate how static program analysis can guide fuzzing by aug- menting existing program models maintained by the fuzzer. Based on the insight that code patterns reflect the data format of inputs pro- cessed by a program, we automatically construct an input dictionary by statically analyzing program control and data flow. Our analysis is performed before fuzzing commences, and the input dictionary is sup- plied to an off-the-shelf fuzzer to influence input generation. Evaluations show that our technique not only increases test coverage by 10{15% over baseline fuzzers such as afl but also reduces the time required to expose vulnerabilities by up to an order of magnitude. As a case study, we have evaluated our approach on two classes of network applications: nDPI, a deep packet inspection library, and tcpdump, a network packet analyzer. Using our approach, we have uncovered 15 zero-day vulnerabilities in the evaluated software that were not found by stand-alone fuzzers. Our work not only provides a practical method to conduct security evalua- tions more effectively but also demonstrates that the synergy between program analysis and testing can be exploited for a better outcome.
    [Show full text]
  • 5 Key Considerations Towards the Adoption of a Mainframe Inclusive Devops Strategy
    5 Key Considerations Towards the Adoption of a Mainframe Inclusive DevOps Strategy Key considaetrations towards the adoption of a mainframe inclusive devops strategy PLANNING FOR DEVOPS IN MAINFRAME ENVIRONMENTS – KEY CONSIDERATIONS – A mountain of information is available around the implementation and practice of DevOps, collectively encompassing people, processes and technology. This ranges significantly from the implementation of tool chains to discussions around DevOps principles, devoid of technology. While technology itself does not guarantee a successful DevOps implementation, the right tools and solutions can help companies better enable a DevOps culture. Unfortunately, the majority of conversations regarding DevOps center around distributed environments, neglecting the platform that still runs more than 70% of the business and transaction systems in the world; the mainframe. It is estimated that 5 billion lines of new COBOL code are added to live systems every year. Furthermore, 55% of enterprise apps and an estimated 80% of mobile transactions touch the mainframe at some point. Like their distributed brethren, mainframe applications benefit from DevOps practices, boosting speed and lowering risk. Instead of continuing to silo mainframe and distributed development, it behooves practitioners to increase communication and coordination between these environments to help address the pressure of delivering release cycles at an accelerated pace, while improving product and service quality. And some vendors are doing just that. Today, vendors are increasing their focus as it relates to the modernization of the mainframe, to not only better navigate the unavoidable generation shift in mainframe staffing, but to also better bridge mainframe and distributed DevOps cultures. DevOps is a significant topic incorporating community and culture, application and infrastructure development, operations and orchestration, and many more supporting concepts.
    [Show full text]
  • Set-Up Environment
    Set-up Environment Note: While this instruction utilises Windows to demonstrate the installation process, it does apply to other operating systems including Mac OS and Linux. Step 1. Install MinGW on Windows (skip this step if you use Mac OS or Linux) FYI: “MinGW” is a Windows implementation of C compiler that will convert your C source code into executable files. For Mac OS and Linux, C compilers should have already been installed. If you have not already installed “MinGW” for Windows, then use the other document <Install MinGW> on the same LMS page to install it. “MinGW” needs to be installed (or is not installed correctly) if you get the following error message: 'gcc' is not recognized as an internal or external command, operable program or batch file. Step 2.1 Check if you have Java SE 11 (JDK) installed FYI: You are not required to know what “JDK” stands for and what it means. Just bear in mind that it is a prerequisite to running jEdit 5.6.0. How do I know if I have Java installed? Windows: Select from the “Control Panel” => “Programs and Features” and check if you can find an entry called “Java(TM) SE Development Kit 11.0.10 (64-bit)” (or later). If you see such an entry, everything is fine and you can go to Step 3. Otherwise, you do not have Java installed and need to go to Step 2.2 Mac OS / Linux: Open “Terminal” and type “java -version”. If you can see the line: java version “11.0.10” or later, everything is fine and you can go to Step 3.
    [Show full text]
  • Engineering Static Analysis Techniques for Industrial Applications
    Eighth IEEE International Working Conference on Source Code Analysis and Manipulation 90% Perspiration: Engineering Static Analysis Techniques for Industrial Applications Paul Anderson GrammaTech, Inc. 317 N. Aurora St. Ithaca, NY 14850 [email protected] Abstract gineered to allow it to operate on very large programs. Sec- tion 4 describes key decisions that were taken to expand This article describes some of the engineering ap- the reach of the tool beyond its original conception. Sec- proaches that were taken during the development of Gram- tion 5 discusses generalizations that were made to support maTech’s static-analysis technology that have taken it from multiple languages. Section 6 describes the compromises to a prototype system with poor performance and scalability principles that had to be made to be successful. Section 7 and with very limited applicability, to a much-more general- briefly describes anticipated improvements and new direc- purpose industrial-strength analysis infrastructure capable tions. Finally, Section 8 presents some conclusions. of operating on millions of lines of code. A wide variety of code bases are found in industry, and many extremes 2. CodeSurfer of usage exist, from code size through use of unusual, or non-standard features and dialects. Some of the problems associated with handling these code-bases are described, CodeSurfer is a whole-program static analysis infras- and the solutions that were used to address them, including tructure. Given the source code of a program, it analyzes some that were ultimately unsuccessful, are discussed. the program and produces a set of representations, on which static analysis applications can operate.
    [Show full text]
  • Course 2: «Open Source Software (OSS) Engineering Data»
    Course 2: «Open Source Software (OSS) Engineering Data». 1st Day: Metrics and Tools for Software Engineering in Open Source Software 1. Open Software / Hardware Technologies: Introduction to Open Source Software and related technologies. 2. Software Engineering FLOSS: Free Libre Open Source Software in Software Engineering. 3. Metrics for Open Source Software: product and project metrics and related tooling. 2nd Day: Research based on Open Source Software Data 4. Facilitating Metric for White-Box Reuse: we will discuss a metric derived from the analysis of Open Source Project which facilitates the white-box reuse (reuse based on the source code analysis). 5. Extracting components from Open-Source: The Component Adaptation Environment Approach (COPE): we will discuss the results of the OPEN-SME EU Research Project and in particular the COPE tool for extracting reusable components from Open Source Software. 6. Software Engineering Research based on Open Source Software Data: Data, A recent example: we will discuss a recent study of open source repository mailing lists for deriving implications for the evolution of an open source project. 7. Improving Component Coupling Information with Dynamic Profiling: we will discuss how dynamic profiling of an open source program can contribute towards its comprehension. In various points during the lectures students will be asked to carry out activities. Open Software / Hardware Technologies Ioannis Stamelos, Professor Nikolaos Konofaos, Associate Professor School of Informatics Aristotle University of Thessaloniki George Kakarontzas, Assistant Professor University of Thessaly 2018-2019 1 F/OSS - FLOSS Definition ● The traditional SW development model asks for a “closed member” team that develops proprietary source code.
    [Show full text]