Integrating a Graph Builder Into Python Tutor

Total Page:16

File Type:pdf, Size:1020Kb

Integrating a Graph Builder Into Python Tutor Integrating a Graph Builder into Python Tutor Diogo Soares # University of Minho, Braga Portugal Maria João Varanda Pereira1 # Ñ Research Centre in Digitalization and Intelligent Robotics, Polythechnic Insitute of Bragança, Portugal Pedro Rangel Henriques # Ñ University of Minho, Braga, Portugal Abstract Analysing unknown source code to comprehend it is quite hard and expensive task. Therefore, the Program Comprehension (PC) subject has always been an area of interest as it helps to realize how a program works by identifying the code that implements each functionality. This means being able to map the problem domain with the program domain. PC is a complex area, but its importance for programmers is so high that many approaches and tools were proposed along the last two decades. Program Animation is one of those approaches requiring specialized techniques. For each programming language, there are already tools that enable us to execute a program step by step, visualize its execution path, observe the effect of each instruction on its data structures, and inspect the value of its variables at any point. In the present context, we sustain the idea that PC techniques and tools can also be of great value for students taking the first steps in programming using a specific language. To this end, weaimto improve Python Tutor, a well-known program visualization tool, with graph-based representations of source code such as Control Flow Graph (CFG), Data Flow Graph (DFG), Function Call Graph (FCG) and System Control Graph (SCG). This helps novice programmers to understand the source code analyzing not only the variable contents but also a set of automatically generated graph-based visualizations, that were not included in Python Tutor so far. This will allow the students to be focused on certain aspects of the program (depending on the graph), abstracting others such as details of its syntax. 2012 ACM Subject Classification Human-centered computing → Visualization systems and tools; Social and professional topics → Computer science education; Software and its engineering → Source code generation Keywords and phrases Program Visualization, Python Tutor, Data Flow Graphs, Control Flow Graphs Digital Object Identifier 10.4230/OASIcs.ICPEC.2021.6 Supplementary Material Software (Graph Builder): https://graph-builder.di.uminho.pt/visualize.html Funding This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the Projects Scopes: UIDB/05757/2020 and UIDB/00319/2020. 1 Introduction There’s along the web many tutorials on programming and several learning methodologies but the majority of them uses the same formula [4]: start with the explanation of a set of examples and propose some similar exercises that train the students to write similar code to solve them. To fully understand a programming language, it is not enough to read the code 1 to mark corresponding author © Diogo Soares, Maria João Varanda Pereira, and Pedro Rangel Henriques; licensed under Creative Commons License CC-BY 4.0 Second International Computer Programming Education Conference (ICPEC 2021). Editors: Pedro Rangel Henriques, Filipe Portela, Ricardo Queirós, and Alberto Simões; Article No. 6; pp. 6:1–6:15 OpenAccess Series in Informatics Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany 6:2 Graph-Based Visualization Builder of programs written in that language, it is also necessary to comprehend how the computer behaves when executing the programs code. For each piece of code, the student shall be able to build a full map between the program domain and the problem domain. Usually it is hard to understand his own code, even to a software engineer that has to maintain third-part code he never saw before, the task is much harder. The average developer spends about 60% to 90% of his time understanding code previously developed; these figures are not changing over time [19]. With this in mind, this paper describes and discusses the integration into Python Tutor of a new tool for automatic generation of graph-based visualizations to turn ease the task of program comprehension. As said in [7], the best way to understand programs is by giving programs another aspect than that of their source code. This new feature works as Python Tutor plugin and it analyses the input code and produces several types of complementary graphs depending on the user desires. As the program comprehension task is also very relevant for the beginners wishing to understand how a given program works and how it solves a given problem, we believe that Python Tutor improved with our plug-in will be very useful to help programming students. The paper has 5 sections after that. The state of the art on program animation appears in Section 2. Section 3 discusses how graph-based visualizations can be included in an Animator adding actual value to it. Section 4 exhibits the architecture to build the new tool and discusses how it can be developed and integrated into the host tool. Section 5 illustrates the final system built and describes a experiment conducted to assess the value addedby the visualizations provided. Section 6 closes the paper and points out some directions for future research. 2 Program Visualization and Animation Think-aloud protocol [22] is a method where a person verbalize his thoughts and it can be used not only to improve self−understanding and reasoning but also to explain the cognitive process to others. That’s why is so important to produce documentation when developing software. The use of this method also allows realizing that the thoughts of a developer differ according to the familiarity of a program’s domains [17]. With this in mind, it was derived two concepts of comprehension models, the top-down comprehension model and the bottom-up comprehension models. The top-down approach is used when the programmer is familiar with the program he’s trying to understand. He starts by creating a general hypotheses about the program goal and how it is achieved by using his previous knowledge about the program. This is called the expectation-based comprehension, where the programmer has pre-generated expectations of the code’s meaning [13]. After creating his hypotheses, the developer starts searching in the code for algorithms/structures that he can link to his theory and refines his hypotheses as he does that [6]. This is the inference-based comprehension [13]. If the programmer has no previous knowledge about a program, he has to analyze line by line to create a hypothesis about the program purpose. In this method, the developer starts aggregating functions by their goals. What usually happens is that developers end by using an integrated model [11] where they use top-down approach when possible and bottom-up when necessary. Although the way of coding varies from person to person, follow a guide of good practices in programming (i.e. appropriate variable names and indentation rules [18]) can lead to an uniformed structure that makes program comprehension easier. A different form of coding can really slow down the comprehension of a program for the most experienced programmer [20]. D. Soares, M. J. V. Pereira, and P. R. Henriques 6:3 Nowadays, there’s even more factors that can affect program comprehension. For example, a simple color schema, available in syntax-oriented text editor/IDE, helps the programmers comprehending a program [16]. Newer studies have their focus not only on source code comprehension, but also on the comprehension of structures, hierarchies and architecture of the program as well on the relations between components [19]. Software visualization is the process of giving a visual representation of a program. It takes the information that is presented in the source code and converts it to a graphical representation with the goal of improving the comprehension of that program. Although a visual image can help to understand a program, it’s not that simple since a basic program can have multiple representations. The best one will depend on the task to be performed and what the developer wants to know. This also means that each representation has to be thought for specific tasks and a good portray of the system is imperative for it to behelpful. Simply repackaging massive textual information into a massive graphical representation is not helpful, so a linear translation is not ideal since experts want to see software visualised in context – not just what the code does, but what it means [14]. The visual representation of a program can be focused in some aspects of the program and it can have different levels of abstraction. This can also solve scalability problems because the visualization of big programs at once it will not be a good help. So, the visualization tool must allow an high level of interactivity to go one step at a time, filter the information, to establish the data that is wanted, to define the level of abstraction and zoom facilities [8]. Graphically these representations can be based on diagrams, maps, icons, graphs and specific drawings. Some examples are the Nassi-Shneiderman diagram used to represent a control-flow of a program [12]; Space-filling (like tree maps or sunburts) allows to visualize in a very compressed way code metrics and statistics [5]; Graphs are the most common representation and consists of nodes and arcs where the nodes represent blocks of code and the arcs its flow [10]. In terms of functionality, there are two types of visualizations: static and dynamic [10]. A static representation can be seen as a simple image of the program, it doesn’t change over time and it’s based on static information. Dynamic representations shows the information that changes as the program is executed. A good example of an animated visualization tool is Python Tutor.
Recommended publications
  • Undefined Behaviour in the C Language
    FAKULTA INFORMATIKY, MASARYKOVA UNIVERZITA Undefined Behaviour in the C Language BAKALÁŘSKÁ PRÁCE Tobiáš Kamenický Brno, květen 2015 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Vedoucí práce: RNDr. Adam Rambousek ii Acknowledgements I am very grateful to my supervisor Miroslav Franc for his guidance, invaluable help and feedback throughout the work on this thesis. iii Summary This bachelor’s thesis deals with the concept of undefined behavior and its aspects. It explains some specific undefined behaviors extracted from the C standard and provides each with a detailed description from the view of a programmer and a tester. It summarizes the possibilities to prevent and to test these undefined behaviors. To achieve that, some compilers and tools are introduced and further described. The thesis contains a set of example programs to ease the understanding of the discussed undefined behaviors. Keywords undefined behavior, C, testing, detection, secure coding, analysis tools, standard, programming language iv Table of Contents Declaration ................................................................................................................................ ii Acknowledgements .................................................................................................................. iii Summary .................................................................................................................................
    [Show full text]
  • C Language Tutorial
    C Language Tutorial Version 0.042 March, 1999 Original MS-DOS tutorial by Gordon Dodrill, Coronado Enterprises. Moved to Applix by Tim Ward Typed by Karen Ward C programs converted by Tim Ward and Mark Harvey with assistance from Kathy Morton for Visual Calculator Pretty printed by Eric Lindsay Applix 1616 microcomputer project Applix Pty Ltd Introduction The C programming language was originally developed by Dennis Ritchie of Bell Laboratories, and was designed to run on a PDP-11 with a UNIX operating system. Although it was originally intended to run under UNIX, there was a great interest in running it on the IBM PC and com- patibles, and other systems. C is excellent for actually writing system level programs, and the entire Applix 1616/OS operating system is written in C (except for a few assembler routines). It is an excellent language for this environment because of the simplicity of expression, the compactness of the code, and the wide range of applicability. It is not a good "beginning" language because it is somewhat cryptic in nature. It allows the programmer a wide range of operations from high level down to a very low level approaching the level of assembly language. There seems to be no limit to the flexibility available. One experienced C programmer made the statement, "You can program anything in C", and the statement is well supported by my own experience with the language. Along with the resulting freedom however, you take on a great deal of responsibility. It is very easy to write a program that destroys itself due to the silly little errors that, say, a Pascal compiler will flag and call a fatal error.
    [Show full text]
  • Codesonar the SAST Platform for Devsecops
    DATASHEET CodeSonar The SAST Platform for DevSecOps Accelerate Application Security Software teams are under constant pressure to deliver more content with higher complexity, in shorter timeframes, with increased quality and security. Static Application Security Testing is a proven best practice to help software teams deliver the best code in the shortest timeframe. GrammaTech has been a leader in this field for over 15 years with CodeSonar delivering multi-language SAST capabilities for enterprises where software quality and software security matter. DevSecOps - Speed and Scale Language Support Software developers need rapid feedback on security CodeSonar supports many popular languages, including vulnerabilities in their code. CodeSonar can be integrated into C/C++, Java, C# and Android, as well as support for native software development environments, works unobtrusively to binaries in Intel, Arm and PowerPC instruction set architectures. the developer and provides rapid feedback. CodeSonar also supports OASIS SARIF, for exchange of information with other tools in the DevSecOps environment. Examples of Defects Detected • Buffer over- and underruns • Cast and conversion problems • Command injection • Copy-paste error • Concurrency • Ignored return value • Memory leak • Tainted data • Null pointer dereference • Dangerous function • Unused parameter / value And hundreds more Security Quality Privacy Broad coverage of security vulnerabilities, Integration into DevSecOps to improve Checkers that detect performance including OWASP Top10, SANS/CWE 25. quality of the code and developer impacts such as unnecessary test for Support for third party applications efficiency. Find code quality and nullness, creation of redundant objects or through byte code analysis. performance issues at speed. superfluous memory writes. Team Support Built In CodeSonar is designed to support large teams.
    [Show full text]
  • Grammatech Codesonar Product Datasheet
    GRAMMATECH CODESONAR PRODUCT DATASHEET Static Code Analysis and Static Application Security Testing Software Assurance Services Delivered by a senior software CodeSonar empowers teams to quickly analyze and validate the code – source and/or binary – engineer, the software assurance identifying serious defects or bugs that cause cyber vulnerabilities, system failures, poor services focus on automating reliability, or unsafe conditions. reporting on your software quality, creating an improvement plan and GrammaTech’s Software Assurance Services provide the benets of CodeSonar to your team in measuring your progress against that an accelerated timeline and ensures that you make the best use of SCA and SAST. plan. This provides your teams with reliable, fast, actionable data. GrammaTech will manage the static Enjoy the Benets of the Deepest Static Analysis analysis engine on your premises for you, such that your resources can Employ Sophisticated Algorithms Comply with Coding Standards focus on developing software. The following activities are covered: CodeSonar performs a unied dataow and CodeSonar supports compliance with standards symbolic execution analysis that examines the like MISRA C:2012, IS0-26262, DO-178B, Integration in your release process computation of the entire program. The US-CERT’s Build Security In, and MITRE’S CWE. Integration in check-in process approach does not rely on pattern matching or similar approximations. CodeSonar’s deeper Automatic assignment of defects analysis naturally nds defects with new or Reduction of parse errors unusual patterns. Review of warnings Analyze Millions of Lines of Code Optimization of conguration CodeSonar can perform a whole-program Improvement plan and tracking analysis on 10M+ lines of code.
    [Show full text]
  • Codesonar for C# Datasheet
    DATASHEET CodeSonar C# SAST when Safety and Security Matter Accelerate Application Security Software teams are under constant pressure to deliver more content with higher complexity, in shorter timeframes, with increased quality and security. Static Application Security Testing is a proven best practice to help software teams deliver the best code in the shortest timeframe. GrammaTech has been a leader in this field for over 15 years with CodeSonar delivering C# multi-language SAST capabilities for enterprises where software quality and software security matter. DevSecOps - Speed and Scale Abstract Interpretation Software developers need rapid feedback on security GrammaTech SAST tools use the concept of abstract vulnerabilities in their work artifacts. CodeSonar can be interpretation to statically examine all the paths through the integrated into software development environments, can work application and understand the values of variables and how they unobtrusively to the developer and provide rapid feedback. impact program state. Abstract interpretation gives CodeSonar for C# the highest scores in vulnerability benchmarks. Security Quality Privacy Broad coverage of security vulnerabilities, Integration into DevSecOps to improve Checkers that detect performance including OWASP Top10, SANS/CWE 25. quality of the code and developer impacts such as unnecessary test for Support for third party applications efficiency. Find code quality and nullness, creation of redundant objects or through byte code analysis. performance issues at speed. superfluous memory writes. Use Cases Enterprise customers are using C# in their internal applications, either in-house built, or built by a third-party. Static analysis is needed to to improve security and quality to drive business continuity. Mobile and Client customers are using C# on end-points, sometimes in an internet-of-things deployment, or to provide information to mobile users.
    [Show full text]
  • Software Vulnerabilities Principles, Exploitability, Detection and Mitigation
    Software vulnerabilities principles, exploitability, detection and mitigation Laurent Mounier and Marie-Laure Potet Verimag/Université Grenoble Alpes GDR Sécurité – Cyber in Saclay (Winter School in CyberSecurity) February 2021 Software vulnerabilities . are everywhere . and keep going . 2 / 35 Outline Software vulnerabilities (what & why ?) Programming languages (security) issues Exploiting a sofwtare vulnerability Software vulnerabilities mitigation Conclusion Example 1: password authentication Is this code “secure” ? boolean verify (char[] input, char[] passwd , byte len) { // No more than triesLeft attempts if (triesLeft < 0) return false ; // no authentication // Main comparison for (short i=0; i <= len; i++) if (input[i] != passwd[i]) { triesLeft-- ; return false ; // no authentication } // Comparison is successful triesLeft = maxTries ; return true ; // authentication is successful } functional property: verify(input; passwd; len) , input[0::len] = passwd[0::len] What do we want to protect ? Against what ? I confidentiality of passwd, information leakage ? I control-flow integrity of the code I no unexpected runtime behaviour, etc. 3 / 35 Example 2: make ‘python -c ’print "A"*5000’‘ run make with a long argument crash (in recent Ubuntu versions) Why do we need to bother about crashes (wrt. security) ? crash = consequence of an unexpected run-time error not trapped/foreseen by the programmer, nor by the compiler/interpreter ) some part of the execution: I may take place outside the program scope/semantics I but can be controled/exploited by an attacker (∼ “weird machine”) out of scope execution runtime error crash normal execution possibly exploitable ... security breach ! ,! may break all security properties ... from simple denial-of-service to arbitrary code execution Rk: may also happen silently (without any crash !) 4 / 35 Back to the context: computer system security what does security mean ? I a set of general security properties: CIA Confidentiality, Integrity, Availability (+ Non Repudiation + Anonymity + .
    [Show full text]
  • An Intelligent Debugging Tutor for Novice Computer Science Students Elizabeth Emily Carter Lehigh University
    Lehigh University Lehigh Preserve Theses and Dissertations 2014 An Intelligent Debugging Tutor For Novice Computer Science Students Elizabeth Emily Carter Lehigh University Follow this and additional works at: http://preserve.lehigh.edu/etd Part of the Computer Sciences Commons Recommended Citation Carter, Elizabeth Emily, "An Intelligent Debugging Tutor For Novice Computer Science Students" (2014). Theses and Dissertations. Paper 1447. This Dissertation is brought to you for free and open access by Lehigh Preserve. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Lehigh Preserve. For more information, please contact [email protected]. An Intelligent Debugging Tutor For Novice Computer Science Students By Elizabeth Emily Carter Presented to the Graduate and Research Committee Of Lehigh University In Candidacy for the Degree of Doctor of Philosophy In Computer Science Lehigh University May 2014 © Copyright by Elizabeth Carter All rights reserved ii Approved and recommended for acceptance as a dissertation in partial fulfillment of the requirements for the degree of Doctor of Philosophy __________________ Date ________________________ Dissertation Advisor (Dr. Gang Tan) ___________________ Accepted Date Committee Members: _______________________ Dr. Gang Tan ________________________ Dr. Jeff Heflin ________________________ Dr. H. Lynn Columba ________________________ Dr. Glenn Blank iii Acknowledgements: First and foremost I would like to thank my de facto advisor Dr. Glenn Blank. He was the first faculty member I spoke to when I applied to Lehigh’s PhD program and he has guided me through the entire process. His guidance and editing assistance have been absolutely invaluable. I would like to thank Dr. Blank, Dr. Henry Odi, Dr. H.
    [Show full text]
  • Code Review Guide
    CODE REVIEW GUIDE 2.0 RELEASE Project leaders: Larry Conklin and Gary Robinson Creative Commons (CC) Attribution Free Version at: https://www.owasp.org 1 F I 1 Forward - Eoin Keary Introduction How to use the Code Review Guide 7 8 10 2 Secure Code Review 11 Framework Specific Configuration: Jetty 16 2.1 Why does code have vulnerabilities? 12 Framework Specific Configuration: JBoss AS 17 2.2 What is secure code review? 13 Framework Specific Configuration: Oracle WebLogic 18 2.3 What is the difference between code review and secure code review? 13 Programmatic Configuration: JEE 18 2.4 Determining the scale of a secure source code review? 14 Microsoft IIS 20 2.5 We can’t hack ourselves secure 15 Framework Specific Configuration: Microsoft IIS 40 2.6 Coupling source code review and penetration testing 19 Programmatic Configuration: Microsoft IIS 43 2.7 Implicit advantages of code review to development practices 20 2.8 Technical aspects of secure code review 21 2.9 Code reviews and regulatory compliance 22 5 A1 3 Injection 51 Injection 52 Blind SQL Injection 53 Methodology 25 Parameterized SQL Queries 53 3.1 Factors to Consider when Developing a Code Review Process 25 Safe String Concatenation? 53 3.2 Integrating Code Reviews in the S-SDLC 26 Using Flexible Parameterized Statements 54 3.3 When to Code Review 27 PHP SQL Injection 55 3.4 Security Code Review for Agile and Waterfall Development 28 JAVA SQL Injection 56 3.5 A Risk Based Approach to Code Review 29 .NET Sql Injection 56 3.6 Code Review Preparation 31 Parameter collections 57 3.7 Code Review Discovery and Gathering the Information 32 3.8 Static Code Analysis 35 3.9 Application Threat Modeling 39 4.3.2.
    [Show full text]
  • Applications of SMT Solvers to Program Verification
    Nikolaj Bjørner Leonardo de Moura Applications of SMT solvers to Program Verification Rough notes for SSFT 2014 Prepared as part of a forthcoming revision of Daniel Kr¨oningand Ofer Strichman's book on Decision Procedures May 19, 2014 Springer Contents 1 Applications of SMT Solvers ............................... 5 1.1 Introduction . .5 1.2 From Programs to Logic . .6 1.2.1 An Imperative Programming Language Substrate. .6 1.2.2 Programs that are logic in disguise . .8 1.3 Test-case Generation using Dynamic Symbolic Execution . .9 1.3.1 The Application . .9 1.3.2 An Example . 10 1.3.3 Methodolgy. 11 1.3.4 Interfacing with SMT Solvers . 13 1.3.5 Industrial Adoption . 14 1.4 Symbolic Software Model checking . 15 1.4.1 The Application . 15 1.4.2 An Example . 15 1.4.3 Methodolgy. 16 1.4.4 Interfacing with SMT Solvers . 18 1.4.5 Industrial Adoption . 18 1.5 Static Analysis . 19 1.5.1 The Application . 19 1.5.2 An Example . 20 1.5.3 Methodolgy. 20 1.5.4 Interfacing with SMT Solvers . 22 1.5.5 Industrial Adoption . 22 1.6 Program Verification . 23 1.6.1 The Application . 23 1.6.2 An Example . 23 1.6.3 Methodolgy. 25 1.6.4 Bit-precise reasoning . 28 1.6.5 Industrial Adoption . 29 1.7 Bibliographical notes . 29 4 Contents References ..................................................... 31 1 Applications of SMT Solvers 1.1 Introduction A significant application domain for SMT solvers is in the analysis, verifi- cation, testing and construction of programs.
    [Show full text]
  • Report on the Third Static Analysis Tool Exposition (SATE 2010)
    Special Publication 500-283 Report on the Third Static Analysis Tool Exposition (SATE 2010) Editors: Vadim Okun Aurelien Delaitre Paul E. Black Software and Systems Division Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899 October 2011 U.S. Department of Commerce John E. Bryson, Secretary National Institute of Standards and Technology Patrick D. Gallagher, Under Secretary for Standards and Technology and Director Abstract: The NIST Software Assurance Metrics And Tool Evaluation (SAMATE) project conducted the third Static Analysis Tool Exposition (SATE) in 2010 to advance research in static analysis tools that find security defects in source code. The main goals of SATE were to enable empirical research based on large test sets, encourage improvements to tools, and promote broader and more rapid adoption of tools by objectively demonstrating their use on production software. Briefly, participating tool makers ran their tool on a set of programs. Researchers led by NIST performed a partial analysis of tool reports. The results and experiences were reported at the SATE 2010 Workshop in Gaithersburg, MD, in October, 2010. The tool reports and analysis were made publicly available in 2011. This special publication consists of the following three papers. “The Third Static Analysis Tool Exposition (SATE 2010),” by Vadim Okun, Aurelien Delaitre, and Paul E. Black, describes the SATE procedure and provides observations based on the data collected. The other two papers are written by participating tool makers. “Goanna Static Analysis at the NIST Static Analysis Tool Exposition,” by Mark Bradley, Ansgar Fehnker, Ralf Huuck, and Paul Steckler, introduces Goanna, which uses a combination of static analysis with model checking, and describes its SATE experience, tool results, and some of the lessons learned in the process.
    [Show full text]
  • Partner Directory Wind River Partner Program
    PARTNER DIRECTORY WIND RIVER PARTNER PROGRAM The Internet of Things (IoT), cloud computing, and Network Functions Virtualization are but some of the market forces at play today. These forces impact Wind River® customers in markets ranging from aerospace and defense to consumer, networking to automotive, and industrial to medical. The Wind River® edge-to-cloud portfolio of products is ideally suited to address the emerging needs of IoT, from the secure and managed intelligent devices at the edge to the gateway, into the critical network infrastructure, and up into the cloud. Wind River offers cross-architecture support. We are proud to partner with leading companies across various industries to help our mutual customers ease integration challenges; shorten development times; and provide greater functionality to their devices, systems, and networks for building IoT. With more than 200 members and still growing, Wind River has one of the embedded software industry’s largest ecosystems to complement its comprehensive portfolio. Please use this guide as a resource to identify companies that can help with your development across markets. For updates, browse our online Partner Directory. 2 | Partner Program Guide MARKET FOCUS For an alphabetical listing of all members of the *Clavister ..................................................37 Wind River Partner Program, please see the Cloudera ...................................................37 Partner Index on page 139. *Dell ..........................................................45 *EnterpriseWeb
    [Show full text]
  • Automatic Bug-Finding Techniques for Large Software Projects
    Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics!"#$%&'()+,-./012345<yA| Automatic Bug-finding Techniques for Large Software Projects Ph.D. Thesis Jiri Slaby Brno, 2013 Revision 159 Supervisor: prof. RNDr. Antonín Kučera, PhD. Co-supervisor: assoc. prof. RNDr. Jan Strejček, PhD. ii Copyright c Jiri Slaby, 2013 All rights reservedtož to ja iii iv Abstract This thesis studies the application of contemporary automatic bug-finding techniques to large software projects. As they tend to be complex systems, there is a call for efficient analyzers that can exercise such projects without generating too many false positives. To fulfill these requirements, this thesis aims at the static analysis with the primary focus at techniques like the abstract interpretation and symbolic execution. Since their introduction in 1970’s, their applicability is increasing thanks to current fast processors and further improvements of the techniques. The thesis summarizes the current state of the art of chosen techniques implemented in the current static analyzers. Further, we discuss some large code bases which can be used as a test-bed for the techniques. Finally, we choose the Linux kernel. On the top of the explored state of the art, we de- velop our own techniques. We present the framework Stanse which utilizes the abstract interpretation and has no issues with handling the Linux kernel. The framework also prunes false positives resulting from some checkers. The pruning method is based on the pattern matching, but it turns out to be weak. Hence we subsequently try to improve the situation by the symbolic execution. For that purpose we use an executor called Klee in our new tool Symbiotic.
    [Show full text]