Dynamic Program Analysis for Suggesting Test Improvements to Developers Oscar Vera-Pérez
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
ON SOFTWARE TESTING and SUBSUMING MUTANTS an Empirical Study
ON SOFTWARE TESTING AND SUBSUMING MUTANTS An empirical study Master Degree Project in Computer Science Advanced level 15 credits Spring term 2014 András Márki Supervisor: Birgitta Lindström Examiner: Gunnar Mathiason Course: DV736A WECOA13: Web Computing Magister Summary Mutation testing is a powerful, but resource intense technique for asserting software quality. This report investigates two claims about one of the mutation operators on procedural logic, the relation operator replacement (ROR). The constrained ROR mutant operator is a type of constrained mutation, which targets to lower the number of mutants as a “do smarter” approach, making mutation testing more suitable for industrial use. The findings in the report shows that the hypothesis on subsumption is rejected if mutants are to be detected on function return values. The second hypothesis stating that a test case can only detect a single top-level mutant in a subsumption graph is also rejected. The report presents a comprehensive overview on the domain of mutation testing, displays examples of the masking behaviour previously not described in the field of mutation testing, and discusses the importance of the granularity where the mutants should be detected under execution. The contribution is based on literature survey and experiment. The empirical findings as well as the implications are discussed in this master dissertation. Keywords: Software Testing, Mutation Testing, Mutant Subsumption, Relation Operator Replacement, ROR, Empirical Study, Strong Mutation, Weak Mutation -
Advanced Systems Security: Symbolic Execution
Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Advanced Systems Security:! Symbolic Execution Trent Jaeger Systems and Internet Infrastructure Security (SIIS) Lab Computer Science and Engineering Department Pennsylvania State University Systems and Internet Infrastructure Security (SIIS) Laboratory Page 1 Problem • Programs have flaws ‣ Can we find (and fix) them before adversaries can reach and exploit them? • Then, they become “vulnerabilities” Systems and Internet Infrastructure Security (SIIS) Laboratory Page 2 Vulnerabilities • A program vulnerability consists of three elements: ‣ A flaw ‣ Accessible to an adversary ‣ Adversary has the capability to exploit the flaw • Which one should we focus on to find and fix vulnerabilities? ‣ Different methods are used for each Systems and Internet Infrastructure Security (SIIS) Laboratory Page 3 Is It a Flaw? • Problem: Are these flaws? ‣ Failing to check the bounds on a buffer write? ‣ printf(char *n); ‣ strcpy(char *ptr, char *user_data); ‣ gets(char *ptr) • From man page: “Never use gets().” ‣ sprintf(char *s, const char *template, ...) • From gnu.org: “Warning: The sprintf function can be dangerous because it can potentially output more characters than can fit in the allocation size of the string s.” • Should you fix all of these? Systems and Internet Infrastructure Security (SIIS) Laboratory Page 4 Is It a Flaw? • Problem: Are these flaws? ‣ open( filename, O_CREAT ); -
Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK
1 From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK Abstract—This paper1 describes some of the challenges and research questions that target the most productive intersection opportunities when deploying static and dynamic analysis at we have yet witnessed: that between exciting, intellectually scale, drawing on the authors’ experience with the Infer and challenging science, and real-world deployment impact. Sapienz Technologies at Facebook, each of which started life as a research-led start-up that was subsequently deployed at scale, Many industrialists have perhaps tended to regard it unlikely impacting billions of people worldwide. that much academic work will prove relevant to their most The paper identifies open problems that have yet to receive pressing industrial concerns. On the other hand, it is not significant attention from the scientific community, yet which uncommon for academic and scientific researchers to believe have potential for profound real world impact, formulating these that most of the problems faced by industrialists are either as research questions that, we believe, are ripe for exploration and that would make excellent topics for research projects. boring, tedious or scientifically uninteresting. This sociological phenomenon has led to a great deal of miscommunication between the academic and industrial sectors. I. INTRODUCTION We hope that we can make a small contribution by focusing on the intersection of challenging and interesting scientific How do we transition research on static and dynamic problems with pressing industrial deployment needs. Our aim analysis techniques from the testing and verification research is to move the debate beyond relatively unhelpful observations communities to industrial practice? Many have asked this we have typically encountered in, for example, conference question, and others related to it. -
Addresssanitizer + Code Coverage Kostya Serebryany, Google Eurollvm 2014 New and Shiny -Fprofile-Instr-Generate
AddressSanitizer + Code Coverage Kostya Serebryany, Google EuroLLVM 2014 New and shiny -fprofile-instr-generate ● Coming this year ● Fast BB-level code coverage ● Increment a counter per every (*) BB ○ Possible contention on counters ● Creates special non-code sections ○ Counters ○ Function names, line numbers Meanwhile: ASanCoverage ● Tiny prototype-ish thing: ○ Part of AddressSanitizer ○ 30 lines in LLVM, 100 in run-time ● Function- or BB- level coverage ○ Booleans only, not counters ○ No contention ○ No extra sections in the binary At compile time: if (!*BB_Guard) { __sanitizer_cov(); *BB_Guard = 1; } At run time void __sanitizer_cov() { Record(GET_CALLER_PC()); } At exit time ● For every binary/DSO in the process: ○ Dump observed PCs in a separate file as 4-byte offsets At analysis time ● Compare/Merge using 20 lines of python ● Symbolize using regular DWARF % cat cov.c int main() { } % clang -g -fsanitize=address -mllvm -asan-coverage=1 cov. c % ASAN_OPTIONS=coverage=1 ./a.out % wc -c *sancov 4 a.out.15751.sancov % sancov.py print a.out.15751.sancov sancov.py: read 1 PCs from a.out.15751.sancov sancov.py: 1 files merged; 1 PCs total 0x4850b7 % sancov.py print *.sancov | llvm-symbolizer --obj=a.out main /tmp/cov.c:1:0 Fuzzing with coverage feedback ● Test corpus: N random tests ● Randomly mutate random test ○ If new BB is covered -- add this test to the corpus ● Many new bugs in well fuzzed projects! Feedback from our customers ● Speed is paramount ● Binary size is important ○ Permanent & temporary storage, tmps, I/O ○ Stripping non-code -
The Art, Science, and Engineering of Fuzzing: a Survey
1 The Art, Science, and Engineering of Fuzzing: A Survey Valentin J.M. Manes,` HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, and Maverick Woo Abstract—Among the many software vulnerability discovery techniques available today, fuzzing has remained highly popular due to its conceptual simplicity, its low barrier to deployment, and its vast amount of empirical evidence in discovering real-world software vulnerabilities. At a high level, fuzzing refers to a process of repeatedly running a program with generated inputs that may be syntactically or semantically malformed. While researchers and practitioners alike have invested a large and diverse effort towards improving fuzzing in recent years, this surge of work has also made it difficult to gain a comprehensive and coherent view of fuzzing. To help preserve and bring coherence to the vast literature of fuzzing, this paper presents a unified, general-purpose model of fuzzing together with a taxonomy of the current fuzzing literature. We methodically explore the design decisions at every stage of our model fuzzer by surveying the related literature and innovations in the art, science, and engineering that make modern-day fuzzers effective. Index Terms—software security, automated software testing, fuzzing. ✦ 1 INTRODUCTION Figure 1 on p. 5) and an increasing number of fuzzing Ever since its introduction in the early 1990s [152], fuzzing studies appear at major security conferences (e.g. [225], has remained one of the most widely-deployed techniques [52], [37], [176], [83], [239]). In addition, the blogosphere is to discover software security vulnerabilities. At a high level, filled with many success stories of fuzzing, some of which fuzzing refers to a process of repeatedly running a program also contain what we consider to be gems that warrant a with generated inputs that may be syntactically or seman- permanent place in the literature. -
Thesis Template
Automated Testing: Requirements Propagation via Model Transformation in Embedded Software Nader Kesserwan A Thesis In The Concordia Institute For Information System Engineering Presented in Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy (Information and Systems Engineering) at Concordia University Montreal, Quebec, Canada March 2020 © Nader Kesserwan 2020 ABSTRACT Automated Testing: Requirements Propagation via Model Transformation in Embedded Software Nader Kesserwan, Ph.D. Concordia University, 2020 Testing is the most common activity to validate software systems and plays a key role in the software development process. In general, the software testing phase takes around 40-70% of the effort, time and cost. This area has been well researched over a long period of time. Unfortunately, while many researchers have found methods of reducing time and cost during the testing process, there are still a number of important related issues such as generating test cases from UCM scenarios and validate them need to be researched. As a result, ensuring that an embedded software behaves correctly is non-trivial, especially when testing with limited resources and seeking compliance with safety-critical software standard. It thus becomes imperative to adopt an approach or methodology based on tools and best engineering practices to improve the testing process. This research addresses the problem of testing embedded software with limited resources by the following. First, a reverse-engineering technique is exercised on legacy software tests aims to discover feasible transformation from test layer to test requirement layer. The feasibility of transforming the legacy test cases into an abstract model is shown, along with a forward engineering process to regenerate the test cases in selected test language. -
Testing, Debugging & Verification
Testing, debugging & verification Srinivas Pinisetty This course Introduction to techniques to get (some) certainty that your program does what it’s supposed to. Specification: An unambiguous description of what a function (program) should do. Bug: failure to meet specification. What is a Bug? Basic Terminology ● Defect (aka bug, fault) introduced into code by programmer (not always programmer's fault, if, e.g., requirements changed) ● Defect may cause infection of program state during execution (not all defects cause infection) ● Infected state propagates during execution (infected parts of states may be overwritten or corrected) ● Infection may cause a failure: an externally observable error (including, e.g., non-termination) Terminology ● Testing - Check for bugs ● Debugging – Relating a failure to a defect (systematically find source of failure) ● Specification - Describe what is a bug ● (Formal) Verification - Prove that there are no bugs Cost of certainty Formal Verification Property based testing Unit testing Man hours (*) Graph not based on data, only indication More certainty = more work Contract metaphor Supplier: (callee) Implementer of method Client: (caller) Implementer of calling method or user Contract: Requires (precondition): What the client must ensure Ensures (postcondition): What the supplier must ensure ● Testing ● Formal specification ○ Unit testing ○ Logic ■ Coverage criteria ■ Propositional logic ● Control-Flow based ■ Predicate Logic ● Logic Based ■ SAT ■ Extreme Testing ■ SMT ■ Mutation testing ○ Dafny ○ Input -
Static Program Analysis As a Fuzzing Aid
Static Program Analysis as a Fuzzing Aid Bhargava Shastry1( ), Markus Leutner1, Tobias Fiebig1, Kashyap Thimmaraju1, Fabian Yamaguchi2, Konrad Rieck2, Stefan Schmid3, Jean-Pierre Seifert1, and Anja Feldmann1 1 TU Berlin, Berlin, Germany [email protected] 2 TU Braunschweig, Braunschweig, Germany 3 Aalborg University, Aalborg, Denmark Abstract. Fuzz testing is an effective and scalable technique to per- form software security assessments. Yet, contemporary fuzzers fall short of thoroughly testing applications with a high degree of control-flow di- versity, such as firewalls and network packet analyzers. In this paper, we demonstrate how static program analysis can guide fuzzing by aug- menting existing program models maintained by the fuzzer. Based on the insight that code patterns reflect the data format of inputs pro- cessed by a program, we automatically construct an input dictionary by statically analyzing program control and data flow. Our analysis is performed before fuzzing commences, and the input dictionary is sup- plied to an off-the-shelf fuzzer to influence input generation. Evaluations show that our technique not only increases test coverage by 10{15% over baseline fuzzers such as afl but also reduces the time required to expose vulnerabilities by up to an order of magnitude. As a case study, we have evaluated our approach on two classes of network applications: nDPI, a deep packet inspection library, and tcpdump, a network packet analyzer. Using our approach, we have uncovered 15 zero-day vulnerabilities in the evaluated software that were not found by stand-alone fuzzers. Our work not only provides a practical method to conduct security evalua- tions more effectively but also demonstrates that the synergy between program analysis and testing can be exploited for a better outcome. -
A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks
A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks Pallavi Joshi Chang-Seo Park Mayur Naik Koushik Sen Intel Research, Berkeley, USA EECS Department, UC Berkeley, USA [email protected] {pallavi,parkcs,ksen}@cs.berkeley.edu Abstract cipline followed by the parent software and this sometimes results We present a novel dynamic analysis technique that finds real dead- in deadlock bugs [17]. locks in multi-threaded programs. Our technique runs in two stages. Deadlocks are often difficult to find during the testing phase In the first stage, we use an imprecise dynamic analysis technique because they happen under very specific thread schedules. Coming to find potential deadlocks in a multi-threaded program by observ- up with these subtle thread schedules through stress testing or ing an execution of the program. In the second stage, we control random testing is often difficult. Model checking [15, 11, 7, 14, 6] a random thread scheduler to create the potential deadlocks with removes these limitations of testing by systematically exploring high probability. Unlike other dynamic analysis techniques, our ap- all thread schedules. However, model checking fails to scale for proach has the advantage that it does not give any false warnings. large multi-threaded programs due to the exponential increase in We have implemented the technique in a prototype tool for Java, the number of thread schedules with execution length. and have experimented on a number of large multi-threaded Java Several program analysis techniques, both static [19, 10, 2, 9, programs. We report a number of previously known and unknown 27, 29, 21] and dynamic [12, 13, 4, 1], have been developed to de- real deadlocks that were found in these benchmarks. -
A Zero Kernel Operating System: Rethinking Microkernel Design by Leveraging Tagged Architectures and Memory-Safe Languages
A Zero Kernel Operating System: Rethinking Microkernel Design by Leveraging Tagged Architectures and Memory-Safe Languages by Justin Restivo B.S., Massachusetts Institute of Technology (2019) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Masters of Engineering in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2020 © Massachusetts Institute of Technology 2020. All rights reserved. Author............................................................................ Department of Electrical Engineering and Computer Science January 29, 2020 Certified by....................................................................... Dr. Howard Shrobe Principal Research Scientist, MIT CSAIL Thesis Supervisor Certified by....................................................................... Dr. Hamed Okhravi Senior Staff Member, MIT Lincoln Laboratory Thesis Supervisor Certified by....................................................................... Dr. Samuel Jero Technical Staff, MIT Lincoln Laboratory Thesis Supervisor Accepted by...................................................................... Katrina LaCurts Chair, Master of Engineering Thesis Committee DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited. This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions -
Different Types of Testing
Different Types of Testing Performance testing a. Performance testing is designed to test run time performance of software within the context of an integrated system. It is not until all systems elements are fully integrated and certified as free of defects the true performance of a system can be ascertained b. Performance tests are often coupled with stress testing and often require both hardware and software infrastructure. That is, it is necessary to measure resource utilization in an exacting fashion. External instrumentation can monitor intervals, log events. By instrument the system, the tester can uncover situations that lead to degradations and possible system failure Security testing If your site requires firewalls, encryption, user authentication, financial transactions, or access to databases with sensitive data, you may need to test these and also test your site's overall protection against unauthorized internal or external access Exploratory Testing Often taken to mean a creative, internal software test that is not based on formal test plans or test cases; testers may be learning the software as they test it Benefits Realization tests With the increased focus on the value of Business returns obtained from investments in information technology, this type of test or analysis is becoming more critical. The benefits realization test is a test or analysis conducted after an application is moved into production in order to determine whether the application is likely to deliver the original projected benefits. The analysis is usually conducted by the business user or client group who requested the project and results are reported back to executive management Mutation Testing Mutation testing is a method for determining if a set of test data or test cases is useful, by deliberately introducing various code changes ('bugs') and retesting with the original test data/cases to determine if the 'bugs' are detected. -
Guidelines on Minimum Standards for Developer Verification of Software
Guidelines on Minimum Standards for Developer Verification of Software Paul E. Black Barbara Guttman Vadim Okun Software and Systems Division Information Technology Laboratory July 2021 Abstract Executive Order (EO) 14028, Improving the Nation’s Cybersecurity, 12 May 2021, di- rects the National Institute of Standards and Technology (NIST) to recommend minimum standards for software testing within 60 days. This document describes eleven recommen- dations for software verification techniques as well as providing supplemental information about the techniques and references for further information. It recommends the following techniques: • Threat modeling to look for design-level security issues • Automated testing for consistency and to minimize human effort • Static code scanning to look for top bugs • Heuristic tools to look for possible hardcoded secrets • Use of built-in checks and protections • “Black box” test cases • Code-based structural test cases • Historical test cases • Fuzzing • Web app scanners, if applicable • Address included code (libraries, packages, services) The document does not address the totality of software verification, but instead recom- mends techniques that are broadly applicable and form the minimum standards. The document was developed by NIST in consultation with the National Security Agency. Additionally, we received input from numerous outside organizations through papers sub- mitted to a NIST workshop on the Executive Order held in early June, 2021 and discussion at the workshop as well as follow up with several of the submitters. Keywords software assurance; verification; testing; static analysis; fuzzing; code review; software security. Disclaimer Any mention of commercial products or reference to commercial organizations is for infor- mation only; it does not imply recommendation or endorsement by NIST, nor is it intended to imply that the products mentioned are necessarily the best available for the purpose.