Cost-Effective Quality Assurance for Long-Lived Software Using Automated Static Analysis

Cost-Effective Quality Assurance For Long-Lived Software Using Automated Static Analysis Daniela Steidl Institut für Informatik der Technischen Universität München Cost-Effective Quality Assurance For Long-Lived Software Using Automated Static Analysis Daniela Steidl Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Bernd Brügge, Ph.D. Prüfer der Dissertation: 1. Univ.-Prof. Dr. Dr. h.c. Manfred Broy 2. Assoc. Prof. Andy Zaidman, Ph.D. Delft University of Technology/ Niederlande Die Dissertation wurde am 05.11.2015 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 26.01.2016 angenommen. Abstract Developing large-scale, long-lived software and adapting it to changing requirements is time and cost intensive. As many systems are maintained for decades, controlling their total life cycle costs is key for commercial success. With source code being the main software artifact, its quality significantly influences arising costs for maintenance. However, as many changes have to be performed under time pressure, suboptimal implementation decisions often lead to gradual code quality decay if no counter measures are taken. To prevent code quality decay, automated static analysis is a powerful quality assurance technique that addresses different aspects of software quality—e. g. security, correctness, or maintainability—and reveals findings: Findings point to code that likely increases future costs in the life cycle of the software. Examples comprise hints to programming faults, vulnerabilities for security attacks, or maintainability problems like redundant code. Although static analyses are widely used, they are often not applied effectively: When introduced to a grown, long-lived system, they typically reveal thousands of findings. As this number is too large and budget for quality improvements is usually limited, not all findings can be removed. Additionally, false positives disturb developers and new findings might be created when existing ones are removed. Due to these challenges, code quality often does not improve in the long term despite static analysis tools being installed. This thesis provides an approach for using static analysis cost-effectively in practice. To reach this goal, we first analyze the state-of-the-practice of using static analysis and inves- tigate the number and the nature of findings occurring in industry. As we show that not all findings can be removed at once, we argue that only those should be selected whichre- veal the best cost-benefit ratio. Consequently, we build a conceptual model which outlines costs and benefits of finding removal. From the model, we derive a prioritization approach which helps developers to effectively select a subset of all findings for removal. However, to actually remove the selected findings, developers also require allocated resources and, thus, management support. To convince management of the benefit of providing resources for quality improvement, we combine the prioritization approach with a continuous quality control process which provides transparency for managers. Further, we evaluate corresponding tool support necessary to apply the process in practice. The combination of our prioritization approach, quality control process, and tooling en- ables the cost-effective usage of static analysis in practice. We evaluate the applicability and usefulness of the cost-benefit prioritization with several empirical studies, industrial case studies, and developer interviews. The evaluations show that the prioritization is appreciated by developers as quality-improving measure. We analyze the success of the quality control process with a longitudinal study with one of our industrial partners. The study shows that the number of findings can be decreased in the long term even if systems are still growing in size. ”But it ain’t about how hard you hit. It’s about how hard you can get hit and keep moving forward.” Acknowledgements Rocky Balboa I would like to thank everyone who supported me throughout my thesis. First of all, I thank Prof. Manfred Broy for supervising my thesis, for the fruitful scientific discussions, for his invitations to every Lehrstuhlhuette, and also for providing me my own office desk at university. I always felt warmly welcomed in his lab. My thanks go to the second member of my PhD committee, Prof. Andy Zaidman. His careful and detailed reviews as well as his remote support helped me tremendously in the final period of the PhD. I would like to thank my company CQSE, my colleagues, and my customers, without whom this industry PhD would have not been possible. I greatly enjoyed the opportunity to conduct industry-oriented research while being closely in touch with our customers as a software quality consultant. I deeply appreciated the company environment providing first- hand industry experience combined with a surrounding scientific mind-set. In particular, my thanks go to my advisor Dr. Florian Deissenboeck for his countless, honest and direct reviews of my work, numerous profound and inspiring scientific discussions, and his moral support in moments of frustration. I would also like to thank Dr. Elmar Juergens for his very helpful input on numerous paper stories and the thesis story. Additionally, my thanks go to Dr. Benjamin Hummel for his technical advice and support and for guiding me throughout my bachelor and master thesis as well as guided research. Without having known him, I would not have dared to do an industry PhD in such a small and young company—which my company was at the time I started my PhD. I would also like to thank all my colleagues at university. They helped me to broaden my scientific horizon and prevented me from becoming a lone warrior. In particular, Ithank Benedikt Hauptmann for his support, both scientifically and morally. He kept me grounded on earth and helped me to take it simply step after step. I also thank Sebastian Eder for his co-authorship on our best-paper award, his always available help, and for being the best office mate possible! I really enjoyed the days I was able to spend at university; theywere primarily filled with laughter and joy. I certainly will miss these days and my colleagues. Outside of work, I thank my underwater hockey team in Munich for coping with my ag- gressive playing style and my bad mood when I was not proceeding with my PhD as fast as I wanted to! Hockey has been the best way to get refreshed. The sport, its people, and their friendship have been my greatest source of energy. Last but not least, I thank my parents for the way they raised me, for the possibility to study without any financial worries, for countless supporting phone calls, and for encouraging me to reach my goals day after day. Publication Preface The contribution of this thesis is based on the following five first-author papers. A D. Steidl, B. Hummel, E. Juergens: Incremental Origin Analysis for Source Code Files. Working Conference on Mining Software Repositories, 2014, 10 pages, [89] ©2014 Association for Computing Machinery, Inc. Reprinted by permission. http://doi.acm.org/10.1145/2597073.2597111 B D. Steidl, F. Deissenboeck: How do Java Methods Grow?. Working Conference on Source Code Manipulation and Analysis, 2015, 10 pages, [84] Copyright ©2015 IEEE. Reprinted, with permission. C D. Steidl, N. Goede: Feature-based Detection of Bugs in Clones. International Work- shop on Software Clones, 2013, 7 pages, [87] Copyright ©2013 IEEE. Reprinted, with permission. D D. Steidl, S. Eder: Prioritizing Maintainability Defects Based on Refactoring Recom- mendations. International Conference on Program Comprehension, 2014, 9 pages, [86] ©2014 Association for Computing Machinery, Inc. Reprinted by permission. http://doi.acm.org/10.1145/2597008.2597805 E D. Steidl, F. Deissenboeck et al: Continuous Software Quality Control in Practice. International Conference on Software Maintenance and Evolution, 2014, 4 pages, [85] Copyright ©2014 IEEE. Reprinted, with permission. The other publications with major contributions as second author are also included. F L. Heinemann, B. Hummel, D. Steidl: Teamscale: Software Quality Control in Real- Time. International Conference on Software Engineering, 2014, 4 pages, [40] ©2014 Association for Computing Machinery, Inc. Reprinted by permission. http://doi.acm.org/10.1145/2591062.2591068 Contents Abstract 5 Acknowledgments 7 Publication Preface 9 1 Introduction 13 1.1 Problem Statement . 14 1.2 Approach and Methodology . 15 1.3 Contributions . 22 1.4 Outline . 23 2 Terms and Definitions 25 2.1 Analysis Terms . 25 2.2 Finding Terms . 25 2.3 System Terms . 27 3 Fundamentals 29 3.1 Activity-Based Maintenance Model . 29 3.2 Software Product Quality Model . 29 3.3 Quality Assurance Techniques . 32 3.4 Application of Automated Static Analysis . 37 3.5 Limitations of Automated Static Analysis . 39 3.6 Summary . 40 4 A Quantitative Study of Static Analysis Findings in Industry 41 4.1 Study Design . 41 4.2 Study Objects . 43 4.3 Results . 44 4.4 Discussion . 48 4.5 Summary . 49 5 A Cost Model for Automated Static Analysis 51 5.1 Conceptual Model . 51 5.2 Empirical Studies . 55 5.3 Model Instantiation . 60 5.4 Discussion . 64 5.5 Model-based Prediction . 65 5.6 Threats to Validity . 67 5.7 Summary . 70 11 Contents 6 A Prioritization Approach for Cost-Effective Usage of Static Analysis 71 6.1 Key Concepts . 71 6.2 Automatic Recommendation . 73 6.3 Manual Inspection . 82 6.4 Iterative Application . 84 6.5 Quality Control Process . 87 6.6 Summary . 89 7 Publications 91 7.1 Origin Analysis of Software Evolution . 93 7.2 Analysis of Software Changes . 104 7.3 Prioritizing External Findings Based on Benefit Estimation . 115 7.4 Prioritizing Internal Findings Based on Cost of Removal . 123 7.5 Prioritizing Findings in a Quality Control Process .

Cost-Effective Quality Assurance for Long-Lived Software Using Automated Static Analysis

Boenig-Liptsin / Entrepreneurs of the Mind / 1 Citizen's Mind

Cpp-Skript SS18.Pdf

A Python Implementation for Racket

A Comparative Study of the First Computer Literacy Programs for Children in the United States, France, and the Soviet Union, 1970-1990

Benchmarking the Amazon Elastic Compute Cloud (EC2)

Chapter 13 Educational Technology at BBN Wallace Feurzeig

Scripting User's Guide 12C Release 1 (12.2.1.4.0) E93588-02

Characterization and Compensation of Stray Light Effects in Time of Flight Based Range Sensors

Njit-Etd2006-063

Oracle Forms Recognition Scripting Guide 10G Release 3 (10.1.3.5.0) Copyright © 2009, Oracle And/Or Its Affiliates

Tierless Web Programming in ML

Content Delivery Network API Documentation