Dissertation Deep Pattern Mining for Program Repair
Total Page:16
File Type:pdf, Size:1020Kb
PhD-FSTC-2019-79 The Faculty of Science, Technology and Communication Dissertation Presented on the 18/12/2019 in Luxembourg to obtain the degree of Docteur de l’Université du Luxembourg en Informatique by Kui Liu Born on 11th August 1988 in Jiangxi, China Deep Pattern Mining for Program Repair Dissertation Defense Committee Dr. Yves Le Traon, Dissertation Supervisor Professor, University of Luxembourg, Luxembourg Dr. Tegawendé Bissyandé, Chairman Assistant Professor, University of Luxembourg, Luxembourg Dr. Dongsun Kim, Vice Chairman Quality Assurance Engineer, Furiosa.ai, South Korea Dr. Jacques Klein, Expert Associate Professor, University of Luxembourg, Luxembourg Dr. Andreas Zeller Professor, Saarland University, Saarbrucken, Germany Dr. David Lo Associate Professor, Singapore Management University, Singapore Abstract Error-free software is a myth. Debugging thus accounts for a significant portion of software maintenance and absorbs a large part of software cost. In particular, the manual task of fixing bugs is tedious, error- prone and time-consuming. In the last decade, automatic bug-fixing, also referred to as automated program repair (APR) has boomed as a promising endeavor of software engineering towards alleviating developers’ burden. Several potentially promising techniques have been proposed making APR an increasingly prominent topic in both the research and practice communities. In production, APR will drastically reduce time-to-fix delays and limit downtime. In a development cycle, APR can help suggest changes to accelerate debugging. As an emergent domain, however, program repair has many open problems that the community is still exploring. Our work contributes to this momentum on two angles: the repair of programs for functionality bugs, and the repair of programs for method naming issues. The thesis starts with highlighting findings on key empirical studies that we have performed to inform future repair approaches. Then, we focus on template-based program repair scenarios and explore deep learning models for inferring accurate and relevant patterns. Finally, we integrate these patterns into APR pipelines, which yield the state of the art repair tools. The dissertation includes the following contributions: • Real-world Patch Study: Existing APR studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve APR techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. However, previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. This dissertation starts with deepening this knowledge via a systematic and fine-grained study of real-world Java program bug fixes. • Fault Localization Impact: Existing test-suite-based APR systems are highly dependent on the performance of the fault localization (FL) technique that is the process of the widely studied APR pipeline. However, APR systems generally focus on the patch generation, but tend to use similar but different strategies for fault localization. To assess the impact of FL on APR, we identify and investigate a practical bias caused by the FL step in a repair pipeline. We propose to highlight the different FL configurations used in the literature, and their impact on APR systems when applied to the real bugs. Then, we explore the performance variations that can be achieved by “tweaking” the FL step. • Fix Pattern Mining: Fix patterns (a.k.a. fix templates) have been studied in various APR scenarios. Particularly, fix patterns have been widely used in different APR systems. To date, fix pattern mining is mainly studied in three ways: manually summarization, transformation inferring and code change action statistics. In this dissertation, we explore mining fix patterns for static bugs leveraging deep learning and clustering algorithms. • Avatar: Fix pattern based patch generation is a promising direction in the APR community. Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of fix pattern based APR systems, however, depends on the fix ingredients mined from commit changes i in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. We propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the Avatar APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. • TBar: Fix patterns are widely used in patch generation of APR, however, the repair performance of a single fix pattern is not studied. We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. • Debugging Method Names: Except the issues about semantic/static bugs in programs, we note that how to debug inconsistent method names automatically is important to improve program quality. In this dissertation, we propose a deep learning based approach to spotting and refactoring inconsistent method names in programs. ii To my dear wife Lie Ma. Acknowledgements This dissertation would not have been possible without the support of many people who in one way or another have contributed and extended their precious knowledge and experience in my PhD studies. It is my pleasure to express my gratitude to them. First of all, I would like to express my deepest thanks to my supervisor, Prof. Yves Le Traon, who has given me this great opportunity to come across continents to pursue my doctoral degree. He has always trusted and supported me with his great kindness throughout my whole PhD journey. Second, I am equally grateful to my daily advisers, Dr. Tegawendé Bissyandé and Dr. Dongsun Kim, who have introduced me into the world of automated program repair. Since then, working in this field is just joyful for me. I am particularly grateful to them as well as Dr. Jacques Klein, for their patience, valuable advice and flawless guidance. They have taught me how to perform research, how to write technical papers, and how to conduct fascinating presentations. Their dedicated guidance has made my PhD journey a fruitful and fulfilling experience. I am very happy for the friendship we have built up during the years. Third, I would like to extend my thanks to all my co-authors including Prof. Sin Yoo, Prof. Martin Monperrus, Prof. Suntae Kim, Dr. Li Li, Anil Koyuncu, Kisub Kim, Pingfan Kong, Taeyoung Kim, etc. for their valuable discussions and collaborations. I would like to thank all the members of my PhD defense committee, including chairman Dr. Tegawendé Bissyandé, Prof. David Lo, Prof. Andreas Zeller, my supervisor Prof. Yves Le Traon, and my daily adviser Dr. Dongsun Kim. It is my great honor to have them in my defense committee and I appreciate very much their efforts to examine my dissertation and evaluate my PhD work. I would like to also express my great thanks to all the friends that I have made in the Grand Duchy of Luxembourg for the memorable moments that we have had. More specifically, I would like to thank all the team members of SerVal at SnT for the great coffee breaks and interesting discussions. Finally, and more personally, I would like to express my deepest appreciation to my dear wife Lie Ma, for her everlasting support, encouragement, and love. Also, I can never overemphasize the importance of my mother in shaping who I am and giving me the gift of education. The last but not least, I would like to extend my thanks to my daughter Ruoshui Liu who gives me unexpected happiness and motivation for this dissertation. Without their support, this dissertation would not have been possible. Kui Liu University of Luxembourg December 2019 v Contents List of figures xi List of tables xv Contents xvi 1 Introduction 1 1.1 Motivation.........................................2 1.2 Challenges of Program Repair...............................3 1.2.1 Challenges in Fault Localization.........................3 1.2.2 APR-Inherited Challenges.............................4 1.2.3 Challenges of Debugging Method Names.....................5 1.3 Contributions........................................5 1.4 Roadmap..........................................6 2 Background 9 2.1 Automated Program Repair................................ 10 2.1.1 State-of-the-Art APR............................... 10 2.1.2 Fault Localizaiton in APR............................. 11 2.1.3 APR Assessment.................................. 12 2.2 Fix Pattern Inference.................................... 12 2.3 Debugging Method Names................................. 12 3 A Closer Look at Real-World