Phd Thesis the Open University

Total Page:16

File Type:pdf, Size:1020Kb

Phd Thesis the Open University Open Research Online The Open University’s repository of research publications and other research outputs Improving Information Retrieval Bug Localisation Using Contextual Heuristics Thesis How to cite: Dilshener, Tezcan (2017). Improving Information Retrieval Bug Localisation Using Contextual Heuristics. PhD thesis The Open University. For guidance on citations see FAQs. c 2016 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.21954/ou.ro.0000c41c Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Improving Information Retrieval Based Bug Localisation Using Contextual Heuristics Tezcan Dilshener M.Sc. A thesis submitted to The Open University in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computing Department of Computing and Communications Faculty of Mathematics, Computing and Technology The Open University September 2016 Copyright © 2016 Tezcan Dilshener Abstract Software developers working on unfamiliar systems are challenged to identify where and how high-level concepts are implemented in the source code prior to performing maintenance tasks. Bug localisation is a core program comprehension activity in software maintenance: given the observation of a bug, e.g. via a bug report, where is it located in the source code? Information retrieval (IR) approaches see the bug report as the query, and the source files as the documents to be retrieved, ranked by relevance. Current approaches rely on project history, in particular previously fixed bugs and versions of the source code. Existing IR techniques fall short of providing adequate solutions in finding all the source code files relevant for a bug. Without additional help, bug localisation can become a tedious, time- consuming and error-prone task. My research contributes a novel algorithm that, given a bug report and the application’s source files, uses a combination of lexical and structural information to suggest, in a ranked order, files that may have to be changed to resolve the reported bug without requiring past code and similar reports. I study eight applications for which I had access to the user guide, the source code, and some bug reports. I compare the relative importance and the occurrence of the domain concepts in the project artefacts and measure the e↵ectiveness of using only concept key words to locate files relevant for a bug compared to using all the words of a bug report. Measuring my approach against six others, using their five metrics and eight projects, I position an e↵ected file in the top-1, top-5 and top-10 ranks on average for 44%, 69% and 76% of the bug reports respectively. This is an improvement of 23%, 16% and 11% respectively over the best performing current state-of-the-art tool. Finally, I evaluate my algorithm with a range of industrial applications in user studies, and found that it is superior to simple string search, as often performed by developers. These results show the applicability of my approach to software projects without history and o↵ers a simpler light-weight solution. i ii Acknowledgements I express my thanks and sincere gratitude to my supervisors, Dr. Michel Wermelinger and Dr. Yijun Yu, for believing in my research and in my capabilities. Their dedicated sup- port, direction and the guidance made my journey through the research forest a conquerable experience. Simon Butler for his assistance in using the JIM tool and for the countless proof readings of my work. As my research buddy, he always managed to motivate and encourage me. Prof. Marian Petre for sharing her valuable experience during online seminar sessions and organising the best departmental conferences. Prof. Hongyu Zhang for kindly providing BugLocator and its datasets, Ripon Saha for the BLUiR dataset, Laura Moreno for the LOBSTER dataset and Chu-Pan Wong for giving me access to the source code of his tool BRTracer. Jana B. at our industrial partner, a global financial IT solutions provider located in southern Germany, for providing the artefacts to one of their proprietary application that I used throughout my research and their input on diverse information required. Markus S., Georgi M. and Alan E. for being the proxy at their business locations while I conducted my user study. Also to Markus S. at our industrial partner for his countless hours of discussions on patterns in software applications and source code analysis. My wife Anja for her endless patience, understanding and respect. With her love, encour- agement and dedicated coaching, I stood the emotional up and down phases of my research, thus completed my Ph.D. studies. To my two daughters, Denise and Jasmin for allowing their play time with me to be spent on my research instead. Last but not least, thank you to all those who wish to remain unnamed for contributing to my dedication. Finally, there are no intended contradictory intentions implied to any individual or estab- lishment. iii iv Contents Contents viii List of Figures x List of Tables xiii 1Introduction 1 1.1 Academic Motivation: Software Maintenance Challenges . 1 1.2 Industry Motivation: Industry Needs in Locating Bugs . 3 1.3 An Example of Information Retrieval Process . 3 1.4 Existing Work and The Limitations . 7 1.5 ResearchQuestions................................. 9 1.6 OverviewoftheChapters ............................. 11 1.7 Summary ...................................... 13 2 Landscape of Code Retrieval 15 2.1 Concept Assignment Problem . 16 2.2 Information Retrieval as Conceptual Framework . 17 2.2.1 Evaluation Metrics . 18 2.2.2 Vector Space Model and Latent Semantic Analysis . 18 2.2.3 Dynamic Analysis and Execution Scenarios . 20 2.2.4 Source Code Vocabulary . 21 2.2.5 Ontology Relations . 25 2.2.6 Call Relations . 27 2.3 Information Retrieval in Bug Localisation . 29 2.3.1 Evaluation Metrics . 31 2.3.2 Vocabulary of Bug Reports in Source Code Files . 32 v 2.3.3 Using Stack Trace and Structure . 34 2.3.4 Version History and Other Data Sources . 37 2.3.5 Combining Multiple Information Sources . 39 2.4 CurrentTools ................................... 43 2.5 UserStudies..................................... 46 2.6 Summary ...................................... 49 2.6.1 Conclusion . 52 3 Relating Domain Concepts and Artefact Vocabulary in Software 55 3.1 Introduction..................................... 56 3.2 Previous Approaches . 57 3.3 My Approach . 59 3.3.1 DataProcessing .............................. 60 3.3.1.1 Data Extraction Stage . 61 3.3.1.2 Persistence Stage . 64 3.3.1.3 Search Stage . 66 3.3.2 Ranking Files . 68 3.4 Evaluation of the results . 71 3.4.1 RQ1.1: How Does the Degree of Frequency Among the Common Con- cepts Correlate Across the Project Artefacts? . 71 3.4.1.1 Correlation of common concepts across artefacts . 75 3.4.2 RQ1.2: What is the Vocabulary Similarity Beyond the Domain Con- cepts, which may Contribute Towards Code Comprehension? . 76 3.4.3 RQ1.3: How can the Vocabulary be Leveraged When Searching for Concepts to Find the Relevant Files? . 79 3.4.3.1 Performance . 81 3.4.3.2 Contribution of VSM . 81 3.4.3.3 Searching for AspectJ and SWT in Eclipse . 83 3.5 Threatstovalidity ................................. 84 3.6 Discussion ..................................... 85 3.7 ConcludingRemarks ............................... 86 vi 4 Locating Bugs without Looking Back 89 4.1 Introduction..................................... 90 4.2 Existing Approaches . 93 4.3 Extended Approach . 95 4.3.1 RankingFilesRevisited .......................... 96 4.3.1.1 Scoring with Key Positions (KP score) . 97 4.3.1.2 Scoring with Stack Traces (ST score) . 99 4.3.1.3 Rationale behind the scoring values . 100 4.4 Evaluation of the Results . 100 4.4.1 RQ2: Scoring with File Names in Bug Reports . 101 4.4.1.1 Scoring with Words In Key Positions (KP score) . 101 4.4.1.2 Scoring with Stack Trace Information (ST score) . 103 4.4.1.3 KP and ST Score improvement when Searching with Concepts Only vs all Bug Report Vocabulary . 105 4.4.1.4 Variations of Score Values . 106 4.4.1.5 Overall Results . 108 4.4.1.6 Performance . 114 4.4.2 RQ2.1: Scoring without Similar Bugs . 116 4.4.3 RQ2.2: VSM’s Contribution . 118 4.5 Discussion ..................................... 120 4.5.1 Threats to Validity . 122 4.6 Concluding Remarks . 123 5Userstudies 125 5.1 Background . 125 5.2 StudyDesign .................................... 127 5.3 Results........................................ 129 5.3.1 Pre-session Interview Findings . 129 5.3.2 Post-session Findings . 131 5.4 Evaluation of the Results . 134 5.4.1 Threats to Validity . 136 5.5 Concluding Remarks . 137 vii 6 Conclusion and Future Directions 139 6.1 How the Research Problem is Addressed . 139 6.1.1 Relating Domain Concepts and Vocabulary . 140 6.1.2 Locating Bugs Without Looking Back . 142 6.1.3 User Studies . 145 6.1.4 Validity of My Hypothesis . 145 6.2 Contributions . 146 6.2.1 Recommendations for Practitioners . 146 6.2.2 Suggestions for Future Research . 147 6.3 A Final Rounding O↵............................... 150 Bibliography 153 A Glossary 167 BTop-5Concepts 170 C Sample Bug Report Descriptions 172 D Search Pattern for Extracting Stack Trace 173 E Domain Concepts 174 viii List of Figures 1.1 Information retrieval repository creation and search process . 4 1.2 Bug report with a very terse description in SWT project . 5 1.3 AspectJ project’s bug report description with stack trace information . 6 1.4 Ambiguous bug report description found in SWT .
Recommended publications
  • The Origins of the Underline As Visual Representation of the Hyperlink on the Web: a Case Study in Skeuomorphism
    The Origins of the Underline as Visual Representation of the Hyperlink on the Web: A Case Study in Skeuomorphism The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Romano, John J. 2016. The Origins of the Underline as Visual Representation of the Hyperlink on the Web: A Case Study in Skeuomorphism. Master's thesis, Harvard Extension School. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:33797379 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA The Origins of the Underline as Visual Representation of the Hyperlink on the Web: A Case Study in Skeuomorphism John J Romano A Thesis in the Field of Visual Arts for the Degree of Master of Liberal Arts in Extension Studies Harvard University November 2016 Abstract This thesis investigates the process by which the underline came to be used as the default signifier of hyperlinks on the World Wide Web. Created in 1990 by Tim Berners- Lee, the web quickly became the most used hypertext system in the world, and most browsers default to indicating hyperlinks with an underline. To answer the question of why the underline was chosen over competing demarcation techniques, the thesis applies the methods of history of technology and sociology of technology. Before the invention of the web, the underline–also known as the vinculum–was used in many contexts in writing systems; collecting entities together to form a whole and ascribing additional meaning to the content.
    [Show full text]
  • MTS4CC Elementary Stream Compliance Checker Tutorials 001-1415-00
    MTS4CC Elementary Stream Compliance Checker Tutorials 001-1415-00 This document applies to software version 1.0 and above. www.tektronix.com Copyright © Tektronix. All rights reserved. Licensed software products are owned by Tektronix or its subsidiaries or suppliers, and are protected by national copyright laws and international treaty provisions. Tektronix products are covered by U.S. and foreign patents, issued and pending. Information in this publication supercedes that in all previously published material. Specifications and price change privileges reserved. TEKTRONIX and TEK are registered trademarks of Tektronix, Inc. Contacting Tektronix Tektronix, Inc. 14200 SW Karl Braun Drive P.O. Box 500 Beaverton, OR 97077 USA For product information, sales, service, and technical support: H In North America, call 1-800-833-9200. H Worldwide, visit www.tektronix.com to find contacts in your area. Table of Contents Getting Started............................................ 1 Basic Functions.................................................. 2 How to Begin a Tutorial........................................... 2 Tutorial 1: H.263 Standards Compliance and Motion Vectors..... 3 Procedure...................................................... 3 Conclusion..................................................... 8 Tutorial 2: MPEG-4 Compliance............................. 9 Procedure...................................................... 9 Conclusions..................................................... 15 Tutorial 3: MP4 Compliance Basics..........................
    [Show full text]
  • SHARING FILES and FOLDERS in WTC and WORKSPACES WORKSPACES V1.3X USER GUIDE
    SHARING FILES AND FOLDERS IN WTC AND WORKSPACES WORKSPACES v1.3x USER GUIDE GlobalSCAPE, Inc. (GSB) Corporate Headquarters Address: 4500 Lockhill-Selma Road, Suite 150, San Antonio, TX (USA) 78249 Sales: (210) 308-8267 Sales (Toll Free): (800) 290-5054 Technical Support: (210) 366-3993 Web Support: http://www.globalscape.com/support/ © 2008-2017 GlobalSCAPE, Inc. All Rights Reserved August 2, 2017 Table of Contents How Do I Share Files? .................................................................................................................................................... 7 WTC Administration ...................................................................................................................................................... 9 Enabling User Access to the Web Transfer Client .................................................................................................. 9 Localization (Language) Settings .......................................................................................................................... 10 WTC Error Messages in EFT .................................................................................................................................. 11 Disable CRC ........................................................................................................................................................... 14 Disabling "Update Your Browser" Prompts .......................................................................................................... 14 Terms and
    [Show full text]
  • Effects of Scroll Bar Orientation and Item Justification in Using List Box Widgets
    Effects of scroll bar orientation and item justification in using list box widgets G. Michael Barnes Erik Kellener California State University Northridge Hollywood Online Inc. 18111 Nordhoff Street 1620 26th St., #370S Northridge, CA 91330-8281 USA Santa Monica, CA 90404 +1 818 677 2299 +1 310 586 2020 [email protected] [email protected] Abstract List boxes are a common user interface component in graphical user interfaces. In practice, most list boxes use right-oriented scroll bars to control left-justified text items. A two way interaction hypothesis favoring the use of a scroll bar orientation consistent with list box item justification was obtained for speed of use and user preference. Item selection was faster with a scroll bar orientation consistent with list item justification. Subjects preferred left- oriented scroll bars with left-justified items and right-oriented scroll bars with right-oriented items. These results support a design principle of locality for user interface controls and controlled objects. Keywords List widgets, scroll bar, graphical user interface design, usability study Introduction This electronic publication is an updated statistical analysis of Erik Kellener’s unpublished masters’ thesis, “Are GUI Ambidexterous” completed at California State University Northridge, CA. 1996. List boxes are used in many graphical user interfaces (GUI) today. Whether its a desktop P.C., a personal digital assistant (PDA) or an information kiosk at the grocery store, list boxes are integrated into most GUIs. The list box GUI component is usually present in an interface that asks a user to make a selection from a list of items. The size of the list of items can vary significantly, however the screen area required by a list box is usually fixed.
    [Show full text]
  • Dealing with Document Size Limits
    Dealing with Document Size Limits Introduction The Electronic Case Filing system will not accept PDF documents larger than ten megabytes (MB). If the document size is less than 10 MB, it can be filed electronically just as it is. If it is larger than 10 MB, it will need to be divided into two or more documents, with each document being less than 10 MB. Word Processing Documents Documents created with a word processing program (such as WordPerfect or Microsoft Word) and correctly converted to PDF will generally be smaller than a scanned document. Because of variances in software, usage, and content, it is difficult to estimate the number of pages that would constitute 10 MB. (Note: See “Verifying File Size” below and for larger documents, see “Splitting PDF Documents into Multiple Documents” below.) Scanned Documents Although the judges’ Filing Preferences indicate a preference for conversion of documents rather than scanning, it will be necessary to scan some documents for filing, e.g., evidentiary attachments must be scanned. Here are some things to remember: • Documents scanned to PDF are generally much larger than those converted through a word processor. • While embedded fonts may be necessary for special situations, e.g., trademark, they will increase the file size. • If graphs or color photos are included, just a few pages can easily exceed the 10 MB limit. Here are some guidelines: • The court’s standard scanner resolution is 300 dots per inch (DPI). Avoid using higher resolutions as this will create much larger file sizes. • Normally, the output should be set to black and white.
    [Show full text]
  • M-Explorer User's Guide
    Table of Contents M-Explorer User’s Guide Table of Contents M-Explorer User’s Guide..............................................1 Chapter 1 Using This Guide.................................................................1-1 Introduction...................................................................................................... 1-1 Key Concepts................................................................................................... 1-2 Chapter Organization .....................................................................................................1-2 Online Help ....................................................................................................................1-2 Manual Conventions ......................................................................................................1-2 Chapter 2 Introduction to M-Explorer .................................................2-1 Introduction...................................................................................................... 2-1 Key Concepts................................................................................................... 2-2 System Component Interaction......................................................................................2-2 OPC Data Access Servers.............................................................................................2-2 M-Inspector ....................................................................................................................2-2 M-Password ...................................................................................................................2-3
    [Show full text]
  • Thomson Reuters Spreadsheet Link User Guide
    THOMSON REUTERS SPREADSHEET LINK USER GUIDE MN-212 Date of issue: 13 July 2011 Legal Information © Thomson Reuters 2011. All Rights Reserved. Thomson Reuters disclaims any and all liability arising from the use of this document and does not guarantee that any information contained herein is accurate or complete. This document contains information proprietary to Thomson Reuters and may not be reproduced, transmitted, or distributed in whole or part without the express written permission of Thomson Reuters. Contents Contents About this Document ...................................................................................................................................... 1 Intended Readership ................................................................................................................................. 1 In this Document........................................................................................................................................ 1 Feedback ................................................................................................................................................... 1 Chapter 1 Thomson Reuters Spreadsheet Link .......................................................................................... 2 Chapter 2 Template Library ........................................................................................................................ 3 View Templates (Template Library) ..............................................................................................................................................
    [Show full text]
  • How to Create and Maintain a Table of Contents
    How to Create and Maintain a Table of Contents How to Create and Maintain a Table of Contents Version 0.2 First edition: January 2004 First English edition: January 2004 Contents Contents Overview........................................................................................................................................ iii About this guide..........................................................................................................................iii Conventions used in this guide................................................................................................... iii Copyright and trademark information.........................................................................................iii Feedback.....................................................................................................................................iii Acknowledgments...................................................................................................................... iv Modifications and updates..........................................................................................................iv Creating a table of contents.......................................................................................................... 1 Opening Writer's table of contents feature.................................................................................. 1 Using the Index/Table tab ...........................................................................................................2 Setting
    [Show full text]
  • ALGE Displaystudio Manual
    ALGE DisplayStudio Manual DisplayStudio Table of Content 1 General .............................................................................................................................3 2 Getting started...................................................................................................................3 2.1 Main Window ............................................................................................................3 2.2 Tree view for project navigation................................................................................4 3 Lists...................................................................................................................................5 3.1 List Builder................................................................................................................5 3.1.1 Text panel.............................................................................................................6 3.1.2 Animation..............................................................................................................6 3.1.3 List compiling........................................................................................................7 4 Animations and wipes .......................................................................................................7 4.1 Adding animation/wipe to the project........................................................................8 4.2 Animation/wipe editing..............................................................................................8
    [Show full text]
  • Operating Procedures for TEM2 FEI Tecnai
    Operating Procedures for TEM2 FEI Tecnai Note: Do not press the buttons on the TEM. This will turn the TEM off and will take hours to bring it back up. Please do not reboot the TEM computer. This will turn off the vacuum and the high tension. If you are having problems with the TEM, please find a SMIF staff member immediately. 1) Log usage into the SMIF web site. 2) Check to see if the camera boxes are on. a) The TIA camera box is the box behind the monitors. The switch will be lit green if on. b) The AMT camera box is on the floor under the Cryo holder. The green light above the switch should be on. c) If either of the cameras were off, please find a SMIF staff member to turn on. The cameras must cool down for 1hr before they can be used. 3) If needed, log into the TEM computer. User name is : TEM Users, Password is: tecnai 4) If needed, open the Tecnai User Interface and then open the TIA software. If the TIA software crashes, please find a SMIF staff member. 5) Place LN2 into the cold finger dewar. The LN2 needs to be topped off before each use. Turning on TEM Note: It is important to change the kV before ramping up the Heat to # when turning on the TEM. 1) There are tabs at the top left of the Tecnai software. Go to the Tune tab. Under the Control Pads box click on Fluorescent Background Light button. This turns on the lights on the control panels.
    [Show full text]
  • John Athayde and Bruce Williams — «The Rails View
    What readers are saying about The Rails View This is a must-read for Rails developers looking to juice up their skills for a world of web apps that increasingly includes mobile browsers and a lot more JavaScript. ➤ Yehuda Katz Driving force behind Rails 3.0 and Co-founder, Tilde In the past several years, I’ve been privileged to work with some of the world’s leading Rails developers. If asked to name the best view-layer Rails developer I’ve met, I’d have a hard time picking between two names: Bruce Williams and John Athayde. This book is a rare opportunity to look into the minds of two of the leading experts on an area that receives far too little attention. Read, apply, and reread. ➤ Chad Fowler VP Engineering, LivingSocial Finally! An authoritative and up-to-date guide to everything view-related in Rails 3. If you’re stabbing in the dark when putting together your Rails apps’ views, The Rails View provides a big confidence boost and shows how to get things done the right way. ➤ Peter Cooper Editor, Ruby Inside and Ruby Weekly The Rails view layer has always been a morass, but this book reins it in with details of how to build views as software, not just as markup. This book represents the wisdom gained from years’ worth of building maintainable interfaces by two of the best and brightest minds in our business. I have been writing Ruby code for over a decade and Rails code since its inception, and out of all the Ruby books I’ve read, I value this one the most.
    [Show full text]
  • Princeton University COS 217: Introduction to Programming Systems a Minimal COS 217 Computing Environment
    Princeton University COS 217: Introduction to Programming Systems A Minimal COS 217 Computing Environment 1. Access the Fall 2020 COS 217 Account on Ed 1.1. You can access Ed through Canvas. 1.2 Post questions and comments (that comply with the course communication policies) to Ed. Posts will be available to all other students and instructors. Remember to check Ed often, especially while working on assignments and preparing for exams. 2. Activating Your University Computing Account One time only… 2.1. (If you're working off-campus) Perform the instructions on this web page to use SRA (secure remote access): http://helpdesk.princeton.edu/kb/display.plx?ID=6023 2.2. Use a Web browser to visit the OIT Account Activation Page at http://helpdesk.princeton.edu/kb/display.plx?ID=9973 2.3. Perform the five steps listed in the Set Your Security Profile section of the page to set your security profile. 2.4. In the After You Have Activated Your Account section of the page, click on the Enable your Unix account link. If you do not see these options on the page, then from the webpage in Step 2.2, in the gray box on the right, click on “How to activate your Princeton University Account and manage personal information." On that new page, scroll down to "Enable your Unix account." 2.5. In the resulting Unix: How do I enable/change the default Unix shell on my account? page, click on the Enable Unix Account link. 2.6. In the resulting dialog box, type your Princeton netid and password, and click the OK button.
    [Show full text]