Context-Sensitive Code Completion

Context-Sensitive Code Completion A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in the Department of Computer Science University of Saskatchewan Saskatoon By Muhammad Asaduzzaman c Muhammad Asaduzzaman, January, 2018. All rights reserved. Permission to Use In presenting this thesis in partial fulfilment of the requirements for a Postgraduate degree from the University of Saskatchewan, I agree that the Libraries of this University may make it freely available for inspection. I further agree that permission for copying of this thesis in any manner, in whole or in part, for scholarly purposes may be granted by the professor or professors who supervised my thesis work or, in their absence, by the Head of the Department or the Dean of the College in which my thesis work was done. It is understood that any copying or publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Saskatchewan in any scholarly use which may be made of any material in my thesis. Requests for permission to copy or to make other use of material in this thesis in whole or part should be addressed to: Head of the Department of Computer Science 176 Thorvaldson Building 110 Science Place University of Saskatchewan Saskatoon SK S7N 5C9 OR Dean College of Graduate and Postdoctoral Studies University of Saskatchewan 116 - 110 Science Place Saskatoon SK S7N 5C9 i Abstract Developers depend extensively on software frameworks and libraries to deliver the products on time. While these frameworks and libraries support software reuse, save development time, and reduce the possibility of introducing errors, they do not come without a cost. Developers need to learn and remember Application Programming Interfaces (APIs) for effectively using those frameworks and libraries. However, APIs are difficult to learn and use. This is mostly due to APIs being large in number, they may not be properly documented, and finally there exist complex relationships between various classes and methods that make APIs difficult to learn. To support developers using those APIs, this thesis focuses on the code completion feature of modern integrated development environments (IDEs). As a developer types code, a code completion system offers a list of completion proposals through a popup menu to navigate and select. This research aims to improve the current state of code completion systems in discovering APIs. Towards this direction, a case study on tracking source code lines has been conducted to better understand capturing code context and to evaluate the benefits of using the simhash technique. Observations from the study have helped to develop a simple, context-sensitive method call completion technique, called CSCC. The technique is compared with a large number of existing code completion techniques. The notion of context proposed in CSCC can even outweigh graph-based statistical language models. Existing method call completion techniques leave the task of completing method parameters to developers. To address this issue, this thesis has investigated how developers complete method parameters. Based on the analysis, a method parameter completion technique, called PARC, has been developed. To date, the technique supports the largest number of expressions to complete method parameters. The technique has been implemented as an Eclipse plug-in that demonstrates the proof of the concept. To meet application-specific requirements, software frameworks need to be customized via extension points. It was observed that developers often pass a framework related object as an argument to an API call to customize default aspects of application frameworks. To enable such customizations, the object can be created by extending a framework class, implementing an interface, or changing the properties of the object via API calls. However, it is both a common and non-trivial task to find all the details related to the customizations. To address this issue, a technique has been developed, called FEMIR. The technique utilizes partial program analysis and graph mining technique to detect, group, and rank framework extension examples. The tool extends existing code completion infrastructure to inform developers about customization choices, enabling them to browse through extension points of a framework, and frequent usages of each point in terms of code examples. Findings from this research and proposed techniques have the potential to help developers to learn different aspects of APIs, thus ease software development, and improve the productivity of developers. ii Acknowledgements First of all, I would like to express my heartiest gratitude to my supervisors, Dr. Chanchal K. Roy and Dr. Kevin A. Schneider, for their constant guidance, valuable suggestions, encouragement and extraordinary patience during this thesis work. This thesis would have been impossible without them. I am indebted to Dr. Daqing Hou (Electrical and Computer Engineering Department, Clarkson University, USA) for his continuous inspiration, expert advice and brilliant ideas that not only shape my research but also provide me the opportunity to be guided through the jungle of ideas. I would like to thank Dr. Dwight Makaroff, Dr. Ian McQuillan, and Dr. Seok-Bum Ko for their participation in my thesis committee. Their advisements, evaluation, and insightful comments lead to the successful completion of this thesis. Thanks are also due to Dr. Tien N. Nguyen and all other members of my thesis examination committee for their helpful comments, insights, and suggestions. I am thankful to the co-authors of the papers I have published during my graduate studies, including Chanchal K. Roy, Kevin A. Schneider, Daqing Hou, Ripon K. Saha, Massimiliano D. Penta, Michael C. Bullock, Ahmed S. Mashiyat, Samiul Monir, and Muhammad Ahasanuzzaman. I would like to thank all the present and past members of the Software Research Lab for their support, advice, and inspiration. In particular, I would like to thank Manishankar Mondal, Jeffrey Svajlenko, Saidur Rahman, Ripon K. Saha, Minhaz Zibran, Mohammad Khan, Md. Sharif Uddin, Khalid Billah, Farouq Al. Omari, Masudur Rahman, Judith Islam, Kawser W. Nafi, Amit Mondal, Avigit K. Saha, Shamima Yeasmin, and Tonny Kar. This work is supported in part by the First Research Excellence Fund (CFREF) under the Global Institute for Food Security (GIFS) and Natural Sciences and Engineering Research Council of Canada (NSERC). I am grateful to them for their generous financial support that helped me to concentrate more deeply on my thesis work. I would like to thank all the staff members in the Department of Computer Science who have helped me throughout my graduate studies. In particular, I would like to thank Gwen Lancaster, Heather Webb, and Shakiba Jalal. The most wonderful thing in my life is my son, Nubaid Ilham, whose presence inspires me in the right direction in finishing my thesis work. I am deeply indebted to my incomparable wife Nushrat Jahan for her love, patience and support during the thesis work. They are the ones who sacrificed the most for this thesis. I express my heartiest gratitude to my family members and relatives especially my mother Ayesha Akhtara Begum, my father Md. Abdul Jalil Miah, my brother Muhammad Ahasanuzzaman, my uncle Motahar Hossain, Atahar Hossain, and Mozahar Hossain. Their constant encouragement and support have given me the confidence to complete this thesis. iii This thesis is dedicated to my mother, Ayesha Akhtara Begum, whose affection and inspiration have given me mental strength in every step of my life, to my father, Md. Abdul Jalil Mia, whose guidance have helped me take the right decisions, and to my wife, Nushrat Jahan, whose continuous support and sacrifice have assisted me to complete this thesis. iv Contents Permission to Use i Abstract ii Acknowledgements iii Contents v List of Tables viii List of Figures x List of Abbreviations xii 1 Introduction 1 1.1 Motivation . 1 1.2 Research Problem . 2 1.3 Addressing Research Problems . 2 1.4 Contributions of the Thesis . 3 1.5 Outline of the Thesis . 4 2 Background and Taxonomy of Related Work 6 2.1 Benefits of Code Completion . 6 2.2 Requirements of Code Completion Systems . 7 2.3 Components of a Code Completion System . 9 2.4 Techniques . 10 2.4.1 Method call completion . 10 2.4.2 Method parameter completion . 14 2.4.3 Method body completion . 15 2.4.4 API usage pattern completion . 18 2.4.5 Language model-based code completion . 19 2.4.6 Keyword-based code completion . 20 2.4.7 Other forms of code completion . 22 3 Capturing code context and using the simhash technique: A case study on tracking source code lines 26 3.1 Introduction . 26 3.2 Related Work . 27 3.3 LHDiff: A Language-Independent Hybrid Line Tracking Technique . 29 3.3.1 Preprocess input files . 30 3.3.2 Detect unchanged lines . 30 3.3.3 Generate candidate list . 30 3.3.4 Resolve conflicts . 32 3.3.5 Detect line splits . 32 3.3.6 Evaluation . 33 3.4 Evaluation Using Benchmarks . 33 3.4.1 Experiment details . 33 3.4.2 Results . 36 3.4.3 Discussion . 36 3.5 Evaluation Using Mutation-based Analysis . 39 v 3.5.1 Editing taxonomy of source code changes . 39 3.5.2 Experiment details . 41 3.5.3 Results . 42 3.6 Threats to Validity . 43 3.7 Conclusion . 43 4 A Simple, Efficient, Context Sensitive Approach for Code Completion 45 4.1 Introduction . 45 4.2 Related Work . 48 4.3 Proposed Algorithm . 50 4.3.1 Collect usage context of method calls . 51 4.3.2 Determine candidates for code completion . 53 4.3.3 Recommend top-3 method calls .

Context-Sensitive Code Completion

Improving Neural Language Models with A

Formatting Instructions for NIPS -17

A Study on Neural Network Language Modeling

Eusipco 2013 1569741253

Master's Thesis Preparation Report

Chapter 4 N-Grams

Modeling Vocabulary for Big Code Machine Learning

A Dynamic Language Model for Speech Recognition

Language Model Techniques in Machine Translation Diplomarbeit

A Neurophysiologically-Inspired Statistical Language Model

Arxiv:1910.05493V1 [Cs.LG] 12 Oct 2019 1

Are Deep Neural Networks the Best Choice for Modeling Source Code?