Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security

University of New Orleans ScholarWorks@UNO University of New Orleans Theses and Dissertations Dissertations and Theses Spring 5-22-2020 Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security Md Rakibul Islam [email protected] Follow this and additional works at: https://scholarworks.uno.edu/td Part of the Other Computer Engineering Commons Recommended Citation Islam, Md Rakibul, "Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security" (2020). University of New Orleans Theses and Dissertations. 2779. https://scholarworks.uno.edu/td/2779 This Dissertation is protected by copyright and/or related rights. It has been brought to you by ScholarWorks@UNO with permission from the rights-holder(s). You are free to use this Dissertation in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Dissertation has been accepted for inclusion in University of New Orleans Theses and Dissertations by an authorized administrator of ScholarWorks@UNO. For more information, please contact [email protected]. Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security A Dissertation Submitted to the Graduate Faculty of the University of New Orleans in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Engineering and Applied Science Computer Science by Md Rakibul Islam B.S. Khulna University, 2008 May, 2020 This dissertation is dedicated to my parents MST. BILKIS BEGUM and late ABUL KALAM AZAD ii Acknowledgements First and foremost, I cordially express my gratitude and heartfelt thanks to my adviser Dr. Minhaz F. Zibran for allowing me to pursue my PhD under his supervision. Dr. Zibran has always been very supportive, caring, inspirational, and motivational. His innovative ideas and proper guiding immensely helped me to keep myself in the right direction of my PhD journey. I also thank the members of my thesis advisory committe, Dr. Md Tamjidul Hoque, Dr. Shengru Tu, Dr. Linxiong Li, and Dr. Syed Adeel Ahmed who have been very supportive by providing their invaluable assistance, and feedback at all stages of this thesis. Their criticisms, comments, and ad- vice are critical in making this dissertation more accurate and complete. Thanks to the anonymous reviewers for their valuable comments and suggestions in improving the work and publications pro- duced from this thesis. I am thankful to my fellow researchers, faculty members, and staff of the Department of Com- puter Science for their spontaneous cooperation and encouragement. I especially thank Aayush Nag- pal, Md Kauser Ahmmed, Pradeep Jakibanjar, and Rahul Chatterjee with whom I jointly worked on a couple of research projects. I would also like to thank the University of New Orleans for providing me an excellent environment for research and the financial support. My father who dreamed and believed that I could come this far to complete my PhD. Unfortu- nately, he is no more with us to see me fulfilling his dream. I will never be able to say to him "I have done it because of you, and I miss you a lot". My mother always remains by my side to give me love, courage, guidance, and everything I need to overcome any difficult situations whenever that come on my ways. I am highly obliged to my parents, and acknowledge their significant contributions in my every achievement. I am also grateful to my wife MST. Shahana Islam my soul-mate and mother of our son Surahbil Sabib Tasfi. Living in a foreign country would never had been so easy without a caring partner like her. She has been extremely supportive of me throughout my entire PhD journey and has made countless sacrifices to help me get to this point. She always has tried to ease my stresses with her sweet smile and love. I believe, when our son Tasfi will be a grown-up, he will be proud by realizing what his father has achieved, and how he has been a source of inspiration in achieving that. Last but not least, I extend my special thanks to my sister, uncles, other family members, and my friends Rajeeb, Reja, Jubayer, Monir, Shafiqur, Atik, Pulok, Ratul, Rana, and others at UNO who helped me in many ways, and gave me many moments of joy. iii Table of Contents List of Figures . xii List of Tables . xvii Abstract . .xviii 1 Introduction and Motivation 1 1.1 Mining and Analyzing Human Affect in Software Engineering . .2 1.1.1 Sub-problem I: Detecting developers’ sentiments and emotions and emotions with high accuracy . .3 1.1.2 Contribution . .3 1.1.3 Sub-problem II: Understanding developers’ sentiments . .4 1.1.4 Contribution . .4 1.2 Mining and Analyzing Impacts of Developers’ Copy-Paste Actions . .5 1.2.1 Sub-problem III: Understanding impacts of developers copy-paste action .5 1.2.2 Contribution . .6 1.3 Mining and Analyzing Fixing Patterns of Developers’ Mistakes . .6 1.3.1 Sub-problem IV: Understanding fixing patterns of software bugs . .6 1.3.2 Contribution . .7 1.4 Outline of the Thesis . .7 2 Background 9 2.1 Software Engineering Terminologies . .9 2.2 Sentiment Related Terminologies . 10 2.2.1 Dimensional Framework of Emotions . 10 2.2.2 A Popular Sentiment Analysis Technique in Software Engineering . 11 2.3 Terminologies Related to Developers’ Copy-paste Actions . 12 2.3.1 Clone Granularity . 13 2.4 Code Smell . 13 2.5 Security Vulnerabilities . 14 2.6 Summary . 15 3 Research Methodology 16 3.1 Mining and Analyzing Developers’ Sentiments . 16 iv 3.1.1 Improving Sentiment Analysis in Software Engineering Text . 16 3.1.2 A Comparison of Domain Specific Sentiment Analysis Tools . 17 3.1.3 A Comparison of Dictionary Building Methods for Sentiment Analysis . 18 3.1.4 Detection of Emotions in the Valence-Arousal Space . 19 3.1.5 Improving Detection of Emotions in the Valence-Arousal Space . 20 3.1.6 Understanding and Exploiting Developers’ Sentimental Variations . 21 3.1.7 Sentiment Analysis of Software Bug Related Commit Messages . 22 3.2 Mining and Analyzing Impacts of Developers’ Copy-Paste Actions . 23 3.2.1 A Study on Code Smells in Categories of Clones and Non-Cloned Code . 23 3.2.2 A Study on Security Vulnerabilities in Clones and Non-Cloned Code . 24 3.2.3 On the Characteristics of Buggy Code Clones: A Code Quality Perspective 24 3.3 Mining and Analyzing Fixing Patterns of Developers’ Mistakes . 25 3.4 Summary . 26 4 Detecting Developers’ Sentiments 27 4.1 Introduction . 27 4.2 Exploratory Study of the Difficulties in Sentiment Analysis . 29 4.2.1 Benchmark Data . 30 4.2.2 Emotional Expressions to Sentimental Polarities . 30 4.2.3 Computation of emotional scores from human rated dataset . 31 4.2.3.1 Illustrative Example of Computing sentimental Polarity . 32 4.2.4 Sentiment Detection Using SentiStrength ................ 32 4.2.5 Analysis and Findings . 33 4.2.5.1 Insights into SentiStrength’s Internal Algorithm . 34 4.2.5.2 Difficulties in Automated Sentiment Analysis in Software Engi- neering . 34 4.3 Leveraging Automated Sentiment Analysis . 39 4.3.1 Creating a New Domain Dictionary for Software Engineering . 40 4.3.1.1 Further Enhancements to the Preliminary Domain Dictionary . 41 4.3.2 Inclusion of Heuristics in Computation of Sentiments . 42 4.3.2.1 Addition of Contextual Sense to Minimize Ambiguity . 42 4.3.2.2 Bringing Neutralizers in Effect . 43 4.3.2.3 Integration of a Preprocessing Phase . 43 4.3.2.4 Parameter Configuration for Better Handling of Negations . 44 4.4 Empirical Evaluation of SentiStrength-SE .................... 44 4.4.1 Head-to-head Comparison Using a Benchmark Dataset . 46 4.4.2 Comparison with respect to Human Raters’ Disagreements . 48 v 4.4.3 Evaluating the Contribution of Domain Dictionary . 51 4.4.3.1 Comparison between the SentiStrength and SentiStrength* 52 4.4.4 Our Domain Dictionary vs. SentiStrength’s Optimized Dictionary . 54 4.4.4.1 Optimizing SentiStrength’s Dictionary . 54 O 4.4.4.2 Comparison between SentiStrength and SentiStrength*. 55 4.4.5 Comparison with a Large Domain-independent Dictionary . 57 4.4.5.1 Choosing a Domain Independent Dictionary for Comparison . 57 4.4.5.2 Range Conversion . 58 W 4.4.5.3 Comparison between SentiStrength and SentiStrength*. 58 4.4.5.4 Manual Investigation to Reveal Cause . 60 4.4.6 Comparison with an Alternative Domain Dictionary . 61 4.4.6.1 Building an Alternative Domain Dictionary . 61 4.4.6.2 Comparison between the New Dictionary and SentiStrength-SE’s Dictionary . 63 4.4.6.3 Manual Investigation to Determine Reasons . 64 4.4.7 Evaluating the Contributions of Heuristics . 65 4.4.7.1 Further Manual Investigation . 68 4.4.8 Qualitative Evaluation of SentiStrength-SE . 68 4.4.9 Threats to Validity . 70 4.4.9.1 Construct Validity and Internal Validity . 70 4.4.9.2 External Validity . 71 4.4.9.3 Reliability . 71 4.5 Limitations of SentiStrength-SE and Future Possibilities . 72 4.6 Related work . 73 4.7 Summary . 76 5 Comparison of Sentiment Analysis Tools 78 5.1 Introduction . 78 5.2 Datasets . 79 5.2.1 JIRA Issue Comments (JIC) Dataset . 79 5.2.1.1 Emotional Expressions to Sentimental Polarities . 79 5.2.1.2 Assignment of Sentiments to Text . 80 5.2.2 Stack Overflow Posts (SOP) Dataset .

Load more