Scoring Answers in Stack Overflow by Raul Quintana

Scoring Answers in Stack Overflow by Raul Quintana

Predictive Model: Using Text Mining for Determining Factors Leading to High- Scoring Answers in Stack Overflow by Raul Quintana Selleras B.S. in Information Technology, August 2012, Florida International University B.A. in Religious Studies, December 2012, Florida International University M.S. in Information Systems, May 2015, The University of Texas at Arlington A Praxis submitted to The Faculty of The School of Engineering and Applied Science of The George Washington University in partial fulfillment of the requirements for the degree of Doctor of Engineering January 10, 2020 Praxis directed by Timothy Blackburn Professorial Lecturer of Engineering Management and Systems Engineering Amir Etemadi Associate Professor of Engineering and Applied Science The School of Engineering and Applied Science of The George Washington University certifies that Raul Quintana Selleras has passed the Final Examination for the degree of Doctor of Engineering as of October 15, 2019. This is the final and approved form of the Praxis. Predictive Model: Using Text Mining for Determining Factors Leading to High- Scoring Answers in Stack Overflow Raul Quintana Selleras Praxis Research Committee: Timothy Blackburn, Professorial Lecturer of Engineering Management and Systems Engineering, Praxis Co-Director Amir Etemadi, Associate Professor of Engineering and Applied Science, Praxis Co-Director Ebrahim Malalla, Visiting Associate Professor of Engineering and Applied Science, Committee Member ii © Copyright 2019 by Raul Quintana Selleras All rights reserved iii Dedication The author wishes to dedicate this dissertation to his daughter, Alexia Quintana and to his wife, Kristina Quintana for their unconditional support. Also, the author would like to thank his parents, Raul Quintana Sarduy and Gilda Selleras Rivas, whose encouragement was vital to his educational accomplishments. iv Acknowledgments The author wishes to acknowledge his praxis director, Dr. Timothy Blackburn; his editor, Peter Rosenbaum, along with all faculty and staff from the Doctor of Engineering program and the students from the seventh cohort. The author thanks Andrew Rothman and Lucas Longan for their insightful suggestions. v Abstract of Praxis Predictive Model: Using Text Mining for Determining Factors Leading to High- Scoring Answers in Stack Overflow With the advent of knowledge-based economies, knowledge transfer within online forums has become increasingly important to the work of IT teams. Stack Overflow, for example, is an online community in which computer programmers can interact and consult with one another to achieve information flow efficiencies and bolster their reputations, which are numerical representations of their standings within the platform. The high volume of information available in Stack Overflow in the context of significant variance in members’ expertise and, hence, the quality of their posts hinders knowledge transfer and causes developers to waste valuable time locating good answers. Additionally, invalid answers can introduce security vulnerabilities and/or legal risks. By conducting text analytics and regression, this research presents a predictive model to optimize knowledge transfer among software developers. This model incorporates the identification of factors (e.g., good tagging, answer character count, tag frequency) that reliably lead to high-scoring answers in Stack Overflow. Upon applying natural language processing, the following variables were found to be significant: (a) the number of answers per question, (b) the cumulative tag score, (c) the cumulative comment score, and (d) the bags of words’ frequency. Additional methods were used to identify the factors that contribute to an answer being selected by the user who posted the question, the community at large, or both. vi Predicting what constitutes a good, accurate answer helps not only developers but also Stack Overflow, as the site can redesign its user interface to make better use of its knowledge repository to transfer knowledge more effectively. Likewise, companies who use the platform can decrease the amount of time and resources invested in training, fix software bugs faster, and complete challenging projects in a timely fashion. vii Table of Contents Dedication ......................................................................................................................... iv Acknowledgments ............................................................................................................. v Abstract of Praxis ............................................................................................................ vi List of Figures .................................................................................................................... x List of Tables ................................................................................................................... xii List of Symbols / Nomenclature .................................................................................... xiii Glossary of Terms .......................................................................................................... xiv Chapter 1: Introduction ....................................................................................................... 1 1.1 Background ....................................................................................................... 1 1.2 Research Motivation ......................................................................................... 5 1.3 Problem Statement ............................................................................................ 6 1.4 Thesis Statement ............................................................................................... 8 1.5 Research Objectives ........................................................................................ 10 1.6 Research Questions and Hypotheses .............................................................. 12 1.7 Scope of Research ........................................................................................... 14 1.8 Research Limitations ...................................................................................... 14 1.9 Organization of Praxis .................................................................................... 15 Chapter 2: Literature Review ............................................................................................ 17 2.1 Introduction ..................................................................................................... 17 2.2 Information, Knowledge, and Related Concepts ............................................ 18 2.3 Digging into Stack Overflow .......................................................................... 21 2.4 Knowledge Transfer ........................................................................................ 24 viii 2.5 Online Forums ................................................................................................ 31 2.6 Summary and Conclusions ............................................................................. 33 Chapter 3: Methodology ................................................................................................... 39 3.1 Introduction ..................................................................................................... 39 3.2 Data Collection and Analysis .......................................................................... 43 3.3 Research Methods ........................................................................................... 47 Chapter 4: Results ............................................................................................................. 57 4.1 Introduction ..................................................................................................... 57 4.2 Data Collection and Preprocessing ................................................................. 59 4.3 Predictive Models ........................................................................................... 67 4.4 Case Studies .................................................................................................... 80 Chapter 5: Discussion and Conclusions ............................................................................ 86 5.1 Discussion ....................................................................................................... 86 5.2 Conclusions ..................................................................................................... 86 5.3 Contributions to Body of Knowledge ............................................................. 88 5.4 Recommendations for Future Research .......................................................... 88 References ......................................................................................................................... 92 Appendix A ..................................................................................................................... 102 Appendix B ..................................................................................................................... 134 ix List of Figures Figure 1-1. Stack Overflow question. ................................................................................. 4 Figure 1-2. Stack Overflow answer. ................................................................................... 4 Figure 1-3. Stack Overflow and optimal answer region. .................................................. 11 Figure 2-1. Interest graph.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    151 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us