The Curious Case of Posts on Stack Overflow

The Curious Case of Posts on Stack Overflow

The curious case of posts on Stack Overflow Shailja Shukla Subject: (Information Systems) Corresponds to: (30 hp) Presented: (VT 2020) Supervisor: Mudassir Imran Mustafa Department of Informatics and Media 1 Contents Abstract ...................................................................................................................................... 6 Acknowledgements .................................................................................................................... 7 Chapter 1 .................................................................................................................................... 8 1. Introduction ........................................................................................................................ 8 1.1. Background ................................................................................................................ 8 1.2. Motivation ................................................................................................................ 10 1.2 Research Questions .................................................................................................. 11 1.3 Delimitation: ............................................................................................................ 12 1.4 Limitation:................................................................................................................ 12 Chapter 2 .................................................................................................................................. 13 2. Theory ............................................................................................................................... 13 2.1 Topic Modelling: ..................................................................................................... 13 2.2 Latent Dirichlet Allocation (LDA): ......................................................................... 14 2.3 Related Work ........................................................................................................... 15 Chapter 3 .................................................................................................................................. 17 3. Methodology:.................................................................................................................... 17 3.1 Data Collection: ....................................................................................................... 18 3.2 Data Extraction: ....................................................................................................... 18 3.2.1 Schema: ................................................................................................................. 19 3.3 Data Pre-processing: ................................................................................................ 20 3.1.1 Subset corpus data: .............................................................................................. 20 3.1.2 Remove code snippets: ........................................................................................ 21 3.3.3 Combine related documents to form a single corpus: .......................................... 22 3.3.4 Tokenization: ....................................................................................................... 22 3.3.5 Lowercasing: ........................................................................................................ 23 3.3.6 Remove punctuations: .......................................................................................... 23 3.3.7 Text Standardization/Replace Contractions:........................................................ 23 3.3.8 Remove stop words: ............................................................................................. 24 3.3.9 Remove URLs:..................................................................................................... 24 3.3.10 Minimum size words: ...................................................................................... 24 3.3.11 Remove multiple whitespaces: ........................................................................ 25 3.3.12 Generate N-Grams: .......................................................................................... 25 3.3.13 Stemming: ........................................................................................................ 25 3.3.14 Lemmatisation: ................................................................................................ 26 2 3.4 Create Dictionary and Term Document Frequency: ................................................ 26 3.5 Run the LDA model: ................................................................................................ 28 Chapter 4 .................................................................................................................................. 29 4 Analysis: ........................................................................................................................... 29 Chapter 5 .................................................................................................................................. 34 5 Result ................................................................................................................................ 34 5.1 RQ1- What are the popular discussion topics in Stack Overflow? .......................... 34 5.1.1 Web as a recurring discussion topic: ................................................................... 36 5.1.2 UI Development as a recurring discussion topic: ................................................ 37 5.1.3 Data management as a recurring discussion topic: .............................................. 37 5.2 RQ2- How does the developer's interest change over time? ................................... 38 5.3 RQ3- How do the interests in specific technologies change over time?.................. 39 5.3.1 React vs Angular .................................................................................................. 39 5.3.2 Python vs JavaScript ............................................................................................ 40 5.3.3. Popular discussion topics related to Web technologies ................................... 40 5.3.4 Relational Databases (RDBMS) .......................................................................... 41 5.3.5 Android vs iOS .................................................................................................... 42 5.3.6 Object-Oriented Programming............................................................................. 43 5.3.7 Machine Learning ................................................................................................ 44 Chapter 6 .................................................................................................................................. 45 6 Validity of research and experiences: ............................................................................... 45 Chapter 7 .................................................................................................................................. 46 7 Conclusion: ....................................................................................................................... 46 Chapter 8 .................................................................................................................................. 47 8 Discussion & Future Work: .............................................................................................. 47 Appendix 1: Tools and technology .......................................................................................... 48 Appendix 2: Popular discussion topics lists among developers: ............................................. 49 Appendix 3: Acronym / Abbreviation Table ........................................................................... 54 References: ............................................................................................................................... 56 3 Table of Figures: Figure 1: Venn Diagram of the intersection of the Text Mining and six related fields (Miner et al., 2012) ................................................................................................................................ 9 Figure 2: Schematic Overview of LDA (Debortoli et al., 2016). ............................................ 14 Figure 3: Methodology Model ................................................................................................. 17 Figure 4: Sample user post before cleaning of code snippet from the text content. ................ 21 Figure 5: Sample user post after cleaning of code snippet from the text content. ................... 21 Figure 6: Title of sample user post .......................................................................................... 22 Figure 7: Body of sample user post ......................................................................................... 22 Figure 8: Combined title and body of sample user post text ................................................... 22 Figure 9: Sample text before pre-processing .......................................................................... 25 Figure 10: Sample text after partial pre-processing ................................................................. 25 Figure 11: Sample text before stemming and lemmatisation................................................... 26 Figure 12: Sample text after stemming and lemmatisation ....................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    61 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us