Mining DEV for social and technical insights about software development

Maria Papoutsoglou∗†, Johannes Wachs‡§, Georgia M. Kapitsaki† ∗Aristotle University of Thessaloniki, Greece †University of Cyprus, Cyprus ‡Vienna University of Economics and Business, Austria §Complexity Science Hub Vienna, Austria [email protected]; [email protected]; [email protected]

Abstract—Software developers are social creatures: they com- On more socially oriented platforms like Twitter, which limits municate, collaborate, and promote their work in a variety of post lengths to 280 characters, discussions about software mix channels. Twitter, GitHub, Stack Overflow, and other platforms with an endless variety of other content. offer developers opportunities to network and exchange ideas. Researchers analyze content on these sites to learn about trends To address this gap we present a novel source of long-form and topics in software engineering. However, insight mined text data created by people working in software called DEV from the text of Stack Overflow questions or GitHub issues (https://dev.to). DEV is “a community of software developers is highly focused on detailed and technical aspects of software getting together to help one another out,” focused especially development. In this paper, we present a relatively new online on facilitating cooperation and learning. Content on DEV community for software developers called DEV. On DEV users write long-form posts about their experiences, preferences, and resembles blog and Medium posts and, at a glance, covers working life in software, zooming out from specific issues and files everything from programming language choice to technical to reflect on broader topics. About 50,000 users have posted over specifications of systems to social factors in software. Other 140,000 articles related to software development. In this work, users can interact with posts by liking them, commenting on we describe the content of posts on DEV using a topic model, them, or sharing them on other platforms. Contributions to showing that developers discuss a rich variety and mixture of social and technical aspects of software development. We show DEV are longer than tweets: the nearly 140,000 texts in our that developers use DEV to promote themselves and their work: dataset contain on average 710 words (median 461, stdev. 990). 83% link their profiles to their GitHub profiles and 56% to Users can fill out an extensive profile, including links to their their Twitter profiles. 14% of users pin specific GitHub repos in other accounts on the web and to their personal projects on their profiles. We argue that DEV is emerging as an important GitHub. This suggests that DEV has significant potential for hub for software developers, and a valuable source of insight for researchers to complement data from platforms like GitHub and developers to promote themselves and to drive attention to Stack Overflow. their software projects. Index Terms—Developers, social networks, human factors, Text posted on DEV has a broader perspective than content topic modeling, GitHub, DEV that can be collected from GitHub or Stack Overflow, while remaining focused on software. Potential applications include I.INTRODUCTION predicting language and framework popularity, measuring the social attitudes of developers, and understanding the general Online communities and platforms play an important role in trends in software. As we will show, many DEV users share the professional lives of software developers [1]. Developers links to their profiles on multiple platforms including GitHub, find ideas, collaborators, and even jobs through connections on Twitter and Linkedin, highlighting the platform’s potential for software oriented platforms like GitHub and Stack Overflow, arXiv:2103.17054v2 [cs.SE] 16 May 2021 the study of cross-platform behavior [5]. About 85% of the and on traditional social media including Facebook and Twitter users in our dataset listing both a GitHub and Twitter account [2]. Their interactions and the content posted on these plat- are not in the state of the art dataset linking GitHub and forms are a valuable source of information about the practice Twitter from MSR 2020 [6]. A significant number of users link of software engineering. The software engineering research to platforms of emerging interest to the software engineering community has long been interested in predicting current community such as Youtube and Twitch. practice and future trends using digital trace data [3], [4]. The aim of our work is to give the empirical software Despite the richness of these data sources, researchers using engineering community a first look at DEV and to promote them in this way often make broad inferences about software its potential for research. We collect data from the platform engineering from highly contextualized content. GitHub issues and apply a topic modeling approach to map the topics of or commit messages generally refer to specific pieces of code. conversation in the community. We find that users discuss Stack Overflow questions also tend to be highly specialized. a rich mixture of social and technical aspects of software To appear in the Proceedings of the 18th International Conference on development, often in the same post. We also describe the Mining Software Repositories (MSR 2021). information users share about themselves, noting that many users link to their accounts on multiple other sites, and to stored metadata and author information for every article. For personal projects. We highlight the potential of this new each unique user we crawled the information available on their platform to complement data sources traditionally used by the public profile, including links to other platforms. Using the software engineering community to study social aspects of official DEV API (https://docs.forem.com/api/), we identified software and trends in development. more than 53,000 posting users, which we used to verify the completeness of our collection of articles. Roughly 33% of II.RELATED WORK users wrote 80% of articles in our corpus. As an article can Open source software developers use a variety of channels have more than one tag, we removed duplicate articles based to promote themselves and their projects [7]. As part of these on their URL and title. We extracted the full text of 147,030 channels social media plays an important role in software engi- articles. Using a language detection method we filtered out neering [8], offering communication (sharing content, meeting non-English articles leaving a corpus of 138,925 articles. new people), coordination (about events, releases, calls to Preprocessing articles: In the data collection step we action) and promotion (launches, new opportunities). Social retrieved the HTML format of the text; so we were able to find media and blogs also play an important role in the labor market specific HTML elements which we used to process the data. for job-seekers and recruiters alike, especially when combined In DEV posts, as in Stack Overflow questions and answer, with content from online coding platforms. Companies and code snippets are enclosed in the HTML projects expend considerable effort and resources to attract . As we are interested for the natural language part the best developers, including examining the social web [9]. of the articles, we removed these code snippets, other HTML This online presence of developers can be utilized directly, for elements (e.g.

  • or

    ), and URLs. Using the Quanteda example matching GitHub profiles to job ads [10]. [21] R package we ran the texts through a standard prepro- Text analysis has been applied to many kinds of data cessing pipeline, removing punctuation, number symbols, and generated by software developers, including data coming from stopwords, and casting to lowercase. We stemmed the remain- the above sources. Though typically unstructured, text con- ing words in each text and created unigrams and bigrams. After tains important information about how people think and feel evaluating preliminary results from various topic models, we about software development. Common sources for text about discarded the unigrams as topics derived from bigrams were software include GitHub [11], Bugzilla [12], Stack Overflow found to have significantly richer meaning [22]. and mailing lists [13]. A common approach is to fit a topic Topic modeling setup: Before proceeding with the topic model to a corpus. A topic model is an unsupervised method modeling, we trimmed bigrams appearing in fewer than 90 which assigns topics to documents. This approach has been texts and less than 200 times in total to reduce noise in the applied to data from technical contexts including GitHub corpus. An important step before finalizing a topic modeling pull requests [14], Stack Overflow Q&A and other Stack approach is to determine the best K number of topics. A Exchange sites [15]–[17], and Gitter [18]. These sources focus popular metric to find the best fitted K value is the use on specific aspects of the software development process, and of Cv coherence score [16], [23]. We performed a range of the resulting corpora are oriented towards technical themes. different experiments by varying K from 5 to 50 in steps of But software engineering researchers are not only interested five recording each run’s coherence score. 30 topics had a in how developers talk about specific technical issues. Seeking high coherence score and was also evaluated by two authors to find data that better captures social aspects of software for interpretability. Having chosen K = 30 topics, we fit the development, including information about preferences, team- model using the STM package in the R programming language work, collaboration, and potential trends, researchers study not [24]. This package provides different implementations of just technical platforms like GitHub [19] but also broader so- structured models including two of the most common initial- cial communities like Twitter [20]. Within such communities, ization methods: “spectral” and “LDA”. LDA (Latent Dirichlet researchers can to some extent identify conversations which Allocation) is the most frequently used topic modeling method, are likely about software engineering. Researchers have not but the spectral model recovers similarly high quality topics had access to data from an online community dedicated to from large (greater than 50,000 documents) corpora at lower social aspects of software development and computing. We computational cost [25], [26]. As our corpus is large in this present this analysis of DEV to fill this gap in the literature. sense, we proceed using spectral topic modeling. Our work, to our knowledge the first to analyze DEV, seeks to promote the use of this new dataset in software engineering IV. RESULTS research. A. What do developers write about? III.DATA COLLECTION AND ANALYSIS SETUP We present and summarize the topics found by our analysis Data collection: On DEV, users typically tag their articles in Table I. The table includes a description of each topic, to improve the article’s visibility and attract readers. Users our categorization of the topic as either social, technical or browsing DEV can search for articles by tag. We wrote and mixed, the relative frequency of the topic, and the top five deployed a customized web crawler that searches for all tags distinguishing bigrams. The topics we found reflect the rich on the platform and collects all articles using that tag. We variety of content on DEV. Though we categorize most topics # Description Category Rel. Freq. Key Bigrams T1 ESS EPICS Environment (E3) Technical .016 e3 e3, host , nil return, overrid fun, error messag T2 NPM Technical .040 project post, npm instal, npm run, file call, config file T3 Learning to program Mixed .046 start learn, learn python, learn , front end, game develop T4 Supporting software and tools Mixed .046 week back, make life, life easier, make sens, find work T5 Text editors Technical .026 visual studio, text editor, oper system, command prompt, code extens T6 Databases Technical .031 primari key, graphql api, sql server, graphql queri, app express T7 User activity logging Technical .007 logrocket record, pixel-perfect video, instrument dom, stacktrac network, replay problem T8 Asynchrony Technical .029 email list, promis resolv, async function, callback function, event handler T9 React Technical .041 subscrib email, react import, react compon, button onclick, compon render T10 Team management Social .041 great product, softwar develop, project manag, technic debt, team work T11 ASP.NET Technical .020 asp net, net core, public class, spring boot, async task T12 Web Development/Learning Mixed .035 blog post, web monet, rubi rail, node js, learn node T13 Work-life balance Social .080 week time, remot work, work home, social media, spend time T14 Automatic testing Technical .021 open full, unit test, write test, autom test, test suit T15 Foundations Technical .037 prime number, data structur, time complex, sort algorithm, hash tabl T16 Open Source Mixed .040 open sourc, sourc contribut, pull request, view github, version control T17 Data Science Technical .028 machin learn, data scienc, neural network, deep learn, artifici intellig T18 REST APIs and the Web Technical .028 rest api, http request, status code, web server, api call T19 APIs Technical .036 api key, send messag, access token, send email, twitter api T20 Code Review, Challenges Mixed .045 code review, code challeng, solv problem, clean code, make code T21 Function Handling Technical .043 function call, return function, function program, data type, declar variabl T22 Containers and versioning Technical .051 docker ps, commit, git add, docker run, git push T23 Mobile Technical .037 subscrib youtub, react nativ, mobil app, io android, build app T24 Site accessibility/UX Mixed .038 static site, site generat, load lazi, screen reader, user experi T25 Skills and learning Social .048 program languag, soft skill, junior develop, appli job, find job T26 Technical .029 api gateway, googl cloud, ec2 instanc, s3 bucket, kubernet cluster T27 Systems Technical .032 smart contract, distribut system, web applic, microservic architectur, applic secur T28 Webforms Technical .010 input type, type text, form field, usest const, form onsubmit T29 CSS Technical .031 flex contain, display flex, posit absolut, css grid, media queri T30 Rust Technical .011 amp mut, sourc code, code generat, compil time, syntax tree TABLE I: Topics of the DEV corpus. We derive both the listed description and category from the key bigrams and an inspection of characteristic articles and their tags. The relative frequency of a topic refers to how often documents draw from that topic. as relating to technical topics, several of the most frequently mentioned topics can be classified as social. 23 Social Topics The assignment of posts to topics in a topic model is not Mixed Topics one to one. Indeed, a strength of the topic modeling approach 24 Technical Topics is that each document is modeled as a mixture of topics. The notion that a text can address multiple topics is referred 29 7 to as heteroglossia [27]. As topics can appear together, we 9 can visualize the overall structure of the DEV corpus using a 28 topic network [28]. In a topic network, nodes represent topics and two nodes are connected by an edge weighted by the 8 11 correlation between those topics across the corpus. 15 21 6 As all pairs of topics have some correlation between them, 1 12 this network needs to be filtered. We filter the topic-topic 19 18 correlation network by calculating its planar maximally filtered 30 2 graph (PMFG) [29]. The PMFG extends the maximal spanning 4 26 tree (MST) approach to filtering weighted networks. While the 5 22 20 27 MST returns a tree, the PMFG returns a planar subgraph of 14 17 the original graph with maximum edge weights (correlations). 10 Planar graphs can include triangles and 4-cliques, making them 25 significantly more dense than trees while still filtering out most 13 16 edges. We plot the resulting network in Figure 1, distinguish- 3 ing nodes we classified as social, mixed, or technical by their color. Node sizes increase with topic frequency in the corpus. Fig. 1: Topic correlation network. Nodes are topics, connected The network reveals that topics we labeled as social (T10, by edges if they frequently appear in the same documents. T13, T25) often occur in the same document and are among Edges were selected using the planar maximally filtered graph the most frequent topics. We also note that general program- method. Node sizes increase with topic frequency in the cor- ming topics like T8 (Asynchrony), T15 (Foundations), and pus. Numbers refer to topics described in Table I. Red nodes T21 (Function Handling) play a bridging role between web are topics we classified as social, green nodes as technical, and yellow nodes as mixed. development-related topics (T9, T28, T29) and the rest of the Linked Platform Share of Users Count network. T22 (Containers and versioning) is also a bridging GitHub/GitLab 84% 32,136 topic. Our interpretation is that these topics are describing — Has pinned Repo(s) on DEV 14% 5,346 aspects of software dealt with by a wide variety of developers Twitter 56% 21,492 and so it is natural that they connect topics that are more Linkedin 32% 12,320 Medium 12% 4,595 specialized. At a glance, the network reveals the multi-faceted Stack Overflow 11% 4,083 nature of the DEV community and what its users write about. Instagram 8% 3,144 Facebook 8% 2,864 B. How do developers present themselves? Youtube 3% 1,190 To learn more about the users themselves, we now describe Twitch 2% 778 how they present themselves in their profiles. As in most GitHub/GitLab & Twitter 40% 15,316 online communities, DEV users have personal profile pages. GitHub/GitLab & Linkedin 29% 11,117 GitHub/GitLab & Twitter & Linkedin 19% 7,207 On these pages users share information about themselves, including a brief biography, their location, workplace, and TABLE II: Links to other platforms on DEV user profiles. education. They can also link to their accounts on other platforms including GitHub, GitLab, Stack Overflow, Medium, Twitter, and Linkedin. There is also a part of the user profile reserved for GitHub repositories, allowing users to directly and social aspects of software development. A significant promote their work [30]. Users can also list skills, interests, number of posts in our corpus deal with purely social as- what they are learning, and what they are available for. The pects of software development, underscoring that DEV offers profile reports user statistics, like the number of posts and information quite different from what researchers might find comments a user has made, and lists badges, gamification on Stack Overflow or GitHub. We also found that DEV users elements on DEV, that the user has collected. are highly networked in the sense that many of them link to Here we focus on the extent to which users fill out their their accounts on other platforms in their profiles. profiles and how often they link to other platforms. We do this In future work we plan to explore the dynamic evolution of to emphasize the role of DEV as a social hub for developers topics, to better assess DEV’s potential for forecasting trends. and as a source for identifying users across platforms. We We also plan to integrate user feedback on posts into our report the share and count of users sharing links to other analysis, aiming to describe which posts and topics are more platforms in their profiles in Table II. Users most frequently popular or controversial. Different users, characterized by how share their GitHub and/or GitLab profiles (84%). A majority they fill out their profiles, likely post different kinds of content. of users (56%) share their Twitter profiles. Over 12 thousand We also plan to relate such user differences to differences in posting users share their Linkedin accounts, with potential content. For example, users describing themselves as managers value for studies of the labor market in software. We also note or leads might be more likely to post about coordination and that some users on DEV also link their Youtube and Twitch collaboration than about technical subjects. accounts, two platforms which are of growing interest to the software engineering community [?], [?]. DEV also has significant potential for research on the labor market for software, complementing data from GitHub or We also report several key intersections: about 40% of users Stack Overflow [31]. On DEV users describe, in plain text, share both their GitHub profiles and their Twitter accounts. their skills and preferences on their user profiles and indirectly To what extent is this new data on cross-site linkages? We via their posts. Data from DEV could be used to revisit downloaded the replication materials of a paper published in important social factors in software engineering including MSR 2020 by Fang et al. which linked around 70 thousand gender gaps and disparities [32]–[34], the relationship between individuals across the two platforms [6] and checked to what geography and collaboration [35], [36], and the effects of extent our data overlaps. Of the 15,316 users in our dataset gamification on user behavior [37]. Of particular interest is sharing both a GitHub and Twitter account, 13,027 (85%) the site’s social network, the structure and dynamics of which were not in the Fang et al. dataset, representing a substantial could be contrasted with developer networks on GitHub [38]. increase in the number of accounts that can be linked. As the Before these projects can proceed, DEV data needs to be DEV community continues to grow (13,732 posters joined in suitably anonymized to respect user privacy, especially when 2019 vs 6,006 in 2018), its value as a data source for cross- linking user identities across platforms [6]. platform linkages will only increase. In the broad and growing ecosystem of online communities V. DISCUSSION relating to software development, DEV plays an important In this paper we presented DEV, an online platform for role. On DEV, users post about software from a broad perspec- software developers that serves as a unique platform for social tive, integrating social and technical ideas in one place. For developers. We shared a first look at the community using the software engineering research community, DEV presents a topic modeling approach to understand what its users are an opportunity to better understand the social and networked talking about. We found that they discuss a variety of technical software developer. REFERENCES [21] K. Benoit, K. Watanabe, H. Wang, P. Nulty, A. Obeng, S. Muller,¨ and A. Matsuo, “quanteda: An r package for the quantitative analysis of textual data,” Journal of Open Source Software, vol. 3, no. 30, p. 774, [1] M.-A. Storey, L. Singer, B. Cleary, F. Figueira Filho, and A. Zagalsky, 2018. “The (r) evolution of social media in software engineering,” in Future [22] M. Pejic-Bach, T. Bertoncel, M. Mesko,ˇ and Zivkoˇ Krstic,´ “Text mining of Software Engineering Proceedings, 2014, pp. 100–116. of industry 4.0 job advertisements,” International Journal of Information [2] A. Begel, J. Bosch, and M.-A. Storey, “Social networking meets software Management, vol. 50, pp. 416 – 431, 2020. development: Perspectives from GitHub, MSDN, Stack Exchange, and [23] J. Han, E. Shihab, Z. Wan, S. Deng, and X. Xia, “What do programmers topcoder,” IEEE Software, vol. 30, no. 1, pp. 52–66, 2013. discuss about deep learning frameworks,” EMPIRICAL SOFTWARE [3] J. Rech, “Discovering trends in software engineering with google trend,” ENGINEERING, 2020. ACM SIGSOFT software engineering notes, vol. 32, no. 2, pp. 1–2, 2007. [24] M. E. Roberts, B. M. Stewart, and D. Tingley, “Stm: An r package for [4] H. Borges, A. Hora, and M. T. Valente, “Predicting the popularity structural topic models,” Journal of Statistical Software, vol. 91, no. 1, of github repositories,” in Proceedings of the The 12th International pp. 1–40, 2019. Conference on Predictive Models and Data Analytics in Software [25] S. Arora, R. Ge, Y. Halpern, D. Mimno, A. Moitra, D. Sontag, Y. Wu, Engineering, 2016, pp. 1–10. and M. Zhu, “A practical algorithm for topic modeling with provable [5] B. Vasilescu, V. Filkov, and A. Serebrenik, “Stackoverflow and github: guarantees,” in International Conference on Machine Learning. PMLR, Associations between software development and crowdsourced knowl- 2013, pp. 280–288. edge,” in 2013 International Conference on Social Computing. IEEE, [26] B. Zafari and T. Ekin, “Topic modelling for medical prescription fraud 2013, pp. 188–195. and abuse detection,” Journal of the Royal Statistical Society: Series C [6] H. Fang, D. Klug, H. Lamba, J. Herbsleb, and B. Vasilescu, “Need for (Applied Statistics), vol. 68, no. 3, pp. 751–769, 2019. tweet: How open source developers talk about their github work on [27] P. DiMaggio, M. Nag, and D. Blei, “Exploiting affinities between topic twitter,” in Proceedings of the 17th International Conference on Mining modeling and the sociological perspective on culture: Application to Software Repositories, 2020, pp. 322–326. newspaper coverage of us government arts funding,” Poetics, vol. 41, [7] H. S. Borges and M. T. Valente, “How do developers promote open no. 6, pp. 570–606, 2013. source projects?” Computer, vol. 52, no. 8, pp. 27–33, 2019. [28] D. M. Blei, J. D. Lafferty et al., “A correlated topic model of science,” [8] M.-A. Storey, C. Treude, A. van Deursen, and L.-T. Cheng, “The impact The Annals of Applied Statistics, vol. 1, no. 1, pp. 17–35, 2007. of social media on software engineering practices and tools,” in Pro- [29] M. Tumminello, T. Aste, T. Di Matteo, and R. N. Mantegna, “A tool for ceedings of the FSE/SDP workshop on Future of software engineering filtering information in complex systems,” Proceedings of the National research, 2010, pp. 359–364. Academy of Sciences, vol. 102, no. 30, pp. 10 421–10 426, 2005. [9] A. Capiluppi, A. Serebrenik, and L. Singer, “Assessing technical can- [30] Y. Lu, X. Mao, T. Wang, G. Yin, Z. Li, and W. Wang, “Studying in the didates on the social web,” IEEE software, vol. 30, no. 1, pp. 45–51, “bazaar”: An exploratory study of crowdsourced learning in github,” 2012. IEEE Access, vol. 7, pp. 58 930–58 944, 2019. [10] C. Hauff and G. Gousios, “Matching github developer profiles to [31] M. Papoutsoglou, N. Mittas, and L. Angelis, “Mining people analytics 2017 43rd Euromicro Con- job advertisements,” in 2015 IEEE/ACM 12th Working Conference on from stackoverflow job advertisements,” in ference on Software Engineering and Advanced Applications (SEAA) Mining Software Repositories. IEEE, 2015, pp. 362–366. . IEEE, 2017, pp. 108–115. [11] A. Sharma, F. Thung, P. S. Kochhar, A. Sulistya, and D. Lo, “Cataloging [32] D. Nafus, “‘patches don’t have gender’: What is not open in open source github repositories,” in Proceedings of the 21st International Conference software,” New Media & Society, vol. 14, no. 4, pp. 669–683, 2012. on Evaluation and Assessment in Software Engineering, 2017, pp. 314– [33] A. May, J. Wachs, and A. Hannak,´ “Gender differences in participation 319. and reward on stack overflow,” Empirical Software Engineering, vol. 24, [12] M. Sadat, A. B. Bener, and A. Miranskyy, “Rediscovery datasets: no. 4, pp. 1997–2019, 2019. Connecting duplicate reports,” in 2017 IEEE/ACM 14th International [34] G. Catolino, F. Palomba, D. A. Tamburri, A. Serebrenik, and F. Ferrucci, Conference on Mining Software Repositories (MSR). IEEE, 2017, pp. “Gender diversity and women in software teams: How do they affect 527–530. community smells?” in 2019 IEEE/ACM 41st International Conference [13] A. Zagalsky, D. M. German, M.-A. Storey, C. G. Teshima, and G. Poo- on Software Engineering: Software Engineering in Society (ICSE-SEIS). Caamano,˜ “How the R community creates and curates knowledge: an IEEE, 2019, pp. 11–20. extended study of stack overflow and mailing lists,” Empirical Software [35] M. Cataldo and S. Nambiar, “The impact of geographic distribution Engineering, vol. 23, no. 2, pp. 953–986, 2018. and the nature of technical coupling on the quality of global software [14] M. M. Rahman and C. K. Roy, “An insight into the pull requests of development projects,” Journal of software: Evolution and Process, github,” in Proceedings of the 11th Working Conference on Mining vol. 24, no. 2, pp. 153–168, 2012. Software Repositories, 2014, pp. 364–367. [36] G. A. A. Prana, D. Ford, A. Rastogi, D. Lo, R. Purandare, and N. Na- [15] G. M. Kapitsaki, M. Papoutsoglou, D. M. German, and L. Angelis, gappan, “Including everyone, everywhere: Understanding opportunities “What do developers talk about open source software licensing?” in 2020 and challenges of geographic gender-inclusion in oss,” arXiv preprint 46th Euromicro Conference on Software Engineering and Advanced arXiv:2010.00822, 2020. Applications (SEAA). IEEE, 2020, pp. 72–79. [37] L. Moldon, M. Strohmaier, and J. Wachs, “How gamification affects [16] M. U. Haque, L. H. Iwaya, and M. A. Babar, “Challenges in docker software developers: Cautionary evidence from a natural experiment development: A large-scale study using stack overflow,” in Proceedings on github,” in 43rd International Conference on Software Engineering of the 14th ACM/IEEE International Symposium on Empirical Software (ICSE 2021), 2021. Engineering and Measurement (ESEM), 2020, pp. 1–11. [38] A. Lima, L. Rossi, and M. Musolesi, “Coding together at scale: Github [17] A. Abdellatif, D. Costa, K. Badran, R. Abdalkareem, and E. Shihab, as a collaborative social network,” in Proceedings of the International “Challenges in chatbot development: A study of stack overflow posts,” AAAI Conference on Web and Social Media, vol. 8, no. 1, 2014. in Proceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 174–185. [18] O. Ehsan, M. El Mezouar, S. Hassan, and Y. Zou, “An empirical study of developer discussions in the gitter platform,” ACM Transactions on Software Engineering and Methodology, vol. 30, 2020. [19] B. Vasilescu, D. Posnett, B. Ray, M. G. van den Brand, A. Serebrenik, P. Devanbu, and V. Filkov, “Gender and tenure diversity in github teams,” in Proceedings of the 33rd annual ACM conference on human factors in computing systems, 2015, pp. 3789–3798. [20] L. Singer, F. Figueira Filho, and M.-A. Storey, “Software engineering at the speed of light: how developers stay current using twitter,” in Pro- ceedings of the 36th International Conference on Software Engineering, 2014, pp. 211–221.