Learning to program using online forums A comparison of links posted on and Stack Overflow

by Caroline D. Hardin

A thesis submitted in partial fulfillment of the requirements for the degree of

Master’s (Curriculum and Instruction) at the

UNIVERSITY OF WISCONSIN-MADISON 2015

Date of the final oral examination: 8/19/2015 2

Abstract:

The Internet offers a vast number of with content for learning programming. While the United States struggles with a mis-match between the number of people who want to learn computer programming and the formal educational opportunities to do so, online informal communities of learning are growing. Online educational resources also have the potential to help mitigate some of the diversity issues plaguing traditional CS education. While popular press articles and Google searches are one way to sift through the many online learning opportunities available, a more authentic measure may be to examine the links posted on technical forums. I wrote a program to scrape and analyze thousands of posts on two of the most popular Internet forums - Reddit and Stack Overflow - and compare how they differ in answering the question of ‘how to get started learning to program’. Understanding how these communities talk about the available online resources demonstrates the different interests and priorities each community has about what is desirable to learn. In addition, these findings have many practical implications for those interested in computer science education, whether as learners, teachers, or creators of educational content. 3

Introduction: high demand and low access in computer science education

If we look at the six month average for Google searches for the phrase ‘learn programming’ from 2007 to the present day, we see an impressive 54% increase (Google Trends, n.d.). The number of high school students taking the AP Computer Science exam has also been increasing over the last 15 years (Ericson, 2014).

Figure 1: The number of students taking AP CS exam has increased by more than three times since 1998

At this time of rapidly increasing interest in computer science education we are also facing a shortage of traditional, formal educational opportunities for learning it. In 2013, out of approximately 37,000 high schools (US Department of Ed, n.d.), only 2,246 participated in the AP CS exam (Ericson, 2014), which “is a reasonable proxy for the number of AP CS teachers in the country” (Guzdial, 2014b). For a national population of about 16 million high school students, only 29,555 took the AP CS exam (US Department of Ed, n.d.; Ericson, 2014). Cassidy (2013) placed the percent of high schools offering any computer science at less than 10%. Despite these 4 limitations the number of students taking the exam continues to increase.

Computers play an increasingly important role in most aspects of our society, including education, career development, civic participation, entertainment, and creative expression. It must be recognized, however, that how computers can be used depends largely on the computer programs available, and that is often dependent on who is making the programs. Therefore, the best way to ensure the availability of computer programs which serve the widest possible range of people is to have a wide range of diversity among those who create the programs.

The availability of computer science education is often limited in ways which restrict diversity. For example, its availability is more prevalent in areas of higher socioeconomic status, as shown in Figure 2. The fact that high school classes are not serving a diverse population can also be observed by analyzing which students takes the AP CS test (Ericson, 2014; Margolis et al., 2012).

Limited access to CS education in primary and secondary school has an impact on who is likely to be successful in college CS courses, as these courses often function on a model which “presumes prior experience in order for the student to succeed” (Guzdial, 2014b). The expectation that students have prior experience is built into the classes partially because colleges and universities have more students interested than they are able to accommodate (some schools accept less than 20% of applicants (Lohr, 2015), causing “students to characterize introductory courses as weed-out courses” (Lewis, Yasuhara, & Anderson, 2011, pp. 10). Subsequently, students are less likely to major in CS if they have no prior experience (Lewis et al., 5

Figure 2: See how as the median income increases the number of students who take the AP CS exam increases. Data from (Ericson, 2014; US Census Bureau, n.d.)

2011, pp. 10).

Furthermore, for these students without prior experience college CS courses can be intimidating, as the students face a classroom culture which places high status on demonstrating prior experience. In order to improve the chances for these students to persist and become successful in computer science it is critical to expand prior opportunities for exposure to computer science concepts. 6

In addition, the shortage of opportunities to learn CS from traditional, formal pathways contributes to a cultural friction between those who had those opportunities and those who are self-taught, as “problems can arise when students confuse the source of knowledge that can lead to high status: intelligence versus experience. This is especially problematic for those with less experience, a group to which most female CS students belong” (Barker, Garvin-Doxas, & Jackson, 2002, pp. 45). This is compounded by the tensions around ‘fixed mindset’ versus ‘growth mindset’ - the theory that people are on a spectrum as far as how much they believe their skills and talents are innate and immutable versus how much they believe they can grow and learn new skills (Dweck, 2006). In CS educational settings this is often manifested as a belief in a ’geek gene’ (Lee, Heeter, Magerko, & Medler, 2012). These tensions can be seen in the online communities of people learning programming. As one user posted on Reddit,

“Going to school for computer science is not going to help you, I have worked in net sec my whole life starting at 18 and I can tell you in full confidence that it may lower your chances of getting the job you seek, the hard core net sec guys grew up caring about this stuff and look at those who are going to school for it outside their circles. . . ..Going to school for CS tells those who live and breath this field, the ones making 6 figures that the person did not know this field at the start of college and 2-4 years is not going to fix that learning from a bad system. . . .everyone who is entering college now to learn this is going to end up working under 7

those we grew up doing this...” [sic]

The question ‘how do I get started programming?’ is so common on the www.reddit.com/r/learnprogramming community that it has a special ‘frequently asked questions’ page to answer it, saying it is “by far the most asked question on this qsubreddit” (Reddit.com, n.d.-a). At the top of the FAQ page is a link to where Cohen (2011) argues that the very premise of the question is passive, and he suggests the answer to ‘how do I learn to code’ is to, well, ‘learn to code’. This prevalence on forums of the inquiry ‘how do I get started?’, therefore, raises questions which have not yet been addressed by other research about how the millions of participants on these forums think about what it means to learn programming.

As such, the primary research question is:

• How do online programming forums function as learning resources?

More specifically, the thesis focuses on two sub-questions:

• How do the online resources suggested by the Reddit and StackOverflow com- munities differ?

• What does that difference suggest about how people learn to program with these resources? 8

In order to answer the research questions, this work collected and counted the links which were posted in two forums (Reddit, StackOverflow) for learning programming, and then analyzed how often the site names were used in conversation in order to find patterns of activity which demonstrate the learning strategies and values of these communities. With the proliferation of online resources for learn- ing programming and a burgeoning audience for these resources, the results have implications for both students and teachers.

Why a focus on programming?

Although programming, or coding, is only a small part of what comprises the discipline of computer science, it does play a disproportionate role in what the public thinks computer science is, and which computer science skills they think are useful and important to know (Lewis et al., 2011). Due to this perception, this work focuses on the large proliferation of online resources designed to specifically teach programming, including: blogs, podcasts, Wikipedia articles, YouTube tutorials, animations, MOOCs, games, interactive puzzles, competitions, video mentoring, forums for asking and answering questions, collaborative projects, and more. These resources complement traditional avenues - classes and textbooks - for learning programming.

What online sites exist for learning programming?

While it would be impossible to count the total number of sites which offer programming educational content, some have raised millions in funding and garnered 9 significant press. For example, Wortham (2012) states, “The sites and services catering to the learn-to-program market number in the dozens and have names like Code Racer, Women Who Code, Rails for Zombies and CoderDojo. But at the center of the recent frenzy in this field is . . . Since the service was introduced [in 2011], more than a million people have signed up, and it has raised nearly $3 million in venture financing”.

Other sites such as Code.org, which itself has raised over 17 million dollars in funding (Code.org Team, 2015; GuideStar, 2015), have been promoted by no less than the President of the United States (Mechaber, 2014). However, “the challenge for Codecademy and others catering to the hunger for technical knowledge is making sure people actually learn something, rather than dabble in a few basic lessons or walk away in frustration” (Wortham, 2012). This raises the question of how to evaluate the effective impact of the various sites. The larger question of whether people can learn skills online is beyond the scope of this work. One way of measuring impact is to examine the differences between the best-known/best-funded resources, and the resources discussed the most by online forums.

Another reason online resources were the focus of this work is the potential they have to address the access and equity issues discussed earlier. While the number of computer science classes can not be quickly scaled, online resources are particularly well positioned to fill the critical gap between a need for more people with programming experience (Lohr, 2015) and the shortage of learning oppotunities. Although Internet access also have some equity issues, online resources have several 10 structural advantages over traditional classes and textbooks. A great diversity of formats exist for these resources and the vast majority are free and self-paced, making them more accessible for people with limited resources, non-traditional educational backgrounds, complicated work/life schedules, and low access to transportation options.

Why do people use forums to learn?

Learning a programming language for the first time can feel intimidating for novices (Bonar & Soloway, 1983), and online resources offer beginners protection from the ‘presumption of expertise’ especially prevalent in face-to-face computer science classes (Barker et al., 2002), (Guzdial, 2014a). Online resources also provide them an environment where they can explore, experiment, and still ‘save face’ when mistakes are made (Guzdial, 2014a). By being interactive, and frequently playful, online resources are engaging and motivating, and many offer just-in-time guidance alongside entertaining projects. In addition, the social element built into many of these sites can help learners overcome the ‘culture of isolation’ which is a major source of discouragement for students considering computer science (Lewis et al., 2011).

Because forums for learning programming are complex communities of aspirants, learners, practitioners, and professionals who self-organize to share resources, answer questions, socialize, and offer encouragement, data from the two most popular online discussion forums in the United States was examined: Reddit and Stack Overflow. Reddit is the most popular discussion forum on the Internet, and the 10th most popular site overall with over 160 million monthly visitors (Alexa Internet Inc, 11

2015; Isaac, 2015). Stack Overflow is the 2nd most popular discussion forum and 42nd overall most popular site (Alexa Internet Inc, 2015).

The two communities differ in their focus. Reddit is “a place where, on a typical day, millions of users read and share content in over 9,600 self-contained communities called “subreddits” (Chen, 2015; Isaac, 2015; Lynch & Swearingen, 2015). Each of these subreddits has its own topic, culture, norms, in-jokes, and moderators.

This work focused on the ‘learn programming’ subreddit (http://www.reddit.com/ r/learnprogramming) as it is the most relevant to the topic at hand (following Reddit convention, and to make for easier reading, the Reddit subreddit ‘learnprogramming’ will be referred to the as ‘/r/learnprogramming’ for the remainder of this work). While most subreddits are open for the public to read, many require users to ‘subscribe’ before posting or commenting on a thread; /r/learnprogramming has over 212,000 subscribers (as of the writing of this work).

Stack Overflow has 32 million monthly visitors and 4.5 million registered users, with the average person visiting Stack Overflow returning six times a month (, 2015). Stack Overflow is self-described as “a question and answer site for professional and enthusiast programmers” with 9.8 million questions, 74% of which have answers (Stack Exchange, 2015, n.d.-a). Its format and mechanics have have emerged as the standard to follow, with over 80 other question and answer sites emulating it (Anderson, Huttenlocher, Kleinberg, & Leskovec, 2012). Stack Overflow’s role in the community of programmers can be summarized by a quote on their annual survey: “Code is everywhere, and just about every coder uses Stack 12

Overflow.” (Stack Exchange, 2015)

An additional benefit of this study’s focus on online forums is their dy- namic, unique combination of scale and specialization, and the authentic role they play (Shaffer & Resnick, 1999). The top three motivations for participating on Reddit.com, as reported by its users, are entertainment, curiosity, and information quantity. Information consumption comes in seventh. Each subreddit has its own rules about posting, commenting, and what content is considered relevant. Most Reddit communities, however, contain quite a bit of playfulness, joking, and social- ization alongside the more on-topic discussions and debates (Massanari, 2013). Stack Overflow, on the other hand, is clear about its focus: “This site is all about getting answers. It’s not a discussion forum. There’s no chit-chat” and later, “Avoid questions that are primarily opinion-based, or that are likely to generate discussion rather than answers” (Stack Exchange, n.d.-b) . Its moderators actively and vigorously enforce the quality of the questions asked. Between moderator deletions and self-deletions (users often delete posts which are negatively affecting their reputation) up to eight percent of questions are removed (Correa & Sureka, 2014). Therefore, studying the data from the two sites gives a useful blend of how people use forums for both informative and social purposes.

By virtue of being publicly available message boards, it was possible to get a wealth of data (a total of 10,943 links from the two sites) offering a detailed picture of these interest-driven groups. The size of the community creating these posts is also significant; theses two sites had a combined audience of over 4.7 million 13 registered users, in addition to many unregistered readers. These millions of users have self-selected as interested in learning more about computer programming and are participating in the forums as part of their authentic activity.

Reddit’s audience comes from over 195 different countries (Singer, Flöck, Meinhart, Zeitfogel, & Strohmaier, 2014), however, surveys put between 82.8% and 85% of Reddit users in the US, Canada, the UK, and Australia (Bogers & Wernersen, 2014; Reddit.com, n.d.-e). The demographics are skewed young and male, with “Some 15% of male Internet users ages 18-29 say that they use Reddit, compared with 5% of women in the same age range and 8% of men ages 30-49” (Duggan & Smith, 2013). However, other surveys found gender ratios of 49% female and 51% male (Bogers & Wernersen, 2014) or 18.9% and 81.1% male (Reddit.com, n.d.-e).

In comparison, the population of the Stack Overflow community comes from over 157 countries, with a survey putting top user origins at 25% United States, 12.5% India, 5.5% UK, and 4.2% Germany (Stack Exchange, 2015). The mean age of a Stack Overflow user is between 28.9 and 29.02 (with a standard deviation of 7) (Stack Exchange, 2015; Morrison & Murphy-Hill, 2013). The gender diversity is low, with 92.1% of survey takers being male (Stack Exchange, 2015), but with 37% of women on Stack Overflow having only two years of experience or fewer, and 67% having fewer than five years, women who are entering the field are using the site. Between 13.6% and 15.9% of Stack Overflow users are students, and fully 41.8% of users report being self-taught software developers.

Literature Review 14

Figure 3: How the developers on Stack Overflow learned to code (Stack Exchange, 2015)

Stack Overflow and /r/learnprogramming are structured around a group of central participants in the model of legitimate peripheral participation (Lave & Wenger, 1992). This is demonstrated by users who move into increasing circles of participation: on Reddit 70% of users are active voters, 37% are active commenters and 12% are active posters (Bogers & Wernersen, 2014; Guzdial, 2014a; Lave & Wenger, 1992). The most centrally active group of users is the volunteer moderators, upon which each site depends, although how they are managed differs between Stack Overflow and Reddit. The former gives users increasing levels of power as they accumulate reputation points. At the top are 18 democratically elected moderators 15 who pledge to follow a detailed job description, including a ‘philosophy of moderation’ and a formal ‘moderating agreement’ (Stack Exchange, n.d.-c). Stack Overflow summarizes it thusly: “At the high end of this reputation spectrum there is little difference between users with high reputation and moderators. That is intentional. We don’t run this site. The community does.” (Stack Exchange, n.d.-c).

Reddit, on the other hand, grants moderator status to anyone who founds a subreddit (or anyone the founder subsequently designates) for that subreddit only. Other than five basic ‘rules of reddit’1, moderators have complete prerogative within their subreddit. At its best, this policy fosters the culture which has made it so popular: self-governing communities which champion a free exchange of opinions within. At its worst, however, “Reddit is governed, insofar as it’s governed at all, by a cabal of high-powered moderators who coordinate with administrators in private forums. . . ..A convoluted moderator culture has developed, full of intrigue and drama.” (Chen, 2015).

Each site has a mechanic which numerically tracks a user’s reputation to rank certain types of contributions according to what value they add to the community and how closely they adhere to community guidelines. Stack Overflow says, “Reputation is a rough measurement of how much the community trusts you; it is earned by convincing your peers that you know what you’re talking about” (Stack Exchange, n.d.-c), while Reddit describes their ‘karma’ as “reflect[ing] how much good

11) Don’t spam 2) Don’t ask for votes or engage in vote manipulation. 3) Don’t post personal information. 4) No child pornography or sexually suggestive content featuring minors. 5) Don’t break the site or do anything that interferes with normal use of the site. (Reddit.com, n.d.-d) 16 the user has done for the reddit community” (Reddit.com, n.d.-a). This benefits the community both by encouraging the creation of high value content and making this higher-valued content easier to find. The reputation point system often takes on game-like behavior (Jenkins, 2009; Massanari, 2013). The Reddit Frequently Asked Questions answers “Why should I try to accumulate karma?” with “Why should you try to score points in a video game?” (Reddit.com, n.d.-a).

Methodology

Both communities are free, anonymous, and allow the public to browse, only requiring registration to post. Discovering which sites for learning programming are most often linked to and discussed on Reddit and Stack Overflow required slightly different techniques.

Reddit.com has an application programming interface (API) which allows programs to be written to request up to 1000 posts which meet the program’s specifications. I wrote a Python program using PRAW (Boe, n.d.) to help make the maximum request for posts, and all their related comments, from /r/learnprogramming on June 19th, 2015. This request returned 78,561 posts containing 9,526 URLs.

These posts came from the ‘Top’ filtered posts, which are the posts with the most ‘up’ votes from the /r/reddit community. The Reddit voting mechanic allows community members to vote content ‘up’ or ‘down’, with the cumulative score of a post or comment determining its visibility. Reddit was originally envisioned as a site to “capture and rank all kinds of diverse content collected from the Web by 17

promoting the best parts via its voting process.” (Singer et al., 2014). By looking at the ‘Top’ voted posts on /r/learnprogramming, we are able to see which resources across the Internet are of the greatest interest to the community.

Stack Overflow.com releases its data weekly to http://data.stackexchange.com/Stack Overflow/queries. The data for this work was retrieved on April 14, 2015 and so drew from all the posts on Stack Overflow from April 6th and earlier. Stack Overflow answers all manner of programming questions, but to make the best comparison with /r/learnprogramming, an effort was made to get posts which were focused on beginning to learn programming. Therefore, Structured Query Language (SQL) was used to filter these posts. After some experimentation, it was discovered that the best SQL query to get relevant data was:

select Id as [Post Link], Body, Score from Posts where (Title like ’\%get started\%’ OR Title like ’\%learn\%’) and ParentId is null

which retrieved the text and score of posts and comments where the original question had the phrase ‘get started’ or ‘learn’ in it. 3,087 posts were returned, which contained 1,276 URLs.

Another Python script was used to count the number of hostnames (for the full URL en.wikipedia.org/wiki/AlanTuring, the hostname would be ‘wikipedia’), both as clickable links and as text mentions in the Stack Overflow and Reddit data. The scripts differed only to accommodate differences in the input data; the Stack Overflow script had a .csv (‘comma separated value’) file from the Stack Exchange 18 database as its input, while the Reddit script had to make a request directly to Reddit and create its .csv from the results. Both input data files were cleaned to remove punctuation, capitalization, and other artifacts. In addition, the text ‘stack overflow’ was replaced with ‘stackoverflow’ to make counting easier. With 78,561 posts from Reddit and 3,087 from Stack Overflow, I used the specially formatted piece of ‘regular expression’ code below to automatically sort (Free Software Foundation Inc., n.d.) through the posts and look for text formatted like a URL: egrep -o "http://([A-Za-z0-9-]+\.)+[A-Za-z]+(/)*([[:alnum:]/._])*"

The results were saved to a file. Next, the Python library tldextract (Kurkowski, n.d.) was used to isolate the hostname. A limitation of this method is that a link would get counted twice if it had been placed in HTML or Mark- down formatting and then typed in plain text (this is usually done as a way of indicating exactly where the link would go). This was, however, not found to occur frequently enough to have significant effect on the overall counts. A sec- ondary limitation is the possibility that sites with the same hostname but differ- ent top level domains would be combined, such as www.codeacademy.com and www.codeacademy.org both being counted as ‘codeacademy’. Except for the example given, however, no evidence was found of this occurring. Each link was counted indi- vidually (i.e., if one comment had three links, all three were counted). Different links which had the same hostname, i.e., https://en.wikipedia.org/wiki/AdaLovelace and https://en.wikipedia.org/wiki/GraceHopper were both counted under the ‘wikipedia’ hostname. 19

After all the links were counted, my program made a second pass through the data to count when the hostnames were used in a sentence, as it was not uncommon for someone to discuss a resource without providing a link, i.e., “I found the wikipedia article to be helpful”. These mentions were counted by comment - for example, if someone talks about ’wikipedia’ three times in a single comment, it is counted as a single mention. In the situation where a is so well known that it is discussed without any links being provided, my program also looked for the 10 language specific resources (RubyMonk, TryRuby, Hackety Hack, Codecademy, Codeacademy, Eloquent JavaScript, CaveOfProgramming, Udemy, Try Python, LearnPython, Crunchy) listed by the regularily updated /r/learnprogramming FAQ ‘How do I get started with programming’ in the ‘online resoures’ section (Reddit.com, n.d.-c) as well as the three resources (Coursera, Udacity and EdX) listed in the ‘Interactive online courses’ section. This seeding included ‘CodeAcademy’, a common misspelling of ‘Codecademy’ (although ‘CodeAcademy’ was the name of a coding boot camp briefly in 2011-2012, it did not appear any of the discussion referred to it)(Internet Archive, n.d.; Google Trends, n.d.). These sites were selected for seeding as they were the sites prominently featured in the FAQ and were thus viewed as most likely to be well enough known to the community to have high ‘mention’ counts, yet possibly have no links. This premise was born out by the data, summarized in Figure 4:

For the ‘mentions’ data, hostnames which were also common English words were counted disproportionately, so that words which were among the top 1000 most common English words were removed (Wikitionary, n.d.). The highest ranking of these included ‘example’, ‘examples’, ‘mean’, ‘free’, and ‘bit’ for Stack Overflow 20

Figure 4: Sites from the Reddit learnprogramming FAQ which were used to seed the search for mentions and ‘i’, ‘up’, ‘good’, ‘go’, ‘might’, ‘bit’, ‘example’, ‘free’, ‘us’, ‘play’ ‘stack’, ‘study’, ‘about’, and ‘time’ for Reddit. On both sites ‘apple’ and ‘sun’ was retained because they are also the names of technical companies. Since Scikit-learn posts, collected due to the word ’learn’ in the title, were not relevant to learning programming, they were removed from the StackOverflow data. ‘localhost’ was also removed as it does not refer to an external resource. The results, as charts, are shown below; the full dataset (and the code used to scrape it) is available on GitHub at https://github.com/carolinehardin/learnProgrammingByForums and is released under a creative commons attribution, non-commercial, share-alike license.

These methods result in the following charts: 21

Figure 5: Stack Overflow most linked sites 22

Figure 6: Reddit most linked sites 23

Figure 7: Stack Overflow most linked sites by percent 24

Figure 8: Reddit most linked sites by percent 25

Figure 9: Stack overflow most mentioned sites 26

Figure 10: Reddit most mentioned sites

It can be seen that the ‘mentions’ count shows a number of sites which have low ‘link’ counts - a humorous example is ‘damn.com’ for which it is safe to assume the 420 mentions on /r/learnprogramming are not actually referring to the site. In some cases, this is interesting, such as for Coursera, which demonstrates that the name is so familiar, or considered so Googleable, that links are not deemed 27 necessary to discussing the sites. In other cases, such as ‘answers’ with its one link to www.answers.com, it demonstrates the difficulty of using an automated tool to understand how people talk about sites with names that are also regular English words. To correct for this, any sites where the ‘link’ count was two or less was removed to produce Figure 11 and FIgure 12. The threshold of two was chosen due to the possibility of a single comment getting a double count if it used the link both in text and in HTML or Markdown (as discussed earlier), and the desire to draw out sites which have sufficient significance to be discussed by at least two different posts.

Figure 11: Reddit most mentioned sites for sites with at least 3 ’links’ 28

Figure 12: Stack Overflow most mentioned sites for sites with at least 3 ’links’

Comparing the top ten linked sites from Reddit and Stack Overflow gives us Figure 13: 29

Figure 13: Reddit and Stack Overflow top ten most linked sites

A direct comparison between the ‘mentions’ of Stack Overflow and Reddit for percent of ‘mentions’ does not result in a useful chart due to the differences in which sites meet the minimum threshold of three links.

Analysis and Results

The results of a close analysis of these links to resources, and the discussion around these resources, are of use to several audiences in a number of ways, some of which will be explored below. Students of programming can get ideas for resources to use beyond what they find in Google searches or popular press articles and benefit from an understanding of how these resources are used by other learners and professionals. Learning about the differences between which links are posted on the two sites can 30 help learners understand the two communities and help inform them on how they will get the most benefit out of these sites.

For those who teach CS, and in particular, programming, it can be useful to know which questions come up the most frequently on these forums (and which answers are best regarded), thus giving them insight when addressing the misconceptions of their students. In addition, since sites like Stack Overflow and Reddit play a significant and authentic role in the practices of professional programmers, knowing the highest ranked content on these sites can help instructors integrate this content into their classroom teaching as part of enculturating their students.

Finally, creators, moderators, and participants of Stack Overflow, Reddit, and other online resources for learning CS can benefit from seeing the dynamics this data highlights. Understanding which resources have the attention of these communities of online learners can help content creators increase the appeal of their own sites, find community needs for content which has not yet been created, and possibly better understand which populations they are (and are not) serving as they make progress towards solving diversity issues.

To understand these results fully, it is helpful to look at the typical post titles from the two data sets. Examples of the ‘top’ threads on /r/learnprogramming are:

• I’m 32 years old, and just started my first full-time job as a developer. One year ago my programming knowledge was basically nil. Everything I learned, I found via /r/learnprogramming, so just wanted to share my experience.

• 40 Key Computer Science Concepts Explained In Layman’s Terms 31

• Here’s a list of 120 free online programming/CS courses (MOOCs) with feed- back(i.e. exams/homeworks/assignments) that you can start this month (Jan 2015)

• How I learned to develop Android apps in less than a year

• 1000+ Beginner Programming Projects

In contrast, on Stack Overflow the top ranked questions are:

• How do I get started with Node.js

• Best resources to learn JavaScript

• Good Haskell source to read and learn from

• What are important languages to learn to understand different approaches and concepts?

• Where can I learn jQuery? Is it worth it?

As a sign of the culture of Stack Overflow, 4 of those 5 top questions, each with over 200,000 views, were flagged by the moderators as:

“This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here.”

and the 4th one, which is a fundamental question for someone who wants to learn programming, was flagged as: 32

“As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.If this question can be reworded to fit the rules in the help center, please edit the question.”

That same question, on /r/learnprogramming, is not only welcomed, it is the #2 question on the ‘Frequently Asked Question’ page, where they begin the answer with a discussion of the misconceptions people bring to the question and follow with links to helpful resources.

Therefore, the data gathered from these posts is valuable in many ways for better understanding of how online communities of programming learners think about what is important to know and interact with the available resources on the Internet. Another pattern the graphs highlight is how few sites dominate the conversation. For Reddit, 9% of the unique hostnames represented 70% of all links posted. Stack Overflow had a wider diversity of links, but was still heavily weighted towards just a few sites, with 26% of hostnames representing 70% of all URLs. It can be seen in the chart how sharply the link count drops off. This shows that the discourse in technical forums tends to focus on a relatively few sites.

But the sites which dominate the conversation on forums for learning programming are strikingly different from those highlighted in popular media. The 33

New York Times, for example, recommends Codecademy, Lightbot, Hopscotch & Udacity, sites whose paucity of links can be seen in Chart 1 (Eaton, 2014). A Google search for “best sites to learn programming” returns articles with long lists of recommended sites, such as “Top 10 Websites to Learn Coding (Interactively) Online” listing Codecademy, Code Avengers, Code School, Treehouse, Learn Street, Udacity, CodeHS, Kahn Academy, Scratch, and SQLZoo (Falcon, n.d.). Other sites with long lists of resources to learn programming (126 resources are listed on ‘The Best Websites to Learn Coding Online’ alone) are similar in how little their lists resemble the top posted links on Reddit and Stack Overflow (Agarwal, 2015). Considering the amount of money and attention being given to new websites that promise to teach computer science, these results indicate that there is little correlation between the money spent and acceptance by the community of online learners, as indicated by how little attention they receive in the everyday discourse of technical forums.

So while Reddit has a higher post-to-link ratio (of 8.25) than Stack Overflow (2.41), if we dig deeper we see that a full 19.9% of Stack Overflow’s links are back to itself, compared to only 17.6% on Reddit. In fact, Reddit was originally designed to be a social portal site which linked to other resources: “The majority of submissions has traditionally been links to external content, as the initial idea of the platform was solely to share links and vote on them.” (Singer et al., 2014). Stack Overflow, on the other hand, makes it a community priority to answer technical questions fully on it’s own site and avoid repeating questions by referring any question seen as duplicated to the existing answer, thereby making links back to other Stack Overflow articles the most prevalent links. This same emphasis on streamlining is not evident on Reddit, 34 where 52% of most popular links receive little attention the first time they are posted, and the average number of times each hostname was posted was 4.52, compared to 3.09 for Stack Overflow (Gilbert, 2013). Beginners sometimes struggle with this efficiency of Stack Overflow, as indicated by this quote from /r/learnprogramming:

“I’m aspiring programmer and . . . I give up with using stackoverflow for this reasons. I’m new to Java and asked probably silly simple question over there. Soon my question was marked as already answered with links to other topics, where subject was connected to my question, but with my low knowledge I wouldn’t be able to find it.”[sic]

While a more experienced programmers perspective is:

“I’d add that your perspective on respecting experts’ time is why Stack Overflow is regarded as elitist, and also why that view is not universally correct. There are bound to be examples of elitism there, of course, but the broad problem is that (some) beginners seem to want more experienced coders to work for free for them. Those folks - not all beginners, I should emphasise - seem to think the existence of a helping community means they are owed an answer, and do not have to undertake any learning or hard work themselves.”

We can also see how Stack Overflow is more strictly fact and answer-based and Reddit forefronts socialization and playfulness by comparing the data in the 35 charts, where a large number of links on /r/learnprogramming (32.5% of all links) are to Imgur, an image hosting site which was designed for Reddit and is often used by the Reddit community to post humorous pictures (Singer et al., 2014). Stack Overflow has much fewer Imgur links (only 3.56% of all links, which is an indications that Stack Overflow is a more strictly on-topic community. Some of this difference may also be explained by Stack Overflow having its own internal hosting for images directly within posts, and some Reddit users did use Imgur for more on-topic screenshots. The playful and social dimensions of Reddit are also demonstrated by the high rank of sites such as ‘programarcadegames’ and ‘xkcd’ (a web comic which frequently features technical topics. The comic is often difficult to understand for beginners (see example below), thus generating the entries for ‘explainxkcd’ and ‘xkcdref’). Stack Overflow, on the other hand, has no links to xkcd and only one mention of it. Another high ranking site on the list of Reddit links is CodingHorror.com (created by the co-founder of Stack Overflow, incidentally) which is a blog that focuses more on the culture of working in the software industry than on giving answers specific questions.

Figure 14: The most linked XKCD comic on /r/learnprogramming is https://xkcd.com/327/ (tied with http://xkcd.com/292/) 36

Stack Overflow is commonly the first resort for people with specific pro- gramming questions, as exemplified by this quote from /r/learnprogramming:

“I love when searching something about C# on the internet. You can see lots of different approaches while people try to hammer each other to get the absolute best. You know where i am talking about right now, The Mighty Stackoverflow. As many of you know, when you have a question about something with programming you probably come across with stackoverflow.”[sic]

Although this strategy does have its detractors, and is sometimes considered crutch for:

“... people who are literally clueless about development. People who have a problem, Google for the solution, and copy+paste the result from Stack Overflow. They keep doing this, until eventually one component breaks. And because they had no idea how the code worked in the first place, they can’t fix it.”

And despite Stack Overflow’s reputation as the place to go for answers, the /r/learnprogramming community sees itself as an important counterbalance to Stack Overflow:

“There should be only 3 types of people on this subreddit IMO [in my opinion]. 1) people who need help or direction. 2) People wanting to help others and grow a better community 3) The curious. I’ll admit there are 37

some things posted that are ridiculous but I am pretty sure we have a back button for a reason. Elitists do not belong in a learning environment. This is not stack overflow....”

And

“the specificity of this sub[reddit] is what can sometimes prevent people from getting much out of it. Of course, most questions benefit the OP [original poster], but many of these cases the questions would be answered in a more detailed manner on stackoverflow.

With this in mind, however, offering a platform for questions on reddit can be better suited to people in that they find it more welcoming; I know when I first started out, sites like stackoverflow were very intimidating, so it’s good that at least some people can get help here.”

This suggests that beginner programmers would benefit from an awareness of the dominant culture of Stack Overflow in order to not be discouraged. The /r/learnprogramming community coaches their users on how to use Stack Overflow:

“Do all the research you can before you ask a question. The people on Stack Overflow can be brutal.”

And 38

“Stack Overflow has a *lot* of elitists. There are tons of people there who will ridicule you for being new, or not knowing absolutely everything about some subject. But don’t let this scare you. Usually, someone will give you a thorough explanation with examples and advice. These are so invaluable, it makes it worth dealing with the assholes that will inevitably show up.”

If their questions don’t meet the strict Stack Overflow criteria, beginners would also benefit from knowing about /r/learnprogramming or alternative sites on the Stack Exchange network such as the separate ‘programmers’, ‘code review’, or ‘computer science’ lists. In addition, people making other question and answer forums for learning programming would do well to be aware of the different cultures which have come up around /r/learnprogramming and Stack Overflow.

While the /r/learnprogramming community may both defer to, and be wary of, the Stack Overflow community, they use it a lot. It is the 6th most linked site and the 15th most mentioned site (for sites with at least three links) on /r/learnprogramming. The regard is not returned, however. Stack Overflow does not link to Reddit even once in the data gathered. It is used in text twice; both of these uses are in discussions about Reddit’s technical design from people working on setting up their own sites.

Another example of the interesting patterns which this method of gathering and analyzing data highlights is the position of links to ‘w3schools’. While it makes 39 the top 10 most liked sites for Stack Overflow (at 9th position), it’s position in the 19th spot for Reddit shows that the Reddit community discusses it significantly less. And in fact, the /r/learnprogramming wiki page which offers basic guidelines for the community has a section titled ‘Discouraged Resources’ (“Some material does more harm than good. Here’s some often-recommended offenders”) with W3schools topping the list (Reddit.com, n.d.-b).

Finally, despite the vast number of and the community’s clear enthusiasm for online resources, traditional resources such as books still play a major role in the conversation at Reddit, as evidenced by Amazon.com being the 4th most linked and 12th most mentioned site (for sites with at least three links). Stack Overflow, on the other hand, has Amazon at a dismal 75th most linked, with only two links (and only one referring to the book section of Amazon). Besides being a further indication that Reddit is happy to serve as a director to other resources while Stack Overflow attempts more to answer the questions without redirection, this type of information is of use to teachers and learners. By analyzing these links to Amazon closer, it is possible to say which books are most discussed on /r/learnprogramming. The top six are listed in Figure 14. 40

Figure 15: Amazon.com books most linked to from/r/learnprogramming

Limitations and Future work

One limitation on counting ‘mentions’ is that it is less useful to count the number of text appearances for resources which are also the name of something other than a website. An example of this in the chart is ‘code’ having a high mention count when it is clear that few of these are referring to the website ‘code.com’. Other examples include ‘java’, ‘php’, ‘python’ ‘asp’, ‘android’, etc. When ‘python’ is used in a sentence, for example, it may indeed refer to the Python website as a resource, or it may refer to the programming language. It would not be trivial to write an automated way to disambiguate which uses are strictly references to the website, so for these terms, the data on text appearances is ambiguous.

An additional ambiguity is the valence of resource mentions - which means it is not possible with this data to say that a particular resource is recommended, only that it was discussed. It is likely that some resources with high link counts are 41 the result of the volume of criticism which the site received. A future work could capture and analyse these sentiments, which would help inform learners, teachers and content creators about which resources are best liked.

It would be possible to gather this data periodically in order to compare it across time. Doing a test scrape of the most recent posts on Reddit presents intriguing contrasts to the ‘top’ results (see chart below). Sites which may represent emerging trends, such as bogotobogo (a site only a year old) and the even newer site ‘freecodecamp’, are present on ‘Recent’ but absent from ‘top’. Work showing how valuable this is for looking at overall trends among keywords in text has been done for Stack Overflow (Barua, Thomas, & Hassan, 2012). 42

Figure 16: Reddit ’Recent’ most linked sites

Comparing these results to links on Reddit ‘top’ also draws out a number of resources, such as ‘pastebin’ and ‘jsfiddle’, which are also much higher ranked on ‘recent’ than ‘top’ and indicates that they play a larger role in the everyday practice of learners without drawing the attention which would get them included on the long lists of resources which dominate the ‘top’ lists. Results such as these could be useful 43 to highlight for learners.

Another reason it is valuable to know more about these online communities of practice for learning programming is that we cannot make the assumption that if we build it, diversity will come. Fisher and Margolis (2002) outline the many ways in which existing resources for learning computer science fail to engage a more diverse audience, partially due to a lack of access. While a focus on these online resources may help ameliorate these access issues, the design of many of these resources is, intentionally or not, targeted towards young white men. The data shows how poorly the sites oriented towards helping women fared. A special run of the python programs, this time seeded for 17 resources2 for learning to program which target women, had dismal results. Stack Overflow had none of these links with only a single mention of the site ‘RailsBridge’. Reddit had links to only six of them. Three of these links were from a single post about the gender gap in tech. It was promising to see five people discuss HackBright Academy, however, one user said about ‘girldevelopit’, “it’s mainly for girls and if you are a male you can you know lie a little but if you are a female than I don’t see no problem (-;” [sic]

Implications & Conclusion

Due to the access shortage of computer science education, and the long term impact this has on potential CS students, it is critical to foster a deeper

2Ada Developers Academy, Hackbright Academy, Skillcrush, Codebar.io, Code First:Girls, CodeChix, Girl Develop It, Ladies Learning Code, PyLadies, Rails Girls, RailsBridge, Women’s Coding Collective, App Camp for Girls, Black Girls Code, Girls Learning Code, Girls Who Code, TechGirlz. List from (Bradford, n.d.) 44 understanding of alternative modes of learning CS. A particularly robust community of online learners is centered around programming, and this work sought to answer the research question, "How do online programming forums function as learning resources?". By analyzing two premier Internet forums to count up the links, insight was gained into the research sub-questions of:

• How do the online resources suggested by the Reddit and StackOverflow com- munities differ?

• What does that difference suggest about how people learn to program with these resources?

With 41% of Stack Overflow posts and 12% of Reddit posts containing links, online resources are demonstrated to have a significant presence in the discourse around learning programming. From this data it is possible to draw out a wide variety of useful and interesting information which is of benefit to learners, teachers, and computer science resource creators. Counting links allows us to know the different cultures of the sites so as to make the best use of them, to uncover which resources are getting the most attention, and to reveal which resources are not represented. This information is hidden in thousands of posts and contrasts sharply with what can be known by reading popular press articles, doing Google searches, or even reading the FAQ on the forums themselves.

So while the United States is experiencing dramatically increasing interest in learning computer science concepts, our traditional formal education options are 45 not keeping pace. Extending the promise of a computer science education to everyone who is interested will require the use of free, interactive, and social online resources. A great number of such resources exist, yet only a few have enjoyed widespread publicity or are able to raise funding, and our current understanding of these resources and how online communities use them is limited. Whether the resources with the best brand awareness in the general public are those which are best regarded by current students and professionals of computer science is an open question, but the data in this work suggests that this is not at all the case. Gathering and analyzing data from these forums is an important step towards better leveraging and extending the wisdom of their users for the benefit of everyone interested in computer science education. References 46

References

Agarwal, A. (2015). The Best Websites to Learn How to Write Code. Retrieved

2015-07-07, from http://www.labnol.org/internet/learn-coding-online/ 28537/ Alexa Internet Inc. (2015). Top Sites in US. Retrieved 2015-07-07, from http:// www.alexa.com/topsites/countries/US Anderson, A., Huttenlocher, D., Kleinberg, J., & Leskovec, J. (2012, August). Discovering value from community activity on focused question answering sites. In Proceedings of the 18th acm sigkdd international conference on knowledge discovery and data mining - kdd ’12 (p. 850). New York, New York, USA:

ACM Press. Retrieved from http://dl.acm.org/citation.cfm?id=2339530 .2339665 doi: 10.1145/2339530.2339665 Barker, L. J., Garvin-Doxas, K., & Jackson, M. (2002). Defensive climate in the computer science classroom. SIGCSE Bull., 34 (1), 43–47. Retrieved from

http://doi.acm.org/10.1145/563517.563354 doi: 10.1145/563517.563354 Barua, A., Thomas, S. W., & Hassan, A. E. (2012). What are developers talking about? An analysis of topics and trends in Stack Overflow. doi: 10.1007/ s10664-012-9231-y Boe, B. (n.d.). PRAW: The Python Reddit Api Wrapper — PRAW 3.1.0 docu-

mentation. Retrieved 2015-07-15, from https://praw.readthedocs.org/en/ v3.1.0/ Bogers, T., & Wernersen, R. (2014, March). How ‘Social’ are Social News Sites? Exploring the Motivations for Using Reddit.com. In iconference 2014 proceedings. References 47

iSchools. Retrieved from https://www.ideals.illinois.edu/handle/2142/ 47295 doi: 10.9776/14108 Bonar, J., & Soloway, E. (1983). Uncovering principles of novice programming. Proceedings of the 10th ACM SIGACTSIGPLAN symposium on Principles

of programming languages, 10–13. Retrieved from http://portal.acm.org/ citation.cfm?id=567069 doi: 10.1145/567067.567069 Bradford, L. (n.d.). 17 Places Where Women Can Learn How to Code.

Retrieved 2015-07-29, from http://learntocodewith.me/posts/13-places -women-learn-code/ Chen, A. (2015, July). When the Internet’s ‘Moderators’ Are Anything But -

The New York Times. Retrieved from http://www.nytimes.com/2015/07/26/ magazine/when-the-internets-moderators-are-anything-but.html Code.org Team. (2015). An Hour of Code for Every Student | Indiegogo. Retrieved

2015-07-07, from https://www.indiegogo.com/projects/an-hour-of-code -for-every-student\#/story Correa, D., & Sureka, A. (2014, April). Chaff from the wheat. In Proceedings of the 23rd international conference on world wide web - www ’14 (pp. 631–642).

New York, New York, USA: ACM Press. Retrieved from http://dl.acm.org/ citation.cfm?id=2566486.2568036 doi: 10.1145/2566486.2568036 Duggan, M., & Smith, A. (2013). 6% of Online Adults are reddit Users. Pew Internet & American Life Project, 3 . Dweck, C. (2006). Mindset: The new psychology of success (Vol. 19). Ballan-

tine Books. Retrieved from http://www.amazon.com/Mindset-Psychology References 48

-Success-Carol-Dweck/dp/0345472322 doi: 10.5860/CHOICE.44-2397 Eaton, K. (2014, August). Programming Apps Teach the Basics of Code - The

New York Times. Retrieved from http://www.nytimes.com/2014/08/28/ technology/personaltech/get-cracking-on-learning-computer-code .html?smid=tw-share Ericson, B. (2014). Detailed data on pass rates, race, and gender for 2013. Retrieved

2015-07-07, from http://home.cc.gatech.edu/ice-gt/556 Falcon, A. (n.d.). Top 10 Websites to Learn Coding (Interactively) Online. Retrieved

2015-07-07, from http://www.hongkiat.com/blog/sites-to-learn-coding -online/ Free Software Foundation Inc. (n.d.). GNU Grep 2.21. Retrieved 2015-08-01, from

http://www.gnu.org/savannah-checkouts/gnu/grep/manual/grep.html Gilbert, E. (2013, February). Widespread underprovision on Reddit. In Proceedings of the 2013 conference on computer supported cooperative work - cscw ’13 (p. 803).

New York, New York, USA: ACM Press. Retrieved from http://dl.acm.org/ citation.cfm?id=2441776.2441866 doi: 10.1145/2441776.2441866 Google Trends. (n.d.). Web Search interest: codeacademy, codecademy - Worldwide,

2004 - present. Retrieved 2015-07-07, from https://www.google.com/trends/ explore\#q=codeacademy,codecademy\&cmpt=q\&tz=Etc/GMT+5 GuideStar. (2015). GuideStar Exchange Reports for Code.org. Retrieved 2015-07-

15, from http://www.guidestar.org/organizations/46-0858543/codeorg .aspx\#financials Guzdial, M. (2014a). Creating learning situations for adults Constructionism for References 49

Adults. ICER, 1–12. Guzdial, M. (2014b). Using Instructional Design Techniques to Create Distance CS Education to Support In-Service Teachers (Tech. Rep.). Internet Archive. (n.d.). Internet Archive Wayback Machine. Retrieved 2015-08-01,

from https://web.archive.org/web/*/http://codeacademy.org Isaac, M. (2015, July). Details Emerge About Victoria Taylor’s Dis- missal at Reddit - The New York Times. New York, New York,

USA. Retrieved from http://bits.blogs.nytimes.com/2015/07/13/ details-emerge-about-victoria-taylors-dismissal-at-reddit/ Jenkins, H. (2009). Confronting the Challenges of Participatory Culture : Media Education for the 21 Century. Program, 21 (1), 72. Re-

trieved from http://digitallearning.macfound.org/atf/cf/\{7E45C7E0 -A3E0-4B89-AC9C-E807E1B0AE4E\}/JENKINS\_WHITE\_PAPER.PDF doi: 10 .1108/eb046280

Kurkowski, J. (n.d.). tldextract. Retrieved 2015-01-01, from https://github.com/ john-kurkowski/tldextract Lave, J., & Wenger, E. (1992). Legitimate Peripheral Participation.. Lee, Y.-H., Heeter, C., Magerko, B., & Medler, B. (2012). Gaming Mindsets: Implicit Theories in Serious Game Learning. Cyberpsychology, Behavior, and Social Networking, 15 (4), 190–194. doi: 10.1089/cyber.2011.0328 Lewis, C. M., Yasuhara, K., & Anderson, R. E. (2011). Deciding to major in Computer Science : A grounded theory of students’ self-assessment of ability. ICER ’11 Proceedings of the seventh international workshop on computing References 50

education research (2011), 3–10. Lohr, S. (2015). As Tech Booms, Workers Turn to Coding for Career Change -

The New York Times. Retrieved from http://www.nytimes.com/2015/07/29/ technology/code-academy-as-career-game-changer.html Lynch, B., & Swearingen, C. (2015, July). Why We Shut Down Reddit’s ‘Ask Me Anything’ Forum - The New York Times. New York, New York, USA. Re-

trieved from http://www.nytimes.com/2015/07/08/opinion/why-we-shut -down--ask-me-anything-forum.html Margolis, J., Ryoo, J. J., Moreno Sandoval, C. D., Lee, C., Goode, J., & Chapman, G. (2012). Beyond access: Broadening participation in high

school computer science. Inroads, 3 (4), 72–78. Retrieved from http:// delivery.acm.org/10.1145/2390000/2381102/p72-margolis.pdf?ip=128 .97.244.96\&acc=ACTIVESERVICE\&CFID=222475124\&CFTOKEN=65986042\ &\_\_acm\_\_=1354898096\_3eec5d6f5129742b9c45549002c54a40 doi: 10.1145/2381083.2381102 Massanari, A. (2013, October). Playful Participatory Culture: Learning from

Reddit. Selected Papers of Internet Research, 3 . Retrieved from http:// spir.aoir.org/index.php/spir/article/view/803 Mechaber, E. (2014). President Obama Is the First President to Write a Line of Code |

The White House. Retrieved 2015-07-07, from https://www.whitehouse.gov/ blog/2014/12/10/president-obama-first-president-write-line-code Morrison, P., & Murphy-Hill, E. (2013, May). Is programming knowledge related to age? An exploration of stack overflow. In 2013 10th work- References 51

ing conference on mining software repositories (msr) (pp. 69–72). IEEE.

Retrieved from http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm ?arnumber=6624008 doi: 10.1109/MSR.2013.6624008 Reddit.com. (n.d.-a). faq - reddit.com. Retrieved 2015-07-26, from https://www .reddit.com/wiki/faq Reddit.com. (n.d.-b). /r/learnprogramming - Introduction. Retrieved 2015-

08-01, from https://www.reddit.com/r/learnprogramming/wiki/index\ #wiki\_discouraged\_resources Reddit.com. (n.d.-c). /r/learnprogramming - online resources. Retrieved 2015-08-01,

from https://www.reddit.com/r/learnprogramming/wiki/online Reddit.com. (n.d.-d). rules of reddit. Retrieved 2015-07-26, from https://www .reddit.com/rules Reddit.com. (n.d.-e). What’s new on reddit: Who in the World is reddit? Results

are in... Retrieved 2015-07-22, from http://www.redditblog.com/2011/09/ who-in-world-is-reddit-results-are-in.html Shaffer, D. W., & Resnick, M. (1999). "Thick" authenticity: New Media and Authentic Learning. Journal of Interactive Learning Research, 10 (2), 195–215.

Retrieved from http://eric.ed.gov/ERICWebPortal/custom/portlets/ recordDetails/detailmini.jsp?\_nfpb=true\&\_\&ERICExtSearch\ _SearchValue\_0=EJ591695\&ERICExtSearch\_SearchType\_0=no\ &accno=EJ591695 Singer, P., Flöck, F., Meinhart, C., Zeitfogel, E., & Strohmaier, M. (2014, April). Evolution of reddit: from the front page of the internet to a self-referential References 52

community? In (pp. 517–522). International World Wide Web Conferences

Steering Committee. Retrieved from http://dl.acm.org/citation.cfm?id= 2567948.2576943 doi: 10.1145/2567948.2576943 Stack Exchange. (n.d.-a). StackOverflow. Retrieved 2015-07-07, from http:// stackoverflow.com/ Stack Exchange. (n.d.-b). Tour - Stack Overflow. Retrieved 2015-07-23, from

http://stackoverflow.com/tour Stack Exchange. (n.d.-c). What is reputation? How do I earn (and lose) it? - Help

Center - Stack Overflow. Retrieved 2015-07-26, from http://stackoverflow .com/help/whats-reputation Stack Exchange. (2015). Stack Overflow Developer Survey 2015. Retrieved 2015-07-25,

from http://stackoverflow.com/research/developer-survey-2015 US Census Bureau. (n.d.). Data. Retrieved 2015-07-20, from http://www.census .gov/data.html US Department of Ed. (n.d.). Projections of Education Statistics to 2021. National

Center for Education Statistics. Retrieved 2015-07-21, from http://nces.ed .gov/programs/projections/projections2021/tables/table\_01.asp Wikitionary. (n.d.). Wiktionary:Most frequent 1000 words in English - Simple English

Wiktionary. Retrieved 2015-07-23, from https://simple.wiktionary.org/ wiki/Wiktionary:Most\_frequent\_1000\_words\_in\_English Wortham, J. (2012, March). A Surge in Learning the Language of the Internet - The New York Times. New York, NY. Retrieved

from http://www.nytimes.com/2012/03/28/technology/for-an-edge-on References 53

-the-internet-computer-code-gains-a-following.html?\_r=0