Quantifying Echo Chamber Behaviours on Reddit a Thesis
Total Page:16
File Type:pdf, Size:1020Kb
Echoing Within and Between: Quantifying Echo Chamber Behaviours on Reddit A thesis submitted to The University of Manchester for the degree of Doctor of Philosophy in the Faculty of Humanities 2020 Ella M. Guest School of Social Sciences, Social Statistics Contents 1 Introduction 13 1.1 Background and motivation................... 15 1.1.1 Social media echo chambers: fact of fiction?...... 15 1.1.2 Reddit: self-proclaimed ‘front page of the internet’. 17 1.2 Research aims........................... 21 1.2.1 Research questions.................... 22 1.2.2 Computational social science approach........ 25 1.3 Thesis outline........................... 27 2 Research Background 32 2.1 The emergence of social media ‘echo chambers’........ 33 2.1.1 The public sphere(s)................... 34 2.1.2 Reinforcing and avoiding................ 41 2.1.3 Echoing in networks................... 43 2.2 ‘The front page of the internet’................. 51 2.2.1 Subreddits: topic-based communities......... 53 2.2.2 Who is on Reddit?.................... 57 2.3 Topic interests on Reddit..................... 64 2.3.1 ‘Interest meta-communities’ as public sphericules.. 65 2.3.2 changemyview: the anti-echo chamber?........ 68 2 2.3.3 The Donald: a self-defined echo chamber....... 71 2.4 Chapter summary......................... 74 3 Data Collection & Preparation 76 3.1 Data collection........................... 76 3.1.1 Data quality........................ 78 3.1.2 Data access........................ 79 3.1.3 Overview of the data................... 82 3.2 Subsetting data.......................... 83 3.2.1 Subsetting authors.................... 86 3.3 Data Preparation......................... 88 3.3.1 Author-subreddit pairs................. 88 3.3.2 Subreddit similarity................... 90 3.4 Labelling subreddit topics.................... 96 3.4.1 Overview of topic frequencies............. 100 3.4.2 Description of the political subreddits......... 102 3.5 Chapter summary......................... 110 4 Subreddits as Echo Chambers 113 4.1 Defining chamberness ....................... 114 4.1.1 Within subreddit author participation......... 115 4.1.2 Between subreddit author participation........ 120 4.2 Within subreddit participation results............. 126 4.2.1 General trends...................... 126 4.2.2 The case study subreddits................ 131 4.3 Between subreddit participation results............ 142 4.3.1 General trends...................... 143 4.3.2 The case study subreddits................ 147 3 4.4 Chapter summary......................... 153 5 Networks of Echo Chamber 158 5.1 Defining meta echo chambers ................... 159 5.1.1 Measuring subreddit similarity............. 165 5.1.2 Defining chamberness similarity............ 168 5.2 A network of public sphericules................. 175 5.2.1 The subreddit network.................. 175 5.2.2 Detecting public sphericules.............. 179 5.2.3 Sphericules as chambers?................ 187 5.3 The geo-political sphericule................... 196 5.3.1 The most chamber-like edges?............. 197 5.3.2 Political meta echo chambers?............. 201 5.3.3 Political echoing on the left............... 205 5.4 Chapter summary......................... 215 6 Discussion 218 6.1 Addressing the research questions............... 219 6.1.1 Research question 1................... 220 6.1.2 Research question 2................... 223 6.1.3 Research questions 3 & 4................. 226 6.2 The case study subreddits.................... 229 6.2.1 The right.......................... 229 6.2.2 The neutral........................ 233 6.2.3 The left........................... 234 6.2.4 The anti echo chamber.................. 237 6.3 Non-political echoing....................... 239 6.3.1 The sports community.................. 239 4 6.3.2 The porn community................... 241 6.3.3 Homophily vs echoing.................. 245 6.4 Implications of the research................... 247 6.4.1 Wider theoretical implications............. 247 6.4.2 Strengths of the research................. 250 6.4.3 Limitations of the research................ 252 7 Conclusion 256 7.1 Summary of key findings.................... 256 7.2 Key contributions of the research................ 258 7.3 Future research.......................... 259 7.4 Concluding remarks....................... 261 References 263 A Appendix 282 A.1 Code repository.......................... 282 A.2 Subreddit topic distributions.................. 283 A.3 Political subreddit results.................... 283 A.4 Inter-community edge weights................. 283 A.5 Community subreddit topic counts............... 283 A.6 Closest neighbours of political subreddits........... 286 Word count: 61,102 5 List of Figures 2.1 Rates of Reddit use by US adults in 2019, data from (Pew Research Center 2019)...................... 59 3.1 Screenshot of BigQuery interface querying January 2019 Red- dit comment dataset....................... 81 3.2 Overall author and comment counts per subreddit...... 84 3.3 Cumulative frequency of author counts for top 1000 subreddits 85 3.4 Distribution of 21 most common subreddit topic labels... 101 3.5 Timeline of political subreddit creation............. 105 3.6 Author and comment counts for political subreddits..... 111 4.1 Frequency of median comment count per author for all sub- reddits............................... 127 4.2 Frequency of median author insubreddit proportion for all subreddits............................. 128 4.3 Scatterplot of median number of author out-comments vs in-comments for all subreddits................. 129 4.4 Heatmap of correlations between comment count, median comment count, and median insubreddit proportion for all subreddits............................. 130 4.5 Author and comment counts for case study subreddits... 133 6 4.6 Results of within subreddit author participation measures for case study subreddits..................... 135 4.7 Scatterplot of median number of author out-comments vs in-comments for case study subreddits............. 137 4.8 Heatmap of political subreddit percentiles for within subred- dit participation measures.................... 138 4.9 Scatterplot of median comments per authors by median au- thor insubreddit proportion for case study subreddits.... 139 4.10 Frequency of median author subreddit count for all subreddits143 4.11 Frequency of median author comment count for all subreddits144 4.12 Frequency of median author average comment count for all subreddits............................. 145 4.13 Frequency of median author Gini for all subreddits..... 146 4.14 Heatmap of between subreddit median author participation measure percentiles for case study subreddits......... 148 4.15 First pair of bar plots for between subreddit author partici- pation measures for case study subreddits........... 150 4.16 Second pair of bar plots for between subreddit author partic- ipation measures for case study subreddits.......... 151 5.1 Scatterplot of logged values of co-authorship (y-axis) and text similarity (x-axis) per pair of subreddits......... 169 5.2 Distribution of regression residuals for all subreddit pairs.. 173 5.3 Frequency distributions of subreddit degree in the network. 177 5.4 Frequency of the 11 most popular topic labels for subreddits in the network........................... 182 5.5 Heatmaps of relative subreddit topic and community fre- quencies.............................. 184 7 5.6 Community sizes......................... 188 5.7 Network graph of communities................. 194 5.8 Barplot of degree percentile rank for political subreddits.. 202 5.9 Frequency distribution of residuals for all pairs of political subreddits............................. 203 5.10 Number of edges shared by case study subreddits...... 204 5.11 Network of top edges between pairs of political subreddits. 208 5.12 Network of case study subreddits and their ten closest neigh- bours................................ 209 5.13 Barplot of percentile rank of edge weights between change- myview and political subreddits................. 215 8 List of Tables 3.1 Polarity of political subreddits in order of date created... 104 3.2 Description of left-wing political subreddits.......... 107 3.3 Description of neutral political subreddits........... 108 3.4 Description of right-wing subreddits.............. 109 5.1 Most prevalent topics per community............. 185 5.2 Community Descriptive Statistics............... 187 5.3 Example contingency table of edges by association..... 189 5.4 Top 20 subreddits in the geo-political community by internal edge count............................. 198 5.5 Top 20 subreddit pairs in the geo-political community by edge weight............................ 200 5.6 Top pairs of political subreddits by residual value...... 207 5.7 Top 10 subreddits changemyview shares an edge....... 212 A.1 Frequencies of all subreddit topic labels............ 284 A.2 Chapter4 raw values for the case study subreddits..... 285 A.3 Community edge weights.................... 287 A.4 Community topic counts.................... 288 A.5 Closest neighbours of the case study subreddits....... 289 9 Abstract This thesis examines whether echo chambers exist on Reddit, the self- titled ‘front page of the internet’. As a social media platform Reddit is widely popular in Western countries but both relatively unknown and understudied by academics. This research will show that the structural features of the platform; in particular its organisation into distinct topic- based communities, called subreddits; makes