<<

How does Trends Affect News Exposure?

Shin Lee Advised by Professor Nicholas Diakopoulos

Northwestern University

Mathematical Methods and Social Sciences Thesis

Acknowledgement

I would like to greatly thank my advisor, Professor Nicholas Diakopoulos for providing helpful feedback and guidance during my journey producing and writing this thesis. This thesis would not be where it is today without his wisdom and weekly guidance. I would also like to thank Professor Joseph Ferrie and Nicole Schneider for being great resources whenever I had questions or concerns. Lastly, I would like to thank Professor Jeffry Ely for his leadership overseeing the MMSS program and its students.

Abstract

Facebook has recently been a topic of hot discussion regarding fake news and algorithms that present bias posts. I wanted to discover whether Facebook is truly posting bias new trends and whether this may heavily influence the new sources exposure. Using programming and analytical methods, I investigated personalization by geographic location and demographics in Facebook’s Trending Topics. I further analyzed the degree of which Facebook personalizes the news trends per person. I also discover whether Facebook gives priority to certain new sources or if it gives equal opportunity for all news sources. Lastly, I investigated how often Facebook’s algorithm updates its news trends and if there were any differences in behavior at different days of the week. The data collection and analysis were conducted with Python scripts I wrote. My results show Facebook does not personalize by geo-location and personalizes very slightly by demographics. Furthermore, my analysis showed Facebook gives more news exposure to liberal than conservative news sources. Lastly, Facebook’s Trending Topics does not have a consistent behavior and tends to have a higher number of new cumulative news trends when there is a lot of news at the time in the world. I would like to note that this paper goes into detail of basic programming concepts as my primary audience’s expertise are not in computer science and programming. Another note is that my advisor plans on furthering my research in the future. Thus, I include some tips for future development of my thesis throughout this paper.

1

I. Introduction

Facebook has received high levels of attention after the Facebook founder and CEO sat in front of Congress to answer questions about Facebook and the safety of its user following the Cambridge Analytica scandal1. The Cambridge Analytica scandal began when Facebook allowed a professor from University of Cambridge to collect information about its users for research purposes. However, the data which consisted of over 50 million profiles including answers to a personality test, location, friends list, and “liked” content, was handed over to Cambridge Analytica, a UK-based political data company that was working on ’s campaign2. It’s clear Facebook has its individual users personalized data, but do they use these data to present certain types of new trends to its users? Cass Sunstein believes Facebook feeds are “echo chambers” that only show posts by people who are and think like us3. He claims this hurts our democracy because Facebook users only read articles that align with their beliefs. As a result, people become more extreme and when they see people who do not share the same beliefs, they become enemies who are “crazy”4. Sunstein predicted that Facebook will begin to experiment with the algorithm that determines which news and posts are presented to its users; and he was correct. In 2018, Facebook announced they will present “less public content, including videos and other posts from publishers or businesses” and increase the visibility of local news. Mark Zuckerberg states local news help us understand issues that affect our lives.5 Thus, it’s apparent that Facebook algorithms have a significant role on what its users see, and Facebook is currently making effort to increase news that are relevant to its users. However, when Zuckerberg said he wanted to increase news that are relevant to its users, will that also increase bias news sources as Sunstein warned us about? Algorithm auditing in theory is relatively simple: It is to examine the inputs, outputs, and outcomes of some problem6. However, in practice, it is a much harder to achieve. The algorithm is replacing the human involvement of data collection, data analysis, and human input, which may be faster and more efficient since the human brain has limited computational abilities. However, it comes with its own struggles such as consistencies, intention behind the algorithm, unintentional or intentional biases, making sure the algorithm is behaving the way the programmer expected, and many more. In this thesis, we dive deep into programming and take a closer look at how Facebook interacts with automatic scraping of its data, the struggles computationally behind Facebook data collection and analysis, and the process of creating Python and Selenium scripts to automatically gather large amounts of data. Up to this point, relevant research has investigated algorithms and the role they play on reporting and analyzing information and whether an algorithm is accountable and to what

1 Tuesday, For five hours on. “Your Facebook Data Scandal Questions Answered.” CNNMoney, News Network, 11 Apr. 2018, money..com/2018/04/11/technology/facebook-questions-data-privacy/index.html. 2 Riley, Charles. “Cambridge Analytica, Facebook and Your Data: Here's What to Know.” CNNMoney, Cable News Network, 20 Mar. 2018, money.cnn.com/2018/03/19/technology/facebook-data-scandal-explainer/index.html? iid=EL 3 “'#Republic' Author Describes How Hurts Democracy.” NPR, NPR, 20 Feb. 2017, www..org/2017/02/20/516292286/-republic-author-describes-how-social-media-hurts-democracy. 4 “'#Republic' Author Describes How Social Media Hurts Democracy.” NPR, NPR, 20 Feb. 2017, 5 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.” Columbia Journalism Review, 18 Apr. 2018, www.cjr.org/tow_center/facebook-local-news.php. 6 Rosén, Josefin. “What Every Business Manager Should Know about Algorithm Audits.”SAS Learning Post, 16 Oct. 2017, .sas.com/content/hiddeninsights/2017/10/16/algorithm-audits/.

2

degree.7 Other studies have investigated how algorithms on social media are becoming echo chambers because the algorithms decide what to present to its users8. However, because Facebook and online personalization is still an innovative concept, there lacks research exploring if Facebook News Trends are personalized by demographic and geo-location, how often trends update, and whether Facebook gives favoritism to certain news sources. This paper studies personalization of Facebook news trends by geolocation and demographic as well as if there is any favoritism of specific news sources by Facebook. Additionally, it analyzes how often Facebook updates its news trends. I hypothesize that Facebook News Trends are personalized by different demographics and geo-location. Furthermore, I hypothesize that Facebook may give some favoritism to certain news sources that are more liberal than conservative because Facebook is known be more liberal as a company, but only slight favoritism. Lastly, I hypothesize that Facebook News Trends will not have a consistent update schedule as news change depending on what happens around the world. The data used for my thesis was collected by Python and Selenium scripts that I developed for this project. There are four categories: Trends, Trends and Tabs, Geo-location, and Personal vs Puppet. Each category has its set of datasets that were collected in real-time from Facebook. The data is stored into a database and further processed by algorithms to answer the following questions: • How often are trends updating? • Is there different news trend behavior on the weekend verse the weekday? What about different days on the weekdays? • Which news sources does Facebook give more exposure to? • How many news articles from external news sources does Facebook publish? • Which news sources does Facebook include in the “Trending” section? • Do news trends differ depending on geographic location? If so, how? • Does Facebook personalize news trends by user? More specifically, do news trends differ depending on the Facebook account?

To my best understanding and knowledge, there are no research that answered the questions listed above. This paper investigates and answers the questions listed and provides insight on how the end users experience Facebook’s Trending Topic news. This paper and is structured as follows: Section II consists of a summary of relevant research on social media, traditional news, and news sources, Facebook news trends, and recent Facebook events. Section III presents the methodology on how data was collected, the technology used, details of the raw datasets, and descriptions of the methods used to analyze the data. Section IV presents the results; Section V offers the discussion which includes the limitations and implications of my thesis as well as future research areas. Lastly, Section VI presents the conclusion.

7 Diakopoulos, Nicholas. “Algorithmic Accountability.” Digital Journalism, vol. 3, no. 3, 2014, pp. 398–415., doi:10.1080/21670811.2014.976411. 8 Alvarado, Oscar, and Annika Waern. “Towards Algorithmic Experience.” Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18, 2018, doi:10.1145/3173574.3173860.

3

II. Literature Review The objective of this literature review is to present recent events with Facebook, past research on Facebook and news trends, social media and traditional news, and algorithm auditing. In May 2016, .com, a design, technology, science, and website published an article where several former Facebook employees who worked as “news curators” claimed that they suppressed conservative news from the Trending news page9. They claimed they were instructed to “artificially inject” specific stories into the Trending news page. Sometimes, these artificially injected stories that were not naturally trending, however, were used in the Trending section anyways.10 This raised concerns because a process that was assumed to be purely determined by algorithms had bias human involvement. A former Facebook employee also claimed that news covered by conservative news sources that Facebook’s algorithm selected would not be included in Trending unless there were more unbiased news sources that also covered the same story.11 After Gizmodo’s news coverage gained popularity, on May 9, 2016, the Vice President of Search at Facebook responded on Facebook stating the following12: There are rigorous guidelines in place for the review team to ensure consistency and neutrality. These guidelines do not permit the suppression of political perspectives. Nor do they permit the prioritization of one viewpoint over another or one news outlet over another. These guidelines do not prohibit any news outlet from appearing in Trending Topics. Trending Topics is designed to showcase the current conversation happening on Facebook. Popular topics are first surfaced by an algorithm, then audited by review team members to confirm that the topics are in fact trending news in the real world and not, for example, similar-sounding topics or misnomers. On August 26, 2016, Facebook announced they will be removing human involvement in writing the Trending topic list’s descriptions13. This meant Facebook’s algorithm will have full control and whatever it writes will be published before a human looks at it. However, only a few days after the change, Facebook’s algorithm posted an article about Megyn Kelly on Trending with a description calling her a “traitor” and that she was kicked out by for “backing Hillary”, which she was not14. This fueled Facebook’s controversy of displaying fake news on its platform. However, there seems to be a conflict regardless of who is behind the decision making of Trending Topics. When Facebook was using human editors for their Trending, it was accused of inputting human biases in their decisions. When it removed the human involvement, they were accused of fake news.

9 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” Gizmodo, Gizmodo.com, 10 May 2016, gizmodo.com/former-facebook-workers-we-routinely-suppressed-conser-1775461006. 10 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” 11 Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” Gizmodo.com, 12 https://www.facebook.com/tstocky/posts/10100853082337958 13 Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake News.” , WP Company, 29 Aug. 2016, www.washingtonpost.com/news/the-intersect/wp/2016/08/29/a- fake-headline-about-megyn-kelly-was-trending-on-facebook/?noredirect=on&utm_term=.2d050b7762f3. 14 Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake News.

4

In January 2018, Facebook announced they will change their algorithm to favor “authentic” connections between people.15 Furthermore, the new algorithm will devalue posts by publishers and paid brands. For the users, the quality of user experience is expected to increase. Advertisement on the News Feed will be more relevant to the user and the quantity of ads will be fewer. Contents from the user’s social connection will get a boost in exposure and are expected to be higher on the News Feed list16. However, this change will make advertising on Facebook more competitive and expensive. Additionally, more advertisements are likely to appear in other Facebook platforms such as Messenger, , and WhatsApp. A study by Chakraborty, Messias, and Benevenuto investigated the demographic biases in crowdsourced recommendations on Twitter’s Trending Topics. News content are selected for recommendations by how much popularity and activity it gets on the social media’s platform. Thus, it indirectly gives users the power to promote specific news. This paper studied which demographics of people influenced which contents were worthy of recommendation and whether that demographic was a good representative of the platform’s overall population17. The results showed that a significant percentage of the trends were promoted by demographics that were drastically different from the overall population18. Further concerns were raised when they discovered that there were some demographic groups that were under-represented among the promoters of the trends. Specifically, Black female were the most under-represented demographic followed by Black male, Asian female, and Asian male; white male was the least under-represented demographic19. Furthermore, middle-aged demographics were more under- represented than the younger population. I wanted to further this research and investigate whether the Trending Topics are different for different demographics. In other words, this study discovered which demographics influenced certain news to rank in Trending Topics. I would like to study how the Trending Topics are presented to different demographics. The results from this study added good insights to my research, as I will be using two different Facebook accounts when scraping data: an impersonalized sock puppet account and an established account with a particular demographic. A 2011 study by Cvijikj and Michahelles investigated trend detection over Facebook public posts20. They monitored trends by data collection and trend detection. The data collection process was continuous and in real-time, like the algorithm implemented for my paper. This study also had many difficulties collecting data from Facebook. I explain in detail about the difficulty scraping data from Facebook in the Discussion section. This study’s results suggested that Facebook trending topics should be divided into three categories: disruptive events, popular topics, and daily routines. However, this study lacked results on how Facebook trends are personalized by location or account. It also did not present results on how often trends changed on Facebook’s Trending. Additionally, the study only collected data for 4 consecutive days and did not consider how Facebook Trends may change on different weeks and on weekends.

15 Göös, Christine. “.” Facebook Advertising Trends 2018, 15 Feb. 2018, www.smartly.io/blog/facebook- advertising-trends-2018. 16 Göös, Christine. “Blog.” Facebook Advertising Trends 2018 17 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 1 Apr. 2017, arxiv.org/abs/1704.00139. 18 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 19 Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 20 Cvijikj, Irena Pletikosa, and Florian Michahelles. “Monitoring Trends on Facebook.” 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, 2011, doi:10.1109/dasc.2011.150.

5

A study by Groshek in 2013 compared accomplished traditional news sources’ agendas (New York Times and CNN) with the most frequency shared news and topics on popular social media sites (Facebook and Twitter)21. The study showed that the news agendas were similar between the two traditional news sources and Facebook, but there were variations in terms of ranking of the news items. The study also showed that social media influenced traditional media’s agenda for the cultural topic. However, there was no relationship between political and cultural coverage within the social media platforms. This literature provided valuable information on the relationship, or the lack there of, between traditional news outlets and social media trending news. However, it did not answer how different location and/or Facebook accounts affect the type of news trends a user sees. In fact, it only investigated the news agendas in terms of categories and lacked the deep analysis of what types of news sources were presented. A 2016 study by Kazai, Yusok, and Clarke developed a prototype mobile application that gave content recommendation by utilizing the user’s location, Facebook and/or Twitter feed, and her in-app activities22. Their model constructed the user’s personalized feed by mixing different sources from multiple sources, some directly from their Facebook/Twitter feeds and some propagated content through her in-app social network. This study was the first to provide personalized feeds by pulling different sources and recommendations over a crowd curated content pool. Their algorithm was different from Facebook’s Trending algorithm, but it had a similar idea because it learned from the user’s activity and what was popular within the user’s and platform’s network. By utilizing the data collected, they made content recommendations to their users which they believed would interest the users. Overall, Facebook research is a recent topic and has recently gained popularity. As a result, there lacks a plentiful amount of Facebook research and there are only a few past researches on Facebook Trending Topics and Facebook data scraping. To the best of my knowledge, there is no existing research that investigate how Facebook Trends different by geo- location and by personalization. There is also no research that investigate how often Facebook Trending topics change and whether there is any difference in behavior depending on the day of the week or the time of the day. In this paper, I describe the algorithm auditing process to scrape the data from Facebook and analyze the results of the collected data in terms of personalization by demographic, geo- location. I also investigate how often Facebook Trending topics change and if there are different Trending behaviors depending on the different days of the week and different times of the day. Lastly, I investigate which new sources are receiving higher news exposure by Facebook. Based on the evaluation of the collected results, I discovered that Facebook Trending Topics are not as personalized as I hypothesized. However, Facebook gives clear preference to certain news sources than others.

21 Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013, doi:10.12924/mac2013.01010015. 22 Kazai, Gabriella, et al. “Personalised News and Blog Recommendations Based on User Location, Facebook and Twitter User Profiling.” Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '16, 2016, doi:10.1145/2911451.2911464.

6

III. Methodology My thesis looks at how Facebook news trends behave and how that influenced the users’ news sources. Unlike most MMSS theses, I collected my data by creating my own programming scripts that scrape data from Facebook. This method was chosen because there are no data publicly available for Facebook news trends. Additionally, Facebook news trends change often because news in the media are always changing depending on what is happening the world. To account for the randomness of what happens in the world, it is important to have the latest data for my analysis. Furthermore, Facebook has proclaimed it will make significant changes to its algorithm after it was criticized for fake news and bias news23. To account for the latest Facebook algorithm, it was important to collect recent data to account for the most up to date algorithm changes. Scripting Technology The scraping scripts were written in Python 3.6.4 and Selenium, a user interface (UI) automation tool that automates UI testing. Selenium was selected because it can simulate a real human user and trick the web browsers. It can distribute and scale scripts over many different environments and can robust, regression automation tests. Regression automation test’s purpose is to catch bugs that were accidently introduced and make sure previous bugs stay dead. Selenium physically clicked on buttons and scrolled through the webpage like a real human user would. This is important as Facebook tried to prevent scraping of its data and raised security checks when it felt an account had security threats, which I will talk about further later in this paper. Additionally, Selenium can visually show the user what is happening on the webpage. Even though the user did not physically make any actions on the page, actions such as click, scroll, etc. were conducted on the webpage. This helped with debugging the code and confirming the algorithm was working properly. The user could compare how Facebook reacted when an algorithm was navigating through its pages verses when a real human being clicked through the pages. Selenium also has the option to run the scripts in headless mode, where the browser is scraped without the web page visually showing. This is an important feature as when the browser is visually present, it is more prone to human errors because the user could accidently click on the webpage and cause an interference with the script. Headless mode was a feature that was added further into the data collection since it is most useful for long intervals of data collection and for scripts that have been fully debugged. Since Selenium is a user interface testing platform, it works great with HTML and has beneficial options to grab the html elements by class, id, CSS, XPath, and practically any html element that is available on Facebook. Python 3 was used over Python 2 because Python 3 is a newer version of Python 2 and offers more features. Furthermore, there is a shift from Python 2 to Python 3 and to account for future research on my thesis topic, I used the newest version of Python, so the version difference is as minimal as possible for future researchers. Additionally, there are many benefits of Python as a programming language. There are six main reasons why I chose to use Python over other programming languages such as Java or C. First, Python Package Index (PyPI) makes Python capable of interacting with many different languages and platforms because it includes various third-party modules. These modules made it easier to work with Python when I wanted to install software, for example open source, that are developed and shared in the public Python

23 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.”

7

community24. It was also beneficial because it made it easier for me to distribute my software via PyPI. Additionally, PyPI made it easier for Python to interact with different languages and platform. In this data collection, I used two different platforms, Chrome and Mozilla Firefox. Second, Python has an extensive amount of support libraries. Support libraries are standard libraries that have predefined functions that can be used throughout the code once the library has been imported into the script. This reduces the length of the code significantly as well as reducing the time to write tedious repetitive code. This is extremely important because Facebook UI has hundreds and hundreds of lines of code. It would be impossible to write code for every single task my code needed to accomplish for this thesis in the given amount of time. Third, Python programming language is an open source and community development25. Thus, it is free to use by the public, even for commercial purposes. It is also an on-going development as developers are contributing to new versions of Python regularly. This is important because even minor updates for the Python language can address and fix bugs that were present in previous versions. Fourth, Python has numerous documentation and community Q&A, which is where other programmers who ran into similar problems post their errors and the Python community gathers to solve the problem together if there is no answer already. This is tremendously helpful when running into bugs and I have used this hundred, of times when I was writing and debugging my scripts. Though the command prompt throws errors and gives brief description of what caused the error when a script failed, it is usually not enough to figure out how the solve the bug. The support available online had encouraged me to pursue my thesis in the Python language and continuously encouraged me to keep it Python. Fifth, Python has great data structures such as built in lists, sets, and dictionaries that make it not only easy to develop but also time optimal. This is critical because large amounts of data were collected and stored into multiple leveled data structures. Additionally, Python offers dynamic high-level data typing which is useful because it decreases the amount of support code required and ultimately the amount of code written. Lastly, Python is an object-oriented programming language. It consists of strong text processing capabilities and has its own unit testing framework which is very important since a significant portion of the data collected are either text or/and have gone through text processing.26 Virtual Environment Set Up To set up the environment and get the correct dependencies and packages in one location safely, I created a virtual environment. This step is important and recommended for developers who wish to use my code in the future because how I manage my dependencies may be different from how another developer manages her dependencies. I focused on development and testing for majority of my programming for this thesis, but I did change my focus to deployment further on as I was completing my scripts and publishing my code into GitHub. After I downloaded Python 3.6.2, I downloaded pip separately. I developed on a Window platform. In MacOS, Python is pre-downloaded on every device and pip comes with Python, however, for my operating system (Windows 10), I had to manually download pip. Afterwards, I

24 “PyPI – the Python Package Index.” PyPI, pypi.org/. 25 “Welcome to Python.org.” Python.org, www.python.org/about/. 26 Rongala, Arvind. “Benefits of Python over Other Programming Languages.” Invensis Blog, 6 Apr. 2018, www.invensis.net/blog/it/benefits-of-python-over-other-programming-languages/.

8

used pip to download Pipenv, a dependency manager for Python projects27. Virtualenv is the virtual environment tool I used to develop in separate and different Python environments on the same local computer. Virtualenv created a folder which isolated my thesis development environment with other environments I was developing on for a different project. This is important because I needed different dependencies for my thesis than, for example, my machine learning project. After creating a new virtual environment, I downloaded the needed libraries and dependencies for the data collection script. There are three main dependencies I downloaded into my virtual environment: Selenium Webdriver 3.9.0, APScheduler 3.5.0, and Pyvirtualdisplay (PyPl). Selenium was a dependency I downloaded to integrate with my Python code. By combining Selenium and Python, I could control the user interface activity with Python code. APScheduler was used to schedule when and how my scripts would run. I will further discuss the APScheduler intervals in the data collection section. PyPl was used to run my script on the Web Services (AWS) cloud. I will further discuss cloud computing in the data collection and challenges section. Web Browser Initially, I collected data using Mozilla Firefox because Selenium IDE, a Firefox add-on, was only available on Firefox and I had experience with Selenium IDE. Since I did not use Selenium IDE for this experiment but instead I used Selenium WebDriver which is available on multiple browsers, I was not restricted to one browser. After running into issues with Firefox when running the scripts on the AWS cloud server, I changed from Firefox to Google Chrome. Another reason I changed from Firefox to Chrome was because Chrome is the most popular web browser with roughly 78% of browser usage verses 11% for Firefox28. It was important to collect data from the most commonly used browser since there was a greater chance most Facebook users use Chrome to access their Facebook accounts. If developers are interested in running my scripts on Firefox, I have left the code commented out for future use. But for this study, all data analyzed and presented in this paper are collected from Chrome. To use Chrome as the preferred browser, you must download chromedriver to properly run the script on the cloud. Without the chromedriver, the script will automatically fail, and you cannot collect data on an AWS server. Examples of common errors are shown in Figure 1. However, the script will run and collect data correctly when it is running locally on a Windows 10 operating system.

selenium.common.exceptions.WebDriverException: Message: 'chromedriver.exe' executable may have wrong permissions. Please see https://sites.google.com/a/chromium.org/chromedriver/home selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary Figure 1: Common Chromedriver Error CSV and mySQL Database

27 “Pipenv & Virtual Environments.” Freezing Your Code - The Hitchhiker's Guide to Python, docs.python- guide.org/en/latest/dev/virtualenvs/. 28 “Browser Statistics.” W3Schools Online Web Tutorials, www.w3schools.com/browsers/default.asp.

9

In the initial stages of data collection, the data was stored in CSV files because Python has a csv support library with useful functions that made it simple to transport data into a csv file. Figure 2 shows the csv support library functions in use where I create a new CSV file and fill in the column names. Figure 3 shows a list of lists that contains data that was scraped from Facebook being written into a csv file. Codes in figure 2 and 3 are from fb_trends.py. Additionally, in the earlier stages of data collection, I only collected data in short intervals such as one to two hours and only collected one subset of data called trends_only, which I will discuss in detail later in the paper. As a result, the CSV was never unbearably large and importing the data into a CSV was feasible.

Figure 2: Create a New CSV File using CSV Support Library

Figure 3: Insert List of List into CSV File using CSV Support Library As I began to collect more data, in terms of higher frequency, longer time range, and different types of data, CSV began to have issues in performance. As a result, I moved my data storage from CSV to MySQL Database. MySQL Database has many perks that CSV does not have. First, MySQL is developed to store extremely large amounts of data. Thus, even if I collect data for 24 hours every one minute, there is no issue with performance in retrieving the data. With MySQL, I can conduct SQL queries that lets me filter certain types of data depending on specific criteria. For example, Figure 4 shows an SQL query that selects all columns of the data in fb_scrape_db database PROXYIL table. Figure 4 is an example of one of the basic select SQL queries, but there are also action queries where I can insert, update, or delete data inside the database. Furthermore, queries can calculate or summarize data in addition to automate data management tasks29. To help understand the concept of SQL queries, you can think of it as a search on Google. You ask Google search engine a and it returns an answer, which is the same as you give Google a query, and it returns what it finds in its database.

Figure 4: Basic Select SQL Query Another benefit of MySQL database is that the database is stored in the cloud. This means that I or anyone I want to share my data with can access my data from any device as long she has the login credentials. This is extremely valuable since I was running my scripts on three different devices during this experiment: my laptop, a desktop, and an AWS cloud server. Thus, there is complete mobility and flexibility on where and how I can access my data. For example, a desktop has a more powerful central processing unit (CPU) than a laptop so majority of the more

29 Rouse, Margaret. “What Is Query? - Definition from WhatIs.com.” SearchSQLServer, searchsqlserver.techtarget.com/definition/query.

10

complicated scripts were ran on a desktop30. However, I can only access the desktop when I am home. With MySQL database, I have the option to access the data being collected on my desktop directly from my laptop. Furthermore, the data is immediately inserted into the database once its collected, and more importantly, the data is immediately accessible anywhere that has access to the cloud. This meant I could check how my script was running on my desktop from my laptop. This dynamic concept was critical in collecting the data on time and accurately. Lastly, storing data into the MySQL database makes it easier for future developers to work on this project since the data is stored on a cloud which can be accessed from anywhere. AWS Cloud During the early stages of data collection, the scripts were locally running on my Window 10 laptop. However, as the time range and amount of data collected increased, it was no longer feasible to run the scripts locally. When a script is locally running, the laptop must be on the whole time or else the script will stop. Additionally, the laptop must be connected to the always or else the connect to Facebook will fail and the script will fail. Because of these two constraints, it was impossible to run the data for a whole 24 hours without any interruptions because I needed my laptop for my classes. I would like to note that even though there is internet on campus in Northwestern University, the connection to the internet is not stable outdoors. Thus, when I tried to walk from class to class with my laptop on, the script still failed because the internet connection was too poor to stay connected to Facebook. As a result, I decided to move the scripts that were running locally to run on the cloud. I used AWS for cloud computing because originally, I was also going to utilize Amazon’s Mechanical Turk to crowdsource Facebook data. However, due to the recent events with Facebook’s data scandal, I decided to not pursue that method. AWS is a cloud platform that offers both cloud computing and database storage31. Since my database was in AWS, I decided to utilize its cloud computing platform as well. Additionally, AWS is a very popular cloud services platform that is used by many large companies such as Spotify, Intuit, Yelp, and more. It is trustworthy and provides applications with flexibility, reliability, and scalability. To access the cloud server, I used PuTTY to connect from my local to the cloud. Please note that this cloud server has a private key that needs to be present when you try to connect. To transfer files from my local to the cloud server and vice versa, I used WinSCP. Both applications are highly recommended for Window users and they available online for free. When a script is running on a cloud, it is running on a virtual server. It is the same idea as running on a different computer, but the only difference is this one is in the cloud. So, you can think of it like it is another computer that is not physically next to you. Thus, I can connect to the cloud and run my scripts as I would on my local. However, there are some key differences. The AWS cloud server I used was in at eastern time zone and the operating system is Red Hat Linux. Because the difference in operating system, command prompt commands are completely different from each other. For example, in Linux, you must start your script with python3 while

30 “What Is a Central Processing Unit (CPU)? - Definition from Techopedia.” Techopedia.com, www.techopedia.com/definition/2851/central-processing-unit-cpu 31 “What Is AWS? - Amazon Web Services.” Amazon, Amazon, aws.amazon.com/what-is-aws/.

11

in Windows you start with py. Additionally, AWS cloud servers do not come with Chrome pre- downloaded and since I do not have an interface to look at on the cloud, I had to download chrome via the command prompt. There are many other key differences between Window and Linux users should be aware of for future additions to this study, but I will not include in this paper since it is too technically complex for the average MMSS audience. Once again, I had to set up a virtual environment and download the three dependencies on the Linux cloud server. The purpose of the cloud server was to run my script without any interruptions for 24 hours. However, I had to make the UI be present but not visible because there is no screen on a cloud computer. Thus, I used xvfb from pyvirtualdisplay (PyPl) which is a library that was imported into the script. This feature allowed an invisible UI to appear without it being shown to the cloud’s screen. To understand this concept, you must understand how a computer display works. A computer screen queues up the next visual images one after the other and shows it on the screen to the user. This is how we can browser from page to page or watch a video. What happens in the cloud server is there is no computer screen to be shown because the server is in the cloud and there is no physical screen. Thus, when I try to launch a chrome browser, the browser cannot open, and error occurs. However, by using xvfb, we can queue the images up on the server but not visually present it on a screen. In doing so, the cloud server loads the UI behind the screen without erroring. This allows us to run UI scripts on the screenless cloud. It is important to note that the cloud server is like a separate invisible computer. This means when I disconnect from the cloud platform, my cloud server shuts down just like how a computer shuts down when you turn it off. When the cloud server shuts down, everything on the server closes, including the scripts. To run the server continuously even when I exit the cloud server, I used nohup. Nohup is a command which prevents the command that follows from getting aborted automatically when you log out or exit the server32. Thus, when a nohup command is ran on a command prompt, the command prompt returns to its normal state after the command is admitted. Figure 5 shows an example of a nohup example. Nohup is followed by the command to run the script python3 -u which is followed by the script name fb_trends_cloud_tab.py. The commands after > is to keep track of the command prompt outputs in a text file. This is used for debugging purposes. Lastly the & is telling the server do not terminate this command even if I close the cloud server. In Figure 6, the text results from running the nohup command is written in a text file instead of a command prompt when the command in Figure 5 is executed. By running the script on the cloud, using xvfb, and using nohup, I could collect data without any interruptions for 24 hours.

Nohup python3 -u fb_trends_cloud_tab.py > trends_only5.txt & Figure 5: Nohup Example

32 “Linux Nohup Command Help and Examples.” Computer Hope, 1 Apr. 2018, www.computerhope.com/unix/unohup.htm.

12

Figure 6: Nohup Command Prompt Results Sent to Text File Another benefit to switch from local to cloud is that anyone can access the cloud server if they have the criteria. This means I can access the cloud anywhere in the world at any time from any device. I can also share the account with another person, in the case we need to debug a script together. This allows easier collaboration and communication because everyone who is working on the project has the latest code. For my thesis, I wrote all the code myself. However, this is important to note for developers who wish to use my code in the future or to further this research. Config File A config file was utilized in the script to keep track of different features of Facebook login criteria, data collection intervals, csv file and MySQL database names. This file contains the different login information for different accounts. Config file was used because it creates a single location where all feature names are in. This makes it easier to change minor but critical variables to collect different types of data. Otherwise, one would need to go into every single Python script and change the variables in every location it occurs. A config file makes this process dynamic and local. This is a file that will not be uploaded on the GitHub as it contains private login information. For future development on my thesis, I highly suggest you create your own config.ini file and include your criteria for Facebook login. You can also include other criteria such as interval length and filenames. Data There are four main dataset categories that were collected: (1) trends, (2) trends and tabs, (3) different geographic location, and (4) personalized vs puppet Facebook accounts. In this paper, I will refer to the four data sets by the names mentioned above. (1) Trends The trends category’s data was collected from the “Trending” section on right side of Facebook’s home page as shown in Figure 7. There are five topics in the trending page: Top Trends, Politics, Science and Technology, Sports, and Entertainment. Each topic had a maximum of 10 trending news. It was usually the case that each topic had 10 trends on average but there

13

were instances where there was less than 10. Notice in Figure 7 there were only three trends. Once the “See More” as clicked, the rest of the trends appeared.

Figure 7: Facebook Trending Page For this category, I used a puppet Facebook account that had a bare profile. The only information that was provided on the puppet account was a fake first and last name, phone number, fake birthday, and a profile picture. It had no friends and no activity. The data was collected by first signing into the puppet Facebook account and loading all the HTML elements on the home page. It is important to note that Facebook does not load all five topics when you log into your account. It only loaded the current topic the user was on, which by default is Top Trends. To load the other four topics, I wrote a function that clicked through every topic before scraping the data. Once the topics were clicked, the HTML elements were loaded and available to get collected. My script searched for specific HTML elements I defined and collected the data respectively. For this category, I collected the following: ● Type: The topic (top trends, politics, science and technology, sports, or entertainment) ● Title: Title of news trend ● Description: The short description located under the title (Figure 7) ● Trend Link: The link that redirects the user when the trend is clicked. The link redirects the user to a compilation of news on a Facebook page. ● Rank: Where the trend is ranked in the trending list ● Scrape ID: An integer to keep track of which round of scraping the data is collected from ● Timestamp: The exact time and day the data was collected (YY-MM-DD HH-MM-SS) Type was used to distinguish the different topics during the analysis. The analysis investigated the top trends for all five topics in addition to the top trends for each of the topics. Title and description were collected for detail for each news trend. The trend link was used to uniquely identify each trend. Rank was collected to discover where in the list certain new trends were placed and whether they moved up or down the list. Scrape ID was used with trend link to uniquely identify trends for the whole 24-hour dataset. Timestamp was used to keep track of the time and date the data was collected, which was critical when analyzing the trend behavior at different days of the week. I collected five rounds of 24-hour data on separate days locally. The data collected are listed in Figure 8. Trends 2 is collected at a high interval to analyze when trends are changing and how often. It investigated if new trends are changing as often as one minute or if 5, 10-

14

minute or a higher interval was sufficient enough to catch most of the new trend updates. Trend 1 and 2 results were compared to investigate whether there is consistency in news trends behavior between the same weekdays but different weeks. Trends 1 was compared to Trend 3 to investigate whether there is consistency in news trends behavior on the weekday and the weekend. Trends 4 and 5 are compared to investigate if different weekdays have different news trends behavior. For datasets that have different intervals, I only include data in the higher interval value, so the interval variable is consistent between the two datasets. In conclusion, for the Trends category I conducted analysis to answer the following questions: 1. How often are trends updating? 2. Is there different news trend behavior on the weekend verse the weekday? What about different days on the weekdays?

Start Date/Time End Date/Time Interval per Round (min)

Trends 1 Wednesday 3/7/18 Thursday 3/8/18 5 4:30PM 4:30PM

Trends 2 Wednesday 3/14/18 Thursday 3/15/18 1 10:30PM 10:30PM

Trends 3 Saturday 4/21/18 Sunday 4/22/18 5 6:00PM 6:00PM

Trends 4 Thursday 4/26/18 Friday 4/27/18 5 1:00PM 1:00PM

Trends 5 Tuesday 5/15/18 Wednesday 5/16/18 10 10:00AM 10:00AM Figure 8: Trends Data Information To answer the questions, I calculated the Jaccard similarity. Jaccard similarity compared two datasets and analyzed the similarities and differences33. The result was a number between 0 and 1 where the higher the number, the more similarities the two sets had. Jaccard similarity is the intersection divided by the union of the two sets. The computation was written in Python and ran as a script. The script then outputted the results on a csv and graphs were produced by excel and matplotlib, a python library used for graphing. The analysis calculated the average, minimum, maximum, and standard deviation of the computed Jaccard similarities. The script I wrote also calculated the Jaccard similarities, average, minimum, maximum, and standard deviation for every 5-minute interval starting from the interval the data was collected up to 60 minutes. Furthermore, I created a Python script to calculate the cumulative new trends for every 5-minute interval starting from the interval the data was collected up to 60 minutes. This analysis showed how many brand-new trends appeared in the 24 hours and the results are printed into a

33 “Jaccard Index / Similarity Coefficient.” Statistics How To, www.statisticshowto.com/jaccard-index/.

15

csv file. The graphs for this analysis were also constructed using the same tools as the Jaccard similarities.

(2) Trends and Tabs The Trends and Tabs category was an addition to (1) Trends. In addition to collecting data of Trends, Tabs information was also collected. Tabs data is collected from the news trends page that consisted of the top articles of that news topic. Once a user clicked on a news trend from the Trending section, as shown in Figure 7, she was redirected to a Facebook page with new articles and Facebook communities’ comments and likes, as shown in Figure 9.

Figure 9: Tabs – New Trends Facebook Page For this category, I used the same puppet account and methodology to collect the Trends data as (1) Trends. To collect the tabs data, I opened a new tab for each news trend’s link, scraped the data from the first box with the news sources for that topic. Once on the tabs page, the new source links did not load unless the user hovered over each news box. Thus, I wrote a hover function that hovered and waited for each new source’s direct link to load before scraping the data. My script searched for specific HTML elements I defined and collected the data respectively. For this category, the same attributes were collected for the Trends portion as (1) Trends. For the Tabs portion, I collected the following: • Timestamp: The exact time and day the data was collected (YY-MM-DD HH-MM-SS) • Scrape ID: An integer to keep track of which round of scraping the data is collected from • Type: The topic (top trends, politics, science and technology, sports, or entertainment) • Rank: Where the news source is ranked in the list of news sources on tabs page

16

• Title: Title of article for the specific news source • Source: News source of the article • Published Date: When the article was published • Time Since: Time since the article was published on Facebook • Description: Description of the article • URL: Direct URL to the article on the original news source page Timestamp was used to keep track of the time and date the data was collected. The Scrape ID was used to keep track of which round of scraping the data was from. The Type was used to distinguish the different topics during analysis. Rank was important to track the news source Facebook prioritized. Title and description were collected for detail for each news article. Source was used to analyze which news sources Facebook gave more news exposure to. Published Data and Time Since was collected to track how recent news articles were. URL was used to uniquely identify each news article. Facebook only provided Publish Date, Time Science, and Description data for the first news source. Collecting Tabs data required a lot of interaction with Facebook in a short amount of time. However, Facebook has been making serious effort to block data collection. As a result, limited Tabs data could be collected. Furthermore, the intervals were higher compared to (1) Trends because there were more data that needed to be collected and it required more time per scrape. I collected a 26-hour dataset locally. The data collected are listed in Figure 10. I conducted analysis to answer the following questions: 1. Which news sources does Facebook give the more news exposure to? 2. How many news articles from external news sources does Facebook publish? 3. Which news sources does Facebook include in the “Trending” section? Start Date/Time End Date/Time Interval (min) Trends and Tabs 1 Wednesday, 5/15/18 Thursday, 5/16/18 30 10:00AM 12:00PM Figure 10: Trends and Tabs Data Information To answer the first question, I calculated how often certain news sources were exposed overall and per topic. To answer the second question, I calculated the number of unique articles overall and per topic. To answer the third question, I used a set to include all the unique news sources Facebook exposed from the data collected. The computation was written in Python and ran as a script. The script I wrote used Counter from the collections library to rank the news sources on how frequently they appeared. I exported the data from MySQL database and conducted data aggregations via Python. The script then outputted the results on a csv and graphs were produced by excel and matplotlib. (3) Geographic Location For the geographic location category, I used the puppet account to gather data from two locations: Northern California and Chicago, Illinois. I got a proxy for an Internet Protocol (IP) address located in Northern California through AWS. Every device has an IP address that is associated with it. When a device connects to a website, the online connection gives your

17

computer an address, so the website knows how to send information to your computer34. This IP address identifies where that device is in the world. To trick Facebook to think I am logging in from a different location, I used an IP address from Northern California. The purpose of this experiment was to discover whether Facebook presented different new trends in different parts of the world. The Facebook account, time, and day were held constant. The variables collected in this experiment are the same variables collected in (1) Trends. The same puppet account from (1) Trends is used. I collected two datasets of 32 hour from 12:00AM, Sunday, May 13, 2018 to 3:00PM, Monday, May 14, 2018 locally. The data collected are listed in Figure 11. Geo-location 1 was compared with Geo-location 2 to analyze if there were any differences between Facebook news trends depending on location. Every other variable was held constant. In conclusion, for the Geo- location category I conducted analysis to answer the following question: 1. Do news trends differ depending on geographic location? If so, how?

Start Date/Time End Date/Time Interval per Location Round (min) Geo-location 1 Sunday, 5/13/18 Monday, 5/14/18 10 Northern 12:00AM 3:00PM California Geo-location 2 Sunday, 5/13/18 Monday, 5/14/18 10 Chicago, IL 12:00AM 3:00PM Figure 11: Geographic Location Data Information To answer the question, I analyzed the differences between the two datasets collected in terms of similarities and the number of unique trends overall and per topic. The results consisted of the number of trends for each location, number of unique trends for each location, the number of same unique trends between the two locations. This analysis was conducted for both the overall news trends and per topic. The computation was written in Python and ran as a script. I exported the data from MySQL database and conducted data aggregations via Python. The script then outputted the results on a csv and tables were produced by excel. (4) Personal vs Puppet For the Personal vs Puppet category, I used the same puppet account as the previous categories and my personal Facebook account. My personal Facebook account was created in 2008 and included many personalization information. It had numerous pictures, statuses, likes, comments, and overall activity. For the privacy and safety of my account, I will not discuss in detail about my personal account. The Config file was very helpful for this category’s data collection because it contained private information that can be separated from the rest of the scripts. This was critical especially when I uploaded my code on GitHub where other people have access to it. The purpose of this experiment was to discover whether Facebook presented different news trends to different Facebook accounts. More specially, I investigated if Facebook personalized its new trends from what they believe a specific user would be interested. This would be linked to whether Facebook used the data it collected from its users to present specific

34 “What Is a Proxy Server and Should You Risk Using One?” WhatIsMyIPAddress.com, whatismyipaddress.com/proxy-server

18

types of new trends on their profile. The bare puppet account did not have any data or activity, so it was hypothesized that the news trends would be more general. However, my personal account had 10 years of data and activity. Thus, it was hypothesized to have new trends that would be personalized to me. As a result, I theorized that the news trends between the two profiles would be different. The only variable that changed was the Facebook account because I collected data on the puppet and personal account. All other variables were held constant. The variables collected in this experiment are the same variables collected in (1) Trends. The same puppet account from (1) Trends is used. I collected two 40-hour datasets locally and the data collected are listed in Figure 12. Personal and Puppet 1 dataset were analyzed to investigate whether there was personalization of news trends. In conclusion, I conducted analysis to answer the following question: 1. Does Facebook personalize news trends by user? More specifically, do news trends differ depending on the Facebook account? Start Date/Time End Date/Time Interval per Round (min) Personal 1 Sunday, 5/13/2018 Tuesday, 5/15/18 10 2:00PM 6:30PM Puppet 1 Sunday, 5/13/2018 Tuesday, 5/15/18 10 2:00PM 6:30PM Figure 12: Personal vs Puppet Data Information To answer the question, I analyzed the differences between the pair of datasets collected in terms of similarities and the number of unique trends per topic. The results consisted of the number of trends for each account, number of unique trends for each account, the number of same unique trends between the two accounts. The analysis for this category was like the analysis for (3) Geographic Location but with a focus on per topic data. The computation was written in Python and ran as a script. I exported the data from MySQL database and conducted data aggregations via Python. The script then outputted the results on a csv and tables were produced by excel.

IV. Results The purpose of this research was to answer whether Facebook Trends effect news exposure. There are four categories: trends, trends and tabs, geo-location, and personal vs puppet accounts. This section will present the results of the different categories. When I refer to all intervals, it means 5-minute intervals from 5 to 60 minutes. When I refer to new cumulative news trends, it means brand new, unique news trends. (1) Trends There were five datasets used and analyzed to answer the two questions listed in the methodology section. The first dataset showed that Top Trends had the highest cumulative new trends on average. The other four topics had similar amounts of cumulative new trends for all intervals, as shown in Figure 13. In a span of 24 hours, there were 43 new trends in Top Trends, and between 22 to 28 new trends in the other topics for 5-minute intervals. Furthermore, for 60-

19

minute intervals, there were only 16 news trends for Top Trends while the other four topics had between 16 to 18 news trends.

Cumulative New Trends - Trend 1 3/7/18 - 3/8-18 | 4:30PM 50 45 40 35 30 25 20 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 43 38 37 28 27 28 24 17 23 22 25 16 Politics 28 26 25 23 19 20 18 20 19 18 20 18 Science/Tech 26 24 22 21 20 19 17 18 18 19 19 18 Sports 22 21 21 19 19 19 17 17 17 17 16 16 Entertainment 25 25 24 25 18 20 18 21 17 17 20 16

Top Trends Politics Science/Tech Sports Entertainment

Figure 13: Trends 1 – Cumulative New Trends

On average, Top Trends had the lowest Jaccard similarity average for all intervals. Figure 14 shows Top Trend’s Jaccard similarity at 5-minute interval was about the same for the other topics at higher intervals. Additionally, Top Trend’s Jaccard similarity average for 5-minute intervals had the biggest difference between the minimum (0.182) and maximum (1) compared to the other four topics: Politics minimum (0.667) and maximum (1), Science/Tech minimum (0.538) and maximum (1), Sports minimum (0.667) and maximum (1), and Entertainment minimum (0.538) and maximum (1) as shown in Appendix 1.

20

Jaccard Similarity Average - Trend 1 3/7/18 - 3/8/18 | 4:30PM 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 0.58 0.36 0.47 0.31 0.44 0.26 0.44 0.20 0.29 0.21 0.31 0.22 Politics 0.87 0.81 0.74 0.69 0.69 0.60 0.61 0.54 0.46 0.49 0.42 0.42 Science/Tech 0.89 0.81 0.80 0.75 0.72 0.68 0.72 0.68 0.58 0.56 0.54 0.39 Sports 0.87 0.81 0.74 0.75 0.73 0.65 0.65 0.65 0.58 0.60 0.58 0.54 Entertainment 0.85 0.77 0.73 0.68 0.73 0.65 0.67 0.54 0.58 0.54 0.44 0.54

Top Trends Politics Science/Tech Sports Entertainment

Figure 14: Trends 1 – Jaccard Similarity Average The second dataset was used to investigate how often news trends are changing. Figure 15 showed that for one-minute intervals, Top Trends had 74 new cumulative trends, Politics 41, Science/Tech 33, Sports 39, and Entertainment 42 in 24 hours. In Figure 16, Top Trends had 73 new cumulative trends for 5 and 10-minute intervals, Politics 41, Science/Tech 33 and 22, Sports 38 and 37, and Entertainment 39 and 38, respectively. The biggest difference is between 10 to 15-minute intervals for Top Trends by a difference of 3, 30 to 35-minute intervals for Politics by a difference of 3, 40 to 45-minute intervals for Science/Tech by a difference of 2, 15 and 20- minute intervals for Sports by a difference of 2, and Entertainment has consistently a difference of one every interval. This dataset had much higher number of cumulative news trends than Trends 1 even though they were both collected from Wednesday to Thursday.

Type Interval New Trends Top Trends 1 74 Politics 1 41 Science/Tech 1 33 Sports 1 39 Entertainment 1 42

Figure 15: Trends 2 – Cumulative New Trends: One Minute Intervals

21

Cumulative New Trends - Trends 2 3/14/18 - 3/15/18 | 10:30PM 80 70 60 50 40 30 20 10 0 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 73 73 70 68 65 65 63 60 60 58 57 58 Politics 41 41 40 39 40 40 37 37 36 37 36 37 Science/Tech 33 32 33 32 33 32 33 32 30 30 30 30 Sports 38 37 37 35 37 35 33 35 34 35 33 34 Entertainment 39 38 38 38 37 37 38 37 36 37 35 36

Top Trends Politics Science/Tech Sports Entertainment

Figure 16: Trends 2 – Cumulative New Trends: 5 to 60-Minute Intervals

For one-minute intervals, Entertainment had the lowest average Jaccard Similarity at 0.800 followed by Top Trends at 0.803, as shown in Figure 17. The other three topics were between 0.93 and 0.97. For 5 to 60-minute intervals, the results showed Top Trends had the lowest Jaccard similarity on average followed up Entertainment, and an unclear ranking of the other three topics, as shown in Figure 18. In the analyzed data table in Appendix 1, it showed that Entertainment’s Jaccard Similarity is, on average, 0.10 lower than Top Trends.

Jaccard Similarity Average - Trend 2 3/14/18 - 3/15/18 | 10:30PM 1.2 0.969030969 1 0.937562438 0.926469364 0.803088578 0.800491175 0.8

0.6

0.4

0.2

0 Top Trends Politics Science/Tech Sports Entertainment

Figure 17: Trends 2 – Jaccard Similarity Average: One-Minute Intervals

22

Jaccard Similarity Average - Trend 2 3/14/18 - 3/15/18 | 10:30PM 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 0.77 0.68 0.62 0.57 0.54 0.55 0.50 0.48 0.48 0.48 0.56 0.38 Politics 0.93 0.90 0.89 0.87 0.85 0.83 0.78 0.78 0.77 0.78 0.75 0.73 Science/Tech 0.93 0.90 0.88 0.88 0.82 0.84 0.80 0.80 0.83 0.77 0.82 0.79 Sports 0.94 0.92 0.86 0.88 0.85 0.82 0.88 0.77 0.86 0.79 0.83 0.74 Entertainment 0.81 0.76 0.62 0.69 0.65 0.65 0.71 0.57 0.64 0.48 0.76 0.60

Top Trends Politics Science/Tech Sports Entertainment

Figure 18: Trend 2 - Jaccard Similarity Average: 5 to 60-Minute Intervals Dataset Trends 3 was used to investigate whether there were different levels of news activity on the weekends verses the weekday. As shown in Figure 19, the most apparent result was that Science/Tech only had six new cumulative news trends for all intervals. In other words, there were only six news trends for the 24-hour interval. Top Trends had the highest cumulative news trends for every interval, then Sports, then Politics then Entertainment. Between 5 and 60- minute intervals, Politics, Science/Tech, and Entertainment only had differences of 4 or 5 cumulative news trends. Top Trends had 17 cumulative new trends.

Cumulative New Trends - Trend 3 4/21/18 - 4/22/18 | 6:00PM 50 45 40 35 30 25 20 15 10 5 0 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 46 44 40 42 36 36 35 34 34 31 31 29 Politics 25 25 24 25 23 24 25 24 22 23 23 20 Science/Tech 6 6 6 6 6 6 6 6 6 6 6 6 Sports 29 29 27 28 26 27 27 26 26 23 25 24 Entertainment 24 24 23 24 23 22 23 24 21 21 22 20

Top Trends Politics Science/Tech Sports Entertainment

Figure 19: Trend 3 – Cumulative New Trends: 5 to 60-Minute Intervals

23

Figure 20 shows that Top Trends’ Jaccard Similarity was lower than the other four topics while Politics, Sports, and Entertainment were relatively close to each other. Science/Tech consistently had a higher Jaccard Similarity on average than other topics. There is a downward trend of the Jaccard Similarity as the time interval increases. At 50-minute, Politics has a sharp decrease but returns to the usual trend at 55-minute intervals. The analyzed data for all intervals are presented in Appendix 1.

Jaccard Similarity Average - Trend 3 4/21/18 - 4/22/18 | 6:00PM 1.20

1.00

0.80

0.60

0.40

0.20

0.00 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 0.83 0.73 0.70 0.61 0.57 0.52 0.53 0.46 0.40 0.35 0.37 0.34 Politics 0.92 0.86 0.82 0.78 0.72 0.73 0.68 0.67 0.66 0.52 0.59 0.61 Science/Tech 0.97 0.94 0.90 0.89 0.84 0.81 0.80 0.77 0.74 0.73 0.68 0.69 Sports 0.91 0.83 0.82 0.76 0.72 0.67 0.65 0.62 0.58 0.59 0.54 0.47 Entertainment 0.93 0.88 0.84 0.81 0.76 0.76 0.73 0.67 0.67 0.65 0.61 0.61

Top Trends Politics Science/Tech Sports Entertainment

Figure 20: Trend 3 - Jaccard Similarity Average: 5 to 60-Minute Intervals Dataset Trends 4 was used to investigate whether different weekdays have different news trends behavior. Figure 21 shows that Top Trends had the highest cumulative new trends followed by Sports, then an unclear ranking of Politics, Science/Tech, and Entertainment for most intervals, excluding 30, 40, 55, and 60-minute. Top Trends had a difference of 28 cumulative new trends between 5 and 60-minute intervals, Sports 16, Politics, 13, Science 10, and Entertainment 11.

24

Cumulative New Trends - Trends 4 4/26/18 - 4/27/18 | 1:00PM 70 60 50 40 30 20 10 0 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 60 55 51 47 43 40 36 39 39 31 32 32 Politics 37 36 34 32 32 28 27 28 27 23 25 24 Science/Tech 35 35 29 30 31 29 27 26 25 26 24 25 Sports 49 47 46 45 43 42 36 42 38 28 37 33 Entertainment 35 34 31 31 28 30 27 27 28 23 27 24

Top Trends Politics Science/Tech Sports Entertainment

Figure 21: Trend 4 – Cumulative New Trends: 5 to 60-Minute Intervals Figure 22 shows Top Trends and Sports had the lowest Jaccard Similarity averages while the other three topics had higher but similar Jaccard Similarity with each other. On average, as the intervals increase, Jaccard Similarity decreases. There is a sharp increase at 15-minute interval for Science/Tech, Entertainment, and Sports and a sharp decrease at 25-minute for Science/Tech and Politics. For most sharp changes, the trend returns to the usual at the next interval.

Jaccard Similarity Average - Trend 4 4/26/18 - 4/27/18 | 1:00PM 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 0.79 0.67 0.58 0.54 0.48 0.44 0.46 0.37 0.32 0.35 0.33 0.24 Politics 0.85 0.74 0.74 0.69 0.51 0.64 0.59 0.55 0.48 0.42 0.49 0.45 Science/Tech 0.82 0.67 0.78 0.60 0.46 0.61 0.63 0.61 0.59 0.35 0.57 0.44 Sports 0.75 0.57 0.62 0.50 0.46 0.42 0.48 0.40 0.39 0.36 0.31 0.29 Entertainment 0.80 0.63 0.71 0.58 0.59 0.51 0.50 0.59 0.55 0.45 0.50 0.48

Top Trends Politics Science/Tech Sports Entertainment

25

Figure 22: Trend 4 Jaccard Similarity Averages: 5 to 60-Minute Intervals Dataset Trends 5 was compared with Trends 4 to investigate whether different weekdays have different news trend behavior and with Trends 2 to investigate whether there is consistency in news trends’ behavior in daytime and nighttime. Figure 23 shows that for every interval, Top Trends had the highest cumulative new trends followed by Politics, and a tie between Science/Tech and Entertainment, and then Sports. Top Trends had the biggest different between 5 and 60-minute intervals at 49, then Politics at 22, then Science/Tech and Entertainment at 12, then Sports at 8.

Cumulative New Trends - Trends 5 5/15/18 - 5/16/18 | 10:00AM 90 80 70 60 50 40 30 20 10 0 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 83 70 64 55 52 50 47 40 41 42 35 34 Politics 53 49 45 43 40 40 39 36 37 37 31 31 Science/Tech 43 41 40 38 38 37 36 33 35 37 32 31 Sports 34 32 33 30 32 31 29 27 28 30 25 26 Entertainment 42 42 41 40 41 40 37 34 37 36 31 30

Top Trends Politics Science/Tech Sports Entertainment

Figure 23: Trend 5 – Cumulative New Trends: 5 to 60-Minute Intervals Figure 24 shows that Science/Tech had the highest Jaccard similarity for all intervals. Top Trends had the lowest Jaccard Similarity most of the time, excluding at 20, 25, and 50- minute intervals when Entertainment had the lowest. There is a sharp increase at 15-minute and a sharp decrease at 50-minute for Sports and Entertainment. The other three topics did not have any sharp increases or decreases. In general, as the interval increases, Jaccard similarity decreases for all topics.

26

Jaccard Similarity Average - Trends 5 5/15/18 - 5/16/18 | 10:00AM 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 5 10 15 20 25 30 35 40 45 50 55 60 Top Trends 0.62 0.49 0.39 0.34 0.30 0.26 0.18 0.24 0.15 0.13 0.17 0.21 Politics 0.76 0.66 0.58 0.48 0.45 0.42 0.37 0.30 0.25 0.25 0.29 0.30 Science/Tech 0.79 0.71 0.68 0.61 0.52 0.52 0.45 0.46 0.37 0.30 0.33 0.41 Sports 0.63 0.53 0.65 0.41 0.31 0.48 0.42 0.35 0.32 0.10 0.25 0.25 Entertainment 0.66 0.55 0.63 0.32 0.23 0.40 0.37 0.29 0.23 0.04 0.20 0.29

Top Trends Politics Science/Tech Sports Entertainment

Figure 24: Trend 5 – Jaccard Similarity Average: 5 to 60-Minute Intervals In the five datasets, there are some similar behavior. Top trends tend to have the highest cumulative new trends and the lowest Jaccard similarity averages for all intervals in all datasets. As expected, cumulative new trends and Jaccard similarity averages decreased as the interval increased for the five datasets. Trends 1 and 2 showed that behavior of the same days of the week but different weeks do not have similar trends behavior. There were more cumulative new trends on Wednesday and Thursday, March 7-8 than on March 14-15 and the Jaccard similarity averages were different as well. Trends 2 showed there was roughly double the amount of cumulative new trends for 1-minute intervals than 5-minute intervals. As expected, the Jaccard similarity averages were higher for 1-minute intervals than 5-minute intervals. Trends 3 showed that cumulative new trends were lower on the weekends than on the weekdays and Jaccard similarity was slightly higher on the weekdays than weekends. Trends 2 and 5 showed that news trends’ behavior were more active at daytime than nighttime. Daytime had slightly more cumulative new trends but significantly lower Jaccard similarity averages than nighttime. Trends 4 and 5 showed that different weekdays do not have the same news trend behavior. There were different numbers of cumulative new trends overall and different topics had different cumulative new trends compared to each other. (2) Trends and Tabs There was one dataset analyzed to answer the three questions listed in the methodology section. Figure 25 shows Top Trends had the highest total unique news articles, then Politics, then Science and Technology, then Entertainment, and then Sports. Overall, there were 6601 total unique news articles in all five topics, majority of them from Top Trends and Politics.

27

Total Unique News Articles

Overall 6601

Entertainment 493

Sports 459

Science and Technology 509

Politics 2242

Top Trends 3109

0 1000 2000 3000 4000 5000 6000 7000

Figure 25: Trends and Tabs 1 - Total Unique News Articles Figure 26 shows the top 15 news sources that had the highest exposure overall. MSN had the highest exposure with 356 articles and the number of articles between the first and the second had the largest difference amongst the top 15. The rest of the ranking can be found in Figure 26.

Ranking of News Source Exposure Overall News Source # of Articles MSN 356 CNN 272 202 Fox News 199 198 Reuters 190 USA TODAY 163 CBS News 137 NBC News 127 HuffPost 126 Washington Post 124 BBC News 106 Business Insider 104 Yahoo 104 People 103 Figure 26: Trends and Tabs 1 - Ranking of News Sources Overall Figure 27 shows the top 15 ranking of news sources Facebook published by topic. MSN was ranked 1 for Top Trends, Politics, Science/Tech, and 2 for Sports, and 3 for Entertainment. There are more recurring news sources between different topics such as CNN, but there are also news sources that only occur in one topic, such as NPR. The list of all news sources Facebook published on its “Trending” section can be found at Appendix 2.

Rank Top Trends Politics Science/Tech Sports Entertainment 1 The New MSN 163 MSN 124 MSN 23 York Times 24 ESPN 27 2 CNN 121 CNN 122 CNN 21 MSN 24 USA TODAY 25 3 Fox News 108 The Hill 115 Reuters 21 TechCrunch 20 MSN 22 4 The New Bleacher York Times 104 Reuters 88 ABC News 13 Engadget 16 Report 21

28

5 USA Al Jazeera TODAY 80 CBS News 75 English 13 ESPN 14 New York Post 12 6 People 75 Fox News 68 Fox News 12 Reuters 13 CBS Sports 11 7 The New York Business HuffPost 73 Times 56 11 Insider 10 E! News 11 8 Washington Washington Washington BBC News 71 Post 55 Post 11 Post 10 BBC News 10 9 The Reuters 68 NBC News 54 NBC News 10 Guardian 10 Daily Mail 8 10 Business Bleacher The Hill 68 48 Insider 9 Report 8 Fox News 8 11 USA NBC News 56 USA TODAY 46 NPR 9 TODAY 8 Bloomberg 8 12 Washington Yahoo 53 ABC News 44 The Hill 9 CBS Sports 8 Post 8 13 Deadline CBS News 52 Yahoo 42 TechCrunch 8 The Hill 8 Hollywood 8 14 The Yahoo CBS Sports 50 HuffPost 39 Independent 8 Sports 7 .com 8 15 Business The CNBC 48 Insider 37 The Verge 8 CNNMoney 7 Independent 7 Figure 27: Trends and Tabs 1 - Ranking of News Sources per Topic In conclusion, there was a large amount of news sources Facebook published on its “Trending” section, however, Facebook gave more news source exposure to certain news sources than others. This suggested there may have been some favoritism for certain news sources. There was 6601 total unique news article Facebook published on its “Trending” page in 26-hours. (3) Geo-Location There were two datasets analyzed to answer the question listed in the methodology section. Figure 28 shows that the only difference between trends in Chicago and Northern California was the quantity of news trends in Science/Tech, Sports, and Entertainment. However, in all topics, there were the same number of unique trends in both locations. This meant that every news that was exposed in Northern California was also exposed in Chicago, but at lower frequency. Overall, there were 7917 news trends in Northern California and 7943 in Chicago but only 129 unique news trends. Thus, on average, a Trending Topic was exposed for roughly 610 minutes or 10.16 hours.

Total Trends Total Trends Total Unique Total Unique Total Similar Unique Trends Topic in IL in CA Trends in IL Trends in CA Between IL and CA Top Trends 1930 1930 52 52 52 Politics 1852 1852 33 33 33 Science and Technology 852 858 15 15 15 Sports 1640 1650 41 41 41 Entertainment 1643 1653 32 32 32 Total: 173 Overall 7917 7943 129 129 129 Figure 28: Proxy 1 – Total Trends and Similarity by Topic (4) Personal vs Puppet There were two datasets analyzed to answer the question listed in the methodology section. Figure 29 shows that there is a slight difference in the unique news trends between a personal

29

and puppet account. There were 95 unique trends for both accounts for Top Trends, but there was one Trending Topic that appeared on the personal and not the puppet account. Thus, there was only 93 similar unique trends. News trend “Lewis Hamilton” appeared in Top Trends for the personal account once but did not appear in the puppet account. Lewis Hamilton news appeared in Sports for both accounts. This means, this topic was promoted from Sports to Top Trends for the personal account. News trend “Peru Two” appeared in the personal account once but not in the puppet account for any of the topics.

Total Total Trends Trends in Total Unique Total Unique Total Similar Unique Trends Topic in Personal Puppet Trends in Person Trends in Puppet Between Personal and Puppet Top Trends 2559 2559 95 95 93 Politics 2507 2507 52 52 52 Science and Technology 1854 1858 30 30 30 Sports 1937 1938 40 40 40 Entertainmen t 2260 2249 42 42 42 Figure 29: Personal vs Puppet 1 - Total Trends and Similarity by Topic

V. Discussion This paper investigated four categories: how often Facebook Trends change, if Facebook personalizes its Trending Topics by demographic and geo-location, and if Facebook gave more news exposure to certain news sources. (1) Trends I discovered that there was not an exact number on how often Facebook Trends update, which aligned with my hypothesis. The highest number of new cumulative trends in 5-minute intervals was 83 for Top Trends from Tuesday to Wednesday 10am (Trends 5) while the lowest was 6 for Science/Tech from Saturday to Sunday 6pm (Trends 3). Even the lowest number for Top Trend’s new cumulative trends was 43 (Trends 1), which was almost half of 83. However, Trends 1 had the lowest Jaccard Similarities, which meant Trending Topics were changing often but between news that have already been published before. This suggested that Facebook’s News Trends truly updated news according to how often news update in the world. I believe it was no coincidence there was a high amount of new cumulative trends for Trends 5 because it was only a couple of days before the Royal Wedding of Prince Harry and Meghan Markle. In fact, Trends 5’s Politics and Entertainment new cumulative trends were the highest compared to the other datasets. However, Science/Tech and Sports were not. Again, it was probably no coincidence since a royal wedding would fall under the Politics and/or Entertainment topics. Topics 2 showed Facebook Trending’s behavior was similar between 1-minute intervals than 5-minute intervals. This suggested that Trending news were not changing as frequently as one-minute. In fact, it seemed 10-minute intervals was plenty to gather most of the new cumulative trends and the Jaccard Similarity had its first large difference between 5 and 10-minute intervals. Top Trends had the most updates for a large majority of the intervals for all five datasets. This makes sense since Top Trends can consist of the highest trending news from the other four topics, which makes it more competitive for news to hold its stance.

30

The next question investigated whether different days and times of the week had different Trending behavior. Trends 1 and 2 were both collected between Wednesday to Thursday and were compared to investigate if same weekdays, but different weeks have different Trending behavior. The data showed that even on the same weekdays, Facebook’s Trending behavior differs. This aligns with the finding of how often Facebook Trends update. I hypothesize that when Trends 2 was collected, there were more news produced that week than when Trends 1 was collected. Trends 3, 2, and 1 showed that there was difference in Trending behavior between the weekend and weekday. However, there is no clear trend on behavior because on average, Trends 1 had lower new cumulative trends than Trends 3, but Trends 2 had higher new cumulative trends. The unexpected behavior was from Trends 3 where Science/Tech only had size new cumulative trends for all intervals. This makes sense because most Science/Tech news are from big technology companies which are usually closed on weekend. Furthermore, Trends 2 and 3’s average Jaccard Similarity were more similar than between Trends 1 and 2. Again, this aligned with the two findings from above and supports my hypothesis that Facebook Trending truly follows the news industry. Lastly, Trends 4 and 5 were investigated to compare if different weekdays have different behavior. On average, Tuesday to Wednesday had higher number of new cumulative trends than Thursday to Friday, except for Sports. Additionally, Thursday to Friday had higher Jaccard Similarity than Tuesday to Wednesday. Thus, it’s clear there were more news activity on Facebook from Thursday to Friday. However, it was unclear whether this result was a norm because I discovered that same weekdays on different weeks have different trend behaviors. It could be that for the week Trends 4 and 5 were collected, the news industry was had more news one week than the other. On the other hand, if this trend is a norm, it suggests that there was more activity on Thursday and Friday than earlier in the week, which may be because more people are more activity on social media near the end of the weekday than the beginning. The findings from (1) Trending suggest that there may be a relationship between how often Facebook’s Trending news update and the overall news industry updates. This suggestion aligned with Groshek’s paper which stated Facebook’s news agenda and traditional news agendas have strong similarities35. (2) Trends and Tabs This section investigated if Facebook gave more news exposure to certain news sources and how many news articles Facebook exposed on average. Data showed that MSN had the highest news exposure by 84 more than the 2nd highest, CNN which had 70 more than the 3rd highest, The Hill. However, after the 3rd highest, the rest of the list did not have as significant gap between the ranking. In fact, the 10th had about 1/3 the number of articles than the 1st. It’s interesting to note that CNN and Fox News are liberal and conservative leaning news outlets, respectively, and even though CNN was ranked 2nd and Fox News was 4th, there was a gap of 73 articles. This meant every hour there were 2.8 more CNN articles than Fox News. Furthermore, CNN, The New York Times, USA Today, CBS News, NBC News, HuffPost, Washington Post, BBC News, and Yahoo have more consistently liberal audience, according to Figure 30, and they ranked within the top 15 news sources with the highest number of articles posted on Facebook

35 Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013

31

Trending36. While only Fox News ranked within the top 15. In other words, 9 liberal news sources and 1 conservative new source ranked within the top 15 most number of articles published on Facebook Trending. Out of the news sources in Figure 30, only three conservative news outlets were published on Facebook Trending in the 26-hours at least once while at least 19 liberal news sources were published. In fact, the number was higher than 19 because, for example, Yahoo had Yahoo Sports, Yahoo Finance, Yahoo News, and Yahoo Canada, which are different news outlets under the same parent company. This aligned with my hypothesis that Facebook gave more news exposure to certain news sources, more specifically liberal news sources.

Figure 30: Liberal and Conservative Leaning News Source Metric (3) Geo-location This category investigated whether Facebook Trending news differed depending on geographic location. The results from this category was shocking because there was no difference in the type of Trending Topics Facebook posts between Chicago, IL and Northern California. This showed that Facebook did not personalize its news trends based on location of the user. However, different Trending Topics were presented at different times, but all Trending Topics were exposed at some point in the data collection period for both locations. There are different local news stations and local news in Northern California than in Chicago, IL. It was highly unlikely trending local news in Chicago would become a trending local news in Northern California and vice versa, or else it defeats the purpose of local news. In the beginning of the year, Mark Zuckerberg announced Facebook would try to increase exposure of local news on their News Feed37. It seems Facebook plans to only apply that for News Feed and not Trending Topics, at least as of now. (4) Personal vs Puppet

36 Engel, Pamela. “Here's How Liberal Or Conservative Major News Sources Really Are.”Business Insider, Business Insider, 21 Oct. 2014, www.businessinsider.com/what-your-preferred-news-outlet-says-about-your- political-ideology-2014-10. 37 Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.”

32

This category investigated whether Facebook Trending news differed between different demographics. The results showed it did, but to an insignificant degree. There was only one news trend that showed up on a personal account but not the puppet account. A possible explanation for the minor difference between the puppet and personal account was that there was a 17 second lag between the data collection of the two accounts. This meant the Trending Topic that appeared on the personal account could have appeared for the 17 second grace period and disappeared before the next scrape. The news trend that did show up on the personal account was about Peru Two drug smuggling and how one of the smugglers gave birth to twins. In theory, a younger female has a higher chance on clicking on this news than a middle-aged male because many celebrity media companies (i.e., People) that report on celebrity pregnancy and babies have a readership of 70% women38. This study with the results from Chakraborty, Messias, and Benevenuto’s study showed that even though certain demographics had a stronger influence on what ended up on Trending Topics, the news trends that get placed on Trending Topics were presented to all demographics with very little personalization. This means topics certain demographic find interesting or meaningful may not be meaningful to other demographics. Furthermore, this meant that under-represented demographics on Facebook are forced to view news that over-represented demographics boost. This raises concern because news presented on Trending could favor one demographic over another. Limitation There were a few limitations in the methodology. Facebook has been making efforts to prevent scraping of its data. Though I hacked around to get the data I collected for this study, I had to scale back on the amount of data I originally wanted to collect. Initially, I planned to collect a week’s worth of data for each category. Some of the limitations with Facebook included the following: Issue with multiple Facebook logins, Account safety, Trends that do not have full information, Fake accounts. Additionally, I planned to collect data with Amazon Mechanical Turk (MTurk), a crowd sourcing tool, to gather Facebook Trending data from other personal accounts from specific demographics and/or geo-location. However, after the Cambridge Analytica data scandal, my adviser and I decided it would be best to stay away from MTurk data crowd sourcing because the data given to Cambridge Analytica was collected through MTurk. Facebook detected when an account was logged in multiple times in short periods of time. This raised an issue for my data collection because when debugging my data scraping code, I had to log into Facebook multiple times in short periods of time. From learning from my study, I believe when an account was logged in multiple times, Facebook no longer allowed the HTML elements to be recognized so the login would fail. Facebook tracks where a user’s account logs in from and when there is “unusual behavior” in an account. When Facebook notices unusual activity, they log the account out of all devices and ask the user to change their password. This was an issue when I was collecting data on the cloud on my personal account. The cloud was in Columbus, Ohio which is a location I have never logged into Facebook from. When I tried to log into my account from the cloud, Facebook thought someone was hacking my account and forced me to change my password. As a result, I was not able to collect data for (4) Personal vs Puppet on my personal account from the cloud. Furthermore, when I was scraping the data too fast, Facebook labeled this as “unusual

38 “35 Eye Opening People Magazine Demographics.” BrandonGaille.com, 14 Jan. 2017, brandongaille.com/35- eye-opening-people-magazine-demographics/.

33

behavior” too. The results were similar, Facebook logged out of all devices and asked me to reset my password. As a result, (2) Trends and Tabs data could only be collected at 30-minute intervals. Sometimes, Facebook’s Trending Topics would be missing information such as a pop-up to the news article links or description of the topic. I noticed in my data collection that there was no consistency in this random behavior. Thus, I hypothesized that Facebook purposely added inconsistently throughout the Trending section to catch automatic scraping of its data. This caused issues in data collection because once one of Facebook’s random tests caught my script, my script would terminate and no longer collect data for that run. This limited how long I could collect data for consistency. Issues with scraping data from Facebook was expected, especially after the Cambridge Analytica data scandal. However, there was added complications when Facebook announced they were taking down fake accounts. Mark Zuckerberg posted about how Facebook has been making effort to shut down fake accounts on May 15, 2018, as shown in Figure 31. I only ran into issue with the puppet account once, but I did notice that when Facebook detected “unusual” behavior” in my personal account, it only asked me to change my password. When it noticed “unusual behavior” in the puppet account, it asked me to verify my phone number by typing in a code it sent to the phone. In other words, Facebook was checking to make sure that account was a true account with a real phone number instead of a fake bot account.

Figure 31: Zuckerberg’s Post There was a limitation with the AWS cloud server. I experienced a bug where Chromedriver cannot be reached. This error occurred about 2.5 hours after a script was launched on the cloud. Once the error occurred, the script would no longer scrape data because the UI was

34

no longer reachable. My adviser and I did the best of our ability to try to fix the issue. Memory was not an issue as there was consistent free memory available when the script first started until when the error occurred. I tried closing and relaunching the chrome driver every interval but that caused red flags with Facebook. I added no-sandbox into my script, as many internet sources recommended but that did not fix my issue. All software and packages were updated to the latest version. Overall, this limited my ability to collect data for a long, continuous amount of time. Further Research One of the insights of my study concluded that there may be a correlation on news industry activity and Facebook Trending Topics. A future research could compare the news activity in the news industry, both traditional and online, and Facebook Trending news to investigate whether Facebook truly follows the new industry patterns. The results of this suggested study would show whether Facebook filters certain news. If the news activity did not closely follow the news industry’s activity that means Facebook presents bias in their Trending by selecting specific type of news topics. Another suggestion is to collect Facebook Trending data from different countries and study if different countries have different Trending Topics. This study discovered that different US cities do not have different Trending Topics, but the suggestion enlarges the scale internationally.

VI. Conclusion This study investigated whether Facebook personalizes its Trending Topics by demographic and geo-location. It further investigated how often Facebook trends change and if Facebook gives more news exposure to certain news sources. In the data collect and analysis, I find that Facebook does not personalize by geo-location and only slightly personalize by demographic. Furthermore, my results show that Facebook gives more news exposure to liberal news sources than conservatives. Lastly, the analysis showed Facebook Trending Topics update irregularly. This leads to a hypothesize that Facebook follows the news industry and publishes news when the news industry publishes news. This is significant because Facebook has claimed they do not include any bias or filter their Trending News, but my results show otherwise. Furthermore, only a small subset of Facebook’s demographic influence what goes on the Trending, but my analysis show there is very little personalization by demographic. This raises concern because under-represented demographics have Trending News that do not attractive their interest or even worse, skew their views to a different demographic’s views. Lastly, even though Facebook claims they do not present any bias in their news, my results show it gives more news exposure to liberal news source than conservatives.

35

Reference

Alvarado, Oscar, and Annika Waern. “Towards Algorithmic Experience.” Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems - CHI '18, 2018, doi:10.1145/3173574.3173860. Brown, Pete. “Facebook Struggles to Promote 'Meaningful Interactions' for Local Publishers, Data Shows.” Columbia Journalism Review, 18 Apr. 2018, www.cjr.org/tow_center/facebook-local- news.php. “Browser Statistics.” W3Schools Online Web Tutorials, www.w3schools.com/browsers/default.asp. Chakraborty, et al. Who Makes Trends? Understanding Demographic Biases in Crowdsourced Recommendations. 1 Apr. 2017, arxiv.org/abs/1704.00139. Cvijikj, Irena Pletikosa, and Florian Michahelles. “Monitoring Trends on Facebook.” 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, 2011, doi:10.1109/dasc.2011.150. Diakopoulos, Nicholas. “Algorithmic Accountability.” Digital Journalism, vol. 3, no. 3, 2014, pp. 398– 415., doi:10.1080/21670811.2014.976411. Engel, Pamela. “Here's How Liberal Or Conservative Major News Sources Really Are.” Business Insider, Business Insider, 21 Oct. 2014, www.businessinsider.com/what-your-preferred-news- outlet-says-about-your-political-ideology-2014-10. Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013, doi:10.12924/mac2013.01010015. Groshek, Jacob, and Megan Clough Groshek. “Agenda Trending: Reciprocity and the Predictive Capacity of Social Networking Sites in Intermedia Agenda Setting across Topics over Time.” 2013, doi:10.12924/mac2013.01010015. Göös, Christine. “Blog.” Facebook Advertising Trends 2018, 15 Feb. 2018, www.smartly.io/blog/facebook-advertising-trends-2018. “Jaccard Index / Similarity Coefficient.” Statistics How To, www.statisticshowto.com/jaccard-index/. Kazai, Gabriella, et al. “Personalised News and Blog Recommendations Based on User Location, Facebook and Twitter User Profiling.” Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '16, 2016, doi:10.1145/2911451.2911464. “Linux Nohup Command Help and Examples.” Computer Hope, 1 Apr. 2018, www.computerhope.com/unix/unohup.htm. “Nearly Half of U.S. Adults Get News on Facebook, Pew Says.” Nieman Lab, www.niemanlab.org/2016/05/pew-report-44-percent-of-u-s-adults-get-news-on-facebook/. Nunez, Michael. “Former Facebook Workers: We Routinely Suppressed Conservative News.” Gizmodo, Gizmodo.com, 10 May 2016, gizmodo.com/former-facebook-workers-we- routinely-suppressed-conser-1775461006. Ohlheiser, Abby. “Three Days after Removing Human Editors, Facebook Is Already Trending Fake News.” The Washington Post, WP Company, 29 Aug. 2016, www.washingtonpost.com/news/the-intersect/wp/2016/08/29/a-fake-headline-about-megyn- kelly-was-trending-on-facebook/?noredirect=on&utm_term=.2d050b7762f3. “Pipenv & Virtual Environments¶.” Freezing Your Code - The Hitchhiker's Guide to Python, docs.python-guide.org/en/latest/dev/virtualenvs/. “PyPI – the Python Package Index.” PyPI, pypi.org/.

36

“'#Republic' Author Describes How Social Media Hurts Democracy.” NPR, NPR, 20 Feb. 2017, www.npr.org/2017/02/20/516292286/-republic-author-describes-how-social-media-hurts- democracy. Riley, Charles. “Cambridge Analytica, Facebook and Your Data: Here's What to Know.” CNNMoney, Cable News Network, 20 Mar. 2018, money.cnn.com/2018/03/19/technology/facebook-data- scandal-explainer/index.html?iid=EL. Rongala, Arvind. “Benefits of Python over Other Programming Languages.” Invensis Blog, 6 Apr. 2018, www.invensis.net/blog/it/benefits-of-python-over-other-programming-languages/. Rosén, Josefin. “What Every Business Manager Should Know about Algorithm Audits.” SAS Learning Post, 16 Oct. 2017, blogs.sas.com/content/hiddeninsights/2017/10/16/algorithm-audits/. Tuesday, For five hours on. “Your Facebook Data Scandal Questions Answered.” CNNMoney, Cable News Network, 11 Apr. 2018, money.cnn.com/2018/04/11/technology/facebook-questions-data- privacy/index.html. “Welcome to Python.org.” Python.org, www.python.org/about/. “What Is a Central Processing Unit (CPU)? - Definition from Techopedia.” Techopedia.com, www.techopedia.com/definition/2851/central-processing-unit-cpu. “What Is a Proxy Server and Should You Risk Using One?” WhatIsMyIPAddress.com, whatismyipaddress.com/proxy-server. “What Is AWS? - Amazon Web Services.” Amazon, Amazon, aws.amazon.com/what-is-aws/. “What Is Query? - Definition from WhatIs.com.” SearchSQLServer, searchsqlserver.techtarget.com/definition/query. www.facebook.com/tstocky/posts/10100853082337958.

37

Appendix Appendix 1: Data from Trends Trend 1: Jaccard Similarity – Wednesday, 3/7/18 to Thursday, 3/8/18

Type Interval Average Min Max Standard Deviation Top Trends 5 0.5839342 0.181818 1 0.764155876 Top Trends 10 0.36485084 0.083333 0.666667 0.604028837 Top Trends 15 0.46557987 0.181818 0.818182 0.682334137 Top Trends 20 0.30984848 0.083333 1 0.556640355 Top Trends 25 0.43504274 0.266667 0.666667 0.659577694 Top Trends 30 0.26260823 0.181818 0.428571 0.512453144 Top Trends 35 0.43818681 0.357143 0.538462 0.661956806 Top Trends 40 0.19772727 0 0.5 0.444665349 Top Trends 45 0.28869048 0.1875 0.428571 0.537299243 Top Trends 50 0.21010101 0.181818 0.266667 0.458367767 Top Trends 55 0.31111111 0.266667 0.333333 0.557773351 Top Trends 60 0.22424242 0.181818 0.266667 0.473542421 Politics 5 0.86720143 0.666667 1 0.931236504 Politics 10 0.80554739 0.428571 1 0.897522921 Politics 15 0.73529501 0.428571 1 0.857493445 Politics 20 0.68851981 0.333333 1 0.82977094 Politics 25 0.69191919 0.333333 1 0.831816802 Politics 30 0.604662 0.333333 0.818182 0.777600157 Politics 35 0.60714286 0.333333 1 0.779193722 Politics 40 0.53627622 0.25 0.818182 0.732308831 Politics 45 0.46053293 0.176471 0.666667 0.678625767 Politics 50 0.48504274 0.25 0.666667 0.696450095 Politics 55 0.41666667 0.25 0.666667 0.645497224 Politics 60 0.42156863 0.176471 0.666667 0.649283164 Science/Tech 5 0.88660359 0.538462 1 0.9415963 Science/Tech 10 0.80707528 0.538462 1 0.898373685 Science/Tech 15 0.80313626 0.538462 1 0.896178697 Science/Tech 20 0.75203963 0.538462 1 0.867202183 Science/Tech 25 0.72105672 0.538462 0.818182 0.849150588 Science/Tech 30 0.67599068 0.538462 0.818182 0.822186521 Science/Tech 35 0.71794872 0.538462 1 0.847318546 Science/Tech 40 0.67832168 0.538462 0.818182 0.823602864 Science/Tech 45 0.58119658 0.538462 0.666667 0.7623625 Science/Tech 50 0.55555556 0.333333 0.666667 0.745355992 Science/Tech 55 0.53554779 0.25 0.818182 0.731811305 Science/Tech 60 0.39423077 0.25 0.538462 0.627877989 Sports 5 0.86987522 0.666667 1 0.932671015 Sports 10 0.81461676 0.666667 1 0.90256122 Sports 15 0.7425302 0.538462 1 0.861701919 Sports 20 0.75203963 0.538462 1 0.867202183 Sports 25 0.73304473 0.428571 1 0.856180316 Sports 30 0.65401265 0.428571 0.818182 0.808710488 Sports 35 0.64502165 0.428571 0.818182 0.803132396 Sports 40 0.65084915 0.428571 0.818182 0.806752224 Sports 45 0.58119658 0.538462 0.666667 0.7623625 Sports 50 0.5950716 0.428571 0.818182 0.771408838 Sports 55 0.58119658 0.538462 0.666667 0.7623625

38

Sports 60 0.53846154 0.538462 0.538462 0.733799386 Entertainment 5 0.85451803 0.538462 1 0.924401445 Entertainment 10 0.76923077 0.538462 1 0.877058019 Entertainment 15 0.73151091 0.538462 1 0.855284113 Entertainment 20 0.67810315 0.25 1 0.823470186 Entertainment 25 0.72999223 0.538462 1 0.854395827 Entertainment 30 0.64568765 0.538462 0.818182 0.803546916 Entertainment 35 0.67249417 0.538462 0.818182 0.82005742 Entertainment 40 0.53627622 0.25 0.818182 0.732308831 Entertainment 45 0.58119658 0.538462 0.666667 0.7623625 Entertainment 50 0.54456654 0.428571 0.666667 0.737947522 Entertainment 55 0.44230769 0.25 0.538462 0.665062172 Entertainment 60 0.53846154 0.538462 0.538462 0.733799386

Trend 2: Jaccard Similarities – Wednesday, 3/14/18 to Thursday, 3/15/18

Topic Interval Avg Min Max Standard Deviation Top Trends 5 0.771027 0.181818 1 0.87808125 Top Trends 10 0.67951 0.083333 1 0.824324262 Top Trends 15 0.617197 0.083333 1 0.785618687 Top Trends 20 0.570689 0.181818 1 0.755439935 Top Trends 25 0.540294 0.083333 1 0.735046675 Top Trends 30 0.552632 0.083333 1 0.743392519 Top Trends 35 0.504391 0.083333 1 0.710204997 Top Trends 40 0.482348 0.083333 1 0.694512595 Top Trends 45 0.476563 0.083333 1 0.690335064 Top Trends 50 0.478761 0.083333 0.818182 0.691925498 Top Trends 55 0.559239 0.2 0.818182 0.74782316 Top Trends 60 0.381854 0 1 0.617943571 Politics 5 0.934812 0 1 0.966856783 Politics 10 0.903866 0 1 0.950718806 Politics 15 0.892534 0.666667 1 0.944739955 Politics 20 0.865741 0.666667 1 0.930451901 Politics 25 0.847842 0.538462 1 0.920783261 Politics 30 0.827506 0.538462 1 0.909673473 Politics 35 0.775712 0 1 0.880745192 Politics 40 0.783929 0.538462 1 0.88539767 Politics 45 0.770313 0.428571 1 0.87767478 Politics 50 0.775179 0.538462 1 0.880442414 Politics 55 0.754335 0.428571 1 0.868524846 Politics 60 0.726787 0.538462 1 0.852518095 Science/Tech 5 0.925648 0 1 0.962106145 Science/Tech 10 0.900104 0 1 0.948738369 Science/Tech 15 0.8816 0 1 0.938935301 Science/Tech 20 0.882997 0 1 0.939679005 Science/Tech 25 0.819106 0 1 0.90504485 Science/Tech 30 0.837753 0 1 0.91528822 Science/Tech 35 0.803059 0 1 0.896135442 Science/Tech 40 0.80303 0 1 0.896119581

39

Science/Tech 45 0.827433 0.538462 1 0.909633434 Science/Tech 50 0.774054 0 1 0.879803122 Science/Tech 55 0.815313 0.538462 1 0.902946783 Science/Tech 60 0.791667 0.666667 1 0.889756521 Sports 5 0.936478 0 1 0.967718029 Sports 10 0.921373 0 1 0.959881701 Sports 15 0.860981 0 1 0.927890588 Sports 20 0.877428 0 1 0.936711336 Sports 25 0.846797 0 1 0.920215671 Sports 30 0.824835 0 1 0.90820421 Sports 35 0.883052 0.538462 1 0.939708416 Sports 40 0.772339 0 1 0.878828068 Sports 45 0.864875 0.538462 1 0.929986402 Sports 50 0.79238 0 1 0.890157308 Sports 55 0.832526 0.538462 1 0.912428873 Sports 60 0.739899 0 1 0.860173814 Entertainment 5 0.812246 0 1 0.90124701 Entertainment 10 0.755799 0 1 0.869367226 Entertainment 15 0.619494 0 1 0.787079353 Entertainment 20 0.686027 0 1 0.82826743 Entertainment 25 0.652299 0 1 0.807650203 Entertainment 30 0.646465 0 1 0.804030252 Entertainment 35 0.708625 0 1 0.841798496 Entertainment 40 0.574722 0 1 0.758103934 Entertainment 45 0.638258 0 1 0.798910243 Entertainment 50 0.483321 0 1 0.695213116 Entertainment 55 0.757576 0 1 0.87038828 Entertainment 60 0.598533 0 1 0.773649411

Trend 3: Jaccard Similarity – Saturday, 4/21/18 to Sunday, 4/22/18

Topic Interval Avg Min Max Standard Deviation Top Trends 5 0.830378 0.538462 1 0.911250764 Top Trends 10 0.734254 0.428571 1 0.856886131 Top Trends 15 0.695804 0.333333 1 0.834148785 Top Trends 20 0.613648 0.428571 1 0.783357043 Top Trends 25 0.566767 0.428571 0.666667 0.752839005 Top Trends 30 0.51974 0.333333 0.666667 0.720929622 Top Trends 35 0.528846 0.333333 0.666667 0.727218092 Top Trends 40 0.462062 0.333333 0.538462 0.67975124 Top Trends 45 0.403694 0.25 0.538462 0.635368813 Top Trends 50 0.354762 0.25 0.428571 0.595618926 Top Trends 55 0.37381 0.25 0.428571 0.611399643 Top Trends 60 0.342949 0.25 0.538462 0.585618236 Politics 5 0.922619 0.666667 1 0.960530607 Politics 10 0.861472 0.666667 1 0.928155085 Politics 15 0.819477 0.538462 1 0.90524959 Politics 20 0.780886 0.538462 1 0.883677419 Politics 25 0.719856 0.538462 1 0.848443222 Politics 30 0.731121 0.428571 1 0.855055981 Politics 35 0.679196 0.538462 1 0.824133366 Politics 40 0.665668 0.333333 0.818182 0.815884591 Politics 45 0.656122 0.428571 0.818182 0.810013368

40

Politics 50 0.516484 0.428571 0.538462 0.718667876 Politics 55 0.593407 0.428571 0.666667 0.770328887 Politics 60 0.607143 0.428571 0.666667 0.779193722 Science/Tech 5 0.96875 0.666667 1 0.984250984 Science/Tech 10 0.9375 0.666667 1 0.968245837 Science/Tech 15 0.902778 0.666667 1 0.950146188 Science/Tech 20 0.886905 0.5 1 0.941756212 Science/Tech 25 0.840909 0.666667 1 0.917010955 Science/Tech 30 0.805556 0.666667 1 0.897527468 Science/Tech 35 0.802083 0.5 1 0.895591053 Science/Tech 40 0.77381 0.5 1 0.879664438 Science/Tech 45 0.736111 0.5 1 0.857969178 Science/Tech 50 0.733333 0.5 1 0.856348839 Science/Tech 55 0.683333 0.5 1 0.826639785 Science/Tech 60 0.6875 0.5 1 0.829156198 Sports 5 0.906926 0.666667 1 0.952326838 Sports 10 0.830087 0.666667 1 0.911090874 Sports 15 0.819477 0.538462 1 0.90524959 Sports 20 0.760906 0.538462 1 0.872299124 Sports 25 0.716011 0.428571 0.818182 0.846174486 Sports 30 0.674437 0.538462 0.818182 0.821240936 Sports 35 0.645646 0.428571 0.818182 0.803521014 Sports 40 0.622378 0.333333 0.818182 0.788909134 Sports 45 0.584249 0.428571 0.666667 0.76436188 Sports 50 0.589744 0.538462 0.666667 0.767947648 Sports 55 0.535714 0.25 0.666667 0.731925055 Sports 60 0.470925 0.25 0.666667 0.686239687 Entertainment 5 0.934217 0.8 1 0.966549105 Entertainment 10 0.884668 0.666667 1 0.940567972 Entertainment 15 0.841362 0.538462 1 0.917258056 Entertainment 20 0.81292 0.538462 1 0.901620992 Entertainment 25 0.763479 0.428571 1 0.873772822 Entertainment 30 0.763533 0.538462 1 0.873803618 Entertainment 35 0.726399 0.538462 0.818182 0.85229021 Entertainment 40 0.669997 0.538462 0.818182 0.818533243 Entertainment 45 0.665584 0.357143 0.818182 0.815833571 Entertainment 50 0.649351 0.428571 0.818182 0.805822964 Entertainment 55 0.609424 0.357143 0.818182 0.780656076 Entertainment 60 0.61297 0.428571 0.818182 0.782924238

Trend 4: Jaccard Similarity – Thursday, 4/26/18 to Friday, 4/27/18

Topic Interval Avg Min Max Standard Deviation Top Trends 5 0.785834 0.538462 1 0.886472813 Top Trends 10 0.674623 0.428571 1 0.821354367 Top Trends 15 0.584166 0.25 1 0.764307421 Top Trends 20 0.54326 0.25 0.818182 0.737061945 Top Trends 25 0.483971 0.333333 0.818182 0.695679937 Top Trends 30 0.435435 0.25 0.818182 0.659874939 Top Trends 35 0.46131 0.25 0.666667 0.679197706 Top Trends 40 0.367139 0.176471 0.666667 0.605920097 Top Trends 45 0.323063 0 0.666667 0.568385924 Top Trends 50 0.351961 0.176471 0.666667 0.593262829

41

Top Trends 55 0.330159 0.111111 0.666667 0.574594405 Top Trends 60 0.238051 0.052632 0.538462 0.487904762 Politics 5 0.849266 0 1 0.921556259 Politics 10 0.738833 0 1 0.859553719 Politics 15 0.73856 0.333333 1 0.859394953 Politics 20 0.691642 0.333333 1 0.831649981 Politics 25 0.511814 0 1 0.71541173 Politics 30 0.639435 0.333333 0.818182 0.799646572 Politics 35 0.594364 0.333333 0.818182 0.770950043 Politics 40 0.554814 0.25 0.666667 0.744858532 Politics 45 0.484235 0.176471 0.666667 0.695869758 Politics 50 0.415385 0 0.666667 0.644503387 Politics 55 0.488095 0.25 0.666667 0.698638131 Politics 60 0.452543 0.176471 0.666667 0.672712833 Science/Tech 5 0.819245 0 1 0.905121584 Science/Tech 10 0.665085 0 1 0.815527385 Science/Tech 15 0.776837 0.538462 1 0.881383684 Science/Tech 20 0.596023 0 1 0.772025275 Science/Tech 25 0.462326 0 0.818182 0.679945258 Science/Tech 30 0.612684 0.428571 1 0.782741089 Science/Tech 35 0.632784 0.333333 1 0.795477142 Science/Tech 40 0.610473 0.25 1 0.781327627 Science/Tech 45 0.591187 0.333333 1 0.768886592 Science/Tech 50 0.354762 0 0.666667 0.595618926 Science/Tech 55 0.571429 0.333333 1 0.755928946 Science/Tech 60 0.439139 0.176471 0.818182 0.662675857 Sports 5 0.754958 0 1 0.868883474 Sports 10 0.574212 0 1 0.757767446 Sports 15 0.618364 0 1 0.786361299 Sports 20 0.499304 0 0.818182 0.706614653 Sports 25 0.455333 0 1 0.674783333 Sports 30 0.422198 0 0.818182 0.649767783 Sports 35 0.476604 0 1 0.690365322 Sports 40 0.397762 0.111111 0.818182 0.630683863 Sports 45 0.389632 0.176471 0.818182 0.624204782 Sports 50 0.360218 0 0.818182 0.600181273 Sports 55 0.307516 0.111111 0.666667 0.554541558 Sports 60 0.286442 0.052632 0.666667 0.53520296 Entertainment 5 0.797166 0 1 0.892841527 Entertainment 10 0.627964 0 1 0.792441609 Entertainment 15 0.707398 0 1 0.841069477 Entertainment 20 0.578897 0 0.818182 0.760853004 Entertainment 25 0.592438 0 1 0.769699854 Entertainment 30 0.505236 0 1 0.710799202 Entertainment 35 0.502245 0 1 0.708692672 Entertainment 40 0.59431 0.333333 0.818182 0.770915334 Entertainment 45 0.54917 0.176471 0.818182 0.741059906 Entertainment 50 0.452525 0 0.818182 0.672699972 Entertainment 55 0.503692 0.176471 0.818182 0.70971289 Entertainment 60 0.478355 0.333333 0.818182 0.691632112

42

Trend 5: Jaccard Similarity – Tuesday, 5/15/18 to Wednesday, 5/16/18

Topic Interval Avg Min Max Standard Deviation Top Trends 5 0.621414 0.25 1 0.788297871 Top Trends 10 0.49392 0.071429 1 0.702794129 Top Trends 15 0.392564 0 0.818182 0.626548963 Top Trends 20 0.34303 0 0.666667 0.58568722 Top Trends 25 0.295228 0 0.538462 0.543348802 Top Trends 30 0.259829 0 0.538462 0.509734303 Top Trends 35 0.179038 0 0.428571 0.423129155 Top Trends 40 0.238311 0 0.538462 0.488170778 Top Trends 45 0.153651 0.034483 0.25 0.391983565 Top Trends 50 0.131865 0.034483 0.25 0.363131689 Top Trends 55 0.166667 0 0.333333 0.40824829 Top Trends 60 0.214286 0 0.428571 0.46291005 Politics 5 0.759806 0.2 1 0.871668392 Politics 10 0.65994 0.153846 1 0.812366949 Politics 15 0.580551 0.153846 1 0.761938857 Politics 20 0.483766 0.071429 0.818182 0.695533057 Politics 25 0.452298 0.071429 0.818182 0.672530819 Politics 30 0.42371 0 0.818182 0.650929815 Politics 35 0.365764 0.034483 0.666667 0.604783884 Politics 40 0.302093 0.034483 0.538462 0.54962946 Politics 45 0.246032 0.071429 0.333333 0.496015873 Politics 50 0.25 0.071429 0.428571 0.5 Politics 55 0.286472 0.034483 0.538462 0.53523093 Politics 60 0.304945 0.071429 0.538462 0.552218304 Science/Tech 5 0.787653 0 1 0.887498286 Science/Tech 10 0.709751 0.25 1 0.842467424 Science/Tech 15 0.678055 0.153846 1 0.82344112 Science/Tech 20 0.607567 0.111111 1 0.779465866 Science/Tech 25 0.524032 0.153846 0.818182 0.723900217 Science/Tech 30 0.515917 0.071429 0.818182 0.718273914 Science/Tech 35 0.454924 0.034483 0.818182 0.674480827 Science/Tech 40 0.463709 0.034483 0.818182 0.680961603 Science/Tech 45 0.370469 0.034483 0.538462 0.608661328 Science/Tech 50 0.302093 0.034483 0.538462 0.54962946 Science/Tech 55 0.333333 0 0.666667 0.577350269 Science/Tech 60 0.409091 0 0.818182 0.639602149 Sports 5 0.626893 0 1 0.791765888 Sports 10 0.526892 0 1 0.72587358 Sports 15 0.654069 0.428571 0.818182 0.808745488 Sports 20 0.414038 0 0.666667 0.643458113 Sports 25 0.305556 0 0.538462 0.552770798 Sports 30 0.478555 0.333333 0.636364 0.691776538 Sports 35 0.420214 0.176471 0.583333 0.64823926 Sports 40 0.353846 0.25 0.461538 0.59484969 Sports 45 0.324405 0.1875 0.5 0.569565415 Sports 50 0.095238 0 0.285714 0.3086067 Sports 55 0.24697 0.227273 0.266667 0.496960458 Sports 60 0.24697 0.227273 0.266667 0.496960458 Entertainment 5 0.656177 0 1 0.810047626 Entertainment 10 0.546753 0 1 0.739427648 Entertainment 15 0.634911 0.26087 1 0.796813319

43

Entertainment 20 0.322594 0 0.818182 0.567973655 Entertainment 25 0.226018 0 0.428571 0.475413445 Entertainment 30 0.398289 0.176471 0.818182 0.631101779 Entertainment 35 0.369545 0.16 0.818182 0.607902504 Entertainment 40 0.291644 0.16 0.538462 0.540040778 Entertainment 45 0.232906 0.115385 0.333333 0.482603339 Entertainment 50 0.038462 0 0.115385 0.196116135 Entertainment 55 0.203704 0.074074 0.333333 0.451335467 Entertainment 60 0.287088 0.035714 0.538462 0.535805853

Appendix 2: Data from Trends and Tabs Trends and Tabs 1: Every News Source Facebook Published

The New York 7News - WHDH Idaho State moneycontrol.co CNNMoney Times Ars Technica Liverpool FC www.gizbot.com Boston WDSU News MarketWatch Journal m WESH 2 News nativenewsonline www.nationalne Tulsa's Channel 8 WGN TV PCWorld inews.co.uk Mirror Football GSMArena.com Fox 35 WOFL WFMJ Tennessean .net wswatch.com - KTUL Portland Press lovinmanchester. www.13wmaz.co WGRZ - Channel Idaho Press- www.nationalobs ESPN Herald com NHL SB Nation m 2, Buffalo Tribune erver.com Bradenton Herald ABC Action medicalxpress.co London Evening www.phonearena The Seattle Anchorage Daily atlantablackstar.c www.palmbeach News - WFTS - Reuters RealClearPolitics m Standard .com Times News om www.kivitv.com post.com Tampa Bay 6abc Action Talking Points CBS7 News / News Recode Autoblog theScore SlashGear Memo Bustle KOSA-TV KTVB The Raw Story NBC 6 The Bangor Smithsonian www.belfastlive. Hawaii News Daily News Magazine co.uk talksport.com SPIN WDTN-TV Inverse Now Mic Teen Vogue TheBlaze Las Vegas The Columbus The Washington WMC Action morungexpress.c Bleacher Report Digital Trends Review-Journal Stars and Stripes Dispatch Times News 5 National Post teleSUR om WRCB Channel 3 Eyewitness Fox Carolina WPMT FOX43 The Guardian www.kqed.org IGN TechRadar News WebMD news360.com News www.whio.com Quartz News 12 Long www.zerohedge. East Idaho Island The Verge LiveScience ComicBook.com com WWLP-22News WWLTV Slate.com News.com 9to5toys.com WMUR-TV Entertainment taskandpurpose.c www.consumeraf www.dailywire.c FOX8 Washington Post MacRumors Weekly om fairs.com om Townhall.com www.wtol.com BGR WPSD-TV Manchester Android www.centralmain www.dailypost.c www.smartbrief. NBC Sports WFAA Evening News New York Post Authority 13 On Your Side KMBC 9 e.com o.uk com News 12 NBC 26 www.moneysavi signup.freebies.c www.fox25bosto Washington www.algemeiner. TechCrunch WSOC-TV ngexpert.com om CBC Sports n.com Examiner KVUE philly.com The Next Web com www.news- WHO TV Lebanon Daily The Hill Yahoo mail.com.au Variety CTV News speedsociety.com Channel 13 News blogs.edweek.org News VentureBeat VICE News www.sciencedail The Baltimore www.bollywoodl KXXV Central www.512tech.co TIME Daily Telegraph y.com Sun VICE ife.com Texas News Now junkee.com exclaim.ca m ABC 7 Chicago WGNO - News www.sciencenew www.catchnews. www.comingsoo The Dallas The Business With A Twist www.edweek.org s.org com n.net Morning News NBC15 Madison WLOS ABC 13 thatgrapejuice.net Journals WILX News 10 www.statnews.co The Gaston ' 14 NEWS Engadget m JoBlo.com Page Six WMTW-TV 10TV - WBNS Gazette www.rap-up.com .com WOOD TV8 azfamily 3TV www.irishexamin FOX 47 News - wildfiretoday.co CBS 5 NBC Chicago The Atlantic WKYT TVLine abc3340.com er.com The Times-News djbooth.net WSYM m heroichollywood. Post- FOX31 www.greaterkash www.defencenew BBC News NBC News The Sun com Den Of Geek UK Gazette KDVR.com Boing Boing mir.com KING 5 s.in The Mercury ABC30 Action Popular Business Insider WIRED News Gizmodo Vulture Wichita Eagle News Hot Air Herald Sun Mechanics Democracy Now! The Charlotte www.flickeringm National Observer Fortune The Scientist MovieWeb yth.com WIBW KTLA 5 News Geographic news.com.au Space.com IndyStar The Sacramento Asheville Citizen The West www.zmescience The Hamilton Fox News Daily Mail The Daily Beast The A.V. Club GameSpot WIS-TV Bee Times Australian .com Spectator www.heraldscotl www.jambase.co www.9news.com. Sauk Valley and.com People Metro Pitchfork m 9NEWS (KUSA) syracuse.com CommonDreams au ABC6 News Media electronicintifada InForum WJAC-TV News FOX Sports rolltide.com Rolling Stone al.com thefilmstage.com The World The FADER Boston Herald .net www.burnleyexp www.campusrefo The Jerusalem Region 8 News Breitbart Goal Indonesia ress.net The Irish Times rm.org theplaylist.net WNEP-TV Post / JPost.com GOLF.com thehustle.co KCBD Centre Daily The Post and ComicBookMovi www.middleeast The News & www.hookem.co NewsChannel 11 Times NBA E! News MLive.com Courier e.com PennLive.com monitor.com Observer m Omaha World- www.beinsports. Sky News www.technologyr KSLA News 12 Herald com Daily Express The Daily Caller ottawasun.com Highsnobiety Reading Eagle Australia NFL eview.com KOTV - News The Hollywood Tampa Bay The Times of www.thisismone On 6 ABC News Reporter FOX 17 Times NBC Bay Area WFMZ Israel y.co.uk TigerNet.com WTVC-TV New York The State The Edmonton consequenceofso NewsChannel 9 Yahoo Finance www.espn.co.uk HuffPost Canada Magazine SFGATE Sun und.net thespun.com News Al Jazeera www.football365 Life & Style The Salt Lake theundefeated.co www.evertonfc.c NJ.com English .com Weekly Ottawa Citizen LJWorld.com Tribune Toronto Sun bust.com m om www.thisisinside www.livesoccert www.popsugar.c crimewatchdaily. www.thelondone LNP + www.dailyedge.i r.com azcentral v.com om THE WEEK com conomic.com LancasterOnline e Golf Channel www.cbr.com The Boston www.screendaily ABC 7 News - The Times of WTHI-TV Globe Sky Sports .com WJLA India KGAN CBS 2 The Incline www.flare.com Golf Digest Newsarama www.shieldsgaze KENS 5 & Yahoo News The Denver Post Sporting News tte.com Adweek Kens5.com KWWL Public Opinion FanSided www.mlbtraderumors.com Indiatimes Forbes www.teamtalk.co www.snapchat.co KALB News WPTV LifeNews.com The York lithub.com Eagles

44

m m Channel 5 Dispatch

www.tribalfootba KCTV5 News NBC Connecticut ll.com The Toronto Star Kansas City 10News WTSP The Gazette Ledger-Enquirer NYLON www.racingpost.com York Daily WEEI Sports Time Out The Des Moines Record/Sunday www.instylemag. phys.org NDTV Radio Network London Ocala StarBanner KOAT Register News com.au Saturday Down South www.uppermichi LifeSiteNews.co Bloomberg Newsday 247sports.com Today Show ganssource.com FOX 12 Oregon m .com Mediaite www.seccountry.com www.brisbanetim WAFB Channel es.com.au NPR clutchpoints.com Vanity Fair 9 Deadspin www.aclu.org Boston.com The New Yorker Foreign Policy Philadelphia english.manoram twinning.popsuga WSMV News 4, CBS News Magazine aonline.com r.com WITN-TV GeekTyrant Nashville ABC Fox News Insider who…"

CBS Sports POLITICO Global News NewsOne Screen Rant Fox 13 News The Oregonian FOX 61 www.dailysabah.com CNBC Salon NESN bearingarms.com TheGrio AJC KATU News KOIN 6 680 NEWS The Quint www.jerusalemo CNET The Telegraph Scroll ABC7 AlterNet KHOU 11 News KRNV News 4 StateCollege.com nline.com NBC Charlotte Deadline Free The Kansas City CNN USA TODAY Yahoo Sports Hollywood Press KSDK News Star KGW-TV The Jewish Press Firstpost www.eveningexp KTVN Channel 2 ress.co.uk Vox ABC15 Arizona Indian Express The Root NOLA.com News Popular Science MassLive.com www.therichest.com The Advocate The Wall Street Ultimate Classic 7 Eyewitness (Baton Rouge, The Roanoke Fast Company Journal BBC Sport Rock News WKBW LA) KXAN News Roll Call Times Android Central Newschannel 3, CBS News, www.gizmodo.co WWMT, West The San Diego St. Louis Post- Statesman South m.au Fox Business Daily Record www.bgr.in Union-Tribune Dispatch Journal Sun Sentinel Fox2Now www.eurosport.c FOX6 News HuffPost Yahoo Canada o.uk Complex KTRE-TV Milwaukee WLOX-TV Us Weekly constitution.com WSFA-TV WKBN 27 WVUE FOX 8 The Register- indiancountryme The Independent 9to5Mac.com GiveMeSport /Film Youngstown OH News article.wn.com Guard dianetwork.com WSBT-TV www.japantimes. I fucking love The Virginian- co.jp science www.joe.co.uk WIRED UK AOL KPLC 7 News Pilot Idaho Statesman ipolitics.ca News 4 Tucson - KVOA Milwaukee Los Angeles KHQ Local WISC-TV / Daily Mirror Miami Herald Journal Sentinel BuzzFeed Hindustan Times Times Billboard News Channel 3000 www.nationalmemo.com www.liverpoolec sports.mynorthwe Corpus Christi www.financialex MSN appleinsider.com ho.co.uk Financial Times st.com Radio Times Caller-Times Willamette Week press.com yourstory.com

45