Investigating Gender Stereotypes in the Media: a Natural Language Processing Approach to Understanding Gender Disparities in the Reporting of Football
Total Page:16
File Type:pdf, Size:1020Kb
Linköping University | Department of Management and Engineering Master’s thesis, 30 credits| Master’s programme Spring 2021| LIU-IEI-FIL-A--21/03695--SE Investigating gender stereotypes in the media: A Natural Language Processing approach to understanding gender disparities in the reporting of football. Isabel Pereira Fernandez Supervisor: Miriam Hurtado Bodell Examiner: Erik Rosenqvist Linköping University SE-581 83 Linköping, Sweden +46 013 28 10 00, www.liu.se Contents 1 Introduction 1 2 Literature Review 3 2.1 Gender Stereotypes . 3 2.2 Media's role in gender stereotypes . 4 2.2.1 Semantic Differences . 4 2.2.2 Syntactical Differences . 6 2.3 History of Women's Football in the UK . 8 3 Data & Methods 10 3.1 Data . 10 3.2 Seeded Topic Modelling . 12 3.2.1 Jensen{Shannon Divergence . 14 3.2.2 Two Sample Kolmogorov{Smirnov Test . 15 3.3 POS Tagging . 16 3.3.1 Mann{Whitney U test . 17 3.3.2 Top Word Analysis . 18 3.4 Bonferroni Correction . 19 4 Results 20 4.1 Semantic Gender Differences . 20 4.2 Syntactic Gender Differences . 25 5 Discussion 33 6 Conclusion 35 A List of Words Used for Categorization 37 B List of Seed Words Used 37 C Seeded Topic Models Robustness Test 38 D Full list of topics and top words 42 E Keyword Robustness Test 49 E.1 POS Tag Results . 50 E.2 Seeded Topic Model Results . 57 References 60 List of Figures 1 Number of Articles in the Dataset Over the Years . 11 2 Intuition behind LDA model from Blei (2012) . 12 3 How to Calculate the Two Sample Kolmogorov{Smirnov Test Statistic 16 4 Jensen{Shannon Divergence For Each Topic For Different Sized Bins 22 5 Average Jensen{Shannon Divergence For Each Topic Over Time . 24 6 Adjusted P-values for the Two Sample Kolmogorov{Smirnov Test per Topic Over Time . 24 7 Overview of results for the POS tag analysis on nouns and pronouns . 29 8 Overview of results for the POS tag analysis on adjectives and adverbs 30 9 Overview of results for the POS tag analysis on verbs . 31 List of Tables 1 Most Common Topics For Men's Football Articles . 21 2 Most Common Topics For Women's Football Articles . 21 3 Results for the Two Sample Kolmogorov{Smirnov Test . 23 4 Results for Mann{Whitney U test on POS tag ratios . 25 5 Most Common Unique Words in Category For Each Tag . 27 6 Percentage of Words in the Top 500 That Are Different For Each Tag 28 Abstract Sports can be an important factor in defining gender identity. However, sports are generally perceived as a masculine activity, especially when they are highly physical. In turn, this negatively impacts women who want to par- take in such activities. The most widely watched sport that is perceived to be masculine is football, it reaches billions of people across the world. Since the media is the main source of information for thousands of people who follow football, it is important to understand what part the media play in reproduc- ing gender stereotypes. The aim of this research is to investigate this phe- nomenon by answering the following research question: In what ways does the media reproduce gender stereotypes when reporting on football?. To do that, all articles from the Football section of the British newspaper The Guardian published between 2002 and 2020 were collected. The analysis is divided into two parts: semantic and syntactical differences. First, a seeded topic model is used to investigate whether the media focuses on different aspects of the sport depending on what gender they reported on. Second, a POS tag analysis is conducted to examine if the media employs different syntax on the coverage of men's and women's football. This is the first large-scale longitudinal study to examine gender differences in the media reporting in sports as well as one of few to use machine learning to analyse gender stereotypes. Findings indicate that both semantic and syntactical differences are prevalent in the reporting. More specifically, results demonstrate that there is a greater focus on female footballers' personal life, whereas for male football players the spotlight is on their performances and accomplishments on the pitch. Furthermore, the syntactical analysis indicates that the media uses gendered language more of- ten when reporting on women's football, and utilizes action-packed language when covering men's football. In both semantic and syntactic aspects, the longitudinal analysis demonstrates that the differences are diminishing over time. 1 Introduction Sports are an important factor in defining gender identities and stereotypes (Thorpe, 2010). From an early age, sports are seen as a male activity, especially when they require masculine attributes such as physical strength, violence and risk-taking (Musto, Cooky, & Messner, 2017; Thorpe, 2010). Therefore, women have a tendency to participate in physical activities that are classified as woman-like and avoid those that are perceived as more masculine. In England, for example, 67% of adult men practice sports regularly, compared to 55% of women (World Health Organization, 2015). This suggests that men are indeed more inclined to practice sports. Football makes for a ideal candidate to study gender stereotypes in sports due to its vast popularity and as it is perceived as a masculine activity. Football is a contact sport which requires strength and aggression (Harris, 2005), and is there- fore regarded as an activity more suited for men, stigmatizing women's football. In terms of the number of players, in England, the gap is even larger than that of ac- tive adults mentioned above: only 24% of registered footballers are women, with the remainder 76% being men (The Football Association, 2015). In general, women's football is less popular than men's. However, over the last decades, women's football has received increased attention and gained popularity. The Women's World Cup for example, had a growth of 65% in viewership from 2015 to 2019, reaching over 1:2 billion viewers (FIFA.com, 2019). Despite the increase in popularity, it is still only a fraction of the viewership of the men's events. In 2018, the men's World Cup final alone amounted to the same number of views as the whole women's tournament in 2019. In fact, the men's version of the tournament in its entirety received over 3:5 billion views, matching the Olympic Games in terms of viewership (FIFA, 2018). This is especially impressive since the Olympic Games are the most important sport- ing event to take place, as it includes more than 30 different sports with participants from over 200 countries (Whannel, 1992). Put together, these characteristics make for an interesting case-study, as football is a male-typed activity with large following which has undergone a change in the gender composition within the sport in recent years. Media outlets have an important role in influencing gender stereotypes and gen- der roles. This is because they can frame a story as they wish, and since they have a large reach, they can influence those who receive the story through their lens (Altheide & Snow, 1992; Rogers, 2004). In other words, the way in which a piece portrays a person impacts how the readers of said piece will perceive them. It has been shown that articles covering male and female athletes tend to use dif- ferent language and focus on different aspects of their performance (Eastman & Billings, 2000; Messner, Duncan, & Jensen, 1993; Musto et al., 2017; Wensing & 1 Bruce, 2003). This can lead people to perceive men and women in sports differently. Therefore, it is important to understand how the media portrays male and female athletes. Based on that, the following research question arises: In what ways does the media reproduce gender stereotypes when reporting on football? As mentioned above, previous literature has found that gender stereotypes are reproduced by the media both through the topics covered and the language used. In that sense, both semantics and syntax play a role in the differential coverage between men's and women's football. As a result, two sub questions are formulated: What are the semantic differences on the reporting of male and female footballers and do they change over time? To what extent does the syntax used by the media lead to different portrayal of men and women in football and how does it evolve over time? To answer these questions, data from the British newspaper The Guardian is used. The dataset consists of all articles about football published by the newspaper from 2002 to 2020. These articles are categorized by the gender they refer to using keywords. Once the data has been appropriately categorized, two machine learning techniques are used to obtain results. First, a seeded topic model is adopted to examine the data semantically. Second, part-of-speech (POS) tagging is applied to study the syntactical content of the data. The present study expands on previous literature in two ways. First, it examines a period of 19 years of coverage without any interruption, something that has not yet been done. This enables for a longitudinal investigation of how gender stereotypes in football evolve over the years. Second, it makes use of machine learning, which previously has not been used to study gender differences in sports. This approach also allows for a larger volume of data to be analyzed. This paper will continue as follows: first, a literature review will outline previous studies in the field and, based on that, draw six hypothesis which will be investigated. Then, the data and methods will be described in more detail.