Using Data Science to Understand the Film Industry's Gender Gap Dima Kagan∗1, Thomas Chesneyy2, and Michael Firez1 1Department of Software and Information Systems Engineering, Ben-Gurion University of the Negev 2Nottingham University Business School August 7, 2019 Abstract Data science can offer answers to a wide range of social science ques- tions. Here we turn attention to the portrayal of women in movies, an industry that has a significant influence on society, impacting such aspects of life as self-esteem and career choice. To this end, we fused data from the online movie database IMDb with a dataset of movie dialogue subti- tles to create the largest available corpus of movie social networks (15,540 networks). Analyzing this data, we investigated gender bias in on-screen female characters over the past century. We find a trend of improvement in all aspects of women`s roles in movies, including a constant rise in the centrality of female characters. There has also been an increase in the number of movies that pass the well-known Bechdel test, a popular|albeit flawed|measure of women in fiction. Here we propose a new and better alternative to this test for evaluating female roles in movies. Our study introduces fresh data, an open-code framework, and novel techniques that present new opportuni- ties in the research and analysis of movies. arXiv:1903.06469v3 [cs.SI] 6 Aug 2019 Keywords: Data Science, Network Science, Gender Gap, Social Networks 1 Introduction ∗
[email protected] [email protected] zmickyfi@post.bgu.ac.il 1 2 The film industry is one of the strongest branches of the media, reaching billions of viewers worldwide [13, 19].