An Automated Background Check on Professional Football Players

Eindhoven University of Technology MASTER A method to perform an automated background check on professional football players Hendrickx, T. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain School of Industrial Engineering Information Systems Group A Method to Perform an Automated Background Check on Professional Football Players Master Thesis T. Hendrickx 0920308 Supervisors: dr. R.J. de Almeida e Santos Nogueira (TU/e) prof.dr.ir. U. Kaymak (TU/e) ir. B.J. Aalbers (SciSports) Final Version Enschede, July 2016 Abstract Lately, there has been much attention for modeling the performance of association football players on the pitch, to see whether a specific player fits in a team, to detect promising players early in their career, and to predict match outcomes in order to make profit by placing bets. However, information considering football players outside the pitch has not been investigated yet, while many clubs show interest in a player background check. Performing such a background check manually is a time consuming process, and often considers text documents individually. Throughout this thesis, text mining techniques are used to both reduce the time it takes to perform a background check, and detect patterns in large amounts of textual data about football players. A first attempt is made to construct a personality profile in terms of the Big Five Personality Factor model, based on news articles about a football player. Three different methods, two of which using the bag-of-words approach and the other one using part-of-speech tagging, are tested and the results are evaluated. Furthermore, other text mining techniques, such as regular expressions and sentiment analysis, are applied to obtain background information about football players from Twitter. The results of the Twitter analysis are directly applicable in practice. A list of topics a player often talks about and people to whom he is talking can save time. A word cloud and a visualiz- ation of the sentiment analysis provide insight in what the fans think about this player. On the other hand, the construction of a personality profile requires some further research. While one of the experiments showed promising results on the Openness to Experience personality factor, and another one on the Neuroticism factor, further research is required to improve the construction of a person's personality based on news articles. Keywords: bag-of-words, part-of-speech tagging, personality, sentiment analysis, text mining A Method to Perform an Automated Background Check on Professional Football Players iii Executive Summary This master thesis project is carried out at SciSports, a company that wants to rationalize the decision making process in the football transfer market (in this thesis, the term football refers to soccer or association football). The main priority of SciSports is looking at the performance of football players on the pitch. This implies that data of player actions on the pitch is analyzed, transformed into information, and included in the player reports that are sold to professional football organizations. However, those football organizations also seem to be interested in background information about players, since SciSports receives a lot of requests for player background checks. Currently, these player background checks are created manually, but this is considered to be a very time consuming process. Reducing the required time for this process would give the personnel of SciSports more time to focus on other activities within the company. Furthermore, the background checks currently contain only certain facts about a player, directly found in a news article or a social media post, but combining different articles to find patterns to reveal new information is not done yet. Therefore, this master thesis project should contribute to SciSports in two different ways. The first one is to make the process of creating background checks less time consuming, by automat- ically filtering interesting text data. Secondly, different text data sources are to be combined in order to reveal new information, that can not be extracted from the text data by just reading it. To accomplish these challenges, an answer to the following research question is to be found: How can text mining techniques help in performing a player background check on professional football players? Selecting Sources and Gathering the Data To get to know which specific news sites and social media platforms are usable for this project, and how the data should be collected such that it can be analyzed, the first sub-question that was asked, reads as follows: Which data sources need to be selected in order to obtain the most meaningful information for a player background check and how should these data be gathered? Since, building one HTML parser to parse the content of different news websites is not feasible, eight news websites (four in Dutch and four in English) were selected and specific HTML parsers were built for those websites, such that a total of 58,140 news articles about 203 different players could be downloaded into a database. The different websites were selected with the help of the person currently performing player background checks at SciSports. Furthermore, Twitter data on 10 different football players in both English and Dutch was downloaded. Using the Twitter API, more than 20,000 tweets were downloaded, both tweets posted by the players themselves, and tweets posted by football fans about the players. The latter was accomplished by downloading mentions on a Twitter account of a football player. A Method to Perform an Automated Background Check on Professional Football Players v Identifying Interesting Player Characteristics After the data has been collected, it can be analyzed to discover information. Before this is done, it is required to specify which kinds of information are interesting to know from both an academic and a practical point of view. This is done by answering the following sub-question: Which player characteristics are interesting to know and how can these characteristics be obtained from the data? The first aspect that is involved in the automated background check, is the player's personality. From the different frameworks to map personality that currently exist, the most widely used one, the Big Five Personality Factor model, is used throughout this master thesis. The five factors of this personality model are: • Agreeableness versus Antagonism. • Conscientiousness versus Lack of Direction. • Extraversion versus Introversion. • Neuroticism versus Emotional Stability. • Openness to Experience versus Closedness to Experience. SciSports indicated that they find the personality aspect of a player very interesting, and considering the theory, creating an overview of the personality of a player based on news articles is very interesting. Existing literature shows a clear relationship between written text and a person's personality in terms of the Big Five, but conclusions of this theory are based on people performing specific writing tasks, or people writing pieces of text about themselves, and not on pieces of text that are written about them by others. There is no theory available on how to predict a person's personality based on pieces of text that are written about this person, but not written by this person himself. Furthermore, a relationship seems to exist between a person's personality and his social media profiles. However, the user created content, such as tweets and blogposts, are irrelevant according to the existing literature. Other parts of the social media profile, such as the number of friends, and the photos posted are more important in determining a person's personality. Since the user- created tweets is the data that is used in this project, it is, according to literature, not possible to base a person's personality on this data only. Therefore it is necessary to create a model that can predict a football player's personality by analyzing news articles about him. A large list consisting of 435 personality adjectives was found in literature, which can be used to analyze the articles. It is assumed that when many news articles about a player are gathered, the adjectives that are used in those articles can tell something about a this player's personality. For example, if the word sympathetic is appears a lot in news articles about a certain player, one might expect this player to score high on the Agreeableness personality factor. Besides a player's personality, three other types of information, that can be obtained from the Twitter data, are proposed. A first aspect that is interesting, is what players are generally talking about on Twitter, and to whom they are talking. Hashtags are commonly used on Twitter to indicate the topic a tweet is about. Furthermore, mentions (an at sign followed by a username) to other Twitter users are placed in tweets to notify other users. These mentions and hastags are extracte to see to whom a player is talking on Twitter and what he is talking about.

An Automated Background Check on Professional Football Players

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support