Yoram Timmerman Reliability of Online News Media During Periods
Total Page:16
File Type:pdf, Size:1020Kb
Reliability of online news media during periods of stress Yoram Timmerman Supervisor: Prof. dr. ir. Antoon Bronselaer Counsellor: Hannah Van den Bossche Master's dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering Department of Telecommunications and Information Processing Chair: Prof. dr. ir. Herwig Bruneel Faculty of Engineering and Architecture Academic year 2017-2018 Reliability of online news media during periods of stress Yoram Timmerman Supervisor: Prof. dr. ir. Antoon Bronselaer Counsellor: Hannah Van den Bossche Master's dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering Department of Telecommunications and Information Processing Chair: Prof. dr. ir. Herwig Bruneel Faculty of Engineering and Architecture Academic year 2017-2018 Preface Exactly one year ago, when deciding to go for a thesis around the reliability of online news, I knew for sure that I was heading towards an interesting year. The subject combined perfectly my two biggest passions, computer science engineering and news. However, I knew that creating a thesis needed a lot of work. One year of very intensive work later however, I can totally say it was worth it. I am proud of and satisfied with the end result that I can finally present. First, I would like to thank my promotor, prof. dr. ir. Antoon Bronselaer. He was very closely involved in the research process to obtain this final work and was always available to provide the necessary feedback. Without his help, it would have been impossible to formulate this thesis. I would also like to thank hir. Hannah Van den Bossche, who also helped me a lot. She even performed part of the manual error annotations in this thesis, to verify whether her results coincided with mine. Furthermore, I would also like to thank my mother and my boyfriend, who provided me with the optimal circumstances to write this thesis. Without their support, this thesis would not have been the same. Finally, a final word of thanks goes to my cat, who was present during the writing of almost any page that this work contains. Although she probably does not remember anything of what I explained to her in the previous months, she was always present to listen to my ideas. Yoram Timmerman, Ghent, May 2018 Permission for usage “The author(s) gives (give) permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the copyright terms have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation.” Yoram Timmerman, Ghent, May 2018 Reliability of online news media during periods of stress Yoram Timmerman Supervisor: Prof. dr. ir. Antoon Bronselaer Counsellor: hir. Hannah Van den Bossche Master’s dissertation submitted in order to obtain the academic degree of Master of Science in Computer Science Engineering Department of Telecommunications and Information Processing Chair: Prof. dr. ir. Herwig Bruneel Faculty of Engineering and Architecture Ghent University Academic year 2017-2018 Summary This Master dissertation aims to perform an extensive study of the online news reliability in Flanders. More specifically, the two largest Flemish online newspapers are investigated. During the analysis, use is made of both manual and automated techniques. Next to a general overview of the reliability, attention is also given to the influence of breaking news events and their accompanying periods of stress on this reliability. We analyze three different aspects of the reliability of online news: accuracy, consistency and relevance. In a first part, the accuracy of Flemish online newspapers and the influence of breaking news events on this accuracy is investigated. This is approached by manually screening a data set of articles for errors. The results of these manual annotations are ana- lyzed and conclusions regarding the accuracy of online news in Flanders are drawn. In a second part, an algorithm is developed that searches data sets of articles about the same subject to find inconsistencies that exist between the articles. This algorithm processes structured graph representations of initially unstructured text articles. This tech- nique is then tested on data sets of articles written during specific periods of stress to be able to quantify the problem of consistency of articles during these periods of stress. Thirdly, a specific aspect of the relevance of articles during periods of stress, namely their freshness, is analyzed. An automatic analysis method based on similarity measures is presented that can be used to this end. By testing this method on two period of stress data sets, the problem of lack of freshness is investigated. Keywords online news reliability, graph databases, text processing, similarity measures Reliability of online news media during periods of stress Yoram Timmerman Supervisor(s): Antoon Bronselaer, Hannah Van den Bossche Abstract—Studies investigating the accuracy of printed news media are news in a couple of aspects. Most important to note here is that widespread. However, as far as we know, no studies exist that investigate online news media are part of the 24-hour news cycle (Bucy, the broad concept of reliability of online news media in Flanders and the influence of breaking news events on this. In this paper, the reliability of Gantz, & Wang, 2007). While printed news media have a typ- Flemish online news media is investigated by analyzing their accuracy, con- ical fixed deadline (e.g. the evening before publication), online sistency and relevance. Next to an investigation of the accuracy of Flemish news media publish their articles as fast as possible, 24/7. This online news media under the influence of different breaking news events, possibly creates a very high pressure on the editorial offices of two algorithms are presented. One allows journalists to find numerical in- consistencies within a data set of articles about the same subject. Another such online newspapers. Especially when a breaking news event algorithm can be used to detect how much new information an article con- has happened, it can be assumed that the pressure of publishing tains. all information that comes in as fast as possible becomes very Keywords—online news reliability, period of stress, graph databases, text high. Possibly, this could be reflected in the quality and relia- processing, similarity measures bility of the online news articles that are finally published. As II NTRODUCTION such, a study investigating the reliability of online news media Following what happens around the world by reading news- in Flanders is important. To the best of our knowledge, no stud- papers or watching news shows on television is an important as- ies exist that investigate the quality of reliability of online news pect of many people their daily lives. However, the way people media in Flanders. Moreover, no studies were found that inves- are following the news is changing very rapidly (Picone, 2016). tigate the influence of the presence of breaking news events on Instead of reading printed newspapers, more and more people this reliability. start to use online newspapers as their primary source of infor- mation. The Digital Report of Belgium in 2016 (Picone, 2016) Reliability is a very broad term. It can be summarized as the indicates that around 50% of the people in Flanders still reads a extent to which people reading online news can trust that what printed news article at least once a week. However, for online they read is a truthful, unbiased, correctly represented and cor- news (including social media links to articles), an overwhelm- rectly written article. In the context of this study, three different ing 83% of the interrogated sample indicates to read at least one aspects of the reliability of online news were investigated: ac- such article a week. This percentage is ever increasing. Typical curacy, consistency and relevance. The accuracy of an article is examples of such online newspapers in Flanders include HLN.be related to how many errors are present in an article. The consis- and nieuwsblad.be. It can thus be assumed that, in a world of tency measures whether information present in different articles fast digitalization, this trend will not stop in the next couple of is compatible: if two articles contain information that is contra- years. dictory, the articles are said to be inconsistent. Finally, with the relevance of an article, it is meant how important the informa- As more and more people make use of these online news ser- tion in an article is to the understanding of the subject it handles. vices, it is important that these services are of sufficient quality. Quality of news is something that is difficult to measure, as it is In section II, an investigation of the accuracy of Flemish on- to a large extent a subjective issue: many common errors found line newspapers under the influence of different breaking news in news articles are a possible subject of discussion. However, events is performed. In section III, a structured representation in the past different studies were already conducted to measure of online news articles is presented. Moreover, an algorithm is the quality of printed news articles (e.g. Maier et al. (2002)). illustrated that exploits this structured representation to find nu- These studies, conducted both in the United States and in Eu- merical inconsistencies between articles about the same subject. rope, indicate that the number of errors that can be found in a In section IV, a specific aspect of the relevance of articles about collection of printed news articles is quite high. Maier et al. a breaking news event, i.e. their freshness, is studied. An auto- (2002) concludes that 59% of the investigated local articles that matic analysis method is presented to this end.