Exploratory analysis of news sentiment using subgroup discovery Anita Valmarska Luis Adrian´ Cabrera-Diego Senja Pollak Jozefˇ Stefan Institute and Elvys Linhares Pontes Jozefˇ Stefan Institute Jamova cesta 39 L3i laboratory Jamova cesta 39 and University of Ljubljana, University of La Rochelle Ljubljana, Slovenia Faculty of Computer La Rochelle, France
[email protected] and information science luis.cabrera diego, Vecnaˇ pot 113 elvys.linhares pontes Ljubljana, Slovenia
[email protected] [email protected] Abstract fake news identification (Bhutani et al., 2019) and in media bias analysis (El Ali et al., 2018). In this study, we present an exploratory anal- In the current trend of natural language process- ysis of a Slovenian news corpus, in which we ing research (Rogers and Augenstein, 2020), the investigate the association between named en- main focus is on improving the predictive perfor- tities and sentiment in the news. We propose a methodology that combines Named Entity mance over state-of-the art especially using deep Recognition and Subgroup Discovery - a de- learning-based methods. The drawback of these scriptive rule learning technique for identify- models is in their very limited interpretability. In ing groups of examples that share the same contrast, several data and text mining techniques class label (sentiment) and pattern (features - have been developed to improve domain under- Named Entities). The approach is used to in- standing and support exploratory analysis of data, duce the positive and negative sentiment class with focus on explainable models, which is crucial rules that reveal interesting patterns related e.g. in medical applications, but also interesting for to different Slovenian and international politi- cians, organizations, and locations.