1 Leveraging Cross-domain Social Media Analytics to Understand TV Topics Popularity Additional Material

Ruggero G. Pensa, Maria Luisa Sapino, Claudio Schifanella, Luca Vignaroli

THE SYSTEM ARCHITECTURE (see Section ??). For each TV program that a schedule an- alyzer inserted in the knowledge graph, the Twitter module collects in real-time all related tweets, grouping them into Sources time dependent slices, where each slice contains the tweets published from time t to t + ∆. Each tweetset is then Layer  ... processed in order to detect the named entities (people, S1 S2 Sn places and events) through the use of an NER (Name Entity Recognition) module, while a Sentiment Analysis module Extractors allows to extract the opinions contained in a tweetset. Sim- rocessing ilarly, at each time slice, the other source analyzer modules look for new elements (posts, videos, user comments) that belong to previously analyzed media and performs the same type of analysis described for Twitter. ources P

S S Within the NER module, we can detect two different Domain phases: entity detection and entity disambiguation [1]. En- Knowledge tity detection is performed by a combined use of the Freeling

Layer POS Tagger [2] and Wikipedia articles as reference knowl- h h edge base. In particular, through the use of the Wikipedia search API1, the NER module is able to detect the presence of entities starting from hashtags: for example, the hashtag

ge Grap #barackobama will be recognized by Wikipedia as the string d d “Barack Obama”. Nevertheless, the most challenging task in NER is represented by the entity disambiguation (or resolution) [1]. Since our scenario is characterized by the

Knowle presence of short and sparse texts (both for Twitter and Face- book comments), many of the existing approaches based on the bag of words model will fail: for this reason our NER Query Analysis module tries to leverage additional information provided

sys Layer by the context defined by the TV program in which the y y resolution process is involved, in order to establish which Views entity is the best among the set of the candidate real-world Anal entities. In details, the context of a TV program is defined by using the Wikipedia categories it belongs to and the set Fig. 4. The concept-level integration framework. of all entities contained in the knowledge graph previously associated with the program. Notice also that Wikipedia In this section, we describe the architecture implement- categories have been previously mapped onto concepts be- O ing our framework. Coherently with our framework, each longing to the ontology graph G . In this manner, for each source (both social and non social) is associated to an detected entity, the NER module tries to establish an order analyzer module whose task is to collect the data from the among all real-world candidates extracted from Wikipedia. sources and extract concepts, subjects, social objects and For example, if the text “Michael Jordan” is contained in their relations through the combined use of different shared a tweetset related to a TV sports program, it is very likely modules (a Name Entity Recognition module and a Sentiment that the tweetset is referring to the famous basketball player Analysis module). The knowledge base extracted from each K analyzer will be used to properly update the graph G 1. https://www.mediawiki.org/wiki/API:Search 2 rather than the Berkeley’s professor, and this is computed by a comparison between the Wikipedia categories of the candidates and the corresponding categories of the TV program. Moreover, if, for example, Michael Jordan is pre- sented within the knowledge graph as a real-world entity recognized and associated with the considered TV program (i.e. because he is the presenter or a frequent guest), the NER module will choose it among all the real-world entity candidates. Finally, our module supports the inte- gration of external knowledge generated by a supervised scenario and it allows for user feedback, using an active learning process. In our application, we filter out infrequent recognized entities with the energy cutoff method. The sentiment analysis module is used to extract polarity values and emotions from tweetsets. Concerning the former, a first phase of lemmatization is performed by the Freeling POS tagger, while SentiwordNet [3] is used to extract the polarity values: hence, an aggregation function allows us to enrich each tweetset in the knowledge graph with a degree of positivity, negativity and neutrality. With the same approach, WordNet-Affect [4] is used to extract emotions. Where necessary, MultiwordNet2 is used for cross-language purposes. The knowledge graph is realized and stored in Neo4j3, the well-known NoSQL graph database: it offers a compre- hensive REST interface, an object-oriented API, and it scales up to billions of nodes and relationships with properties. The last component of the architecture is the module dedicated to the data analysis and publication of the results to the end users of the system. This module is able to extract both simple views of the graph and more complex query and analysis algorithm.

REFERENCES [1] H. Kopcke and E. Rahm, “Frameworks for entity matching: A comparison,” Data and Knowledge Engineering, vol. 69, no. 2, pp. 197 – 210, 2010. [2] L. Padr´oand E. Stanilovsky, “FreeLing 3.0: Towards wider multi- linguality,” in Proceedings of LREC 2012, 2012, pp. 2473–2479. [3] S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion min- ing,” in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, 2010. [4] C. Strapparava and A. Valitutti, “WordNet-Affect: An affective extension of wordnet,” in Proceedings of LREC 2004, 2004.

2. http://multiwordnet.fbk.eu 3. http://www.neo4j.org 3

Concept Ballarò Ballarò Social object 03/11/2015 Subject node rai Time node Facebook Representation t3 t2 Post Structure Uploader Support giova78

Comment Comment t4 Set Set Active usr marco masi

Italia People People Place Place

Andrea Matteo Luca Roma Romano Renzi Zaia

(a)

Concept Ballarò Social object

Subject node @RaiBallaro Time node Representation Ballarò Structure 27/10/2015 Broadcasters Support @claunet75

Tweet Set Active usr t1 @skippy

Firenze People Place

D. Della Valle Raffaele Matteo Roma Cantone Renzi

(b)

Fig. 1. Knowledge graph representation of a Facebook post (a) and a Twitter thread (b). 4

Rank Person Centrality Rank Person Centrality Rank Person Centrality 1 0.5244 1 Matteo Renzi 0.1662 1 Matteo Renzi 0.3089 2 0.1624 2 Massimo Giannini 0.1550 2 Matteo Salvini 0.1208 3 0.0477 3 Silvio Berlusconi 0.1159 3 Silvio Berlusconi 0.1008 4 Murizio Landini 0.0379 4 Matteo Salvini 0.0714 4 Massimo Giannnini 0.0689 5 Elsa Fornero 0.0139 5 Beppe Grillo 0.0655 5 Maurizio Landini 0.0362 6 Oscar Giannino 0.0128 6 Elsa Fornero 0.0292 6 Beppe Grillo 0.0319 7 Giorgia Meloni 0.0116 7 Maurizio Landini 0.0212 7 Elsa Fornero 0.0294 8 Maurizio Lupi 0.0103 8 Giovanni Floris 0.0180 8 Maurizio Lupi 0.0142 9 Maurizio Martina 0.0092 9 0.0108 9 Oscar Giannino 0.0129 10 Paola De Micheli 0.0061 10 Maurizio Lupi 0.0105 10 Daniela Santanch`e 0.0119 (a) Twitter social network (b) Facebook social network (c) Combined social network

Fig. 2. Top betweenness centrality scores of nodes from different social networks.

0.8 0.8 0.8 0.7 Twitter 0.7 Twitter 0.7 Twitter 0.6 Facebook 0.6 Facebook 0.6 Facebook Twitter+Facebook Twitter+Facebook Twitter+Facebook 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3

Popularity 0.2 Popularity 0.2 Popularity 0.2 0.1 0.1 0.1 0 0 0 Jul 2015 Jul 2015 Jul 2015 Apr 2015 Apr 2015 Apr 2015 Jan 2015 Jun 2015 Jan 2015 Jun 2015 Jan 2015 Jun 2015 Mar 2015 Mar 2015 Mar 2015 Feb 2015 Feb 2015 Feb 2015 May 2015 May 2015 May 2015 Episodes Episodes Episodes (a) Matteo Renzi (b) Maurizio Landini (c) Matteo Orfini

0.8 0.8 0.8 0.7 Twitter 0.7 Twitter 0.7 Twitter 0.6 Facebook 0.6 Facebook 0.6 Facebook Twitter+Facebook Twitter+Facebook Twitter+Facebook 0.5 0.5 0.5 0.4 0.4 0.4 0.3 0.3 0.3

Popularity 0.2 Popularity 0.2 Popularity 0.2 0.1 0.1 0.1 0 0 0 Jul 2015 Jul 2015 Jul 2015 Apr 2015 Apr 2015 Apr 2015 Jan 2015 Jun 2015 Jan 2015 Jun 2015 Jan 2015 Jun 2015 Mar 2015 Mar 2015 Mar 2015 Feb 2015 Feb 2015 Feb 2015 May 2015 May 2015 May 2015 Episodes Episodes Episodes (d) Giorgia Meloni (e) Matteo Salvini (f) Oscar Giannino

Fig. 3. Episode popularity of some cited persons from our knowledge graph