From Dbpedia and Wordnet Hierarchies to Linkedin and Twitter
Total Page:16
File Type:pdf, Size:1020Kb
From DBpedia and WordNet hierarchies to LinkedIn and Twitter Aonghus McGovern Alexander O’Connor Vincent Wade ADAPT Centre ADAPT Centre ADAPT Centre Trinity College Dublin Trinity College Dublin Trinity College Dublin Dublin, Ireland Dublin, Ireland Dublin, Ireland [email protected] Alex.OConnor Vincent.Wade@cs @scss.tcd.ie .tcd.ie Abstract growing in number, but also in their heterogeneity' Previous research has demonstrated the benefits of (2014). They further argue that the best way to using linguistic resources to analyze a user’s social overcome these issues is by using linguistic media profiles in order to learn information about that resources that conform to Linked Open Data user. However, numerous linguistic resources exist, principles. Chiarcos et al. divide such resources raising the question of choosing the appropriate into two categories, distinguishing between resource. This paper compares Extended WordNet strictly lexical resources such as WordNet and Domains with DBpedia. The comparison takes the general knowledge bases such as DBpedia. form of an investigation of the relationship between users’ descriptions of their knowledge and background The study described in this paper examines a on LinkedIn with their description of the same single resource of each type. WordNet is characteristics on Twitter. The analysis applied in this considered to be a representative example of a study consists of four parts. First, information a user has shared on each service is mined for keywords. purely lexical resource given the extent of its use 1 These keywords are then linked with terms in in research . DBpedia is considered to be a DBpedia/Extended WordNet Domains. These terms are representative example of a knowledge base ranked in order to generate separate representations of because its information is derived from the user’s interests and knowledge for LinkedIn and Wikipedia, the quality of whose knowledge Twitter. Finally, the relationship between these almost matches that of Encyclopedia Britannica separate representations is examined. In a user study (Giles, 2005). These resources are compared by with eight participants, the performance of this analysis means of an investigation of the relationship using DBpedia is compared with the performance of between users’ descriptions of their knowledge this analysis using Extended WordNet Domains. The and background on LinkedIn with their best results were obtained when DBpedia was used. description of the same characteristics on Twitter. 1 Introduction Both LinkedIn and Twitter allow users to describe Natural Language Processing (NLP) techniques their interests and knowledge by: (i) Filling in have been shown in studies such as Gao et al. profile information (ii) Posting status updates. (2012) and Vosecky et al. (2013) to be successful However, the percentage of users who post status in extracting information from a user’s social updates on LinkedIn is significantly lower than media profiles. In these studies, linguistic the percentage of users who do so on Twitter resources are employed in order to perform NLP (Bullas, 2015). On the other hand, LinkedIn users tasks such as identifying concepts, Named Entity fill in far more of their profiles on average than Recognition etc. However, as Chiarcos et al. Twitter users (Abel, Henze, Herder, and Krause, argue, significant interoperability issues are posed 2010). by the fact that linguistic resources ‘are not only 1 A list of publications involving WordNet: http://lit.csci.unt.edu/~wordnet/ 1 Proceedings of the 4th Workshop on Linked Data in Linguistics (LDL-2015), pages 1–10, Beijing, China, July 31, 2015. c 2015 Association for Computational Linguistics and Asian Federation of Natural Language Processing Given the different ways in which users use these described in this paper, the Extended WordNet services, it is possible that they provide different Domain labels created by González et al. (2012) representations of their interests and knowledge are used. Not only do these labels provide greater on each one. For example, a user may indicate a synset coverage, González et al. report better new-found interest in ‘Linguistics’ through their WSD performance with Extended WordNet tweets before they list this subject on their Domains than with the original WordNet LinkedIn profile. This study examines the Domains. relationship between users’ descriptions of their Mihalcea and Csomai describe an approach for interests and knowledge on each service. identifying the relevant Wikipedia articles for a 2 Related Work piece of text (2007). Their approach employs a combination of the Lesk algorithm and Naïve Hauff and Houben describe a study that Bayes classification. Since DBpedia URIs are investigates whether a user’s bookmarking profile created using Wikipedia article titles, the above on the sites Bibsonomy, CiteULike and approach can also be used to identify DBpedia LibraryThing can be inferred using information entities in text. obtained from that user’s tweets (2011) . Hauff and Houben generate separate ‘knowledge Magnini et al’s approach offers a means of profiles’ for Twitter and for the bookmarking analysing the skill, interest and course lists on a sites. These profiles consist of a weighted list of user’s LinkedIn profile with regard to both terms that appear in the user’s tweets and WordNet and DBpedia. Unambiguous items in bookmarking profiles, respectively. The authors’ these lists can be linked directly with a WordNet approach is hindered by noise introduced by synset or DBpedia URI. These unambiguous tweets that are unrelated to the user’s learning items can then provide a basis for interpreting activities. This problem could be addressed by ambiguous terms. For example, the unambiguous enriching information found in a user’s profiles ‘XML’ could be linked with the ‘Computer with structured data in a linguistic resource. Science’ domain, providing a basis for interpreting the ambiguous ‘Java’. However, there are often multiple possible interpretations for a term. For example, the word The above analysis allows for items in a user’s ‘bank’ has entirely different interpretations when tweets and LinkedIn profile to be linked with it appears to the right of the word ‘river’ than entities in WordNet/DBpedia. Labels associated when it appears to the right of the word with these entities can then be collected to form ‘merchant’. When linking a word with separate term-list representations for a user’s information contained in a linguistic resource, the tweets and LinkedIn profile information. correct interpretation of the word must be chosen. Plumbaum et al. describe a Social Web User The NLP technique Word Sense Disambiguation Model as consisting of the following attributes: (WSD) addresses this issue. Two different Personal Characteristics, Interests, Knowledge approaches to WSD are described below. and Behavior, Needs and Goals and Context Magnini et al. perform WSD using WordNet as (2011). Inferring ‘Personal Characteristics’ (i.e. well as domain labels provided by the WordNet demographic information) from either a user’s Domains project2 (2002). This project assigned tweets or their LinkedIn profile information would domain labels to WordNet synsets in accordance require a very different kind of analysis from that with the Dewey Decimal Classification. However, described in this paper, for example that WordNet has been updated with new synsets since performed by Schler and Koppel (2006). As Magnini et al.’s study. Therefore, in the study Plumbaum et al. define ‘Behaviour’ and ‘Needs and Goals’ as system-specific characteristics, 2 http://wndomains.fbk.eu/ information about a specific system would be 2 required to infer them. ‘Context’ requires to recommend items to them e.g. news articles. information such as the user’s role in their social Furthermore, as user activity is significantly network to be inferred. higher on Twitter than on LinkedIn (Bullas, Given the facts stated in the previous paragraph, 2015), users may discuss recent interests on the the term lists generated by the analysis in this former without updating the latter. The second study are taken as not describing ‘Personal research question of this study is as follows: Characteristics’, ‘Behavior’, ‘Needs and Goals’ RQ2. Can information obtained from a user’s and ‘Context’. However, a term-list format has been used to represent interests and knowledge by tweets be used to recommend items for that user’s Ma, Zeng, Ren, and Zhong (2011) and Hauff and LinkedIn page? Houben (2011), respectively. As mentioned These questions aim to investigate: (i) The previously, users’ Twitter and LinkedIn profiles variation between a user’s description of their can contain information about both characteristics. Thus, the term lists generated in this study are interests and knowledge through their LinkedIn taken as representing a combination of the user’s profile and their description of these interests and knowledge. characteristics through their tweets (ii) Whether a user’s tweets can be used to augment the 3 Research Questions information in their LinkedIn profile. 3.1 Research Question 1 4 Method This question investigates the possibility that a The user study method is applied in this research. user may represent their interests and knowledge This decision is taken with reference to work such differently on different Social Media services. It is as Lee and Brusilovsky (2009) and Reinecke and as follows: Bernstein (2009). The aforementioned authors RQ1. To what extent does a user’s description of employ user studies in order to determine the their interests and knowledge through their tweets accuracy with which their systems infer correspond with their description of the same information about users. characteristics through their profile information 4.1 Analysis on LinkedIn? The analysis adopted in this study consists of four For example, 'Linguistics’ may be the most stages: discussed item in the user’s LinkedIn profile, but Identify keywords only the third most discussed item in their tweets. Link these keywords to labels in DBpedia This question is similar to that investigated by /Extended WordNet Domains) Hauff and Houben (2011).