A Semantic Tag Recommendation Framework for Collaborative Tagging Systems
Total Page:16
File Type:pdf, Size:1020Kb
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220876230 A Semantic Tag Recommendation Framework for Collaborative Tagging Systems Conference Paper · October 2011 DOI: 10.1109/PASSAT/SocialCom.2011.170 · Source: DBLP CITATIONS READS 2 89 3 authors, including: Konstantinos N. Vavliakis Pericles A. Mitkas Aristotle University of Thessaloniki Aristotle University of Thessaloniki 21 PUBLICATIONS 60 CITATIONS 272 PUBLICATIONS 1,634 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Pericles A. Mitkas on 06 April 2014. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. 2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing A Semantic Tag Recommendation Framework for Collaborative Tagging Systems Zinovia I. Alepidou∗, Konstantinos N. Vavliakis∗,† and Pericles A. Mitkas∗,† ∗Electrical and Computer Engineering, Aristotle University of Thessaloniki †Informatics and Telematics Institute, CERTH Thessaloniki, Greece Email: [email protected], [email protected], [email protected] Abstract—In this work we focus on folksonomies. Our goal is II. RELATED WORK to develop techniques that coordinate information processing, by taking advantage of user preferences, in order to auto- Numerous methodologies, such as statistical models, mul- matically produce semantic tag recommendations. To this end, tilabel classifiers and collaborative systems that combine we propose a generalized tag recommendation framework that conveys the semantics of resources according to different user information from multiple resources have been proposed profiles. We present the integration of various models that for tag recommendation. These methodologies are based take into account content, historic values, user preferences on resource attributes and have been applied in different and tagging behavior to produce accurate personalized tag fields, such as YouTube, Flickr and music resources. In recommendations. Based on this information we build several all cases the goal is to accurately specify the resource’s Bayesian models, we evaluate their performance, and we dis- cuss differences in accuracy with respect to semantic matching content, in order to elicit words semantically related to it. criteria, and other approaches. The majority of tag recommendation systems are based on statistical models. These systems propose tags based on TF- Keywords-Folksonomy; tagging; recommendation; personal- ization; semantic evaluation. IDF weights (term frequency - inverse document frequency) built from the information contained in the content of the I. INTRODUCTION resource, the user profile, and the resource history. Folksonomies allow users to describe a set of objects us- Other techniques focus on eliciting latent concepts in- ing freely chosen keywords. Formally, folksonomies can be cluded in the document. LDA (Latent Dirichlet Allocation) regarded as a tripartite graph of three disjoint sets of vertices: [1], ACT (Author-Conference-Topic) [2] and NMF (Non- a) users, U = {a1, ..., an}, b) resources R = {r1, ...rn} and Negative Matrix Factorization) classify a resource based on c) tags, T = {t1, ..., tn}, thus defined by a set of annotations its main concepts. These algorithms attribute resources to F ⊆ U × R × T . appropriate categories and then propose a corresponding set The selection of keywords depends on user’s comprehen- of tags. Furthermore, well accepted machine learning tech- sion of the concept/entity being tagged. Different attributes niques, like clustering, logistic regression, and classification of the resource, i.e. title, content, author, creation date, as trees [3] have been extensively applied in order to produce well as individual user attributes, i.e. preferences, profile and tag recommendations. Many methodologies concentrate only background, in correlation with the tagging time and date, on the content of resources. In this context, [4] uses FDT contribute towards creating user’s individual perception on a (Feature-Driven Tagging), a fast method that indexes tags by particular concept/entity. Our main goal is to produce user- features and then creates a list of weighted tags that form centric, yet generalized, algorithms which simulate human the pool of possible suggestions. Moreover, [5] calculates behavior towards tagging. the importance of words derived from the resource and Three are the main contributions of our work: 1) We combines information sources with barely-used tags. provide a modular tag recommendation framework, built A group of recommendation techniques propose tags on user-centric Bayesian models. 2) We combine and eval- based on similarities among posts. These similarities may uate different recommendation algorithms, and we show refer to the users of posts, the documents they are about to that different techniques should be used, depending on the mark, or the tags they assign to the resources [5]. In [6], available information. 3) We provide a linguistic approach to Marinho et al. introduce relational classifiers that take into tag recommendation, indicating that various algorithms may consideration information derived from all the resource’s yield different results, depending on the evaluation method. attributes, as well as latent semantic relations between posts. The paper is organized as follows: Section II presents For example, a scientific publication may be related to related work, while Section III outlines the preprocessing another one if both have been written by the same author steps, and the Bayesian models developed. Section IV dis- or have the same citations. Finally, Lipczak et al. in [7], cusses semantic and string based evaluation and the last integrate various recommendation mechanisms and generate section presents our conclusions and future work. a final tag recommendation set with high accuracy. 978-0-7695-4578-3/11 $26.00 © 2011 IEEE 633 DOI III. TAG RECOMMENDATION of a post. If both the resource and the user appear for the In the preprocessing phase we start by building a lexicon first time in the system, we can propose the most frequent as a repository where words extracted from the resource are tags of the training dataset in general (“MostFrequentTags” connected to a number of tags. Thus, final tag suggestion method). consists of the n tags with the higher co-occurrence probabil- In an attempt to take advantage of user’s person- ity based on the words that constitute the resource’s content. omy (user’s past tagging behavior), we created a method We created the lexicon using a linguistic approach. Linguis- called “UserPersonomy”, partially presented in Algorithm tic types such as articles, pronouns and conjunctions lack 1. Whereas “User History Tag” method simply suggests semantic content, thus together with common stop-words the most popular tags in the user profile, “UserPersonomy” were discarded. We used Wordnet to apply stemming in goes one step further, and counts the times a term derived order to acquire the root of each term and enrich our lexicon from a resource’s primary attribute is also present in user by adding the capability of hypernym search, alongside word personomy. In other words, this method is a union of search. Our system was built upon resource’s title, history, “TitleRecommender” and “User History Tags”, combining and other tags. Next, we briefly discuss on each of the three the former’s effectiveness, and the latter’s subjectivity. models. Algorithm 1 Partial pseudo-code for “UserPersonomy”. A. Tag recommendation based on resource’s title for ri resources in user’s personomy do The title of a resource is a key indicator for its con- extract wi terms of ’TitleRecommender’ tent, thus tags may be extracted from resource’s attributes for wi terms in the title do = ∩ without further elaboration. Nouns, adjectives and verbs are Scorewi wi tagi,user the richest parts of speech, as far as semantic content is i=i suggest wi with the highest Scorew concerned. WordNet allows us to retrieve only these parts i of speech from text, given the fact that it can recognize the linguistic type of a word. It is worth noticing that some domain specific terms, that convey semantic meaning only C. Tag to tag recommendation within specific subjects or specific communities (e.g. Folk- In some cases it may be useful to recommend tags based sonomy, elearning, bibtex) are not contained in WordNet. on other recommended tags, especially when the latter are These Non-WordNet terms are regarded as candidates for provided with high probability. Given a tag, other tags tag recommendations as they hold a significant proportion can be produced from its etymologically related terms, like of the terms contained in the training dataset. synonyms and hypernyms. This type of meta-recommenders We developed a “TitleRecommender” model, which de- may support the basic recommenders by improving the taches words from the basic resource’s attribute (title or quality and quantity of the recommendation, especially when URL), and suggests them as tags. Adjectives, nouns and the main recommender fails to provide a sufficient number Non-WordNet terms are extracted from the resource and of tags. Nevertheless, the main drawback of this technique proposed as tags. Furthermore, tag suggestions may also is that