Organizing Information in Medical Blogs Using a Hybrid Taxonomy-Folksonomy Approach

Total Page:16

File Type:pdf, Size:1020Kb

Organizing Information in Medical Blogs Using a Hybrid Taxonomy-Folksonomy Approach Journal of Web Engineering, Vol. 1, No.1 (2015) 181-195 © Rinton Press ORGANIZING INFORMATION IN MEDICAL BLOGS USING A HYBRID TAXONOMY-FOLKSONOMY APPROACH YAMEN BATCH Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology National University of Malaysia, Selangor, Malaysia [email protected] MARYATI MOHD. YUSOF Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology National University of Malaysia, Selangor, Malaysia [email protected] Received October 11, 2014 Revised January 5, 2015 The retrieval of health-related information from medical blogs is challenging, primarily because these blogs lack a systematic method to organize their posts. This paper investigated the application of a hybrid taxonomy-folksonomy approach in medical blogs by reviewing the existing approaches that apply the hybrid taxonomy-folksonomy structure in web-based systems. The review showed that the hybrid structure was promising for enhanced classification of resources in web-based systems; particularly in a specific type of medical blog known as a physician-written blog. However, further research is needed to truly identify the long-term impact of the hybrid structure and its benefit in achieving a better organization of other categories of medical blogs. Key words: Web-Based Systems, Medical Blogs, Folksonomy, Taxonomy Communicated by: M. Gaedke & L. Carr 1 Introduction The Internet is leading an evolution in the manner in which health information is delivered to health professionals and consumers [1]. The term “e-health” is broadly used to describe the use of the internet or web technology in healthcare [2]. One of the major advances of the internet was the introduction of the Web 2.0 technology in 2004 [3], which represented a new generation of websites and services that supported online collaboration and sharing among users [4]. This provided benefits for an easy-to-use and free social software [5]. Patients participating in Web 2.0 communities could share information about their conditions, diagnoses and medications with others [6]. Physicians also use Web 2.0 applications (e.g., wikis, blogs, forums) to share advice and expertise with other physicians and keep up-to-date with the latest advances in their specialties [7]. Web 2.0 also strengthened the relationship 182 Organizing Information in Medical Blogs Using a Hybrid Taxonomy-Folksonomy Approach between patients and doctors [8]. Some Web 2.0 sites can assist patients in selecting the best doctor or health service [9], or even make appointments with physicians [10]. A growing interest was witnessed in the web-based collaboration ware (namely wikis, blogs and podcasts) which was adopted in the dissemination of health-related information among health professionals and health consumers [11]. The use of blogs in the healthcare context is growing [12, 13]. A medical blog primarily discuss healthcare topics; i.e., its posts usually focuses on medical topics such as diseases, medical treatments and medications [14]. Medical blogs provide health professionals with new channels to share health information with patients and members of the public [15, 16]. These blogs are categorized based on their authors: blogs that were (1) physician-written, (2) nurse-written and (3) patient-written [17]. Patients use blogs to share their own experiences on health and diseases [17]; a few examples include Diabetes Mine blog and My Breast Cancer blog. In contrast, health professionals use blogs to share their practical knowledge and skills [17]; such as in the Clinical Cases blog and Kevin MD blog. Health professionals and health consumers produce significant health-related content through blogs [18]. However, similar to other Web 2.0 sites, medical blogs have drawbacks that include scattered information of an uncertain quality [16]. Retrieving the content of medical blogs is currently challenging, particularly in terms of extracting relevant health-related information; given the lack of systematic methods in the organization of blog posts [19]. Therefore, these blogs still do not meet the expectations of their users in terms of retrieving relevant health information from their posts [20]. One of the most important functions of web information retrieval systems is to organize contents in such a way that users are able to easily retrieve relevant information [21]. Information organization (i.e., how the information is structured) is a crucial aspect to consider in order to achieve an enhanced information retrieval result [22]. Two types of information organization schemes are used in web- based systems: taxonomies (i.e., predefined classifications) and folksonomies (i.e., user-generated classifications). Taxonomies offered consistent classifications of web resources; however, they were not able to represent users’ vocabularies that continue to emerge in online communities. Folksonomies offered flexible and adaptable classifications of web resources; however, they lacked precision when describing resources. Several approaches have hybridized taxonomies with folksonomies to better organize and classify web resources; unfortunately, how a hybrid taxonomy-folksonomy structure can contribute to organizing information in medical blogs is still not well understood. The aim of this paper is to explore the benefits of applying a hybrid taxonomy-folksonomy in medical blogs. To achieve this aim, first, we discussed the characteristics and content of medical blogs. Second, we compared taxonomies to folksonomies and reviewed the approaches that integrated both classifications in web-based systems. Third, we presented an example of applying the hybrid approach in physician-written blogs highlighting its benefits and limitations. Finally, the paper was concluded with some implications for further research. 2 The Characteristics of Medical Blogs Medical posts contain information about medical concepts such as diagnoses, procedures or medications [23]. To gain a better understanding of their characteristics and contents, some of the research accomplished in the field of medical blogs were reviewed and discussed. To date, only a few studies have focused on the content of medical blogs. Table 1 summarized these studies. Y. Batch and M. M. Yusof 183 Author Objective Method Results Denecke [14] To introduce an The algorithm used an existing Distinguished algorithm to categorize information extraction system informative posts posts according to their (SeReMeD) to extract entities on from affective posts information type and to diagnoses, procedures and in medical blogs determine whether they medications based on a fixed and identified contained medical classification of related medical medical-related content. semantic types. information in posts. Lagu, Kaufman [15] To evaluate the content Two hundred and seventy-one Nearly half of the of blogs written by medical blogs were chosen and blogs discussed health professionals five entries per blog were healthcare-related (i.e., health-related or reviewed to identify content. topics. Over half of otherwise) and the the blogs had characteristics of identifiable authors. medical bloggers. More than half of the blogs described interactions with individual patients or promoted healthcare products. Denecke [24] To introduce a method The method was based on The method helped to study diversity in information extraction and domain improve post medical blogs’ topics. knowledge and was applied to a retrieval by set of medical posts. detecting the medical category of the post. Miller and Pole [12] To analyze the content Identified a sample of 951 health Most blogs focused of health blogs. blogs in 2007 and 2008. All blogs on health topics were U.S.-focused and were from professional updated regularly; their features and patient and topics were analyzed. perspective. Wagner, Paquin [25] To assess the Conducted a content analysis of In order to improve prevalence of various genetics blogs (n = 94) and Genetics blogs genetics-related topics archived blog contents published access, bloggers and perceived between June 15, 2007 and June should consider credibility indicators in 15, 2009. updating blog Genetics Blogs. content frequently and providing information related to their expertise. Greenberg, Yaari [26]To examine the Constructed six blogs on diabetes The participants credibility of medical treatment. Each blog was viewed have an attitude of blogs and their by approximately 60 participants. scepticism and/or published information The participants were asked to fill criticism of many as perceived by their in a short questionnaire that aspects of the readers. measured their perceived information in the credibility of the blog and its blogs in spite of information. their desire and readiness to use this information. Table 1 Summary of studies that discussed the content of medical blogs 184 Organizing Information in Medical Blogs Using a Hybrid Taxonomy-Folksonomy Approach The first study by Denecke [14] introduced an algorithm to classify medical posts according to their information type, and further distinguished informative posts from affective posts. A medical post was considered affective if it described the author’s feelings on treatments, diseases or medications. A medical post was considered informative if it contained information on diseases, treatments or information related to healthcare. An affective post was regarded as unhelpful to the readers since it reflected authors’ emotions and did not hold any
Recommended publications
  • Studying Social Tagging and Folksonomy: a Review and Framework
    Studying Social Tagging and Folksonomy: A Review and Framework Item Type Journal Article (On-line/Unpaginated) Authors Trant, Jennifer Citation Studying Social Tagging and Folksonomy: A Review and Framework 2009-01, 10(1) Journal of Digital Information Journal Journal of Digital Information Download date 02/10/2021 03:25:18 Link to Item http://hdl.handle.net/10150/105375 Trant, Jennifer (2009) Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information 10(1). Studying Social Tagging and Folksonomy: A Review and Framework J. Trant, University of Toronto / Archives & Museum Informatics 158 Lee Ave, Toronto, ON Canada M4E 2P3 jtrant [at] archimuse.com Abstract This paper reviews research into social tagging and folksonomy (as reflected in about 180 sources published through December 2007). Methods of researching the contribution of social tagging and folksonomy are described, and outstanding research questions are presented. This is a new area of research, where theoretical perspectives and relevant research methods are only now being defined. This paper provides a framework for the study of folksonomy, tagging and social tagging systems. Three broad approaches are identified, focusing first, on the folksonomy itself (and the role of tags in indexing and retrieval); secondly, on tagging (and the behaviour of users); and thirdly, on the nature of social tagging systems (as socio-technical frameworks). Keywords: Social tagging, folksonomy, tagging, literature review, research review 1. Introduction User-generated keywords – tags – have been suggested as a lightweight way of enhancing descriptions of on-line information resources, and improving their access through broader indexing. “Social Tagging” refers to the practice of publicly labeling or categorizing resources in a shared, on-line environment.
    [Show full text]
  • Why Do Folksonomies Need Semantic Web Technologies? Hak-Lae Kim Samsung Electronics Co., LTD
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by AIS Electronic Library (AISeL) Association for Information Systems AIS Electronic Library (AISeL) Americas Conference on Information Systems AMCIS 2010 Proceedings (AMCIS) 8-2010 Why Do Folksonomies Need Semantic Web Technologies? Hak-Lae Kim Samsung Electronics Co., LTD. 416, Maetan-3dong, Yeongtong-gu Suwon-city, Gyeonggi-do, 443-742 Korea, [email protected] Jae-Hwa Choi Dankook University, [email protected] Follow this and additional works at: http://aisel.aisnet.org/amcis2010 Recommended Citation Kim, Hak-Lae and Choi, Jae-Hwa, "Why Do Folksonomies Need Semantic Web Technologies?" (2010). AMCIS 2010 Proceedings. 68. http://aisel.aisnet.org/amcis2010/68 This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 2010 Proceedings by an authorized administrator of AIS Electronic Library (AISeL). For more information, please contact [email protected]. Kim, et al. Why Do Folksonomies Need Semantic Web Technologies? Why Do Folksonomies Need Semantic Web Technologies? Hak-Lae Kim Jae-Hwa Choi Samsung Electronics Co., LTD. College of Business 416, Maetan-3dong, Yeongtong-gu Dankook University Suwon-city, Gyeonggi-do, 443-742 San#29, Anseo-dong, Dongnam-gu, Cheonan-si, Korea Chungnam, 330-714, Korea [email protected] [email protected] ABSTRACT This paper is to investigate some general features of social tagging and folksonomies along with their advantages and disadvantages, and to present an overview of a tag ontology that can be used to represent tagging data at a semantic level using Semantic Web technologies.
    [Show full text]
  • Folksonomies - Cooperative Classification and Communication Through Shared Metadata
    Folksonomies - Cooperative Classification and Communication Through Shared Metadata Adam Mathes Computer Mediated Communication - LIS590CMC Graduate School of Library and Information Science University of Illinois Urbana-Champaign December 2004 Abstract This paper examines user-generated metadata as implemented and applied in two web services designed to share and organize digital me- dia to better understand grassroots classification. Metadata - data about data - allows systems to collocate related information, and helps users find relevant information. The creation of metadata has generally been approached in two ways: professional creation and author creation. In li- braries and other organizations, creating metadata, primarily in the form of catalog records, has traditionally been the domain of dedicated profes- sionals working with complex, detailed rule sets and vocabularies. The primary problem with this approach is scalability and its impracticality for the vast amounts of content being produced and used, especially on the World Wide Web. The apparatus and tools built around professional cataloging systems are generally too complicated for anyone without spe- cialized training and knowledge. A second approach is for metadata to be created by authors. The movement towards creator described docu- ments was heralded by SGML, the WWW, and the Dublin Core Metadata Initiative. There are problems with this approach as well - often due to inadequate or inaccurate description, or outright deception. This paper examines a third approach: user-created metadata, where users of the documents and media create metadata for their own individual use that is also shared throughout a community. 1 The Creation of Metadata: Professionals, Con- tent Creators, Users Metadata is often characterized as “data about data.” Metadata is information, often highly structured, about documents, books, articles, photographs, or other items that is designed to support specific functions.
    [Show full text]
  • Technical Folksonomy of Tagging Information
    IRA-International Journal of Education & Multidisciplinary Studies ISSN 2455–2526; Vol.03, Issue 03 (2016) Institute of Research Advances http://research-advances.org/index.php/IJEMS Technical Folksonomy of Tagging Information Sonali Dapsi Librarian, Raja Peary Mohan College, India. Sudip Ranjan Hatua Asst Prof. Rabindra Bharati University, Kolkata, India. DOI: http://dx.doi.org/10.21013/jems.v3.n3.p6 How to cite this paper: Dapsi, S., & Hatua, S. (2016). Technical Folksonomy of Tagging Information. IRA International Journal of Education and Multidisciplinary Studies (ISSN 2455–2526), 3(3). doi:http://dx.doi.org/10.21013/jems.v3.n3.p6 © Institute of Research Advances This works is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License subject to proper citation to the publication source of the work. Disclaimer: The scholarly papers as reviewed and published by the Institute of Research Advances (IRA) are the views and opinions of their respective authors and are not the views or opinions of the IRA. The IRA disclaims of any harm or loss caused due to the published content to any party. 333 IRA-International Journal of Education & Multidisciplinary Studies ABSTRACT The Study of the various articles published in Library and Information Science Journals in the resent times shows that the keywords provided by the authors along with their articles are mostly uncontrolled. They are basically phrases. In spite of the knowledge of controlled vocabulary and various subject heading scheme they mostly are using natural word and sentences to represent the thought content of their research outcomes. This is generating new trends of representing subjects known as technical folksonomy.
    [Show full text]
  • Ontology Vs. Folksonomy
    Ontology vs. Folksonomy Theodosia Togia 1 Comparison ONTOLOGY FOLKSONOMY designed by knowledge engineers collaboratively created by users laborious quick & easy requires expertise no expertise needed hard to implement on a large scale used on large-scale document collections controlled vocabulary no vocabulary control engineer's view of the world social aspect of meaning formal specification of knowledge domains informal metadata on documents structured unstructured (but structure emerges) not necessarily web-based typically web-based essential for the semantic web important in web 2.0 (social web) high expressive power low expressive power explicit meaning ambiguity synonyms, homonyms etc. can be clearly stated synonyms separated; homonyms conflated 2 Folksonomies • broad vs. narrow folksonomies • social vs. personal tagging (public vs. private tags) • tagging behaviour can give information about language and society • entry points to document collections: documents (their tags and users can be browsed), tags (their documents and users can be browsed), users (their tags and documents can be browsed) • not just typical information retrieval, but also resource discovery 1 3 Ontologies • facilitate interpersonal communication, human-computer interaction, inter-computer interaction • can be extended with new predicates (known as `concepts' or `classes' if they have 1 argument; `properties' or `relations' if they have at least 2 arguments) and axioms/rules. • How complex/expressive is an ontology? Depends on terminology. (usually frames, Description Logics, First-Order Logic; maybe higher- order logics too) • relations of equivalence (e.g. `equals', `hasOpposite'), hierarchy (e.g. `isA'/`isSubclassOf', `isPartOf', `isInstanceOf'), other associations (e.g. `hasColour') • decisions to be made: granularity, classes vs. instances 4 Competing or Complementary? Folksonomy and Ontology both: • broaden the spectrum of knowledge representation in different direc- tions.
    [Show full text]
  • Visualizing Social Indexing Semantics
    PREPRINT Visualizing Social Indexing Semantics Yusef Hassan-Montero1 and Víctor Herrero-Solana2 March 2007 1 SCImago Research Group. [email protected] 2 SCImago Research Group. [email protected] Abstract Social tagging is a distributed indexing process where web resources are described by means of tags – freely chosen keywords or labels – by their users. The aggregated result of social tagging is usually known as Folksonomy, an index where each resource is related to different tags by different users. This paper describes an approach to visualize both the overview and detail of semantic relationships intrinsic in the folksonomy. For this purpose, folksonomy is displayed as a network: tags are represented as nodes and semantic relationships as links between nodes. The semantic relationships are measured using the Jaccard coefficient, which determines the thickness of the link between tags. The bisecting K-mean algorithm is applied to cluster the tags, and the tags in the same cluster are displayed with the same colour. In order to ensure that the overview of social tagging is comprehensible, the Pathfinder Network Scaling algorithm is used to prune the network and only show the most significant links among tags. In addition, to get a better understanding of the local semantic relationships of a tag, an interactive procedure was applied to display some important links pruned by Pathfinder when the user moves mouse over a tag. The presented approach provides a means of knowledge acquisition and understanding of the socially constructed meaning of tags, as well as a means of visual information retrieval. Keywords Social Indexing, Pathfinder Networks, Clustering, Visual Information Retrieval Interfaces 1.
    [Show full text]
  • The Advantages and Disadvantages of Social Tagging: Evaluation of Delicious Website1
    The Advantages and Disadvantages of Social Tagging: Evaluation of Delicious Website1 Ruslan Lecturer of Library Science Department Faculty of Letter and Humanism Ar-Raniry State Islamic University Banda Aceh - Indonesia E-mail: [email protected] Introduction The internet is the fastest growing and largest tool for mass communication and information distribution in the world. It can be used to distribute large amounts of information anywhere in the world at a minimal cost. This progress also can be seen from the emergence of Web 2.02, a form of social computing which engages consumers at the grassroots level in systems that necessitate creative, collaborative or information sharing tasks. Web 2.0 encompasses social bookmarking, blogging, wikis and online social networking among others. One of the ways users can do this is through tagging. Tagging is referred to with several names: collaborative tagging, social classification, social indexing, folksonomy, etc. The basic principle is that end users do subject indexing instead of experts only, and the assigned tags are being shown immediately on the Web (Voss, 2007). Nowadays, social bookmarking systems have been successful in attracting and retaining users. This success initially originated from members’ ability to centrally store bookmarks on the web. 1 This is assignment paper of author when studying in School of Information Science, McGill University, Montreal-Canada, 2009. 2 Web 2.0 is a set of economic, social, and technology trends that collectively form the basis for the next generation of the Internet—a more mature, distinctive medium characterized by user participation, openness, and network effects.
    [Show full text]
  • Taxonomy Directed Folksonomies
    2nd Version Date : 19/06/2007 TAXONOMY DIRECTED FOLKSONOMIES Integrating user tagging and controlled vocabularies for Australian education networks Sarah Hayman and Nick Lothian education.au Adelaide Australia Meeting: 157 Classification and Indexing Simultaneous Interpretation: No WORLD LIBRARY AND INFORMATION CONGRESS: 73RD IFLA GENERAL CONFERENCE AND COUNCIL 19-23 August 2007, Durban, South Africa http://www.ifla.org/iv/ifla73/index.htm 1 Abstract What is the role of controlled vocabulary in a Web 2.0 world? Can we have the best of both worlds: balancing folksonomies and controlled vocabularies to help communities of users find and share information and resources most relevant to them? education.au develops and manages Australian online services for education and training. Its goal is to bring people, learning and technology together. education.au projects are increasingly involved in exploring the use of Web 2.0 developments building on user ideas, knowledge and experience, and how these might be integrated with existing information management systems. This paper presents work being undertaken in this area, particularly in relation to controlled vocabularies, and discusses the challenges faced. Education Network Australia (edna) is a leading online resource collection and collaborative network for education, with an extensive repository of selected educational resources with metadata created by educators and information specialists. It uses controlled vocabularies for metadata creation and searching, where users receive suggested related terms from an education thesaurus, with their results. We recognise that no formal thesaurus can keep pace with user needs so are interested in exploiting the power of folksonomies. We describe a proof of concept project to develop community contributions to managing information and resources, using Taxonomy-Directed Folksonomy.
    [Show full text]
  • Integration Guide 7.8.0-12 Table of Contents
    CUE Content Store Integration Guide 7.8.0-12 Table of Contents 1 Introduction.......................................................................................................................................... 6 1.1 Conventions Used in This Manual......................................................................................... 7 1.2 Backwards Compatibility.........................................................................................................7 1.3 How The Web Service Works................................................................................................7 1.4 Change Logs.......................................................................................................................... 9 1.4.1 Accessing Change Logs From the "Start" Feed..................................................... 10 1.4.2 Accessing a Section's Change Log.........................................................................12 1.4.3 Accessing Tag Structure Change Logs...................................................................12 1.4.4 Accessing Inbox/List Change Logs......................................................................... 13 1.5 Searching..............................................................................................................................13 2 Supported MIME Types.....................................................................................................................15 2.1 application/atom+xml...........................................................................................................
    [Show full text]
  • UNDERSTANDING a FOLKSONOMY AS a WEB CLASSIFICATION By
    UNDERSTANDING A FOLKSONOMY AS A WEB CLASSIFICATION by HEE JIN PARK A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Library, Archival and Information Studies) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) December 2009 © Hee Jin Park, 2009 ABSTRACT Despite increasing interest in folksonomy in practice as well as in research, little has been done to build a solid conceptual framework to understand how people classify Web resources using a folksonomy. This study is an attempt to articulate a conceptual framework that will help us better understand users’ interactions with a folksonomy. The conceptual framework consists of three components of users’ interactions with the folksonomy: (1) tagging – cognitive categorization of an individual user with a Web accessible resource; (2) navigation – exploration and discovery of Web accessible resources in the folksonomic system; and (3) knowledge sharing – representation and communication of knowledge within a domain that consists of a group of users who share the same interests or goals. The current study is exploratory and descriptive, focusing on the first component of users’ interaction with a folksonomy, tagging. The purpose of this study is to explore how users are tagging in order to utilize a folksonomy; and whether or how they understand the social and interactive aspects of tagging in three different folksonomic systems, Connotea (www.connotea.org ), Delicious ( http://delicious.com ), and CiteULike ( www.citeulike.org ). The study uses Web questionnaires, qualitative diary studies, and follow-up interviews to ii understand 12 participants’ tagging activities associated with folksonomic interactions.
    [Show full text]
  • Folksonomy with Practical Taxonomy, a Design for Social Metadata of the Virtual Museum of the Pacific
    University of Wollongong Research Online Faculty of Creative Arts - Papers (Archive) Faculty of Arts, Social Sciences & Humanities 1-1-2009 Folksonomy with practical taxonomy, a design for social metadata of the virtual museum of the Pacific Peter W. Eklund University of Wollongong, [email protected] Peter Goodall University of Wollongong, [email protected] Timothy Wray University of Wollongong, [email protected] Vinod Daniels The Australian Museum Melanie Van Olffen The Australian Museum Follow this and additional works at: https://ro.uow.edu.au/creartspapers Part of the Arts and Humanities Commons, and the Social and Behavioral Sciences Commons Recommended Citation Eklund, Peter W.; Goodall, Peter; Wray, Timothy; Daniels, Vinod; and Van Olffen, Melanie: Folksonomy with practical taxonomy, a design for social metadata of the virtual museum of the Pacific 2009, 112-117. https://ro.uow.edu.au/creartspapers/217 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] The 6th International Conference on Information Technology and Applications (ICITA 2009) Folksonomy with Practical Taxonomy, a Design for Social Metadata of the Virtual Museum of the Pacific1 Peter Eklund†, Peter J. Goodall†, Tim Wray†, Vinod Daniel‡ and Melanie Van Olffen‡ navigation by unobtrusively rendering a concept view with Abstract—The Virtual Museum of the Pacific is a Digital links to its upper (more general) and lower (more specialised) Ecosystem that engages members of several communities, neighbours. The navigation paradigm is based on a technique each with their own ontological relationships with the called Formal Concept Analysis [1] and the design results Pacific Collection of the Australian Museum.
    [Show full text]
  • Folksonomy-Based Term Extraction for Word Cloud Generation
    Folksonomy-Based Term Extraction for Word Cloud Generation DAVID CARMEL, EREL UZIEL, IDO GUY, YOSI MASS, and HAGGAI ROITMAN, IBM Research - Haifa Lab In this work we study the task of term extraction for word cloud generation in sparsely tagged domains, in which manual tags are scarce. We present a folksonomy-based term extraction method, called tag-boost, which boosts terms that are frequently used by the public to tag content. Our experiments with tag-boost based term extraction over different domains demonstrate tremendous improvement in word cloud quality, as reflected by the agreement between manual tags of the testing items and the cloud’s terms extracted from the items’ content. Moreover, our results demonstrate the high robustness of this approach, as compared to alternative cloud generation methods that exhibit a high sensitivity to data sparseness. Additionally, 60 we show that tag-boost can be effectively applied even in nontagged domains, by using an external rich folksonomy borrowed from a well-tagged domain. Categories and Subject Descriptors: H.3.3 [Information Systems]: Information Storage and Retrieval— Information Search and Retrieval General Terms: Algorithms, Experimentation Additional Key Words and Phrases: Tag-cloud generation, keyword extraction, tag-boost ACM Reference Format: Carmel D., Uziel E., Guy I., Mass Y., and Roitman H. 2012. Folksonomy-based term extraction for word cloud generation. ACM Trans. Intell. Syst. Technol. 3, 4, Article 60 (September 2012), 20 pages. DOI = 10.1145/2337542.2337545 http://doi.acm.org/10.1145/2337542.2337545 1. INTRODUCTION The emergence of social media applications in recent years has encouraged people to be actively involved in content creation and classification.
    [Show full text]