Exploiting Social Tagging Network for Web Mining and Search

Exploiting Social Tagging Network for Web Mining and Search A Thesis Submitted to the Faculty of Drexel University by Caimei Lu in partial fulfillment of the requirements for the degree of Doctor of Philosophy November 2011 © Copyright 2011 Caimei Lu All Rights Reserved. i ACKNOWLEDGEMENTS First and foremost I offer my sincere gratitude to my advisor Xiaohua (Tony) Hu for his supervision, encouragement and support from the preliminary to the concluding level of my thesis. He showed me an excellent example of a successful researcher and professor. The energy, enthusiasm and joy he has for his research motivated all his students and the people around him, including me. I believe the experience of working with him over the past five years will continue to benefit me in my future career. Dr. Jung-Ran Park deserves special thanks for being my co-advisor. I am much indebted to Dr. Park for her supervision at the initial stage of my Ph. D. study, and her valuable advice and funding support through the completion of this thesis. My sincere thanks go to Dr. Il-Yeol Song who always kindly granted me his precious time for answering my questions and providing me useful advice on my research, teaching, and career development. For this dissertation, I also would like to thank my committee members, Christopher Yang, Yuan An, and Jin Wen. I thank them for their constructive comments on this thesis and insightful questions in my thesis defense. It is also a pleasure to pay tribute to the members of the research group supervised by Tony Hu. The group has been a source of good advice, collaboration and friendship. I benefited a lot from the outstanding work of Davis Zhou and Xiandan Zhang. The Dragon Toolkit developed by them was a very useful tool for my research experiments. Many thanks go to Xin Chen for research discussion and the pleasure working together in ii the design of algorithms and development of research papers. I am also grateful to all the fellow Ph.D. students I met at Drexel iSchool. I would like to thank Shanshan Ma, Palakorn Achananuparp, Ornsiri Thonggong, Ki Jung Lee, Jian Zhang, Sidath Gunawardena, Jia Huang, Zunyan Xiong, and Mi Zhang for being my great classmates, collaborators and friends and for all the fun we have had together in the last five years. I gratefully acknowledge the funding sources that made my Ph.D. work possible. The research related to this thesis was supported in part by an IMLS Early Career Development Grant (2006-2010), NSF Career Grant (NSF IIS 0448023), and PA Department of Health Grant (No. 239667). Lastly, I owe my deepest gratitude to my family. This thesis would not have been possible without their support. I wish to thank my parents for raising me as a responsible person and constantly supporting me in all my pursuits; My special gratitude goes to my elder brother Yonggang Lu, who had been a role model for me to follow since I was a teenager. I own him many thanks for showing me the joy of intellectual pursuit and generously helping me whenever I encountered difficulties in my study or life; And most of all, I wish to thank my loving, encouraging, and patient husband Haishan Zhao for his faithful support on my journey to the competition of this thesis. iii TABLE OF CONTENTS Table of Contents ............................................................................................................... iii List of Tables ................................................................................................................... viii List of Figures ..................................................................................................................... x Abstract ............................................................................................................................. xii CHAPTER 1: Introduction ................................................................................................. 1 1.1 The Potential of Social Tagging for Web Mining and Search ................................ 2 1.2 The Challenges of Utilizing Social Tagging for Web Mining and Search ............. 4 1.3 Research Questions and Framework ....................................................................... 7 1.3.1 The Effectiveness of Social Tags as Document Features ............................... 7 1.3.2 Tag-based Web Clustering .............................................................................. 9 1.3.3 Personalized Search on Tagged Web ............................................................ 10 1.3.4 Topic Models of Tagged Web ...................................................................... 11 1.4 Data Sets for Experimentation .............................................................................. 13 1.5 The Organization of the Thesis ............................................................................. 13 CHAPTER 2: Literature Review ...................................................................................... 15 2.1 Usage Patterns and Semantics of Social Tags ...................................................... 15 2.1.1 Power Law Distribution ................................................................................ 15 2.1.2 Stable Pattern in Social Tagging ................................................................... 16 2.1.3 Tag Categories .............................................................................................. 17 2.1.4 Meaningfulness of Social Tags ..................................................................... 18 2.2 Social Tags as Index terms ................................................................................... 18 iv 2.2.1 Social Tags and Professionally Created Metadata ........................................ 18 2.2.2 Social Tags, Full Text, and Query ................................................................ 20 2.3 Tag-based Web Clustering .................................................................................... 21 2.3.1 Clustering on Extended Document Features ................................................. 21 2.3.2 Clustering on Social Tagging........................................................................ 23 2.4 Personalized Search on Tagged Web .................................................................... 24 2.4.1 Personalized Search ...................................................................................... 24 2.4.2 Personalized Search based on Social Tagging .............................................. 25 2.5 Topic Models of Tagged Web .............................................................................. 26 2.5.1 Topic Analysis using Generative Models ..................................................... 26 2.5.2 Topic Models of Tagged Web ...................................................................... 27 CHAPTER 3: A Comparison between Soctial Tags and Expert-assigned Subject Terms 30 3.1 Introduction ........................................................................................................... 30 3.2 Dataset................................................................................................................... 33 3.3 Analysis and results .............................................................................................. 35 3.3.1 Overlapping vocabulary ................................................................................ 36 3.3.2 Frequent tags and LC subject headings ........................................................ 38 3.3.3 Tags and LC subject headings annotating the same book ............................ 40 3.3.4 Tags and LCSH subdivisions ........................................................................ 42 3.3.5 Tags, LC subject headings and book titles.................................................... 45 3.4 Conclusion ............................................................................................................ 46 3.5 Research Question Tested ..................................................................................... 48 v CHAPTER 4: A Comparison of Social Tags, Author-Provided Metadata and Content Words ................................................................................................................................ 49 4.1 Introduction ........................................................................................................... 49 4.2 Dataset................................................................................................................... 52 4.3 Vocabulary Analysis ............................................................................................. 54 4.4 Clustering Analysis ............................................................................................... 57 4.4.1 Clustering Standard and Quality Metrics ...................................................... 58 4.4.2 Clustering Method and Results ..................................................................... 59 4.5 Conclusion ............................................................................................................ 63 4.6 Research Question Tested ..................................................................................... 64 CHAPTER 5: Clustering the Tagged Web ....................................................................... 65 5.1 Introduction ........................................................................................................... 65 5.2 Tripartite Clustering Model .................................................................................

Load more