A Dissertation Submitted to the Faculty of The
Total Page:16
File Type:pdf, Size:1020Kb
A Framework for Application Specific Knowledge Engines Item Type text; Electronic Dissertation Authors Lai, Guanpi Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 25/09/2021 03:58:57 Link to Item http://hdl.handle.net/10150/204290 A FRAMEWORK FOR APPLICATION SPECIFIC KNOWLEDGE ENGINES by Guanpi Lai _____________________ A Dissertation Submitted to the Faculty of the DEPARTMENT OF SYSTEMS AND INDUSTRIAL ENGINEERING In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2010 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by Guanpi Lai entitled A Framework for Application Specific Knowledge Engines and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy _______________________________________________________________________ Date: 4/28/2010 Fei-Yue Wang _______________________________________________________________________ Date: 4/28/2010 Ferenc Szidarovszky _______________________________________________________________________ Date: 4/28/2010 Jian Liu Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. ________________________________________________ Date: 4/28/2010 Dissertation Director: Fei-Yue Wang 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: Guanpi Lai 4 ACKNOWLEDGEMENTS I wish to thank my committee members who were more than generous with their expertise and precious time. A special thanks to Prof Fei-Yue Wang, my dissertation advisor and committee chair for his countless hours of reflecting, reading, encouraging, and most of all patience throughout the entire process. Thank you Prof. Ferenc Szidarovszky, Dr. Daniel Zeng, and Dr. Jian Liu for agreeing to serve on my committee. I especially thank Yanqing Gao, Yilu Zhou, Jialun Qin and many others for their encouragement and emotional support during my tough time. 5 DEDICATION This dissertation is dedicated to my family: my wife Xuetao Xu, my child Lucas Luming Lai, my parents Yangfu Lai and Chunrong Yang, and my parents-in-law Furong Xu and Guiju Liang. I give my deepest expression of love and appreciation for the encouragement that you gave during this long journey. 6 TABLE OF CONTENTS LIST OF TABLES ....................................................................................... 9 LIST OF FIGURES ................................................................................... 10 ABSTRACT .............................................................................................. 11 CHAPTER 1 INTRODUCTION ............................................................ 13 1.1 Motivation and Research Description ............................................................ 14 1.2 Organization of the Dissertation ..................................................................... 17 CHAPTER 2 UNDERSTAND DATA ON THE WEB ........................... 20 2.1 Two worlds of Data – Unstructured and Structured ....................................... 20 2.1.1 Manage unstructured data ....................................................................... 21 2.1.2 Structured data on the Web ..................................................................... 28 2.2 Life on the Web .............................................................................................. 33 2.2.1 Online Communities ............................................................................... 33 2.2.2 Peer-to-Peer World .................................................................................. 41 2.3 Conclusions .................................................................................................... 46 CHAPTER 3 A FRAMEWORK FOR APPLICATION SPECIFIC KNOWLEDGE ENGINES ........................................................................ 47 3.1 Knowledge Portals and Applications ............................................................. 49 3.2 An Overview of the Framework for Application Specific Knowledge Engines… .................................................................................................................... 54 3.3 Construction of Data Repositories .................................................................. 55 3.3.1 Data Collection ........................................................................................ 55 3.3.2 Data Preparation ...................................................................................... 71 3.3.3 Data Silo .................................................................................................. 72 3.4 Searching by KCF with Result Presentation .................................................. 79 3.4.1 KCF Processing ....................................................................................... 79 3.4.2 Semantic Search ...................................................................................... 83 3.4.3 Result Presentation .................................................................................. 85 3.5 Conclusions .................................................................................................... 89 7 TABLE OF CONTENTS - Continued CHAPTER 4 SEARCHING TERRORIST GROUPS ON THE INTERNET…… ........................................................................................ 90 4.1 Literature Review ........................................................................................... 92 4.1.1 Digital Archiving for Terrorists’ Resources............................................ 94 4.1.2 Terrorism Research Portals ..................................................................... 95 4.1.3 Multilingual Issues .................................................................................. 97 4.2 Research Questions......................................................................................... 98 4.3 Implementation of Dark Web Portal .............................................................. 99 4.3.1 Dark Web Data Collection Building ....................................................... 99 4.3.2 Post-retrieval Analysis and Multilingual Support ................................. 115 4.3.3 Searching and Browsing in the Dark Web Portal ................................. 119 4.3.4 Multilingual Support ............................................................................. 124 4.3.5 Semantic Search in the Dark Web ......................................................... 126 4.4 Conclusions and Future Directions............................................................... 130 CHAPTER 5 MONITOR FILE SHARING IN P2P WORLD ............... 132 5.1 Literature Review ......................................................................................... 133 5.1.1 P2P History ........................................................................................... 133 5.1.2 P2P Networks ........................................................................................ 136 5.1.3 Related P2P Research............................................................................ 141 5.2 Implementation of Building ASKE Data Collection .................................... 143 5.2.1 Resource Identifier ................................................................................ 143 5.2.2 Spider agents ......................................................................................... 145 5.2.3 Content Filter......................................................................................... 149 5.3 Services and Case Study ............................................................................... 152 5.3.1 Services for Copyright Owners ............................................................. 152 5.3.2 Case Study – Watchmen ....................................................................... 157 5.4 Conclusions .................................................................................................. 161 CHAPTER 6 CONCLUSIONS AND FUTURE DIRECTIONS ............ 162 6.1 Conclusions .................................................................................................. 162 6.2 Future Directions .........................................................................................