A Bandwidth-Conserving Architecture for Crawling Virtual Worlds Dipesh Gautam University of Arkansas, Fayetteville
Total Page:16
File Type:pdf, Size:1020Kb
University of Arkansas, Fayetteville ScholarWorks@UARK Theses and Dissertations 12-2013 A Bandwidth-Conserving Architecture for Crawling Virtual Worlds Dipesh Gautam University of Arkansas, Fayetteville Follow this and additional works at: http://scholarworks.uark.edu/etd Part of the Graphics and Human Computer Interfaces Commons, and the Other Computer Sciences Commons Recommended Citation Gautam, Dipesh, "A Bandwidth-Conserving Architecture for Crawling Virtual Worlds" (2013). Theses and Dissertations. 933. http://scholarworks.uark.edu/etd/933 This Thesis is brought to you for free and open access by ScholarWorks@UARK. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of ScholarWorks@UARK. For more information, please contact [email protected], [email protected]. A Bandwidth-Conserving Architecture for Crawling Virtual Worlds A Bandwidth-Conserving Architecture for Crawling Virtual Worlds A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science by Dipesh Gautam Tribhuvan University Bachelor of Engineering in Computer Engineering, 2005 Chosun University Master of Science in Computer Engineering, 2008 December 2013 University of Arkansas This thesis is approved for recommendation to the Graduate Council. ____________________________________ Dr. Susan Gauch Thesis Director ____________________________________ Dr. Craig W. Thompson Committee Member ____________________________________ Dr. Gordon Beavers Committee Member ABSTRACT A virtual world is a computer-based simulated environment intended for its users to inhabit via avatars. Content in virtual worlds such as Second Life or OpenSimulator is increasingly presented using three-dimensional (3D) dynamic presentation technologies that challenge traditional search technologies. As 3D environments become both more prevalent and more fragmented, the need for a data crawler and distributed search service will continue to grow. By increasing the visibility of content across virtual world servers in order to better collect and integrate the 3D data we can also improve the crawling and searching efficiency and accuracy by avoiding crawling unchanged regions or downloading unmodified objects that already exist in our collection. This will help to save bandwidth resources and Internet traffic during the content collection and indexing and, for a fixed amount of bandwidth, maximize the freshness of the collection. This work presents a new services paradigm for virtual world crawler interaction that is co-operative and exploits information about 3D objects in the virtual world. Our approach supports analyzing redundant information crawled from virtual worlds in order to decrease the amount of data collected by crawlers, keep search engine collections up to date, and provide an efficient mechanism for collecting and searching information from multiple virtual worlds. Experimental results with data crawled from Second Life servers demonstrate that our approach provides the ability to save crawling bandwidth consumption, to explore more hidden objects and new regions to be crawled that facilitate the search service in virtual worlds. ACKNOWLEDGEMENTS I express my gratitude to my advisor Dr. Susan Gauch for her precious support and guidance during my study. Her guidance and valuable suggestions shaped this thesis to this form. I also thank my committee members Dr. Craig W. Thompson and Dr. Gordon Beavers for suggesting ways to better organize the thesis and for guiding me on which topics to emphasize. In addition, I thank my wife Tara for her support and encouragement throughout my career. Also my appreciation goes to my son Sauvagya for finally growing healthier throughout the period of writing this thesis. Last but not least, I thank my parents and all the family members for their support and love throughout my life. TABLE OF CONTENTS 1. Introduction .................................................................................................................. 1 1.1 Motivation ............................................................................................................................. 2 1.1.1 Crawling in Virtual Worlds ........................................................................................ 3 1.1.2 Redundant Data ........................................................................................................... 4 1.1.3 Bandwidth Saving Issues ............................................................................................... 4 1.2 Goals ..................................................................................................................................... 5 1.3 Approach ............................................................................................................................... 6 1.4 Organization of the Thesis .................................................................................................... 6 2. Background .................................................................................................................. 7 2.1 Virtual World Search Technology ........................................................................................ 7 2.2 Virtual Worlds Research ....................................................................................................... 8 2.2.1 Interoperation ................................................................................................................. 9 2.2.2 Behavioral Aspects ...................................................................................................... 10 2.2.3 Server Architecture ...................................................................................................... 11 2.3 Crawlers .............................................................................................................................. 12 2.3.1 WWW Crawlers ........................................................................................................... 13 2.3.2 Virtual Worlds Crawlers .............................................................................................. 16 3. Our Approach ............................................................................................................ 18 3.1 System Architecture ............................................................................................................ 19 3.1.1 Virtual World Server.................................................................................................... 20 3.1.2 Web Service ................................................................................................................. 21 3.1.3 NextGen Crawler ......................................................................................................... 23 3.1.4 Crawler Metadata ......................................................................................................... 24 3.1.5 Content Collection ....................................................................................................... 24 3.1.6 Search Service .............................................................................................................. 25 3.2 Traditional Virtual World Crawler ..................................................................................... 26 4. Experiments ................................................................................................................ 29 4.1 Data Collection ................................................................................................................... 29 4.2 Evaluation ........................................................................................................................... 31 4.2.1 Data Redundancy Metric ............................................................................................. 31 4.2.2 Bandwidth Consumption Savings Metric .................................................................... 32 4.2.3 Data Redundancy Results ............................................................................................ 32 4.2.4 Bandwidth Savings Results ......................................................................................... 35 4.4 Discussion ........................................................................................................................... 36 5. Conclusions ................................................................................................................. 38 5.1 Summary ............................................................................................................................. 38 5.2 Contributions ...................................................................................................................... 38 5.3 Future Work ........................................................................................................................ 39 References ........................................................................................................................ 41 LIST OF FIGURES Figure 1: Content Sources in Virtual Worlds ................................................................................. 1 Figure 2: System Architecture ...................................................................................................... 19 Figure 3: Typical WSDL type definition of web service .............................................................. 22 Figure 4: Typical message in WSDL ...........................................................................................