Studying the User Task of Information Gathering on the Web by for the Degree of Doctor of Philosophy at Halifax, Nova Scotia

Total Page:16

File Type:pdf, Size:1020Kb

Studying the User Task of Information Gathering on the Web by for the Degree of Doctor of Philosophy at Halifax, Nova Scotia Studying the User Task of Information Gathering on the Web by Anwar Alhenshiri Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at Dalhousie University Halifax, Nova Scotia March 2013 © Copyright by Anwar Alhenshiri, 2013 DALHOUSIE UNIVERSITY FACULTY OF COMPUTER SCIENCE The undersigned hereby certify that they have read and recommend to the Faculty of Graduate Studies for acceptance a thesis entitled “Studying the User Task of Information Gathering on the Web” by Anwar Alhenshiri in partial fulfilment of the requirements for the degree of Doctor of Philosophy. Dated: March 13, 2013 External Examiner: __________________________ Research Supervisor: __________________________ Examining Committee __________________________ __________________________ Departmental Representative: __________________________ ii DALHOUSIE UNIVERSITY DATE: March 13, 2013 AUTHOR: Anwar Alhenshiri TITLE: Studying the User Task of Information Gathering on the Web DEPARTMENT OR SCHOOL: Faculty of Computer Science DEGREE: PhD CONVOCATION: May YEAR: 2013 Permission is herewith granted to Dalhousie University to circulate and to have copied for non-commercial purposes, at its discretion, the above title upon the request of individuals or institutions. I understand that my thesis will be electronically available to the public. The author reserves other publication rights, and neither the thesis nor extensive extracts from it may be printed or otherwise reproduced without the author’s written permission. The author attests that permission has been obtained for the use of any copyrighted material appearing in the thesis (other than the brief excerpts requiring only proper acknowledgement in scholarly writing), and that all such use is clearly acknowledged. _________________________________ Signature of Author iii DEDICATION This dissertation is dedicated to the first two people I loved, my parents. May my father rest in peace! iv TABLE OF CONTENTS LIST OF TABLES ............................................................................................................. ix LIST OF FIGURES ............................................................................................................ x ABSTRACT ............................................................................................................... xii LIST OF ABBREVIATIONS USED .............................................................................. xiii ACKNOWLEDGEMENTS ............................................................................................. xiv CHAPTER 1 INTRODUCTION .................................................................................. 1 CHAPTER 2 RELATED WORK ................................................................................. 7 2.1. INFORMATION SEEKING MODELS ...................................................................... 7 2.1.1. Ellis’s Model .................................................................................... 7 2.1.2. Marchionini’s Model ........................................................................ 9 2.1.3. Wilson’s Model .............................................................................. 12 2.1.4. Wilson’s Combined Model ............................................................ 13 2.2. MODELS OF USER TASKS ON THE WEB ........................................................... 14 2.2.1. Broder’s Taxonomy ........................................................................ 16 2.2.2. Sellen’s Taxonomy ......................................................................... 17 2.2.3. Rose and Levinson’s Classification ............................................... 19 2.2.4. Kellar’s Framework ........................................................................ 21 2.3. A REVIEW OF THE TASK OF INFORMATION GATHERING ON THE WEB ............. 23 2.4. VISUALIZING, CLUSTERING, RE-FINDING, AND ORGANIZING WEB INFORMATION ............................................................................................................... 24 2.4.1. Visualizing Web Information ......................................................... 25 2.4.2. Clustering Web Information ........................................................... 28 2.4.3. Re-finding Web Information .......................................................... 30 2.4.4. Managing and Organizing Web Information ................................. 32 2.5. SUMMARY ....................................................................................................... 36 CHAPTER 3 INFORMATION GATHERING .......................................................... 37 3.1. SUBTASKS IN THE INFORMATION GATHERING TASK ....................................... 37 3.1.1. Interpreting the Task ...................................................................... 38 3.1.2. Finding Sources of Information on the Web .................................. 38 3.1.3. Finding Information on the Web .................................................... 39 3.1.4. Finding Related Information .......................................................... 39 v 3.1.5. Comparing Information, Reasoning, and Decision Making........... 41 3.1.6. Keeping and Re-finding Information ............................................. 42 3.1.7. Managing and Organizing Information .......................................... 42 3.1.8. Reviewing the Task ........................................................................ 43 3.2. DEFINING THE TASK OF INFORMATION GATHERING ........................................ 44 3.3. THE PROCESS OF INFORMATION GATHERING ON THE WEB ............................. 48 3.4. SUMMARY ....................................................................................................... 49 CHAPTER 4 PRELIMINARY STUDIES .................................................................. 51 4.1. VSE, A VISUAL SEARCH ENGINE FOR IMPROVING HOW USERS GATHER INFORMATION ON THE WEB .......................................................................................... 51 4.1.1. Design and Implementation............................................................ 51 4.1.2. Search and Response ...................................................................... 52 4.1.3. The Interactive Search Interface ..................................................... 52 4.1.4. Research Questions ........................................................................ 54 4.1.5. VSE Evaluation .............................................................................. 54 4.1.6. Study Location and Population ...................................................... 55 4.1.7. Study Design .................................................................................. 55 4.1.8. Study Tasks .................................................................................... 55 4.1.9. Study Methodology ........................................................................ 55 4.1.10. Study Data.................................................................................... 56 4.1.11. Study Results ............................................................................... 56 4.1.12. Study Limitations ......................................................................... 62 4.1.13. Discussion .................................................................................... 62 4.1.14. Conclusion ................................................................................... 63 4.2. VLN, A VISUAL LINK NAVIGATION INTERFACE FOR FINDING INFORMATION WITHIN WEB HIERARCHIES ........................................................................................... 63 4.2.1. Design and Implementation............................................................ 63 4.2.2. Research Questions ........................................................................ 66 4.2.3. VLN Evaluation ............................................................................. 66 4.2.4. Study Location and Population ...................................................... 67 4.2.5. Study Design .................................................................................. 67 4.2.6. Study Tasks .................................................................................... 67 4.2.7. Study Methodology ........................................................................ 68 vi 4.2.8. Study Data ...................................................................................... 68 4.2.9. Study Results .................................................................................. 68 4.2.10. Discussion .................................................................................... 73 4.2.11. Study Limitations ......................................................................... 74 4.2.12. Conclusion ................................................................................... 74 4.3. SUMMARY ....................................................................................................... 74 CHAPTER 5 INVESTIGATING THE TASK OF INFORMATION GATHERING ON THE WEB ............................................................................................................... 76 5.1. RESEARCH STUDY ........................................................................................... 76 5.1.1. Research Questions ........................................................................ 76 5.1.2. Study Design and Population ......................................................... 76 5.1.3. Task Description and Construction
Recommended publications
  • Open Source Intelligence (OSINT) Link Directory Targeting Tomorrow’S Terrorist Today (T4) Through OSINT By: Mr
    Creative Commons Copyright © Ben Benavides—no commercial exploitation without contract June 2011 Country Studies Public Places Open Source Intelligence (OSINT) Link Directory Targeting Tomorrow’s Terrorist Today (T4) through OSINT by: Mr. E. Ben Benavides CounterTerrorism Infrastructure Money Laundering Gang Warfare Open Source Intelligence is the non-cloak- and-dagger aspect of fact collecting. (Alan D. Tompkins) Human Smuggling Weapon Smuggling IEDs/EFPs Creative Commons Copyright © Ben Benavides—no commercial exploitation without contract Table of Contents Table of Contents ........................................................................................................................ 2 Comments ................................................................................................................................... 7 Open Source Intelligence (OSINT): What It Is and What It Isn’t ................................................... 8 How To Use Open Source Intelligence ........................................................................................ 9 Key Army Access Sites .............................................................................................................. 17 Must Haves References ............................................................................................................ 18 Core Open Source Intelligence Documents & Guides ........................................................... 18 MI Officer Students ...............................................................................................................
    [Show full text]
  • Comparison of Web Search Engines Using User-Based Metrics in Survey Statistics *Ogunyinka, P
    Futo Journal Series (FUTOJNLS) e-ISSN : 2476-8456 p-ISSN : 2467-8325 Volume-6, Issue-2, pp- 190 - 200 www.futojnls.org Research Paper December 2020 Comparison of web search engines using user-based metrics in survey statistics *Ogunyinka, P. I.1, Aigbogun, L. I.1, Iheanyichukwu, B. F.2, Ekundayo, O. M.3, Banjo, O.1, Olubanwo, O. O.1 and Dehinsilu, O. A.1 1Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria. 2Olabisi Onabanjo University Library, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria. 3Department of Computer Science, Federal University of Technology, Akure, Ondo State, Nigeria. *Corresponding Author’s Email: [email protected] Abstract Different web search engines had been rated based on different metrics. However, almost none had considered the search query length, the retrieved quantity and retrieval time for evaluation of web search engines. This study had rated five web search engines (Google, Yahoo, WOW, AOL and Bing) using non-parametric Kruskal-Wallis test for significant mean difference and single-phase sampling for regression estimation and examination of internal error. The retrieval time was used as the study variable while the retrieved quantity of the organic search results and the search query length were used as the auxiliary variables. The correlation coefficient, mean square error, percentage coefficient of variation and percentage relative efficiency were used for the evaluation and comparison of the estimated population mean of the retrieval time. Results revealed that Google was the most rated web search engine with the highest significant retrieved quantity and significant retrieval time while Bing was the least rated web search engine.
    [Show full text]
  • List of Search Engines
    A blog network is a group of blogs that are connected to each other in a network. A blog network can either be a group of loosely connected blogs, or a group of blogs that are owned by the same company. The purpose of such a network is usually to promote the other blogs in the same network and therefore increase the advertising revenue generated from online advertising on the blogs.[1] List of search engines From Wikipedia, the free encyclopedia For knowing popular web search engines see, see Most popular Internet search engines. This is a list of search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. Contents 1 By content/topic o 1.1 General o 1.2 P2P search engines o 1.3 Metasearch engines o 1.4 Geographically limited scope o 1.5 Semantic o 1.6 Accountancy o 1.7 Business o 1.8 Computers o 1.9 Enterprise o 1.10 Fashion o 1.11 Food/Recipes o 1.12 Genealogy o 1.13 Mobile/Handheld o 1.14 Job o 1.15 Legal o 1.16 Medical o 1.17 News o 1.18 People o 1.19 Real estate / property o 1.20 Television o 1.21 Video Games 2 By information type o 2.1 Forum o 2.2 Blog o 2.3 Multimedia o 2.4 Source code o 2.5 BitTorrent o 2.6 Email o 2.7 Maps o 2.8 Price o 2.9 Question and answer .
    [Show full text]
  • Download Download
    International Journal of Management & Information Systems – Fourth Quarter 2011 Volume 15, Number 4 History Of Search Engines Tom Seymour, Minot State University, USA Dean Frantsvog, Minot State University, USA Satheesh Kumar, Minot State University, USA ABSTRACT As the number of sites on the Web increased in the mid-to-late 90s, search engines started appearing to help people find information quickly. Search engines developed business models to finance their services, such as pay per click programs offered by Open Text in 1996 and then Goto.com in 1998. Goto.com later changed its name to Overture in 2001, and was purchased by Yahoo! in 2003, and now offers paid search opportunities for advertisers through Yahoo! Search Marketing. Google also began to offer advertisements on search results pages in 2000 through the Google Ad Words program. By 2007, pay-per-click programs proved to be primary money-makers for search engines. In a market dominated by Google, in 2009 Yahoo! and Microsoft announced the intention to forge an alliance. The Yahoo! & Microsoft Search Alliance eventually received approval from regulators in the US and Europe in February 2010. Search engine optimization consultants expanded their offerings to help businesses learn about and use the advertising opportunities offered by search engines, and new agencies focusing primarily upon marketing and advertising through search engines emerged. The term "Search Engine Marketing" was proposed by Danny Sullivan in 2001 to cover the spectrum of activities involved in performing SEO, managing paid listings at the search engines, submitting sites to directories, and developing online marketing strategies for businesses, organizations, and individuals.
    [Show full text]
  • Web Search Engine
    Web Search Engine Bosubabu Sambana, MCA, M.Tech Assistant Professor, Dept of Computer Science & Engineering, Simhadhri Engineering College, Visakhapatnam, AP-531001, India. Abstract: This hypertext pool is dynamically changing due to this reason it is more difficult to find useful The World Wide Web (WWW) allows people to share information.In 1995, when the number of “usefully information or data from the large database searchable” Web pages was a few tens of millions, it repositories globally. We need to search the was widely believed that “indexing the whole of the information with specialized tools known generically Web” was already impractical or would soon become as search engines. There are many search engines so due to its exponential growth. A little more than a available today, where retrieving meaningful decade later, the GYM search engines—Google, information is difficult. However to overcome this problem of retrieving meaningful information Yahoo!, and Microsoft—are indexing almost a intelligently in common search engines, semantic thousand times as much data and between them web technologies are playing a major role. In this providing reliable sub second responses to around a paper we present a different implementation of billion queries a day in a plethora of languages. If this semantic search engine and the role of semantic were not enough, the major engines now provide much relatedness to provide relevant results. The concept higher quality answers. of Semantic Relatedness is connected with Wordnet which is a lexical database of words. We also For most searchers, these engines do a better job of made use of TF-IDF algorithm to calculate word ranking and presenting results, respond more quickly frequency in each and every webpage and to changes in interesting content, and more effectively Keyword Extraction in order to extract only useful eliminate dead links, duplicate pages, and off-topic keywords from a huge set of words.
    [Show full text]
  • Members' Library Finding Information Online January 2011
    Members' Library Finding information online January 2011 The National Assembly for Wales is the democratically elected body that represents the interests of Wales and its people, makes laws for Wales and holds the Welsh Government to account. Members' Library Finding information online January 2011 Finding information online This guide is intended to help Assembly Members, their support staff and Assembly officials find useful information quickly when searching the internet. Different types of search engines and online directories are included, followed by information on creating a strong search strategy and evaluating websites. Lastly, further sources of information and useful links are listed. The basics Search engines Search engines do not search the Search engines are useful when you know what you are looking for World Wide Web directly. Each one and can define it in a few keywords or phrases. searches a database of web pages Popular search engines include: that they have already collected and stored using spiders which “crawl” Altavista the internet looking for new web http://uk.altavista.com/ pages. Bing (Microsoft’s search engine) When a user clicks on the links http://www.bing.com/?mkt=en-gb provided in a search engine's search Exalead results, they retrieve the current http://www.exalead.com/search/ version of the page. Duckduckgo Many web pages are excluded from http://duckduckgo.com/ search engines, including most Google library catalogues and article http://www.google.co.uk/ databases, because search engine Google: Welsh language interface: spiders cannot access them. This http://www.google.co.uk/webhp?hl=cy material is referred to as the invisible web as it cannot be “seen” by search Yahoo engines.
    [Show full text]
  • How Does the Search Process Support Inquiry Learning?
    2 ESSENTIAL QUESTIONS How does the search process support inquiry learning? What are the best instructional strategies to teach the search process? Which search engines best support students' searching and exploration? Searching the Web 23 Alisha slips off her iPod earbuds and sits down at the computer; she has about 15 minutes before she meets with her team. She's thinking about their last meeting when they brainstormed questions and discussed what they each knew about the topic. She pulls up the Research Wiki set up by Mr. Rodriquez, her teacher-librarian. He always uses a wiki so kids can work collaboratively in identifying the best search tools. A few of the search engines peak her interest as she scans the annotated list; she clicks on Quintura to see how the tag cloud is used. As she watches the screencast, she opens another window and follows along experimenting with different keywords. Satisfied that she has mastered the new search engine, she returns to Google remem­ bering a strategy Mr. Rodriquez recommended at the class research session- "type define: search term" to get definitions. She uses Google all the time but did not realize that it had advanced searching strategies. Cool, it worked! She scans the list of 20 or more definitions of global warming, looking at the URLs. She tags the Web page in her shared del.icio. us account to share with her teammates. She moves to another search tool, Ask.com. When the class was comparing search engines, she noticed that Ask.com had a great results page-it gave listings of videos and up-to-date news on the search term and offered suggestions to narrow and broaden the search.
    [Show full text]
  • An Effective Framework for Cloud Based Search Engine Dr
    International Journal of Scientific Research in Computer Science, Engineering and Information Technology © 2018 IJSRCSEIT | Volume 3 | Issue 1 | ISSN : 2456-3307 An Effective Framework for Cloud Based Search Engine Dr. R. Malathi Ravindran Associate Professor, Computer Applications, NGM College, Pollachi, Tamil Nadu, India ABSTRACT In present world, we have plenty of information around us. However, for a specific information need, only a small subset of all the available information will be useful. Over the 1970’s and 1980’s, much of the research in information retrieval was focused on document retrieval, and the emphasis on this task in the Text Retrieval Conference (TREC) evaluations of the 1990’s has further reinforced the view that information retrieval is synonymous with document retrieval. Web search engines are, of course, the most common example of this type of information retrieval system. The enormous increase in the amount of online text available and the demand for access to different types of information have, however, led to a renewed interest in a broad range of information retrieval related areas that go beyond simple document retrieval, such as question answering, topic detection and tracking, summarization, multimedia retrieval (e.g., image, video and music), software engineering, chemical and biological informatics, text structuring, text mining and genomics. In this paper, Cloud Computing Based Information Retrieval (CCBIR) system is introduced for the information retrieval from the huge volume of data. Keywords : Search Engine, Cloud Computing, Information Retrieval, Document and CCBIR. I. INTRODUCTION Information Retrieval (CCBIR) is developed for the efficient information retrieval [4]. Search Engines has become an essential need for internet users in their today’s day to day life.
    [Show full text]
  • By Content/Topic
    Different search engines This is a list of Wikipedia articles about search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. By content/topic General Baidu (Chinese, Japanese) Bing Blekko Google Sogou (Chinese) Soso.com (Chinese) Volunia Yahoo! Yandex (Russian) Yebol Yodao (Chinese) WireDoo P2P search engines FAROO Seeks (Open Source) YaCy (Free and fully decentralized) Metasearch engines See also: Metasearch engine Brainboost ChunkIt! Clusty DeeperWeb Dogpile Excite Harvester42 HotBot Info.com Ixquick Kayak LeapFish Mamma Metacrawler MetaLib Mobissimo Myriad Search SideStep WebCrawler Geographically limited scope Accoona, China/United States Alleba, Philippines Ansearch, Australia/United States/United Kingdom/New Zealand Biglobe, Japan Daum, Korea Goo, Japan Guruji.com, India Leit.is, Iceland Maktoob, Arab World Miner.hu, Hungary Najdi.si, Slovenia Naver, Korea Onkosh, Arab World Rambler, Russia Rediff, India SAPO, Portugal/Angola/Cabo Verde/Mozambique Search.ch, Switzerland Sesam, Norway, Sweden Seznam, Czech Republic Walla!, Israel Yandex, Russia Yehey!, Philippines ZipLocal, Canada/United States Accountancy IFACnet Business Business.com GenieKnows (United States and Canada) GlobalSpec Nexis (Lexis Nexis) Thomasnet (United States) Enterprise See also: Enterprise search AskMeNow: S3 - Semantic Search Solution Concept
    [Show full text]
  • BUSTY HOOKERS,POLITICAL MISFORTUNES and UNREGISTERED LOBBYING How Former Cabinet Minister and Conservative M.P
    MEDIATHE CANADIAN ASSOCIATION OF JOURNALISTS • SUMMER 2010 • VOLUME14, NUMBER THREE BUSTY HOOKERS,POLITICAL MISFORTUNES AND UNREGISTERED LOBBYING How former cabinet minister and Conservative M.P. Helena Guergis and her husband became fodder for sensational allegations that still have Parliament Hill reeling by Russ Martin Student Journalist Hong Kong Fellowship Exploring Asia’s world city MEDIA SUMMER 2010 • VOLUME 14, NUMBER THREE www.caj.ca/mediamag Hong Kong, Asia’s world city, is a a package including a five-day visit Special Administrative Region of the program with an economy class air ticket COLUMNS People’s Republic of China, run by Hong and hotel accommodation. In Hong Kong, 5 FIRST WORD by David McKie • Where are we going? Kong people under the “One Country, the winners will have the opportunity to Two Systems” principle. Hong Kong is visit various points of interest, and meet 6 WRITER’S TOOLBOX By Don Gibb • Make a plan before you write. Our writing coach completes his tip sheet with this simple one of the most open, externally oriented with people of diverse views and message: Tell stories. Tell them well. 9 JOURNALISMNET By Julian Sher . Digging for buried treasure. There are ways to navigate the invisible web. economies in the world. The city has been backgrounds. The selected student rated the world’s freest economy by the journalists must publish or broadcast at FEATURES Heritage Foundation and Fraser Institute. least three stories about Hong Kong within six months upon completion of the 10 MEDIA CIRCUS By Russ Martin • Busty hookers, political misfourtunes and unregisted lobbying: The story about how former What makes Hong Kong tick as a great cabinet minister and Conservative M.P.
    [Show full text]
  • Radio Frequency Identification Based Smart
    ISSN (Online) 2394-2320 International Journal of Engineering Research in Computer Science and Engineering (IJERCSE) Vol 3, Issue 10, October 2016 History and Web Search Engines Works [1]Saurabh Kumar, [2] Maneesh Kumar, [3] Vipul Agarwal M .Tech Scholar, Department of Computer Science and Engineering, Dr. A.P.J Abdul Kalam Technical University, Lucknow (U.P), Abstract: -- A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results often referred to as search engine results pages(SERPs). The information may be a mix of web pages, images, and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real- time information by running an algorithm on a web crawler. I. INTRODUCTION Lycos Active Search engine is a web software program or Infoseek Inactive web based script available over the Internet that searches documents and files for keywords and returns the list of results containing those keywords. Today, there are 1995 AltaVista Inactive, redirected to Yahoo! numbers of different search engines available on the Internet, each with their own techniques and specialties. Daum Active Search Engine Optimization is a technique to improve visibility of a website in search engine. Magellan Inactive II. HISTORY Excite Active Timeline (full list) SAPO Active Year Engine Current status Yahoo! Active,
    [Show full text]
  • Search Engine: a Survey
    [Singh , 2(1): Jan., 2013] ISSN: 2277 -9655 IJESRT INTERNATIONAL JOURNA L OF ENGINEERING SCI ENCES & RESEARCH TECHNOLOGY Search Engine : A Survey Ankur Singh Bist Govind Ballabh P ant University of Agriculture And Technology, Pant, India [email protected] Abstract There is lot of information available on the internet so to fetch the relevant information is very typical task . In this paper our purpose is to make a survey on various issues related to search engines li ke semantic web search technology. Keywords: Crawler, Ontology .. Introduction AlltheWeb Inactive (URL redirected to Yahoo!) Searching is an important issue to retrieve the GenieKnows Active, rebranded Yellowee.com required data . Search engines are serving the same 1999 Naver Active task . There are various challenges covered by search Teoma Active engines in recent years . There are lot of challenges Viv isimo Inactive still to be solved . Answer the intelligent questions Baidu Active most efficiently is one of the major issue that is to be 2000 handled by search engines semantic web searching Exalead Inactive approaches are evolving that are responsible for 2002 Inktomi Acquired by Yahoo! making an appropriate results for intelligent input 2003 Info.com Active query . There are also various issues related to Active, Launched own web search Yahoo! Search vertical search engines that needed to be handled by (see Yahoo! Directory, 1995) 2004 applying efficient approaches .The timeline of search A9.com Inactive engines is provided below:-- Sogou Active AOL Search Active
    [Show full text]