SAVVYSEARCH: a Metasearch Engine That Learns Which Search

Total Page:16

File Type:pdf, Size:1020Kb

SAVVYSEARCH: a Metasearch Engine That Learns Which Search AI Magazine Volume 18 Number 2 (1997) (© AAAI) Articles SAVVYSEARCH A Metasearch Engine That Learns Which Search Engines to Query Adele E. Howe and Daniel Dreilinger ■ Search engines are among the most successful single query to multiple search engines. applications on the web today. So many search The simplest metasearch engines are forms engines have been created that it is difficult for that allow the user to indicate which search users to know where they are, how to use them, engines should be contacted (for example, and what topics they best address. Metasearch ALL-IN-ONE [www.albany.net/allinone] and engines reduce the user burden by dispatching METASEARCH [members.gnn.com/infinet/ queries to multiple search engines in parallel. The meta.htm]). PROFUSION (Gauch, Wang, and SAVVYSEARCH metasearch engine is designed to efficiently query other search engines by carefully Gomez 1996; www.designlab.ukans.edu/pro- selecting those search engines likely to return fusion) gives the user the choice of selecting useful results and responding to fluctuating load search engines themselves or letting PROFUSION demands on the web. SAVVYSEARCH learns to iden- select three of six robot-based search engines tify which search engines are most appropriate using hand-built rules. METACRAWLER (Selberg for particular queries, reasons about resource and Etzioni 1995; metacrawler.cs.washing- demands, and represents an iterative parallel ton.edu:8080/home.html) significantly search strategy as a simple plan. enhances the output by downloading and analyzing the links returned by the search engines to prune out unavailable and irrele- ompanies, institutions, and individuals vant links. must have a presence on the web; each Metasearch engines reduce the burden on Care vying for the attention of millions the user. They make available search engines of people. Not too surprisingly then, the most that might have been unknown to the user. successful applications on the web to date are They handle the simultaneous submission of search engines: tools that assist users in queries; some direct the query to appropriate finding information on specific topics. engines, and some postprocess the results as A variety of search engines are available, well. They provide a single interface (on the from general, robot based (for example, down side, they might not support all the fea- ALTAVISTA [www.altavista.digital.com] and tures of the target search engines). WEBCRAWLER [webcrawler.com]) to topic or area Unfortunately, metasearch can lead to the specific (for example, FTPSEARCH [ftpsearch. tragedy of the commons problem from eco- unit.no/ftpsearch] and DEJANEWS [www. nomics in which an individual’s best interests dejanews.com]). Each uses different algo- run counter to society’s. Individual users rithms for collecting, indexing, and searching appear to be best served by simultaneously links; thus, each returns different results for searching every possible search engine on the similar queries. Empirical results indicate that web for desired information. However, the no single search engine is likely to return process might waste web resources: network more than 45 percent of the relevant results load and search-engine computation. (Selberg and Etzioni 1995). To find what they We believe that a metasearch system can be desire, users might need to query several a good web citizen (Eichmann 1994) by tar- search engines; metasearch engines automate geting those search engines likely to return this process by simultaneously submitting a useful results and responding to changing Copyright © 1997, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1997 / $2.00 SUMMER 1997 19 Articles load demands on the web. To provide this be present) or can be presented as an ordered function, we incorporated simple AI tech- phrase. Three aspects of the results display niques in a metasearch engine. Our can be varied: (1) the number of links metasearch engine learns to identify which returned, (2) the format of the links descrip- search engines are most appropriate for par- tion, and (3) the timing. By default, 10 links ticular queries, reasons about resource are displayed with the uniform resource loca- demands, and represents an iterative parallel tors (URLs) and descriptions when available, search strategy as a simple plan. and the results of each search engine are list- ed separately as they arrive. Alternatively, we could change the number of links to 50, Our Metasearch System: return less or more description of the links, SAVVYSEARCH and interleave the results of the separate search engines. The interface is also available S AVVYSEARCH is our metasearch system in 23 different languages.1 (Dreilinger and Howe 1997, 1996) available at guaraldi.cs.colostate.edu:2000. It runs on five Processing a Query machines (three Sun SPARCSTATIONs and two When a user submits the query, SAVVYSEARCH IBM RS 6000s) at Colorado State University. must make two decisions: (1) how many The system was first made available in March search engines to contact simultaneously and AVVYSEARCH 1995 and has undergone several revisions S (2) what order the search engines should be since the original design. At present, two ver- is designed to contacted in. The first decision requires rea- sions of the system are available: (1) the one soning about the available resources and the balance two described here and (2) an experimental inter- second about ranking the search engines. potentially face that is mentioned in the section entitled Retrospective and Prospective Views of the Resource Reasoning Each search engine conflicting SAVVYSEARCH Project. queried expends network and local computa- tional resources. Thus, modifying concurren- goals: (1) SAVVYSEARCH is designed to balance two potentially conflicting goals: (1) maximizing cy (number of search engines queried in par- maximizing the likelihood of returning good links and (2) allel) is the best way to moderate resource the likelihood minimizing computational and web resource consumption. Concurrency is a function of consumption. The key to compromise is network load estimates, which are determined of returning knowing which search engines to contact for from a lookup table created from observa- tions of the network load at this time of day good links specific queries at particular times. SAVVY- in the past, and local central processing unit and (2) SEARCH tracks long-term performance of search engines on specific query terms to load, which is computed using the UNIX minimizing determine which are appropriate and moni- uptime command. computa- tors recent performance of search engines to Concurrency has a base value of two; as determine whether it is even worth trying to many as two additional units are added for tional and contact them. each load estimate for periods of low load. Thus, the maximum concurrency value is six. web resource In this section, we describe SAVVYSEARCH from a user’s perspective. We follow a run- Ranking Search Engines SAVVYSEARCH consumption. ning example, indicating what a user sees includes both large robot-based search engines and what goes on behind the scenes in pro- and small specialized search engines in its set. cessing a search request. The large search engines are likely to return links for any query, but these links might not Submitting a Query be as appropriate as links returned by a spe- To find out about artificial intelligence con- cialized search engine for a query in its area. ferences, we enter the query, select the Inte- The purpose of ranking is to determine grate Results option, and click on the SAVVY- which search engines are most worthwhile to SEARCH! button, as shown in an image of the contact for a given query. Search engines are interface in figure 1. The search form, the ranked based on learned associations between query interface to SAVVYSEARCH, asks the user search engines and query terms (stored in a to specify a set of keywords (query terms) and metaindex) and recent data on search-engine options for the search. Users typically enter performance. two query terms. The metaindex—A compendium of The options cover the treatment of the search experience: The metaindex maintains terms, the display of results, and the interface associations between individual query terms language. Query terms can be combined with (simplified by stemming and case stripping) logical and (all query terms must be included and search engines as effectiveness values. in documents) or or (any query term should High positive values indicate excellent perfor- 20 AI MAGAZINE Articles Figure 1. User Interface to SAVVYSEARCH for Entering a Query. mance of a search engine on queries contain- as some questionable responses. For each ing a specific term; high negative values indi- search, we collect two types of information: cate extremely poor performance. (1) no results, the search engine failed to The effectiveness values are derived from return links, and (2) visits, the number of two types of observation of the results of links that are explored by the user. No results users’ searches. We used observations (passive reduces confidence that the search engine is measures) because we obtained a low rate of appropriate for the particular query, and Vis- response to requests for user feedback as well its indicates that the user found some SUMMER 1997 21 Articles returned links to be interesting and so and are likely to return something for any increases confidence. query, their weights will tend to grow larger SAVVYSEARCH uses a simple weight-adjust- and more quickly than the specialized search ment scheme for learning effectiveness val- engines. Ts mitigates the tendency toward ues. No results and Visits are treated as nega- their dominance, allowing more specialized tive and positive reinforcement, respectively, engines to be selected when appropriate. amortized by the number of terms in the The penalties (Ps,h and Ps,r) are accrued query. Thus, if a search engine returned noth- only if thresholds on the minimum number ing for the example query, the effectiveness of hits and the maximum response time are values for artificial, intelligence, and conferences exceeded.
Recommended publications
  • Internet Training: the Basics. INSTITUTION Temple Univ., Philadelphia, PA
    DOCUMENT RESUME ED 424 360 CE 077 241 AUTHOR Gallo, Gail; Wechowski, Chester P. TITLE Internet Training: The Basics. INSTITUTION Temple Univ., Philadelphia, PA. Center for Vocational Education Professional Personnel Development. PUB DATE 1998-04-00 NOTE 27p. PUB TYPE Guides Non-Classroom (055) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS Adult Education; Educational Needs; *Information Sources; *Online Systems; Postsecondary Education; Secondary Education; Teaching Methods; Vocational Education Teachers; *World Wide Web IDENTIFIERS Web Sites ABSTRACT This paper outlines the basic information teachers need to . know to use the World Wide Web for research and communication, using Netscape 3.04. Topics covered include the following: what is the World Wide Web?; what is a browser?; accessing the Web; moving around a web document; the Uniform Resource Locator (URL); Bookmarks; saving and printing a web page; Netscape's options preferences menu; searching the Web; and interesting places (websites of interest to vocational educators) .Two appendixes contain the following: "Infoseek: How Do I Search? 'Ruby Slippers'" and Guidelines for Completing Abstracts. (KC) ******************************************************************************** * Reproductions supplied by EDRS are the best that can be made * * from the original document. * ***********************************************************************i******** t. The Center for Vocational Education \I/ Professional Personnel Development/ \ INTERNET TRAINING: THE BASICS Developed by Gail Gallo Project Director Chester P. Wichowski Temple University April, 1998 ./1.S. DEPARTMENT OFEDUCATION /ffice of Educational Research and Improvement PERMISSION TO REPRODUCE AND E UCATIONAL RESOURCES INFORMATION DISSEMINATE THIS MATERIAL HAS CENTER (ERIC) BEEN GRANTED BY This document has been reproduced as received from the person or organization originating it. 0 Minor changes have been made to improve reproduction quality.
    [Show full text]
  • Market Research SD-5 Gathering Information About Commercial Products and Services
    Market Research SD-5 Gathering Information About Commercial Products and Services DEFENSE STANDARDIZATION PROGRA M JANUARY 2008 Contents Foreword 1 The Market Research Other Considerations 32 Background 2 Process 13 Amount of Information Strategic Market Research to Gather 32 What Is Market Research? 2 (Market Surveillance) 14 Procurement Integrity Act 32 Why Do Market Research? 2 Identify the Market or Market Paperwork Reduction Act 33 Segment of Interest 14 When Is Market Research Cost of Market Research 34 Done? 5 Identify Sources of Market Information 16 Who Should Be Involved In Market Research? 7 Collect Relevant Market Other Information Information 17 Technical Specialist 8 Document the Results 18 on Market Research 35 User 9 Logistics Specialist 9 Tactical Market Research Appendix A 36 (Market Investigation) 19 Testing Specialist 9 Types of Information Summarize Strategic Market Available on the Internet Cost Analyst 10 Research 19 Legal Counsel 10 Formulate Requirements 20 Appendix B 39 Contracting Officer 10 Web-Based Information Identify Sources of Sources Information 21 Guiding Principles 11 Collect Product or Service Appendix C 47 Examples of Tactical Start Early 11 Information from Sources 22 Collect Information from Information Define and Document Product or Service Users 26 Requirements 11 Evaluate the Data 27 Refine as You Proceed 12 Document the Results 30 Tailor the Investigation 12 Repeat as Necessary 12 Communicate 12 Involve Users 12 Foreword The Department of Defense (DoD) relies extensively on the commercial market for the products and services it needs, whether those products and services are purely commercial, modified for DoD use from commercial products and services, or designed specifically for DoD.
    [Show full text]
  • Evaluation of Web-Based Search Engines Using User-Effort Measures
    Evaluation of Web-Based Search Engines Using User-Effort Measures Muh-Chyun Tang and Ying Sun 4 Huntington St. School of Information, Communication and Library Studies Rutgers University, New Brunswick, NJ 08901, U.S.A. [email protected] [email protected] Abstract This paper presents a study of the applicability of three user-effort-sensitive evaluation measures —“first 20 full precision,” “search length,” and “rank correlation”—on four Web-based search engines (Google, AltaVista, Excite and Metacrawler). The authors argue that these measures are better alternatives than precision and recall in Web search situations because of their emphasis on the quality of ranking. Eight sets of search topics were collected from four Ph.D. students in four different disciplines (biochemistry, industrial engineering, economics, and urban planning). Each participant was asked to provide two topics along with the corresponding query terms. Their relevance and credibility judgment of the Web pages were then used to compare the performance of the search engines using these three measures. The results show consistency among these three ranking evaluation measures, more so between “first 20 full precision” and search length than between rank correlation and the other two measures. Possible reasons for rank correlation’s disagreement with the other two measures are discussed. Possible future research to improve these measures is also addressed. Introduction The explosive growth of information on the World Wide Web poses a challenge to traditional information retrieval (IR) research. Other than the sheer amount of information, some structural factors make searching for relevant and quality information on the Web a formidable task.
    [Show full text]
  • A Meta Search Engine Based on a New Result Merging Strategy
    Usearch: A Meta Search Engine based on a New Result Merging Strategy Tarek Alloui1, Imane Boussebough2, Allaoua Chaoui1, Ahmed Zakaria Nouar3 and Mohamed Chaouki Chettah3 1MISC Laboratory, Department of Computer Science and its Applications, Faculty of NTIC, University Constantine 2 Abdelhamid Mehri, Constantine, Algeria 2LIRE Laboratory, Department of Software Technology and Information Systems, Faculty of NTIC, University Constantine 2 Abdelhamid Mehri, Constantine, Algeria 3Department of Computer Science and its Applications, Faculty of NTIC, University Constantine 2, Abdelhamid Mehri, Constantine, Algeria Keywords: Meta Search Engine, Ranking, Merging, Score Function, Web Information Retrieval. Abstract: Meta Search Engines are finding tools developed for improving the search performance by submitting user queries to multiple search engines and combining the different search results in a unified ranked list. The effectiveness of a Meta search engine is closely related to the result merging strategy it employs. But nowadays, the main issue in the conception of such systems is the merging strategy of the returned results. With only the user query as relevant information about his information needs, it’s hard to use it to find the best ranking of the merged results. We present in this paper a new strategy of merging multiple search engine results using only the user query as a relevance criterion. We propose a new score function combining the similarity between user query and retrieved results and the users’ satisfaction toward used search engines. The proposed Meta search engine can be used for merging search results of any set of search engines. 1 INTRODUCTION advantages of MSEs are their abilities to combine the coverage of multiple search engines and to reach Nowadays, the World Wide Web is considered as the deep Web.
    [Show full text]
  • Internet Research
    Discipline Specific Researching on the Internet The internet is growing exponentially, and thousands of new web pages are being added each day. The upside is that you have an enormous amount of information only a few mouse clicks away. The downside is that you must refine your approach to online research in order to target the handful of websites that may be useful to you. The following search tops are designed to get you stared and save you time. Happy hunting! Finding Sources Go to www.library.unh.edu The University of New Hampshire Library has access to dozens of online databases catering to nearly every subject you may be studying. The site also has several online research guides and tools to ensure you'll find what you need. The library’s web site is a good first choice to help narrow your research. Try several search engines Google is the most popular search engine in the world, so that is a good place to start. There are, however, other search engines that might be of use to you: Altavista.com askjeeves.com ditto.com excite.com metacrawler.com dogpile.com Alltheweb.com yahoo.com Use the advanced search function Most search engines have a link for “advanced search.” This can be extremely useful in narrowing your search, as the additional features enable you to search by exact phrase, date range, domain, language, date of web page update, and more. Choose your search words carefully Use words that are specific, or unique, to what you are looking for. Some words are very common and will lead to far too many hits.
    [Show full text]
  • Meta Search Engine with an Intelligent Interface for Information Retrieval on Multiple Domains
    International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.4, October 2011 META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON MULTIPLE DOMAINS D.Minnie1, S.Srinivasan2 1Department of Computer Science, Madras Christian College, Chennai, India [email protected] 2Department of Computer Science and Engineering, Anna University of Technology Madurai, Madurai, India [email protected] ABSTRACT This paper analyses the features of Web Search Engines, Vertical Search Engines, Meta Search Engines, and proposes a Meta Search Engine for searching and retrieving documents on Multiple Domains in the World Wide Web (WWW). A web search engine searches for information in WWW. A Vertical Search provides the user with results for queries on that domain. Meta Search Engines send the user’s search queries to various search engines and combine the search results. This paper introduces intelligent user interfaces for selecting domain, category and search engines for the proposed Multi-Domain Meta Search Engine. An intelligent User Interface is also designed to get the user query and to send it to appropriate search engines. Few algorithms are designed to combine results from various search engines and also to display the results. KEYWORDS Information Retrieval, Web Search Engine, Vertical Search Engine, Meta Search Engine. 1. INTRODUCTION AND RELATED WORK WWW is a huge repository of information. The complexity of accessing the web data has increased tremendously over the years. There is a need for efficient searching techniques to extract appropriate information from the web, as the users require correct and complex information from the web. A Web Search Engine is a search engine designed to search WWW for information about given search query and returns links to various documents in which the search query’s key words are found.
    [Show full text]
  • Meta Search Engine Examples
    Meta Search Engine Examples mottlesMarlon istemerariously unresolvable or and unhitches rice ichnographically left. Salted Verney while crowedanticipated no gawk Horst succors underfeeding whitherward and naphthalising. after Jeremy Chappedredetermines and acaudalfestively, Niels quite often sincipital. globed some Schema conflict can be taken the meta descriptions appear after which result, it later one or can support. Would result for updating systematic reviews from different business view all fields need to our generated usually negotiate the roi. What is hacking or hacked content? This meta engines! Search Engines allow us to filter the tons of information available put the internet and get the bid accurate results And got most people don't. Best Meta Search array List The Windows Club. Search engines have any category, google a great for a suggestion selection has been shown in executive search input from health. Search engine name of their booking on either class, the sites can select a search and generally, meaning they have past the systematisation of. Search Engines Corner Meta-search Engines Ariadne. Obsession of search engines such as expedia, it combines the example, like the answer about search engines out there were looking for. Test Embedded Software IC Design Intellectual Property. Using Research Tools Web Searching OCLS. The meta description for each browser settings to bing, boolean logic always prevent them the hierarchy does it displays the search engine examples osubject directories. Online travel agent Bookingcom has admitted that playing has trouble to compensate customers whose personal details have been stolen Guests booking hotel rooms have unwittingly handed over business to criminals Bookingcom is go of the biggest online travel agents.
    [Show full text]
  • C:\Users\Boots\Dropbox\Research\Spring 2014\Rebuilt CCCENL
    Using Metasearch Engines for Chemistry-II Part of The Alchemist's Lair Web Site Maintained by Harry E. Pence, Professor of Chemistry, SUNY Oneonta, for the use of his students. Any opinions are totally coincidental and have no official endorsement, including the people who sign my pay checks. Comments and suggestions are welcome ([email protected]). Last Revised Sept. 30, 2004 YOU ARE HERE> Alchemist's Lair > Web Tutorial > Meta Search Engines for the WWW. (This article appeared in the Fall 2004 issue of the Computers in Chemical Education Newsletter.) Meta Search Engines for the WWW, Harry E. Pence, SUNY Oneonta, Oneonta, NY, [email protected] Introduction The mergers and realignments of the past year are still churning the world of search engines. As if this were not enough, Amazon has recently announced that it is developing its own search engine that will make it much easier for an individual to manage his or her information resources. Until this situation becomes more stable, any attempt to compare the different engines is as fruitless as trying to nail plain jello on the wall. On the other hand, there have been some interesting developments in the world of metasearch engines, and it has been two years since this topic was discussed in these reviews (see Using Metasearch Engines for Chemistry-I). As noted in this previous article, the basic idea of a metasearch engine seems very attractive. If it is true that few search engines cover the entire searchable web (not to speak of the portions of the WWW that cannot be accessed by spiders), it seems like a very good strategy to combine the results from several different engines to obtain greater coverage and create a correspondingly greater chance of finding the best reference to the topic being searched.
    [Show full text]
  • Google Overview Created by Phil Wane
    Google Overview Created by Phil Wane PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 30 Nov 2010 15:03:55 UTC Contents Articles Google 1 Criticism of Google 20 AdWords 33 AdSense 39 List of Google products 44 Blogger (service) 60 Google Earth 64 YouTube 85 Web search engine 99 User:Moonglum/ITEC30011 105 References Article Sources and Contributors 106 Image Sources, Licenses and Contributors 112 Article Licenses License 114 Google 1 Google [1] [2] Type Public (NASDAQ: GOOG , FWB: GGQ1 ) Industry Internet, Computer software [3] [4] Founded Menlo Park, California (September 4, 1998) Founder(s) Sergey M. Brin Lawrence E. Page Headquarters 1600 Amphitheatre Parkway, Mountain View, California, United States Area served Worldwide Key people Eric E. Schmidt (Chairman & CEO) Sergey M. Brin (Technology President) Lawrence E. Page (Products President) Products See list of Google products. [5] [6] Revenue US$23.651 billion (2009) [5] [6] Operating income US$8.312 billion (2009) [5] [6] Profit US$6.520 billion (2009) [5] [6] Total assets US$40.497 billion (2009) [6] Total equity US$36.004 billion (2009) [7] Employees 23,331 (2010) Subsidiaries YouTube, DoubleClick, On2 Technologies, GrandCentral, Picnik, Aardvark, AdMob [8] Website Google.com Google Inc. is a multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products,[9] and generates profit primarily from advertising through its AdWords program.[5] [10] The company was founded by Larry Page and Sergey Brin, often dubbed the "Google Guys",[11] [12] [13] while the two were attending Stanford University as Ph.D.
    [Show full text]
  • List of Search Engines
    A blog network is a group of blogs that are connected to each other in a network. A blog network can either be a group of loosely connected blogs, or a group of blogs that are owned by the same company. The purpose of such a network is usually to promote the other blogs in the same network and therefore increase the advertising revenue generated from online advertising on the blogs.[1] List of search engines From Wikipedia, the free encyclopedia For knowing popular web search engines see, see Most popular Internet search engines. This is a list of search engines, including web search engines, selection-based search engines, metasearch engines, desktop search tools, and web portals and vertical market websites that have a search facility for online databases. Contents 1 By content/topic o 1.1 General o 1.2 P2P search engines o 1.3 Metasearch engines o 1.4 Geographically limited scope o 1.5 Semantic o 1.6 Accountancy o 1.7 Business o 1.8 Computers o 1.9 Enterprise o 1.10 Fashion o 1.11 Food/Recipes o 1.12 Genealogy o 1.13 Mobile/Handheld o 1.14 Job o 1.15 Legal o 1.16 Medical o 1.17 News o 1.18 People o 1.19 Real estate / property o 1.20 Television o 1.21 Video Games 2 By information type o 2.1 Forum o 2.2 Blog o 2.3 Multimedia o 2.4 Source code o 2.5 BitTorrent o 2.6 Email o 2.7 Maps o 2.8 Price o 2.9 Question and answer .
    [Show full text]
  • Architecture Based Study of Search Engines and Meta Search Engines for Information Retrieval A
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 2 Issue 5, May - 2013 Architecture Based Study Of Search Engines And Meta Search Engines For Information Retrieval A. Madhavi1,K. Harisha Chari2 1Asst.Professor, Matrusri Institute of PG Studies, Hyderabad, India. 2Asst.Professor, K.V Ranga Reddy College, Hyderabad, India. Abstract not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to WWW is a huge repository of information. republish to post on servers or to redistribute to The complexity of accessing the web data has lists, requires prior specific permission and/or a fee. increased tremendously over the years. There is Single search engine are increased coverage and a a need for efficient searching techniques to consistent interface. A recent study by Lawrence extract appropriate information from the web, as and Giles estimated the size of the web at about 800 million indexable pages. This same study the users require correct and complex concluded that no single search engine covered information from the web. more than about sixteen percent of the total. By This paper analyses the architectures and searching multiple search engines simultaneously features of metasearch engines for searching and via a metasearch engine, coverage increases receiving documents on single and multiples dramatically over searching only one engine. domains on the web. A web search engine searches Lawrence and Giles found that combining the for information in WWW. results of 11 major search engines increased the coverage to about 42% of the estimated size of the 1.
    [Show full text]
  • CS 101 - Intro to Computer Science Such Software Has Capabilities Such As: Search Engines Organizing an Index to the Information in a Database
    What is a Search Engine? A search engine is a computer program that allows a user to submit a query and retrieve information from a database. CS 101 - Intro to Computer Science Such software has capabilities such as: Search Engines organizing an index to the information in a database, Dr. Stephen P. Carl enabling the formulation of a query by a user, and searching the index in response to a query. Web Crawlers - Mining Web Information Types of web search engines A search engine for the WWW uses a program called a robot or spider to index the information on Web pages. Topical search engines organize their catalogs of sites by topic or subject. Examples are Yahoo! and Lycos a2z. Spiders work by following all the links on a page according to a specific search strategy. The simplest strategy is to collect all links from a page and then follow them, one by one, collecting new links as new pages are visited. The content of each page is added to a database. The database is indexed and this index is searched upon receipt of a query from the user; the search engine then presents a sorted list of matching results to the user. 1-4 Types of web search engines Types of web search engines Keyword or Key Phrase search engines let the user specify a set of Metasearch Engines send queries to several other search engines keywords or phrases related to the desired content. and consolidate the results. Some, such as ProFusion, filter the results to remove duplicates and check the validity of the links.
    [Show full text]