Development Team
Total Page:16
File Type:pdf, Size:1020Kb
Paper No: 05 ICT for Libraries Principal Investigator Dr. Jagdish Arora, Director Module : 13 Search& Engines, Concept, Types and Advantages Subject Coordinator INFLIBNET Centre, Gandhinagar Development Team PrincipalPaper Coordinator Investigator Dr. Jagdish Arora, Director & INFLIBNET Centre, Gandhinagar Subject Coordinator Content Writer Dr Usha Mujoo Munshi, Librarian, Paper Coordinator Indian Institute of Public Administration Content Reviewer Mr Dinesh Pradhan and Dr Jagdish Arora, Content Writer Scientist B and Director, INFLIBNET Centre Dr Usha Mujoo Munshi, Librarian, Content Reviewer Indian Institute of Public Administration PaperContent Coordinator Reviewer Search Engines, Concept, Types and Advantages I. Objectives This lesson is designed to impart knowledge on the following components on Internet search engines: • Search engines and their evolution; • How search engines work? and components of a search engine; • Categories of search engines; • Search techniques; • Meta data and search engines; • Evaluation of search engines; and • Important search engines. II. Learning Outcomes On completion of this lesson, learners would attain knowledge about basics of search engines. They would learn about the evolution of search engines, functions, components and categories of search engines. They would also learn about important search engines in brief. III. Module Structure 1. Introduction 2. Search Engines: Definition 3. Evolution of Search Engines 4. How Does Search Engines Work? 4.1 The Robot or Spider 4.2 The Database 4.3 The User Interface or the Agent 5. Categories of Search Engines 5.1 Primary Search Engines 5.2 Meta Search Engines 5.3 Specialised Search Engines 5.4 Subject or Web Directories 5.5 Hybrid Search Engines 5.6 Subject Gateways or Subject Portals6. Choosing a Search Engine 6. Choosing a Search Engine 6.1 Ease of Use 1 6.2 Comprehensiveness 6.3 Quality of Content 6.4 Control Over the Search 6.5 Flexibility in Searching 6.6 Assessment of Relevance 6.7 Informative Presentation of Results 7. Searching the Web: Search Techniques 7.1 Basic Search 7.2 Advanced Search or Refining Your Search 7.2.1 Boolean Operators 7.2.2 Phrase Searching 7.2.3 Proximity Searching 7.2.4 Parentheses 7.2.5 Truncation and Wildcards 7.2.6 Case Sensitivity 7.2.7 Field Searching 8. Evaluation of Search Engines 8.1 Database of Web Documents 8.2 Capabilities of a Search Engine 8.3 Basic Search Options and Limitations 8.4 Advanced Search Options and Limitations 8.5 General Limitations and Features 8.6 Results Display 9. Important Search Engines 9.1 Primary Search Engines 9.1.1 Google (http://www.google.com/) 9.1.2 Bing Search (http://www.bing.com/) 9.1.3 Yahoo! (http://www.yahoo.com/) 10. Summary 11. References 2 1. Introduction The growth of the Internet has led to a paradoxical situation. While on the one hand there is a huge amount of information available on the Internet, on the other hand sheer volume of unorganized information makes it difficult for the users to find relevant and accurate information in a speedy and efficient manner. The first Google index in 1998 had 26 million pages and it touched the mark of one billion by 2000. In mid 2008 it touched a new milestone with 1 trillion (as in 1,000,000,000,000) unique URLs. Internet can be said to be the most exhaustive, important and useful source of information on almost all aspects of knowledge hosted on millions of servers connected to Internet around the world. Searching for specific information is the main purpose of using Internet for several users. However, with availability of excessive information, it has become very difficult for a common user to search for precise and relevant information on the Internet. To tackle this situation, computer scientists came up with search tools that search through the information on the Internet to churn out required information by a user. There are varieties of search, resource discovery and browsing tools that has been developed to support more efficient information retrieval. Search engines are one of such discovery tools. Search engines use automated programs, variably called bots, robots, spiders, crawlers, wanderers and worms developed to search the web. The robots traverse the web in order to index web sites. Some of them index web sites by title, some by uniform resource locators (URLs), some by words in each document in a web site, and some by combinations of these. These search engines function in different ways and search different parts of the Internet. 2. Search Engines: Definition Search engine is a generic term used for the software that “searches” the web for pages relating to a specific query. Google, Yahoo and Bing are few examples of common search engines that index and search a significant part of the web. Several web sites have their own search engines to index their own websites. The World Wide Web has several sites dedicated to indexing of information on all other sites. A search engine can be defined as a tool for finding, classifying and storing information on various websites on the Internet. It can help in locating information of relevance on a particular subject by using various search methods. It is a service that indexes, organizes, and often rates and reviews 3 Web sites. It helps users to find the proverbial needle in the Internet haystack. Different search engines work in different ways. Some rely on people to maintain a catalogue of Web sites or web pages, others use software to identify key information on sites across the Internet. Some combine both types of service. Searching Internet with different search engines for the same topic, therefore, provide different results. 3. Evolution of Search Engines The Archie, developed in 1990 by Alan Emtage, a student at McGill University in Montreal, can be considered as the first search engine that was used for indexing and searching files on FTP server. Archie became a database of web filenames which it would match with the users queries. Inspired with the success of Archie, the University of Nevada developed Veronica in 1993. VERONICA (Very Easy Rodent-Oriented Netwide Index to Computerized Archives) was developed at the University of Nevada to search all menu items on Gopher servers. Soon another user interface named Jughead appeared with the same purpose as Veronica. The JugHead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a powerful gopher search tool written by Rhett “Jonzy” Jones. It was computer program that searched a specified Gopher site (not all). It searched directory titles only, not the text of resources that appeared on the Gopher submenus. Archie, VERONICA and Jughead have now disappeared, but before the web's spectacular growth, these tools were real workhorses for searchers on the Internet. Soon after launch of World Wide Web in 1993, the first robot, called World Wide Web Wanderer, was introduced by Matthew Gray to search the Web. In October 1993, Artijn Koster developed an Archie-like Indexing tool for the Web, called ALIWEB. It did not use a robot to collect the metadata, instead, it allowed users to submit the Web sites they wanted to be indexed by ALIWEB with their own descriptions and keywords. By December 1993, three full-fledged robot-fed search engines had surfaced on the web, i.e. JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) spider. JumpStation gathered information about the title and header from Web pages and retrieved them using a simple linear search. As the web grew, JumpStation slowed to a stop. The WWW Worm indexed titles and URLs. The JumpStation and the World Wide Web Worm did not use any ranking method to list their search results, rather results were listed in the order they were found. The RSBE spider did implement a ranking system. 4 The Excite was a by-product of the project called Architext that was started in 1993 by six Stanford undergraduates. They used statistical analysis of word relationships to make searching more efficient. The Excite search software was released by mid-1993. However, the technique used by the Excite seems irrelevant because the spiders were not intelligent enough to understand what all the links meant. The EINet Galaxy Web Directory was launched in January, 1994. The EINet Galaxy became a success since it also contained Gopher and Telnet search features in addition to its web search feature. In April 1994, David Filo and Jerry Yang created Yahoo as a collection of their favorite web pages. As their number of links grew, they had to reorganize and develop a searchable directory. The Yahoo directory provided description with each URL as an improvement to the Wanderer. Brian Pinkerton of the University of Washington launched the WebCrawler on April 20, 1994. It was the first crawler that indexed entire pages. In 1997, Excite bought out WebCrawler, and AOL began using Excite to power its NetFind. WebCrawler opened the door for many other services to follow the suit. Three important search engines, namely Lycos, Infoseek and OpenText appeared soon after Web Crawler was launched. Lycos was the next major search engine developed at Carnegie Mellon University in July 1994. On July 20, 1994, Lycos was launched with a catalogue of 54,000 documents. By August 1994, Lycos had identified 394,000 documents and by November 1996, Lycos had indexed over 60 million documents, more than any other Web search engine. In October 1994, Lycos ranked first on Netscape's list of search engines by finding the most hits on the word “surf”. Infoseek was also launched in 1994. In December 1995, Netscape started using Infoseek as its default search engine. AltaVista was also launched in Dec. 1995. It brought many important features to the web searching.