Intelligent Web Agent for Search Engines

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent Web Agent for Search Engines International Conference on Trends and Advances in Computation and Engineering, TRACE- 2010 Intelligent Web Agent for Search Engines Avinash N. Bhute1,Harsha A. Bhute2 , Dr.B.B.Meshram3 Abstract II. WEB CRAWLER In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on A Web crawler is a computer program that browses the the Web. Search engines are retrieve the efficient information. We World Wide Web in a methodical, automated manner. Other collected data on the Internet from several different sources, e.g., terms for Web crawlers are ants, automatic indexers, bots, and current as well as projected number of users, hosts, and Web sites. worm [3] or Web spider, Web robot. Web crawling or The trends cited by the sources are consistent and point to spidering is term alternatively used for same. Many sites, in exponential growth in the past and in the coming decade. Hence it is particular search engines, use spidering as a means of not surprising that about 85% of Internet users surveyed claim using search engines and search services to find specific information and providing up-to-date data. Web crawlers are mainly used to users are not satisfied with the performance of the current generation create a copy of all the visited pages for later processing by a of search engines; the slow retrieval speed, communication delays, search engine that will index the downloaded pages to provide and poor quality of retrieved results. Web agents, programs acting fast searches. Crawlers can also be used for automating autonomously on some task, are already present in the form of maintenance tasks on a Web site, such as checking links or spiders, crawler, and robots. Agents offer substantial benefits and validating HTML code. Also, crawlers can be used to gather hazards, and because of this, their development must involve specific types of information from Web pages, such as attention to technical details. This paper illustrates the different harvesting e-mail addresses (usually for spam). A Web types of agents ,crawlers, robots ,etc for mining the contents of web crawler is one type of bot, or software agent. In general, it in a methodical, automated manner, also discusses the use of crawler to gather specific types of information from Web pages, such as starts with a list of URLs to visit, called the seeds. As the harvesting e-mail addresses. crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the Keywords: Chatterbot, spiders, Web agents, web crawler, , crawl frontier. URLs from the frontier are recursively visited Web robots. according to a set of policies. I. INTRODUCTION A. Crawling policies The potential Internet-wide impact of autonomous software There are important characteristics of the Web that make agents on the World Wide Web [1] has spawned a great deal crawling it very difficult: of discussion and occasional controversy. Based upon Large volume, experience in the design and operation of the Spider [2], such Fast rate of change, and tools can create substantial value to users of the Web. Dynamic page generation Unfortunately, agents can also be pests, generating substantial These characteristics combine to produce a wide variety of loads on already overloaded servers, and generally increasing possible crawlable pages. Internet backbone traffic. Much of the discussion in this paper The large volume implies that the crawler can only is about the agents, its types and architecture. download a fraction of the Web pages within a given time. The behavior of a Web crawler is the outcome of a combination of following policies: Selection Policy Re-Visit Policy Politeness Policy Parallelization Policy i. Selection policy It states which pages to download, As a crawler always Avinash N Bhute1, Author is with Sinhgad College of Engineering, Department of information technology as Assistant Professsor, Pune, downloads just a fraction of the Web pages, The current size Maharastra India. email- [email protected]. of the Web, evens large search engines cover only a portion of Harsha A Bhute2 is Pursuing ME, Computer Engg. (Computer N/W), the publicly-available Internet; a study by Lawrence and Giles S.C.O.E, Pune Maharastra, India. [email protected]. showed that no search engine indexes more than 16% of the Dr.B.B.Meshram3 is professor and head of computer engineering department. VJTI, Mumbai, Maharastra, India [email protected] Web in 1999[4].It is highly desirable that the downloaded 87 International Conference on Trends and Advances in Computation and Engineering, TRACE- 2010 fraction contains the most relevant pages and not just a large files, a server would have a hard time keeping up with random sample of the Web. Cho and other author. made the requests from multiple crawlers. first study on policies for crawling scheduling. Their data set As noted by Koster[9], the use of Web crawlers is useful for was a 180,000-pages crawl from the stanford.edu domain, in a number of tasks, but comes with a price for the general which a crawling simulation was done with different community. The costs of using Web crawlers include: strategies.[5] The ordering metrics tested were breadth-first, Network resources, as crawlers require considerable backlink-count and partial Pagerank calculations. One of the bandwidth and operate with a high degree of parallelism conclusions was that if the crawler wants to download pages during a long period of time; with high Pagerank early during the crawling process, then the Server overload, especially if the frequency of accesses to partial Pagerank strategy is the better, followed by breadth- a given server is too high; first and backlink-count. However, these results are for just a Poorly-written crawlers, which can crash servers or single domain. routers, or which download pages they cannot handle; and Najork and Wiener [6] performed an actual crawl on 328 Personal crawlers that, if deployed by too many users, can million pages, using breadth-first ordering. They found that a disrupt networks and Web servers. breadth-first crawl captures pages with high Pagerank early in A partial solution to these problems is the robots exclusion the crawl (but they did not compare this strategy against other protocol, also known as the robots.txt protocol[10] that is a strategies). The explanation given by the authors for this result standard for administrators to indicate which parts of their is that "the most important pages have many links to them Web servers should not be accessed by crawlers. This standard from numerous hosts, and those links will be found early, does not include a suggestion for the interval of visits to the regardless of on which host or page the crawl originates. same server, even though this interval is the most effective way of avoiding server overload. Recently commercial search ii.Re-visit Policy engines like Ask Jeeves, MSN and Yahoo are able to use an extra "Crawl-delay:" parameter in the robots.txt file to indicate It states that to check for changes to the pages. The Web the number of seconds to delay between requests. has a very dynamic nature, and crawling a fraction of the Web can take a really long time, usually measured in weeks or iv. Parallelization Policy months. By the time a Web crawler has finished its crawl, many events could have happened. These events can include It states how to coordinate distributed Web crawlers. A creations, updates and deletions. parallel crawler is a crawler that runs multiple processes in From the search engine's point of view, there is a cost parallel. The goal is to maximize the download rate while associated with not detecting an event, and thus having an minimizing the overhead from parallelization and to avoid outdated copy of a resource. The most-used cost functions are repeated downloads of the same page. To avoid downloading freshness and age[7]. Two simple re-visiting policies were the same page more than once, the crawling system requires a studied by Cho and Garcia-Molina[8]: policy for assigning the new URLs discovered during the Uniform policy: This involves re-visiting all pages in the crawling process, as the same URL can be found by two collection with the same frequency, regardless of their rates of different crawling processes. change. Proportional policy: This involves re-visiting more often III. WEB CRAWLER ARCHITECTURE the pages that change more frequently. The visiting frequency is directly proportional to the (estimated) change frequency. Web crawlers are a central part of search engines, and (In both cases, the repeated crawling order of pages can be details on their algorithms and architecture are kept as done either in a random or a fixed order.) business secrets. When crawler designs are published, there is Cho and Garcia-Molina proved the surprising result that, in often an important lack of detail that prevents others from terms of average freshness, the uniform policy outperforms the reproducing the work. There are also emerging concerns about proportional policy in both a simulated Web and a real Web "search engine spamming", which prevent major search crawl. The explanation for this result comes from the fact that, engines from publishing their ranking algorithms. when a page changes too often, the crawler will waste time by trying to re-crawl it too fast and still will not be able to keep its copy of the page fresh iii. Politeness Policy It states how to avoid overloading Web sites. Crawlers can retrieve data much quicker and in greater depth than human searchers, so they can have a crippling impact on the performance of a site.
Recommended publications
  • Building a Scalable Index and a Web Search Engine for Music on the Internet Using Open Source Software
    Department of Information Science and Technology Building a Scalable Index and a Web Search Engine for Music on the Internet using Open Source software André Parreira Ricardo Thesis submitted in partial fulfillment of the requirements for the degree of Master in Computer Science and Business Management Advisor: Professor Carlos Serrão, Assistant Professor, ISCTE-IUL September, 2010 Acknowledgments I should say that I feel grateful for doing a thesis linked to music, an art which I love and esteem so much. Therefore, I would like to take a moment to thank all the persons who made my accomplishment possible and hence this is also part of their deed too. To my family, first for having instigated in me the curiosity to read, to know, to think and go further. And secondly for allowing me to continue my studies, providing the environment and the financial means to make it possible. To my classmate André Guerreiro, I would like to thank the invaluable brainstorming, the patience and the help through our college years. To my friend Isabel Silva, who gave me a precious help in the final revision of this document. Everyone in ADETTI-IUL for the time and the attention they gave me. Especially the people over Caixa Mágica, because I truly value the expertise transmitted, which was useful to my thesis and I am sure will also help me during my professional course. To my teacher and MSc. advisor, Professor Carlos Serrão, for embracing my will to master in this area and for being always available to help me when I needed some advice.
    [Show full text]
  • Risk Management Resolutions Easier Than a Diet; Good for the Health of Your Nonprofit by Melanie L
    A publication of the Nonprofit Risk Management Center Volume 14, No. 1, January/February 2005 Risk Management Resolutions Easier than a diet; good for the health of your nonprofit by Melanie L. Herman ere you one of thousands dilemma you’re facing and the of Americans who received a solution we recommend. To W gym membership gift tucked access this free service, visit neatly in a card from a loved one? www.nonprofitrisk.org and Across the country millions of click on the ADVICE tab. Or Americans are jotting down resolutions, give us a call at (202) 785-3891. most of which have something to do with the three Cs: calories, cardio- Resolution #1 workouts, or carcinogens. Blow the Dust off Resolutions about the three health- inducing Cs are awfully tough to keep, Your Policies as are risk management resolutions that Throughout the year the could change the health of your Nonprofit Risk Management nonprofit. Does the organization you Center receives phone calls from serve have a mission worth preserving? chief financial officers, Are there any risks lurking in your continued on page 2 nonprofit’s future that could spell significant set-back or disaster? Are you losing any sleep about risks related to HR, financial management, fundraising, NEW! Risk Management Essentials Series reputation, or staff/participant injuries? This article offers some simple but Visit www.nonprofitrisk.org to check out a important Risk Management special offer from the Nonprofit Risk Resolutions for 2005. The resolutions can be easily adapted to reflect the Management Center. The Risk Management circumstances and resources of your Essentials Series takes the guesswork out of nonprofit.
    [Show full text]
  • Open Search Environments: the Free Alternative to Commercial Search Services
    Open Search Environments: The Free Alternative to Commercial Search Services. Adrian O’Riordan ABSTRACT Open search systems present a free and less restricted alternative to commercial search services. This paper explores the space of open search technology, looking in particular at lightweight search protocols and the issue of interoperability. A description of current protocols and formats for engineering open search applications is presented. The suitability of these technologies and issues around their adoption and operation are discussed. This open search approach is especially useful in applications involving the harvesting of resources and information integration. Principal among the technological solutions are OpenSearch, SRU, and OAI-PMH. OpenSearch and SRU realize a federated model to enable content providers and search clients communicate. Applications that use OpenSearch and SRU are presented. Connections are made with other pertinent technologies such as open-source search software and linking and syndication protocols. The deployment of these freely licensed open standards in web and digital library applications is now a genuine alternative to commercial and proprietary systems. INTRODUCTION Web search has become a prominent part of the Internet experience for millions of users. Companies such as Google and Microsoft offer comprehensive search services to users free with advertisements and sponsored links, the only reminder that these are commercial enterprises. Businesses and developers on the other hand are restricted in how they can use these search services to add search capabilities to their own websites or for developing applications with a search feature. The closed nature of the leading web search technology places barriers in the way of developers who want to incorporate search functionality into applications.
    [Show full text]
  • Efficient Focused Web Crawling Approach for Search Engine
    Ayar Pranav et al, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.5, May- 2015, pg. 545-551 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IJCSMC, Vol. 4, Issue. 5, May 2015, pg.545 – 551 RESEARCH ARTICLE Efficient Focused Web Crawling Approach for Search Engine 1 2 Ayar Pranav , Sandip Chauhan Computer & Science Engineering, Kalol Institute of Technology and Research Canter, Kalol, Gujarat, India 1 [email protected]; 2 [email protected] Abstract— a focused crawler traverses the web, selecting out relevant pages to a predefined topic and neglecting those out of concern. Collecting domain specific documents using focused crawlers has been considered one of most important strategies to find relevant information. While surfing the internet, it is difficult to deal with irrelevant pages and to predict which links lead to quality pages. However most focused crawler use local search algorithm to traverse the web space, but they could easily trapped within limited a sub graph of the web that surrounds the starting URLs also there is problem related to relevant pages that are miss when no links from the starting URLs. There is some relevant pages are miss. To address this problem we design a focused crawler where calculating the frequency of the topic keyword also calculate the synonyms and sub synonyms of the keyword. The weight table is constructed according to the user query. To check the similarity of web pages with respect to topic keywords and priority of extracted link is calculated.
    [Show full text]
  • Distributed Indexing/Searching Workshop Agenda, Attendee List, and Position Papers
    Distributed Indexing/Searching Workshop Agenda, Attendee List, and Position Papers Held May 28-19, 1996 in Cambridge, Massachusetts Sponsored by the World Wide Web Consortium Workshop co-chairs: Michael Schwartz, @Home Network Mic Bowman, Transarc Corp. This workshop brings together a cross-section of people concerned with distributed indexing and searching, to explore areas of common concern where standards might be defined. The Call For Participation1 suggested particular focus on repository interfaces that support efficient and powerful distributed indexing and searching. There was a great deal of interest in this workshop. Because of our desire to limit attendance to a workable size group while maximizing breadth of attendee backgrounds, we limited attendance to one person per position paper, and furthermore we limited attendance to one person per institution. In some cases, attendees submitted multiple position papers, with the intention of discussing each of their projects or ideas that were relevant to the workshop. We had not anticipated this interpretation of the Call For Participation; our intention was to use the position papers to select participants, not to provide a forum for enumerating ideas and projects. As a compromise, we decided to choose among the submitted papers and allow multiple position papers per person and per institution, but to restrict attendance as noted above. Hence, in the paper list below there are some cases where one author or institution has multiple position papers. 1 http://www.w3.org/pub/WWW/Search/960528/cfp.html 1 Agenda The Distributed Indexing/Searching Workshop will span two days. The first day's goal is to identify areas for potential standardization through several directed discussion sessions.
    [Show full text]
  • Natural Language Processing Technique for Information Extraction and Analysis
    International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 2, Issue 8, August 2015, PP 32-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Natural Language Processing Technique for Information Extraction and Analysis T. Sri Sravya1, T. Sudha2, M. Soumya Harika3 1 M.Tech (C.S.E) Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] 2 Head (I/C) of C.S.E & IT Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] 3 M. Tech C.S.E, Assistant Professor, Sri Padmavati Mahila Visvavidyalayam (Women’s University), School of Engineering and Technology, Tirupati. [email protected] Abstract: In the current internet era, there are a large number of systems and sensors which generate data continuously and inform users about their status and the status of devices and surroundings they monitor. Examples include web cameras at traffic intersections, key government installations etc., seismic activity measurement sensors, tsunami early warning systems and many others. A Natural Language Processing based activity, the current project is aimed at extracting entities from data collected from various sources such as social media, internet news articles and other websites and integrating this data into contextual information, providing visualization of this data on a map and further performing co-reference analysis to establish linkage amongst the entities. Keywords: Apache Nutch, Solr, crawling, indexing 1. INTRODUCTION In today’s harsh global business arena, the pace of events has increased rapidly, with technological innovations occurring at ever-increasing speed and considerably shorter life cycles.
    [Show full text]
  • An Ontology-Based Web Crawling Approach for the Retrieval of Materials in the Educational Domain
    An Ontology-based Web Crawling Approach for the Retrieval of Materials in the Educational Domain Mohammed Ibrahim1 a and Yanyan Yang2 b 1School of Engineering, University of Portsmouth, Anglesea Road, PO1 3DJ, Portsmouth, United Kingdom 2School of Computing, University of Portsmouth, Anglesea Road, PO1 3DJ, Portsmouth, United Kingdom Keywords: Web Crawling, Ontology, Education Domain. Abstract: As the web continues to be a huge source of information for various domains, the information available is rapidly increasing. Most of this information is stored in unstructured databases and therefore searching for relevant information becomes a complex task and the search for pertinent information within a specific domain is time-consuming and, in all probability, results in irrelevant information being retrieved. Crawling and downloading pages that are related to the user’s enquiries alone is a tedious activity. In particular, crawlers focus on converting unstructured data and sorting this into a structured database. In this paper, among others kind of crawling, we focus on those techniques that extract the content of a web page based on the relations of ontology concepts. Ontology is a promising technique by which to access and crawl only related data within specific web pages or a domain. The methodology proposed is a Web Crawler approach based on Ontology (WCO) which defines several relevance computation strategies with increased efficiency thereby reducing the number of extracted items in addition to the crawling time. It seeks to select and search out web pages in the education domain that matches the user’s requirements. In WCO, data is structured based on the hierarchical relationship, the concepts which are adapted in the ontology domain.
    [Show full text]
  • A Hadoop Based Platform for Natural Language Processing of Web Pages and Documents
    Journal of Visual Languages and Computing 31 (2015) 130–138 Contents lists available at ScienceDirect Journal of Visual Languages and Computing journal homepage: www.elsevier.com/locate/jvlc A hadoop based platform for natural language processing of web pages and documents Paolo Nesi n,1, Gianni Pantaleo 1, Gianmarco Sanesi 1 Distributed Systems and Internet Technology Lab, DISIT Lab, Department of Information Engineering (DINFO), University of Florence, Firenze, Italy article info abstract Available online 30 October 2015 The rapid and extensive pervasion of information through the web has enhanced the Keywords: diffusion of a huge amount of unstructured natural language textual resources. A great Natural language processing interest has arisen in the last decade for discovering, accessing and sharing such a vast Hadoop source of knowledge. For this reason, processing very large data volumes in a reasonable Part-of-speech tagging time frame is becoming a major challenge and a crucial requirement for many commercial Text parsing and research fields. Distributed systems, computer clusters and parallel computing Web crawling paradigms have been increasingly applied in the recent years, since they introduced Big Data Mining significant improvements for computing performance in data-intensive contexts, such as Parallel computing Big Data mining and analysis. Natural Language Processing, and particularly the tasks of Distributed systems text annotation and key feature extraction, is an application area with high computational requirements; therefore, these tasks can significantly benefit of parallel architectures. This paper presents a distributed framework for crawling web documents and running Natural Language Processing tasks in a parallel fashion. The system is based on the Apache Hadoop ecosystem and its parallel programming paradigm, called MapReduce.
    [Show full text]
  • Curlcrawler:A Framework of Curlcrawler and Cached Database for Crawling the Web with Thumb
    Ela Kumaret al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 2 (4) , 2011, 1700-1705 CurlCrawler:A framework of CurlCrawler and Cached Database for crawling the web with thumb. Dr Ela Kumar#,Ashok Kumar$ # School of Information and Communication Technology, Gautam Buddha University, Greater Noida $ AIM & ACT.Banasthali University,Banasthali(Rajasthan.) Abstract-WWW is a vast source of information and search engines about basic modules of search engine functioning and these are the meshwork to navigate the web for several purposes. It is essential modules are (see Fig.1)[9,10]. problematic to identify and ping with graphical frame of mind for the desired information amongst the large set of web pages resulted by the search engine. With further increase in the size of the Internet, the problem grows exponentially. This paper is an endeavor to develop and implement an efficient framework with multi search agents to make search engines inapt to chaffing using curl features of the programming, called CurlCrawler. This work is an implementation experience for use of graphical perception to upright search on the web. Main features of this framework are curl programming, personalization of information, caching and graphical perception. Keywords and Phrases-Indexing Agent, Filtering Agent, Interacting Agent, Seolinktool, Thumb, Whois, CachedDatabase, IECapture, Searchcon, Main_spider, apnasearchdb. 1. INTRODUCTION Information is a vital role playing versatile thing from availability at church level to web through trends of books. WWW is now the prominent and up-to-date huge repository of information available to everyone, everywhere and every time. Moreover information on web is navigated using search Fig.1: Components of Information Retrieval System engines like AltaVista, WebCrawler, Hot Boat etc[1].Amount of information on the web is growing at very fast and owing to this search engine’s optimized functioning is always a thrust Store: It stores the crawled pages.
    [Show full text]
  • Meta Search Engine with an Intelligent Interface for Information Retrieval on Multiple Domains
    International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.1, No.4, October 2011 META SEARCH ENGINE WITH AN INTELLIGENT INTERFACE FOR INFORMATION RETRIEVAL ON MULTIPLE DOMAINS D.Minnie1, S.Srinivasan2 1Department of Computer Science, Madras Christian College, Chennai, India [email protected] 2Department of Computer Science and Engineering, Anna University of Technology Madurai, Madurai, India [email protected] ABSTRACT This paper analyses the features of Web Search Engines, Vertical Search Engines, Meta Search Engines, and proposes a Meta Search Engine for searching and retrieving documents on Multiple Domains in the World Wide Web (WWW). A web search engine searches for information in WWW. A Vertical Search provides the user with results for queries on that domain. Meta Search Engines send the user’s search queries to various search engines and combine the search results. This paper introduces intelligent user interfaces for selecting domain, category and search engines for the proposed Multi-Domain Meta Search Engine. An intelligent User Interface is also designed to get the user query and to send it to appropriate search engines. Few algorithms are designed to combine results from various search engines and also to display the results. KEYWORDS Information Retrieval, Web Search Engine, Vertical Search Engine, Meta Search Engine. 1. INTRODUCTION AND RELATED WORK WWW is a huge repository of information. The complexity of accessing the web data has increased tremendously over the years. There is a need for efficient searching techniques to extract appropriate information from the web, as the users require correct and complex information from the web. A Web Search Engine is a search engine designed to search WWW for information about given search query and returns links to various documents in which the search query’s key words are found.
    [Show full text]
  • Effective Focused Crawling Based on Content and Link Structure Analysis
    (IJCSIS) International Journal of Computer Science and Information Security, Vol. 2, No. 1, June 2009 Effective Focused Crawling Based on Content and Link Structure Analysis Anshika Pal, Deepak Singh Tomar, S.C. Shrivastava Department of Computer Science & Engineering Maulana Azad National Institute of Technology Bhopal, India Emails: [email protected] , [email protected] , [email protected] Abstract — A focused crawler traverses the web selecting out dependent ontology, which in turn strengthens the metric used relevant pages to a predefined topic and neglecting those out of for prioritizing the URL queue. The Link-Structure-Based concern. While surfing the internet it is difficult to deal with method is analyzing the reference-information among the pages irrelevant pages and to predict which links lead to quality pages. to evaluate the page value. This kind of famous algorithms like In this paper, a technique of effective focused crawling is the Page Rank algorithm [5] and the HITS algorithm [6]. There implemented to improve the quality of web navigation. To check are some other experiments which measure the similarity of the similarity of web pages w.r.t. topic keywords, a similarity page contents with a specific subject using special metrics and function is used and the priorities of extracted out links are also reorder the downloaded URLs for the next crawl [7]. calculated based on meta data and resultant pages generated from focused crawler. The proposed work also uses a method for A major problem faced by the above focused crawlers is traversing the irrelevant pages that met during crawling to that it is frequently difficult to learn that some sets of off-topic improve the coverage of a specific topic.
    [Show full text]
  • Uva-DARE (Digital Academic Repository)
    UvA-DARE (Digital Academic Repository) Search engine freedom: on the implications of the right to freedom of expression for the legal governance of Web search engines van Hoboken, J.V.J. Publication date 2012 Link to publication Citation for published version (APA): van Hoboken, J. V. J. (2012). Search engine freedom: on the implications of the right to freedom of expression for the legal governance of Web search engines. General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:29 Sep 2021 Chapter 2: A Short History of Search Engines and Related Market Developments 22 2.1 The Internet, the Web and the rise of navigational media 2.1.1. Early visions of navigation in digitized information environments The way in which digital computing would lead to a revolution in information and knowledge navigation was already being explored more than half a century ago, when computers were still a rarity and neither the Internet, nor the World Wide Web existed.
    [Show full text]