Artificial Life Applied to Adaptive Information Agents
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI Technical Report SS-95-08. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. Artificial Life Applied to Adaptive Information Agents Filippo Menczer Richard K. Belew Computer Science and Engineering Department, 0114 University of California, San Diego La Jolla, California 92093, USA {fil, rik} @cs.ucsd.edu Wolfram Willuhn CommunicationTechnology Lab, Image Science Group ETH-Zentrum, ETZ F86 8092 Zurich, Switzerland wolfram@ vision.ee.ethz.ch Abstract the complexity of such an environment. Traditional genetic Wepropose a model,inspired by recent artificial life theory, algorithms [Mitchell & Forrest 1994] are characterized by applied to the problemof retrieving information from a exploitation of information, but the distributed information large, distributed collection of documentssuch as the World gathering problem requires adaptation rather than WideWeb. A populationof agents is evolvedunder density optimization. Therefore we propose a model inspired by dependentselection for the task of locating informationfor the endogenous fitness metaphor to search for relevant the user. Theenergy necessary for survivalis obtainedfrom information through a population of intelligent agents both environmentand user in exchangefor appropriate evolving in the WWWenvironment. This paper presents information. By competingfor relevant documents,the various extensions to the work recently reported in agents robustly adapt to their informationenvironment and [Menczer, Willuhn & Belew 1994]. are allocated to efficiently exploit shared resources. We illustrate the roles playedby documentlocality, adaptive searchstrategies, andrelevance feedback, in the information Background gatheringprocess. The traditional solution to the problem of information retrieval (IR) on the WWWis to build a database index Introduction* all the documents, on which to use traditional search and retrieval techniques. Such virtual libraries are built and The World Wide Web (WWW)is an information updated off-line, either in a user-driven fashion or by environment madeof a very large distributed database of automated exhaustive programs, called spiders or robots. A heterogeneous documents, using a wide-area network and a well-known robot is the WWWWorm [McBryan 1994]. client-server protocol. The structure of this environmentis The off-line approachhas several drawbacks:first, the size that of a graph, where the nodes (documents)are connected of the WWWmakes it increasingly difficult to update by hyperlinks. The typical strategy for accessing virtual libraries without inefficient use of network information on the WWWis to navigate across documents resources. Moreover, any database has to abstract away through hyperlinks, retrieving the information of interest important information, concerning both the content of along the way. The dynamic and distributed nature of the documentsand their structure [Belew 1985]. Finally, the environment, however, makes retrieving specific information gathering and retrieval processes are information a hard task [De Bra & Post 1994, McBryan independent and therefore feedback from the latter cannot 1994]. be used to adaptively improveefficiency nor quality of the On the other hand, the WWWrepresents an ideal former. environmentin which to apply techniques recently matured To remedy someof these problems, the client-based Fish in the field of artificial life (ALife). In particular, adaptive Search algorithm has been proposed [De Bra & Post 1994]. and distributed algorithms seem to appropriately capture This approach uses the metaphorof a school of fish, where agents in a population survive, reproduce, and die based on the energy gained from their performance in the retrieval Researchpartly supportedby a fellowshipfrom the Apple task. This approach, however, stops short of solving the AdvancedTechnology Group remaining problems: no mutation occurs at reproduction, queries, just as different behaviors may be optimal for and each agent in the population follows a non-adaptive, different environments. Only the end results of a search exhaustive depth-first search algorithm. While some (the retrieved documents)can be evaluated, and the agents heuristics are used to order the graph traversal, the lack of identify the relevant information from its correlation with intelligent cutting of search branches results in slow speed energy. Therefore populations will evolve strategies and high network consumption (caching being proposed as effective in the different environments specified by both a palliative). information space and queries. Adaptation means for The idea of adaptive IR is not new [Belew 1989]. More agents to concentrate in high energy areas of the Web, recently, learning agents and traditional genetic algorithms where manydocuments are relevant. Each agent’s survival have been successfully applied to information retrieval will be ensured by exchanging an adequate flow of [Yang & Korfhage 1993] and information filtering [Maes information for energy. & Kozierok 1993, Sheth & Maes 1993]. Workin progress indicates that learning is sped up by extending such models to collaborative multi-agent systems [Lashkari, Metral, & An Implementation Maes1994]. Using a distributed population of cooperating best-first search agents has been recently proposed in the Wehave implemented the endogenous fitness model in a WebAntsproject [Mauldin & Leavitt 1994] to overcome simple prototype of IR system for the WWW.The the single-server and single-client bottlenecks. algorithm is illustrated in Table I. The user provides a query consisting of a set of keywords. A population of agents is initialized with some energy, some random Information Search by Endogenous Fitness strategy, and somedistribution in the Web.The ideal, zero- knowledge assumption is to start with a population at Endogenousfitness models are becoming an increasingly minimaldistance from all nodes. Typical heuristics suggest appreciated and well-understood paradigm in the ALife to initialize the population with a uniformdistribution in a community[Mitchell & Forrest 1994]. While a thorough default set of knownstarting points, or better yet in the discussion of the subject is out of the scope of this paper documentsreturned by a preliminary call to a traditional [see, e.g., Menczer & Belew 1995], we outline in this search engine. section the main aspects that makethis paradigmuseful for the problem of IR in the WWW. INPUT:n>O, e>O, t>O ; queryword(s) A population of agents becomesevolutionary adapted in a dynamicenvironment by a steady-state genetic algorithm. 1 initialize populationof n agentswith random,6, 7, E Energy is the single currency by which agents survive, reproduce, and die, and it must be positively correlated 2 while numberof agents> 0 do for all agents: with someperformance measure for the task defined by the 2.1 computefor eachlink i in current documentthe environment. Agents asynchronously go through a simple estimateL i = sumof occurrencesof eachquery cycle in which they receive input from the environmentas wordin document,weighted inversely to the well as internal state, perform some computation, and numberof headersbetween the positions of i and execute actions. Actions have an energy cost but mayresult the wordin the document in energy intake. Energy is used up and accumulated pick link i to follow accordingto the probability internally throughout an agent’s life; its internal level automatically determines reproduction and death, events in distribution Pi=exp(flLi)//~i exp(~Lj) whichenergy is conserved. Agents that perform the task better than average reproduce more and colonize the population. Indirect 2.2 E ~- E- N + eR whereN = server accesscost measureand current document’srelevance R = interaction among agents occurs without the need of numberof occurrencesof querywords in expensive communication,via competition for the shared, document finite environmental resources. Mutations afford the changes necessary for the evolution of dynamicallyadapted 1 agents. This paradigm enforces density-dependent if ( R + I ) 7 > -ff =:~E ~-- E + where F = user selection: the expected population size is determinedby the feedbackenergy (optional) carrying capacity of the environment. Associating high 2.3 if E>t cloneagent, split parent’senergy with energy costs with expensiveactions intrinsically enforces a offspring, mutateoffspring’s/~, % balanced network load by limiting inefficient uses of bandwidth. else if E < 0 destroyagent In the heterogeneous environment of the WWW,it is Table1: Algorithmfor adaptiveinformation agents. hard to associate a fitness measure with a strategy in general, but it is easy to estimate the results of a strategy In order to keep search strategies simple while allowing applied to a particular query. Different information search adaptivity, stochastic selection is used to navigate across and retrieval strategies may be optimal for different hyperlinks. For each cycle, each agent estimates the 129 hyperlinks from the current node to decide which node to Figure1. Thefact that this collection is closed to the rest of visit next. The estimates, based on a fixed matching the WWWis only one of its limitations. function of the current documentand the user keywords, are scaled by a non-decreasing function to obtain a probability distribution that is in turn used by a stochastic selector. The slope of the non-linear scaling function is determined by the agent’s genetic parameter fl This trait represent the adaptive part of the agent’s search